Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only create a new pin if no other previous version matches? #826

Open
venpopov opened this issue Apr 22, 2024 · 1 comment
Open

Only create a new pin if no other previous version matches? #826

venpopov opened this issue Apr 22, 2024 · 1 comment
Labels
feature a feature request or enhancement

Comments

@venpopov
Copy link

venpopov commented Apr 22, 2024

I was surprised by this behavior:

library(pins)

board <- board_temp(versioned = TRUE)

board |> pin_write(mtcars, "mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20240422T010932Z-a4a15'
#> Writing to pin 'mtcars'
Sys.sleep(1)
board |> pin_write(mtcars[1:5, ], "mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20240422T010933Z-45238'
#> Writing to pin 'mtcars'
Sys.sleep(1)
board |> pin_write(mtcars, "mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20240422T010934Z-a4a15'
#> Writing to pin 'mtcars'
Sys.sleep(1)
board |> pin_write(mtcars[1:5, ], "mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20240422T010935Z-45238'
#> Writing to pin 'mtcars'

board |> pin_versions("mtcars")
#> # A tibble: 4 × 3
#>   version                created             hash 
#>   <chr>                  <dttm>              <chr>
#> 1 20240422T010932Z-a4a15 2024-04-22 03:09:32 a4a15
#> 2 20240422T010933Z-45238 2024-04-22 03:09:33 45238
#> 3 20240422T010934Z-a4a15 2024-04-22 03:09:34 a4a15
#> 4 20240422T010935Z-45238 2024-04-22 03:09:35 45238

Created on 2024-04-22 with reprex v2.1.0

In the third step, I am saving the same object as in the first. Normally, if it is the same as the most recent, it is not rewritten.

I would like to use pins to track data objects produced by a research pipeline, in which I might change branches to try out different features. With the current behavior, a new object will be saved every time I rerun a pipeline after switching branches, which is unnecessary file duplication.

To fix this (which could be through an option setting), pin_write should check the hash not only for the last version, but for all cached versions. Then to make sure that pin_read() will work correctly, it would need to update the "created" field (or perhaps a new "reactivated" field?) so that the appropriate version is considered the most recent.

@juliasilge
Copy link
Member

Thanks for this feedback @venpopov!

Let's use this issue to collect thoughts on this type of change. There isn't a great workaround right now for your particular use case because it is hard for folks to manually check the hash themselves ahead of writing; pins:::pin_hash() is both unexported and uses paths as the arg, which isn't entirely easy for a user to get at.

@juliasilge juliasilge added the feature a feature request or enhancement label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants