Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tar_sf_* family of targets #13

Open
njtierney opened this issue Mar 12, 2024 · 7 comments
Open

Implement tar_sf_* family of targets #13

njtierney opened this issue Mar 12, 2024 · 7 comments
Labels
new package suggestion of a new spatial package to support priority: medium

Comments

@njtierney
Copy link
Owner

Related to #4

We need to list the datatypes from sf that we want to support

@brownag
Copy link
Contributor

brownag commented Mar 12, 2024

Some ideas of what could be implemented for object types in {sf}

  • tar_sf() - create a target for a sf data.frame
  • tar_sf_s2() - create a target for a geodesic s2 geography (could just be tar_s2()/direct from {s2}; special case of wk_rcrd)
  • tar_sf_sfc() - create a target for a simple feature collection (homogenous collection)
  • tar_sf_sfg() - create a target for a simple feature geometry collection (heterogeneous collection)

For vector data, filetypes supported by GDAL for read/write could be used as the format for the file in the target store (including Parquet/Arrow), or you could use {arrow} per discussion in #2. Using arrow is probably a great default behavior for vector data due to efficiency/small size. Since sf objects in memory do not have issue with having an internal pointer, they can be serialized to normal 'rds' format files as well.

In the case of sfc/sfg only one column is being managed, but storage formats would likely be similar to what would be used for tar_sf() and the read/write method would just return the column of interest.

For other classes in {sf} like bbox/crs I don't think there is much benefit to having specific methods for these, they are simple enough to store with tar_target() in an ordinary "rds" format.

@defuneste
Copy link

Hello! Thanks for working on that this is great!

A quick idea/question:

sf is using two different engine (geos, s2) and use a setting function (sf_use_s2) to affect that. What could be a desired design for dealing with it in a "target" mindset? Is it something that should be set for a specific target or set at the workflow (or it depends)?

@Aariq
Copy link
Collaborator

Aariq commented Mar 22, 2024

sf is using two different engine (geos, s2) and use a setting function (sf_use_s2) to affect that. What could be a desired design for dealing with it in a "target" mindset? Is it something that should be set for a specific target or set at the workflow (or it depends)?

I'm currently including sf_use_s2() inside target functions when I need to, but I'm relatively new to using sf. Is it common to need to set sf_use_s2() for an entire workflow? If so, I can imagine this being an option in tar_sf*() that could be optionally set for the whole workflow.

@brownag
Copy link
Contributor

brownag commented Mar 22, 2024

In my work I have had cases where I set sf_use_s2(FALSE) for an entire workflow. This is often because I just don't "need" s2 and dont want to add extra boilerplate to fix geometries. I have seen many instances where geometries being operated on may be "invalid" from the perspective of s2, but would at least be functional (even if technically invalid) using geos backend.

Errors like this come up a lot with invalid geometry

Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented,  : 
  Loop 1 is not valid: Edge 0 crosses edge 14

These things are usually fixable by sf::st_make_valid() before the operation that generates above error.

For geotargets, I think we can't assume the user is going to always be working with valid geometries, nor do we want to "fix" them for them. I think there should be a way to set/get the sf option "sf_use_s2" for a whole workflow as the results will subtly differ depending on the engine used

@Aariq
Copy link
Collaborator

Aariq commented Mar 22, 2024

Makes sense. So we'll add use_s2 to the list of options that can be set for an entire pipeline (by geotargets_options_set()) and in individual tar_sf_*() targets.

@defuneste
Copy link

I am running in the same "issue" describing by @brownag (s2 is a bit more "difficult"). If my understanding of how sf_use_s2() is correct even inside a function it will change how sf behave in an R session (and that can be tricky for non aware users).

 library("sf")
sf_use_s2()
#[1] TRUE
jim <- function() sf_use_s2(FALSE)
jim()
#Spherical geometry (s2) switched off
 sf_use_s2()
#[1] FALSE

I think adding in the options_set is a good idea.

The topic of invalid geometry is an important one that also probably deserve it specific issue (it will also depend on GEOS version).

@Aariq Aariq added the new package suggestion of a new spatial package to support label Mar 28, 2024
@Aariq
Copy link
Collaborator

Aariq commented Jun 6, 2024

Since each target is run in a separate clean R session, it shouldn't be a problem to include sf_use_s2() at the top of a function used in a single target—that shouldn't effect the behavior of other targets. For the same reason though, it could be difficult to change the default for an entire workflow using sf_use_s2() and makes sense to consider an option that gets passed to each target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new package suggestion of a new spatial package to support priority: medium
Projects
None yet
Development

No branches or pull requests

4 participants