Add support for `resource` argument w/ improved support for spatial data inputs #8

elipousson · 2024-04-25T02:59:31Z

Following up on our exchange @alenastern (on this PR #1) about adding support for FeatureLayer and other input resource types, I went ahead and put something together that adds support for URLs for spatial or tabular data or FeatureLayers or Tables.

I also added support for the following objects or inputs—all of which are converted to a tabular format and written to a CSV file before they are used with the api.

File paths for spatial data
sf or sfc input objects
data frame with coordinate columns

I also included some further documentation clean-up (reducing duplication, adding missing titles, expanding descriptions, adding URLs to the DESCRIPTION).

One change that didn't get incorporated in my prior PR that I kept in this one: swapping .onLoad for .onAttach. I think the approach here is more consistent with best practices but, if there is a reason you prefer the current approach, I'd be curious to learn.

I know this is a big pull request but I'm hoping the changes are all welcome. As the modified example in the README illustrates, I think the new resource argument definitely simplifies the syntax when using this package in a helpful way.

Also add sedt_url helper function

- Refactor to use sedt_url helper function - Add check_installed to check for optional packages - Fix links and escaping for square brackets in documentation - Add package docs - Add prefixing for non-imported functions

Move optional packages to Suggests and add rlang to Imports Add UrbanInstitute/urbnthemes to Remotes Add Eli Pousson to contributors

... must appear first + .envir must be passed on to `stringr::str_glue`

Also minor adjustment to README headings + add withr to create temporary directory for file download

Support uses the new `prep_sedt_resource()` helper function and adds `sf` and `arcgislayers` to Suggests Also clean-up inheritance for parameter definitions to reduce duplicative documentation Also make acs_data_year settable with an option

Merge changes from main onto feature branch

Also drop withr from Suggests (since it isn't used in the README anymore) Also add URL + BugReports URL to DESCRIPTION

Prefix not needed if full package is imported

Also expand description and address excess line lengths in code/docs

Otherwise use `utils::write.csv()` Also make check_sedt_resource return NULL invisibly consistently

… function

alenastern · 2024-06-06T20:07:23Z

R/call_upload_user_files.R

- geo,
- acs_data_year)
- )
+ acs_data_year = getOption("sedtR.year", 2021),


Is the logic of this code to allow a user to set a global year option using sedtR.year (that would be different from 2021) so that they would, for example, not have to keep entering 2019 as the year argument if they wanted to use 2019 data for all of their function calls in a given session?

Exactly. The main advantage would be for a user who is making multiple calls to the API and wants to set a default year at the outset.

alenastern · 2024-06-06T20:08:50Z

R/call_upload_user_files.R

 )

- stopifnot(acs_data_year %in% c("2019", "2021", "2022"))
- stopifnot(geo %in% c("city", "county", "state", "national"))


It looks like the match_acs_data_year() and match_geo() functions perform the same check of the acs_data_year and geo arguments for valid values using arg_match0 instead of stopifnot. I wasn't totally clear on why this is a preferable approach?

rlang::arg_match0() gives you a much more information error message. For example, if you pass 2018 (or "2018") for validation. This is what you get with arg_match (or arg_match0()):

Error:
! var must be one of "2019", "2021", or "2022", not
"2018".
ℹ Did you mean "2019"?

stopifnot() gives you:

Error: var %in% as.character(c(2019, 2021, 2022)) is not TRUE

Moving validation into a helper function also reduces the number of places you'd need to change the vector of supported years if you validating an input year in more than one location.

alenastern · 2024-06-06T20:15:14Z

R/utils.R

+#'
+#' @noRd
+#' @importFrom purrr list_cbind
+convert_to_coords <- function(data,


We had a concern about this added functionality leading to non-point spatial data being used/interpreted improperly in the tool. For example, if someone uploaded a dataset of parks with multipolygon geometries, a park that spans multiple census tracts would be assigned a single point via the st_centroid or st_point_on_surface function. On the back-end, our tool would then "attribute" the park only to the census tract that the single point falls into for purposes of the disparity calculations. That wouldn't be quite correct, since in reality the park falls in multiple census tracts (and you'd likely want to weight the assignment by area in each tract).

We've heard from other folks about the desire to use non-point spatial data with the tool and we're about to pilot some use cases with one of our partners to determine where/how to use these data properly. Until we have a "best practice" that we're comfortable implementing as a default in sedtR, we'd rather individuals have to make the choice about how to preprocess their data to convert to points before using sedtR

Totally fair. I think the scenario I had in mind was a case when you are trying to count parks by county or parcels by neighborhood but you could always supply an example showing how to use sf::st_centroid() to transform data in advance of passing it to the API. If you want to revisit this in the future, I can imagine including some warning similar to the default warning for sf::st_centroid() ("st_centroid assumes attributes are constant over geometries") or some check that compares the input geometry of the resource to the comparison geometry. Not sure if that is possible or desirable though. I'll plan to revise the PR to pull this out.

alenastern · 2024-06-06T20:16:01Z

R/zzz.R

- "https://equity-tool-api.urban.org",
- envir = parent.env(environment()))
+.onAttach <- function(libname, pkgname) {
+ stage <- "production"


Thanks for sharing the best practice for using .onAttach for startup messages - we agree that the approach you've implemented comports with the best practice

alenastern · 2024-06-06T20:16:50Z

R/prep_sedt_resource.R

+ return(invisible(NULL))
+ }
+
+ not_message <- "{.obj_type_friendly {resource}}"


I didn't quite understand this syntax - do you mind explaining how it works?

"obj_type_friendly" is part of the inline syntax supported by the cli package: https://cli.r-lib.org/reference/inline-markup.html

Jenny Bryan has been a convincing proponent of the "never nester" approach to code structure (I think this video is where I first encountered the term) so setting a default value for not_message avoids the need for an else in the following conditional.

alenastern · 2024-06-06T20:19:31Z

R/call_upload_user_files.R

- )
+ acs_data_year = getOption("sedtR.year", 2021),
+ placement = "surface",
+ ...,


I wasn't quite sure of the rationale for adding the ... to this function - do you mind explaining how you'd envision it being used?

Those dots get passed from call_upload_user_files() to prep_sedt_resource() and then to either arcgislayers::arc_read() or sf::read_sf(). Basically, it is a way of exposing parameters like token (for accessing private FeatureLayers) or query (for subsetting a file or database) that would only be used if a user is supplying an ESRI FeatureService URL or a file path. Most users probably wouldn't need this option but it could be really helpful if someone needs it.

The WIP book on Tidy design principles has more details on the uses of ... arguments if it is a helpful reference: https://design.tidyverse.org/dots-after-required.html

alenastern · 2024-06-06T20:22:56Z

R/prep_sedt_resource.R

+ fileext = "csv",
+ placement = "surface",
+ ...,
+ arg = caller_arg(resource),


I also wasn't quite sure of the logic for adding ... and setting file = NULL as an argument in this function. It seemed to me that we'd envision this as an internal function that's only called by other functions in the package, in which case it appears that the file argument is never set to something besides the default. Were you seeing a use case for users calling this function directly?

See above for my thoughts on passing ... to this function.

Giving file a NULL default value allows a user to set resource as a URL where a file is located and pass the filename to use for the downloaded file as a separate parameter. A user could pass a GeoPackage file as the resource, e.g. "parks.gpkg", and then user the file argument to set the path and file name of the converted data, e.g. "path_to_converted_file/sedtR_parks.csv". I can't recall if I included any documentation for this feature and it may not be essential.

alenastern · 2024-06-06T20:28:08Z

@elipousson thank you SO MUCH for this detailed PR and apologies for how long it's taken us to get back to you! We reviewed the changes (and learned a lot about package development along the way) and made a couple of comments throughout the PR. Most of the comments are just us trying to better understand the logic of some of the changes. The one substantive change we want to make to this code is to remove the functionality to accept non-point spatial data. I provide more explanation in the in-line comment. It's something we'd love to add more support for eventually, but don't think we're there yet. We'd love for you to be part of the conversations about what that functionality looks like.

the other thing we want to do before merging the PR is to write tests for the new functionality. If you were comfortable giving us push access to your fork, we'd be happy to open a PR into your branch with the proposed changes. alternatively, we could create a new branch in our repo that you could merge your PR into instead of main, and we could do the edits/testing there before merging into main.

please let us know how you'd prefer to proceed!

we'd also still love to find a time to chat on the phone to better understand how you're interested in using sedtR and to share some of our upcoming development plans with you! I can follow up over email to set up a time!

elipousson · 2024-06-07T04:24:16Z

@alenastern No apologies needed at all! I still owe you and your colleagues a reply to your earlier email regarding our interest in using this package for our work in Baltimore City. I'll follow up and hope we can connect soon.

I went ahead and responded to your comments briefly to explain my rationale for the changes but please so let me know if anything is unclear or if you have something different in mind. I'm partial to the tidyverse code style for package development but I know that there isn't anything like one "right" way to do any of this.

I have "Allow edits by maintainers" checked already and I'm happy for you and your colleagues to make changes as needed. I could do a first pass to make sure POINT geometry input is always required and add some initial tests then you can revise and add as needed. If a new PR for a feature branch is better than a PR for main, that also works – whatever works for you really.

elipousson · 2024-06-26T03:09:49Z

@alenastern There are still a few changes I want to make based on our discussion the other day (especially getting tests working) but I went ahead and made the changes to allow non-POINT geometry. The helper function didn't change too much since I wanted to maintain support for converting POINT geometry into a coordinate data frame output.

If you have a chance, could you create a new development branch and I (or you) can modify the PR so it will merge into that branch instead of main?

elipousson added 28 commits March 7, 2024 14:29

Fix typo in package prefix

2c1a120

Swap .onAttach for .onLoad

d9601dd

Also add sedt_url helper function

Address issues w/ package check

f642c18

- Refactor to use sedt_url helper function - Add check_installed to check for optional packages - Fix links and escaping for square brackets in documentation - Add package docs - Add prefixing for non-imported functions

Add .Rbuildignore and .gitignore

c3cd63f

Tidy DESCRIPTION + fix issue w/ urbnthemes Import

155c9b3

Move optional packages to Suggests and add rlang to Imports Add UrbanInstitute/urbnthemes to Remotes Add Eli Pousson to contributors

Correct issue w/ sedt_url helper

0bc6ed8

... must appear first + .envir must be passed on to `stringr::str_glue`

Make create_map interactive if session is interactive

df12054

Add map to README

6968554

Also minor adjustment to README headings + add withr to create temporary directory for file download

Merge branch 'main' of https://github.com/elipousson/sedtR

56bb274

Merge branch 'UrbanInstitute:main' into main

6c5d6b6

import rlang

b58f8d6

Make encoding explicit to avoid warnings

fb9d617

Add support for multi-format resource inputs

07402cc

Support uses the new `prep_sedt_resource()` helper function and adds `sf` and `arcgislayers` to Suggests Also clean-up inheritance for parameter definitions to reduce duplicative documentation Also make acs_data_year settable with an option

Update package docs

bdb3d6a

Merge pull request #1 from elipousson/main

5eac862

Merge changes from main onto feature branch

Fill in missing placement argument

10d4108

Revise README to use new resource argument for call_sedt_api

8d46dd5

Also drop withr from Suggests (since it isn't used in the README anymore) Also add URL + BugReports URL to DESCRIPTION

Revert changes to README for consistency w/ main

85f35f1

Remove prefixing for rlang

7217a48

Prefix not needed if full package is imported

Import download.file from utils

de5d8bd

Add stage variable for consistency

d55626d

Clean-up excess line length

de6ec66

Update NEWS

9616906

Fix default value for interactive in create_map()

75261a3

Also expand description and address excess line lengths in code/docs

Remove figure (no longer created by README)

26a0772

Only call readr function if installed

54eec65

Otherwise use `utils::write.csv()` Also make check_sedt_resource return NULL invisibly consistently

Add support for resource arg to call_upload_user_files_no_stop helper…

1b5307d

… function

Add helpers for input checks

7dc7e5b

alenastern reviewed Jun 6, 2024

View reviewed changes

Limit resource input to POINT geometry

2b7461e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `resource` argument w/ improved support for spatial data inputs #8

Add support for `resource` argument w/ improved support for spatial data inputs #8

elipousson commented Apr 25, 2024

alenastern Jun 6, 2024

elipousson Jun 7, 2024

alenastern Jun 6, 2024

elipousson Jun 7, 2024

alenastern Jun 6, 2024

elipousson Jun 7, 2024

alenastern Jun 6, 2024

alenastern Jun 6, 2024

elipousson Jun 7, 2024

alenastern Jun 6, 2024

elipousson Jun 7, 2024

alenastern Jun 6, 2024

elipousson Jun 7, 2024

alenastern commented Jun 6, 2024

elipousson commented Jun 7, 2024

elipousson commented Jun 26, 2024

Add support for resource argument w/ improved support for spatial data inputs #8

Are you sure you want to change the base?

Add support for resource argument w/ improved support for spatial data inputs #8

Conversation

elipousson commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alenastern commented Jun 6, 2024

elipousson commented Jun 7, 2024

elipousson commented Jun 26, 2024

Add support for `resource` argument w/ improved support for spatial data inputs #8

Add support for `resource` argument w/ improved support for spatial data inputs #8