Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for resource argument w/ improved support for spatial data inputs #8

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

elipousson
Copy link

Following up on our exchange @alenastern (on this PR #1) about adding support for FeatureLayer and other input resource types, I went ahead and put something together that adds support for URLs for spatial or tabular data or FeatureLayers or Tables.

I also added support for the following objects or inputs—all of which are converted to a tabular format and written to a CSV file before they are used with the api.

  • File paths for spatial data
  • sf or sfc input objects
  • data frame with coordinate columns

I also included some further documentation clean-up (reducing duplication, adding missing titles, expanding descriptions, adding URLs to the DESCRIPTION).

One change that didn't get incorporated in my prior PR that I kept in this one: swapping .onLoad for .onAttach. I think the approach here is more consistent with best practices but, if there is a reason you prefer the current approach, I'd be curious to learn.

I know this is a big pull request but I'm hoping the changes are all welcome. As the modified example in the README illustrates, I think the new resource argument definitely simplifies the syntax when using this package in a helpful way.

Also add sedt_url helper function
- Refactor to use sedt_url helper function
- Add check_installed to check for optional packages
- Fix links and escaping for square brackets in documentation
- Add package docs
- Add prefixing for non-imported functions
Move optional packages to Suggests and add rlang to Imports
Add UrbanInstitute/urbnthemes to Remotes
Add Eli Pousson to contributors
... must appear first + .envir must be passed on to `stringr::str_glue`
Also minor adjustment to README headings + add withr to create temporary directory for file download
Support uses the new `prep_sedt_resource()` helper function and adds `sf` and `arcgislayers` to Suggests

Also clean-up inheritance for parameter definitions to reduce duplicative documentation

Also make acs_data_year settable with an option
Merge changes from main onto feature branch
Also drop withr from Suggests (since it isn't used in the README anymore)

Also add URL + BugReports URL to DESCRIPTION
Prefix not needed if full package is imported
Also expand description and address excess line lengths in code/docs
Otherwise use `utils::write.csv()`

Also make check_sedt_resource return NULL invisibly consistently
geo,
acs_data_year)
)
acs_data_year = getOption("sedtR.year", 2021),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the logic of this code to allow a user to set a global year option using sedtR.year (that would be different from 2021) so that they would, for example, not have to keep entering 2019 as the year argument if they wanted to use 2019 data for all of their function calls in a given session?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. The main advantage would be for a user who is making multiple calls to the API and wants to set a default year at the outset.

)

stopifnot(acs_data_year %in% c("2019", "2021", "2022"))
stopifnot(geo %in% c("city", "county", "state", "national"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the match_acs_data_year() and match_geo() functions perform the same check of the acs_data_year and geo arguments for valid values using arg_match0 instead of stopifnot. I wasn't totally clear on why this is a preferable approach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rlang::arg_match0() gives you a much more information error message. For example, if you pass 2018 (or "2018") for validation. This is what you get with arg_match (or arg_match0()):

Error:
! var must be one of "2019", "2021", or "2022", not
"2018".
ℹ Did you mean "2019"?

stopifnot() gives you:

Error: var %in% as.character(c(2019, 2021, 2022)) is not TRUE

Moving validation into a helper function also reduces the number of places you'd need to change the vector of supported years if you validating an input year in more than one location.

#'
#' @noRd
#' @importFrom purrr list_cbind
convert_to_coords <- function(data,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a concern about this added functionality leading to non-point spatial data being used/interpreted improperly in the tool. For example, if someone uploaded a dataset of parks with multipolygon geometries, a park that spans multiple census tracts would be assigned a single point via the st_centroid or st_point_on_surface function. On the back-end, our tool would then "attribute" the park only to the census tract that the single point falls into for purposes of the disparity calculations. That wouldn't be quite correct, since in reality the park falls in multiple census tracts (and you'd likely want to weight the assignment by area in each tract).

We've heard from other folks about the desire to use non-point spatial data with the tool and we're about to pilot some use cases with one of our partners to determine where/how to use these data properly. Until we have a "best practice" that we're comfortable implementing as a default in sedtR, we'd rather individuals have to make the choice about how to preprocess their data to convert to points before using sedtR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fair. I think the scenario I had in mind was a case when you are trying to count parks by county or parcels by neighborhood but you could always supply an example showing how to use sf::st_centroid() to transform data in advance of passing it to the API. If you want to revisit this in the future, I can imagine including some warning similar to the default warning for sf::st_centroid() ("st_centroid assumes attributes are constant over geometries") or some check that compares the input geometry of the resource to the comparison geometry. Not sure if that is possible or desirable though. I'll plan to revise the PR to pull this out.

"https://equity-tool-api.urban.org",
envir = parent.env(environment()))
.onAttach <- function(libname, pkgname) {
stage <- "production"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the best practice for using .onAttach for startup messages - we agree that the approach you've implemented comports with the best practice

return(invisible(NULL))
}

not_message <- "{.obj_type_friendly {resource}}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand this syntax - do you mind explaining how it works?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"obj_type_friendly" is part of the inline syntax supported by the cli package: https://cli.r-lib.org/reference/inline-markup.html

Jenny Bryan has been a convincing proponent of the "never nester" approach to code structure (I think this video is where I first encountered the term) so setting a default value for not_message avoids the need for an else in the following conditional.

)
acs_data_year = getOption("sedtR.year", 2021),
placement = "surface",
...,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't quite sure of the rationale for adding the ... to this function - do you mind explaining how you'd envision it being used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those dots get passed from call_upload_user_files() to prep_sedt_resource() and then to either arcgislayers::arc_read() or sf::read_sf(). Basically, it is a way of exposing parameters like token (for accessing private FeatureLayers) or query (for subsetting a file or database) that would only be used if a user is supplying an ESRI FeatureService URL or a file path. Most users probably wouldn't need this option but it could be really helpful if someone needs it.

The WIP book on Tidy design principles has more details on the uses of ... arguments if it is a helpful reference: https://design.tidyverse.org/dots-after-required.html

fileext = "csv",
placement = "surface",
...,
arg = caller_arg(resource),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wasn't quite sure of the logic for adding ... and setting file = NULL as an argument in this function. It seemed to me that we'd envision this as an internal function that's only called by other functions in the package, in which case it appears that the file argument is never set to something besides the default. Were you seeing a use case for users calling this function directly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above for my thoughts on passing ... to this function.

Giving file a NULL default value allows a user to set resource as a URL where a file is located and pass the filename to use for the downloaded file as a separate parameter. A user could pass a GeoPackage file as the resource, e.g. "parks.gpkg", and then user the file argument to set the path and file name of the converted data, e.g. "path_to_converted_file/sedtR_parks.csv". I can't recall if I included any documentation for this feature and it may not be essential.

@alenastern
Copy link
Collaborator

@elipousson thank you SO MUCH for this detailed PR and apologies for how long it's taken us to get back to you! We reviewed the changes (and learned a lot about package development along the way) and made a couple of comments throughout the PR. Most of the comments are just us trying to better understand the logic of some of the changes. The one substantive change we want to make to this code is to remove the functionality to accept non-point spatial data. I provide more explanation in the in-line comment. It's something we'd love to add more support for eventually, but don't think we're there yet. We'd love for you to be part of the conversations about what that functionality looks like.

the other thing we want to do before merging the PR is to write tests for the new functionality. If you were comfortable giving us push access to your fork, we'd be happy to open a PR into your branch with the proposed changes. alternatively, we could create a new branch in our repo that you could merge your PR into instead of main, and we could do the edits/testing there before merging into main.

please let us know how you'd prefer to proceed!

we'd also still love to find a time to chat on the phone to better understand how you're interested in using sedtR and to share some of our upcoming development plans with you! I can follow up over email to set up a time!

@elipousson
Copy link
Author

@alenastern No apologies needed at all! I still owe you and your colleagues a reply to your earlier email regarding our interest in using this package for our work in Baltimore City. I'll follow up and hope we can connect soon.

I went ahead and responded to your comments briefly to explain my rationale for the changes but please so let me know if anything is unclear or if you have something different in mind. I'm partial to the tidyverse code style for package development but I know that there isn't anything like one "right" way to do any of this.

I have "Allow edits by maintainers" checked already and I'm happy for you and your colleagues to make changes as needed. I could do a first pass to make sure POINT geometry input is always required and add some initial tests then you can revise and add as needed. If a new PR for a feature branch is better than a PR for main, that also works – whatever works for you really.

@elipousson
Copy link
Author

@alenastern There are still a few changes I want to make based on our discussion the other day (especially getting tests working) but I went ahead and made the changes to allow non-POINT geometry. The helper function didn't change too much since I wanted to maintain support for converting POINT geometry into a coordinate data frame output.

If you have a chance, could you create a new development branch and I (or you) can modify the PR so it will merge into that branch instead of main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants