-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for resource
argument w/ improved support for spatial data inputs
#8
base: main
Are you sure you want to change the base?
Add support for resource
argument w/ improved support for spatial data inputs
#8
Conversation
Also add sedt_url helper function
- Refactor to use sedt_url helper function - Add check_installed to check for optional packages - Fix links and escaping for square brackets in documentation - Add package docs - Add prefixing for non-imported functions
Move optional packages to Suggests and add rlang to Imports Add UrbanInstitute/urbnthemes to Remotes Add Eli Pousson to contributors
... must appear first + .envir must be passed on to `stringr::str_glue`
Also minor adjustment to README headings + add withr to create temporary directory for file download
Support uses the new `prep_sedt_resource()` helper function and adds `sf` and `arcgislayers` to Suggests Also clean-up inheritance for parameter definitions to reduce duplicative documentation Also make acs_data_year settable with an option
Merge changes from main onto feature branch
Also drop withr from Suggests (since it isn't used in the README anymore) Also add URL + BugReports URL to DESCRIPTION
Prefix not needed if full package is imported
Also expand description and address excess line lengths in code/docs
Otherwise use `utils::write.csv()` Also make check_sedt_resource return NULL invisibly consistently
geo, | ||
acs_data_year) | ||
) | ||
acs_data_year = getOption("sedtR.year", 2021), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the logic of this code to allow a user to set a global year option using sedtR.year (that would be different from 2021) so that they would, for example, not have to keep entering 2019 as the year argument if they wanted to use 2019 data for all of their function calls in a given session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. The main advantage would be for a user who is making multiple calls to the API and wants to set a default year at the outset.
) | ||
|
||
stopifnot(acs_data_year %in% c("2019", "2021", "2022")) | ||
stopifnot(geo %in% c("city", "county", "state", "national")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the match_acs_data_year()
and match_geo()
functions perform the same check of the acs_data_year
and geo
arguments for valid values using arg_match0
instead of stopifnot
. I wasn't totally clear on why this is a preferable approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rlang::arg_match0()
gives you a much more information error message. For example, if you pass 2018 (or "2018") for validation. This is what you get with arg_match (or arg_match0()
):
Error:
! var
must be one of "2019", "2021", or "2022", not
"2018".
ℹ Did you mean "2019"?
stopifnot()
gives you:
Error: var %in% as.character(c(2019, 2021, 2022)) is not TRUE
Moving validation into a helper function also reduces the number of places you'd need to change the vector of supported years if you validating an input year in more than one location.
#' | ||
#' @noRd | ||
#' @importFrom purrr list_cbind | ||
convert_to_coords <- function(data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a concern about this added functionality leading to non-point spatial data being used/interpreted improperly in the tool. For example, if someone uploaded a dataset of parks with multipolygon geometries, a park that spans multiple census tracts would be assigned a single point via the st_centroid
or st_point_on_surface
function. On the back-end, our tool would then "attribute" the park only to the census tract that the single point falls into for purposes of the disparity calculations. That wouldn't be quite correct, since in reality the park falls in multiple census tracts (and you'd likely want to weight the assignment by area in each tract).
We've heard from other folks about the desire to use non-point spatial data with the tool and we're about to pilot some use cases with one of our partners to determine where/how to use these data properly. Until we have a "best practice" that we're comfortable implementing as a default in sedtR, we'd rather individuals have to make the choice about how to preprocess their data to convert to points before using sedtR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fair. I think the scenario I had in mind was a case when you are trying to count parks by county or parcels by neighborhood but you could always supply an example showing how to use sf::st_centroid()
to transform data in advance of passing it to the API. If you want to revisit this in the future, I can imagine including some warning similar to the default warning for sf::st_centroid()
("st_centroid assumes attributes are constant over geometries") or some check that compares the input geometry of the resource to the comparison geometry. Not sure if that is possible or desirable though. I'll plan to revise the PR to pull this out.
"https://equity-tool-api.urban.org", | ||
envir = parent.env(environment())) | ||
.onAttach <- function(libname, pkgname) { | ||
stage <- "production" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing the best practice for using .onAttach for startup messages - we agree that the approach you've implemented comports with the best practice
return(invisible(NULL)) | ||
} | ||
|
||
not_message <- "{.obj_type_friendly {resource}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand this syntax - do you mind explaining how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"obj_type_friendly" is part of the inline syntax supported by the cli package: https://cli.r-lib.org/reference/inline-markup.html
Jenny Bryan has been a convincing proponent of the "never nester" approach to code structure (I think this video is where I first encountered the term) so setting a default value for not_message
avoids the need for an else
in the following conditional.
) | ||
acs_data_year = getOption("sedtR.year", 2021), | ||
placement = "surface", | ||
..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't quite sure of the rationale for adding the ...
to this function - do you mind explaining how you'd envision it being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those dots get passed from call_upload_user_files()
to prep_sedt_resource()
and then to either arcgislayers::arc_read()
or sf::read_sf()
. Basically, it is a way of exposing parameters like token
(for accessing private FeatureLayers) or query
(for subsetting a file or database) that would only be used if a user is supplying an ESRI FeatureService URL or a file path. Most users probably wouldn't need this option but it could be really helpful if someone needs it.
The WIP book on Tidy design principles has more details on the uses of ...
arguments if it is a helpful reference: https://design.tidyverse.org/dots-after-required.html
fileext = "csv", | ||
placement = "surface", | ||
..., | ||
arg = caller_arg(resource), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wasn't quite sure of the logic for adding ...
and setting file = NULL
as an argument in this function. It seemed to me that we'd envision this as an internal function that's only called by other functions in the package, in which case it appears that the file
argument is never set to something besides the default. Were you seeing a use case for users calling this function directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above for my thoughts on passing ...
to this function.
Giving file a NULL
default value allows a user to set resource as a URL where a file is located and pass the filename to use for the downloaded file as a separate parameter. A user could pass a GeoPackage file as the resource, e.g. "parks.gpkg", and then user the file argument to set the path and file name of the converted data, e.g. "path_to_converted_file/sedtR_parks.csv". I can't recall if I included any documentation for this feature and it may not be essential.
@elipousson thank you SO MUCH for this detailed PR and apologies for how long it's taken us to get back to you! We reviewed the changes (and learned a lot about package development along the way) and made a couple of comments throughout the PR. Most of the comments are just us trying to better understand the logic of some of the changes. The one substantive change we want to make to this code is to remove the functionality to accept non-point spatial data. I provide more explanation in the in-line comment. It's something we'd love to add more support for eventually, but don't think we're there yet. We'd love for you to be part of the conversations about what that functionality looks like. the other thing we want to do before merging the PR is to write tests for the new functionality. If you were comfortable giving us push access to your fork, we'd be happy to open a PR into your branch with the proposed changes. alternatively, we could create a new branch in our repo that you could merge your PR into instead of main, and we could do the edits/testing there before merging into main. please let us know how you'd prefer to proceed! we'd also still love to find a time to chat on the phone to better understand how you're interested in using sedtR and to share some of our upcoming development plans with you! I can follow up over email to set up a time! |
@alenastern No apologies needed at all! I still owe you and your colleagues a reply to your earlier email regarding our interest in using this package for our work in Baltimore City. I'll follow up and hope we can connect soon. I went ahead and responded to your comments briefly to explain my rationale for the changes but please so let me know if anything is unclear or if you have something different in mind. I'm partial to the tidyverse code style for package development but I know that there isn't anything like one "right" way to do any of this. I have "Allow edits by maintainers" checked already and I'm happy for you and your colleagues to make changes as needed. I could do a first pass to make sure POINT geometry input is always required and add some initial tests then you can revise and add as needed. If a new PR for a feature branch is better than a PR for main, that also works – whatever works for you really. |
@alenastern There are still a few changes I want to make based on our discussion the other day (especially getting tests working) but I went ahead and made the changes to allow non-POINT geometry. The helper function didn't change too much since I wanted to maintain support for converting POINT geometry into a coordinate data frame output. If you have a chance, could you create a new development branch and I (or you) can modify the PR so it will merge into that branch instead of main? |
Following up on our exchange @alenastern (on this PR #1) about adding support for FeatureLayer and other input resource types, I went ahead and put something together that adds support for URLs for spatial or tabular data or FeatureLayers or Tables.
I also added support for the following objects or inputs—all of which are converted to a tabular format and written to a CSV file before they are used with the api.
I also included some further documentation clean-up (reducing duplication, adding missing titles, expanding descriptions, adding URLs to the DESCRIPTION).
One change that didn't get incorporated in my prior PR that I kept in this one: swapping .onLoad for .onAttach. I think the approach here is more consistent with best practices but, if there is a reason you prefer the current approach, I'd be curious to learn.
I know this is a big pull request but I'm hoping the changes are all welcome. As the modified example in the README illustrates, I think the new resource argument definitely simplifies the syntax when using this package in a helpful way.