Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting various split-apply-combine spatial workflows #74

Open
Aariq opened this issue May 20, 2024 · 2 comments
Open

Supporting various split-apply-combine spatial workflows #74

Aariq opened this issue May 20, 2024 · 2 comments

Comments

@Aariq
Copy link
Collaborator

Aariq commented May 20, 2024

So there are at least 4 different ways (yikes!) to do a split-apply-combine type workflow with spatial data just in terra that I've encountered. It may be beneficial to figure out a way to implement these workflows using targets branching in geotargets

  1. SpatRasters or SpatVectors with multiple layers can be split with [[ and recombined with c() or rast()/vect(). Also can be iterated over with lapply() or coerced to a list with as.list()
  2. SpatRasterCollections hold a list of SpatRasters, possibly with different extents, resolutions, and projections. Can be split with [ (but not [[) and combined with terra::sprc(). lapply() also works on them and can be coerced with as.list(). (see consider how tar_terra_sprc might work in dynamic and static branching #53)
  3. SpatRasterDatasets hold a list of SpatRasters with the same resolution and projection (sub-datasets). Can be split with [ or [[ and re-combined with terra::sds(). lapply() also works and can be coerced with as.list(). names are lost in the process (I assume this is a bug: Names lost when using as.list() on a SpatRasterDataset or S rspatial/terra#1513). (see create tar_terra_sds() #59)
  4. makeTiles() splits a raster into tiles that are saved to disk and returns a vector of file paths. Can be opened individually, worked on, and re-combined with merge(sprc(rast(<list of tiles as SpatRasters>))). Alternatively, the files on disk can be opened with vrt() (see Best way to use SpatRaster tiles? #69)

Some open questions:

  1. Which of these (if any) should we try to support better in geotargets (e.g. with target factories)?
  2. Are there existing similar patterns we can borrow from other Targetopia packages? (knowing that we can't customize the behavior of the iteration argument to tar_target_raw())
  3. Will any of these be easier to do if we adopt an approach like [draft] tar_terra_rast_wrap: multi-target method to preserve SpatRaster metadata #63? (I suspect that a tar_terra_tiles() would be easier like this)
  4. Should we try to figure this stuff out soon, or later after we've added basic support for sf and stars?
@Aariq
Copy link
Collaborator Author

Aariq commented May 24, 2024

A 5th way: just iterating over lists of rasters or vectors (#77).

Given that these keep coming up, and that it is confusing complicated stuff to implement (see my ongoing struggles in #76), it might be prudent to find a time we (@njtierney, I, and possibly @brownag) can talk in real-time about this. I want to get some kind of vision about how much of this we build in to geotargets and how much we just document in vignettes and leave it to the users (e.g. my response in #77, or how I implemented iterating over tiles before #76). And if we do implement some solutions, how generalized can/should they be? Would each of these 5 ways need bespoke target factories if we implemented something for them in geotargets?

@Aariq
Copy link
Collaborator Author

Aariq commented May 24, 2024

This is what a workflow for 1. (iterating over layers) looks like:

library(targets)

tar_dir({ # tar_dir() runs code from a temporary directory.
    tar_script({
        library(geotargets)
        library(terra)
        make_rast <- function() {
            a <- terra::rast(nrows = 108, ncols = 21, xmin = 0, xmax = 10)
            values(a) <- 10
            names(a) <- "layer 1"
            b <- terra::rast(nrows = 108, ncols = 21, xmin = 0, xmax = 10)
            values(b) <- 100
            names(b) <- "layer 2"
            c(a, b)
        }
        sub_lyr <- function(rast, index) {
            rast[[index]]
        }
        combine_rast <- function(rast_list) {
            real_names <- lapply(rast_list, names)
            out <- rast(rast_list)
            names(out) <- real_names
            out
        }
        list(
            tar_terra_rast(
                rast_example,
                make_rast()
            ),
            tar_target(
                layers,
                names(rast_example)
            ),
            tar_terra_rast(
                rast_list,
                sub_lyr(rast_example, layers),
                pattern = map(layers),
                iteration = "list"
            ),
            tar_terra_rast(
                rast_plus_2,
                rast_list + 2,
                pattern = map(rast_list),
                iteration = "list"
            ),
            tar_terra_rast(
                rast_combined,
                combine_rast(rast_plus_2)
            )
        )
    })
    
    tar_make()
    tar_read(rast_combined)
})
#> terra 1.7.71
#> ▶ dispatched target rast_example
#> ● completed target rast_example [0.088 seconds]
#> ▶ dispatched target layers
#> ● completed target layers [0 seconds]
#> ▶ dispatched branch rast_list_93203888e2b38c7f
#> ● completed branch rast_list_93203888e2b38c7f [0.004 seconds]
#> ▶ dispatched branch rast_list_a6b0d01bda4d921c
#> ● completed branch rast_list_a6b0d01bda4d921c [0.023 seconds]
#> ● completed pattern rast_list
#> ▶ dispatched branch rast_plus_2_2e5221e915e2a41b
#> ● completed branch rast_plus_2_2e5221e915e2a41b [0.004 seconds]
#> ▶ dispatched branch rast_plus_2_589aaf2880aa3b86
#> ● completed branch rast_plus_2_589aaf2880aa3b86 [0.003 seconds]
#> ● completed pattern rast_plus_2
#> ▶ dispatched target rast_combined
#> ● completed target rast_combined [0.022 seconds]
#> ▶ ended pipeline [0.503 seconds]
#> class       : SpatRaster 
#> dimensions  : 108, 21, 2  (nrow, ncol, nlyr)
#> resolution  : 0.4761905, 1.666667  (x, y)
#> extent      : 0, 10, -90, 90  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : rast_combined 
#> names       : layer 1, layer 2 
#> min values  :      12,     102 
#> max values  :      12,     102

Created on 2024-05-24 with reprex v2.1.0

geotargets could provide a tar_terra_map_lyr() function that would make all the targets necessary to get to rast_plus_2 from

tar_terra_map_lyr(
  rast_plus_2,
  rast_example + 2
)

I'm not sure how often this workflow would really be necessary or computationally more efficient than just rast_example + 2 (or whatever), so I'd put this potential target factory at low priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant