Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vrt() produces .vrts which are unnecessarily slow or even intractable work with #1490

Open
twest820 opened this issue Apr 29, 2024 · 6 comments

Comments

@twest820
Copy link

I work with a modest assortment of GeoTIFF based virtual rasters. This consists of a few different .vrts with a few hundred tiles of 10–200 MB each and total sizes of 6–100 GB. Usability of these .vrts in QGIS has been not great and, recently, some of the generated .vrts simply don't render completely or reliably. This has prompted some investigation, some of which gets into how GDAL is compiled for QGIS.

However, a significant contributor to poor render performance is terra::vrt() generates .vrts that tell GDAL it's most efficient to load rasters one line at a time via BlockYSize. This is often not the case, which is why GDAL defaults to loading entire tiles unless it's a scanline image. For example, a typical vrt() call in this use case produces a few thousand source statements like

    <ComplexSource>
      <SourceFilename relativeToVRT="1">s04320w06750.tif</SourceFilename>
      <SourceBand>8</SourceBand>
      <SourceProperties RasterXSize="2000" RasterYSize="2000" DataType="Float32" BlockXSize="2000" BlockYSize="1" /> <!-- oops! -->
      <SrcRect xOff="0" yOff="0" xSize="2000" ySize="2000" />
      <DstRect xOff="58000" yOff="36000" xSize="2000" ySize="2000" />
      <NODATA>nan</NODATA>
    </ComplexSource>

(terra 1.7.65, Win10 22H2, 5950X, 4060 Ti, the R call here is just vrt(vectorOfTilePaths, "tileFolderPath/foo.vrt", set_names = TRUE)).

A quick fix is to delete BlockXSize="2000" BlockYSize="1". Setting BlockYSize the same as RasterYSize also works and, while specifying block sizes is at least semi-discouraged on the GDAL side, might sometimes yield somewhat better performance.

.vrt usability is also substantially influenced by the number of bands in the .vrt, band datatypes, and file sizes. In principle this is partially addressed by using terra::vrt(options = c("b 1", "b 2", ...)) or similar to restrict the bands present in the .vrt. In practice, I'm finding options is ignored and the output .vrt contains all of the input tiles' bands. Additionally problematic is there's no way to say which bands to pull from which tiles. For example, to constrain file sizes to where QGIS rendering doesn't time out it might be necessary to shard 12 band floating point tiles into six band floating point tiles plus a spatially matching set of six band, 16 bit integer tiles. Useful visualizations might then call for, say, a three band .vrt consisting of floating point bands 1 and 3 plus one of the integer bands. vrt() can help with this by making floating point and integer .vrts but it's the necessary manually put the bands together and do find/replaces on the XML. Speaking from experience, this gets old fast.

I'm unfamiliar with the working relationship between rspatial and OSGeo so the most effective path to improvements here is unclear to me. A bit of searching in the GDAL repo isn't giving me gdalbuildvrt issues around this either. My solution's been writing an object model and XML serialization for the .vrt file format but that's rather a one off.

@kadyb
Copy link
Contributor

kadyb commented Apr 29, 2024

In principle this is partially addressed by using terra::vrt(options = c("b 1", "b 2", ...)) or similar to restrict the bands present in the .vrt. In practice, I'm finding options is ignored and the output .vrt contains all of the input tiles' bands.

Shouldn't it rather be vrt(options = c("-b", 1, "-b", 2))?

@twest820
Copy link
Author

Shouldn't it rather be vrt(options = c("-b", 1, "-b", 2))?

-b is rejected with Unknown option name (GDAL error 6) and c("b", 1, "b", 2) also has no effect, suggesting issues in the terra documentation. The usual GDAL format for passing options would be b=1, which doesn't work either. Both the terra source and GDAL documentation are cryptic here.

@kadyb
Copy link
Contributor

kadyb commented Apr 30, 2024

The example of resolution change from the terra doc suggests this way:

v <- vrt(ff, vrtfile, options = c("-tr", 5, 5))

For terra 1.7.71 and GDAL 3.8.2 on Windows 10 the code below works ok for me:

library("terra")
r <- rast(ncols = 100, nrows = 100, nlyrs = 3)
values(r) <- rep(c(1, 2, 3), each = 100^2)
x <- rast(ncols = 2, nrows = 2)
filename <- paste0(tempfile(), "_.tif")
ff <- makeTiles(r, x, filename)
vrtfile <- paste0(tempfile(), ".vrt")

# works fine
v <- vrt(ff, vrtfile, options = c("-b", 3, "-b", 2), overwrite = TRUE)
v
#> min values  :               3,               2
#> max values  :               3,               2

# wrong result
v <- vrt(ff, vrtfile, options = c("b 3", "b 2"), overwrite = TRUE)
v
#> min values  :               1,               2,               3
#> max values  :               1,               2,               3

@Rapsodia86
Copy link

Hi @kadyb,
I played a bit with your example and wanted to ask about the expected outcome if adding "-separate" in case of having tiles.
Five scenarios following your example:

  1. No band selection:
v1 <- vrt(ff, vrtfile, overwrite = TRUE)
v1;plot(v1)
> v1;plot(v1)
class       : SpatRaster 
dimensions  : 100, 100, 3  (nrow, ncol, nlyr)
resolution  : 3.6, 1.8  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : file4f24409f7cef.vrt 
names       : file4f24409f7cef_1, file4f24409f7cef_2, file4f24409f7cef_3 
min values  :                  1,                  2,                  3 
max values  :                  1,                  2,                  3

image

  1. Selecting specific bands:
v2 <- vrt(ff, vrtfile, options = c("-b", 3, "-b", 2), overwrite = TRUE)
v2;plot(v2)
> v2;plot(v2)
class       : SpatRaster 
dimensions  : 100, 100, 2  (nrow, ncol, nlyr)
resolution  : 3.6, 1.8  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : file4f24409f7cef.vrt 
names       : file4f24409f7cef_1, file4f24409f7cef_2 
min values  :                  3,                  2 
max values  :                  3,                  2 

image

  1. Add "-separate" and use all bands:
v3 <- vrt(ff, vrtfile, options = c("-separate"), overwrite = TRUE)
> v3;plot(v3)
class       : SpatRaster 
dimensions  : 100, 100, 4  (nrow, ncol, nlyr)
resolution  : 3.6, 1.8  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : file4f2455056a73.vrt 
names       : file4f2~56a73_1, file4f2~56a73_2, file4f2~56a73_3, file4f2~56a73_4 
min values  :               1,               1,               1,               1 
max values  :               1,               1,               1,               1 

image

Comment: Output only has four bands (tiles) based on the first band. No remaining bands. Shouldn't we expect number of tiles * number of bands? So having in total 12 bands?

  1. Add "-separate" and use selected band 3:
v4 <- vrt(ff, vrtfile, options = c("-separate","-b", 3), overwrite = TRUE)
v4;plot(v4)
> v4;plot(v4)
class       : SpatRaster 
dimensions  : 100, 100, 4  (nrow, ncol, nlyr)
resolution  : 3.6, 1.8  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : file4f2455056a73.vrt 
names       : file4f2~56a73_1, file4f2~56a73_2, file4f2~56a73_3, file4f2~56a73_4 
min values  :               1,               1,               1,               1 
max values  :               1,               1,               1,               1 

image

Comment: Shouldn't output have values of 3?

  1. Add "-separate" and use selected bands 3 & 2:
v5 <- vrt(ff, vrtfile, options = c("-separate","-b", 3,"-b", 2), overwrite = TRUE)
v5;plot(v5) 
> v5;plot(v5)
class       : SpatRaster 
dimensions  : 100, 100, 4  (nrow, ncol, nlyr)
resolution  : 3.6, 1.8  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : file4f2455056a73.vrt 
names       : file4f2~56a73_1, file4f2~56a73_2, file4f2~56a73_3, file4f2~56a73_4 
min values  :               1,               1,               1,               1 
max values  :               1,               1,               1,               1 

image

Comment: Shouldn't output have 8 bands? First four with values of 3, then the other four with values of 2?

Thanks!

@kadyb
Copy link
Contributor

kadyb commented Apr 30, 2024

@Rapsodia86, do you have GDAL version < 3.8? Because this has been changed and then the results should be as you expect.

Starting with GDAL 3.8, all bands of each input file are added as separate VRT bands, unless -b is specified to select a subset of them. Before GDAL 3.8, only the first band of each input file was placed into a new VRT band, and -b was ignored. (https://gdal.org/programs/gdalbuildvrt.html)

@Rapsodia86
Copy link

Yes, I do (win10):

> library(terra)
terra 1.7.74
> gdal(lib="")
   gdal    proj    geos 
"3.5.2" "8.2.1" "3.9.3" 

Ok, so this is expected for now.
Thank you very much for the information!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants