-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VRT form of merging #2699
base: main
Are you sure you want to change the base?
VRT form of merging #2699
Conversation
@snowman2 the code here is basically correct! The XML produced by What do you think? I want Rasterio to aim high when it comes to API design. It's pragmatic to wrap gdalbuildvrt, but I think we can do better. |
@sgillies, your logic makes sense to me - thanks for working on this 👍 |
@sgillies that reasoning makes perfect sense. Thanks! |
@gboeing in the meanwhile, you have the option of copying the new virtual_merge function into osmnx if you want. I don't imagine the interface or functionality will change much between now and 1.4.0. |
@sgillies, thoughts about adding more tests? Could start by copying some from #2698. I am thinking it would be good to verify that the bounds/transform/data of the merged VRT are as expected. Also probably a good idea to test merging multiple files. If rasters with different CRS are passed into this function, should it raise an exception? Should we test that? |
Whether to adjust output image bounds so that pixel coordinates | ||
are integer multiples of pixel size, matching the ``-tap`` | ||
options of GDAL utilities. Default: False. | ||
dst_path : str or PathLike, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is dst_path used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is dst_kwds used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are left over from the original merge. TODO: remove them.
@snowman2 @groutr (adding you since you're a key merge contributor) if we have a virtual merging method like this one, what would you think about it returning a "Virtual Dataset" object instead of XML or an infoset? It could be very lightweight, like our existing MemoryDataset. MemoryDataset is a thin wrapper around a "/vsimem/..." dataset. A virtual dataset could be a thin wrapper around an XML bytestring. The usage could be a lot like MemoryDataset's. I've got an additional method in mind: a I'll throw a quick example together. |
I like the idea, though I would suggest choosing a method name that is easier to type like |
I like that idea. |
We might consider adding internal validation against the VRT schema (https://raw.githubusercontent.com/OSGeo/gdal/master/data/gdalvrt.xsd). xmlschema seems like a nice package for validation. |
rasterio/vrt.py
Outdated
|
||
""" | ||
# To avoid a circular import. | ||
from rasterio.io import MemoryFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to put VirtualDataset with the rest of the Dataset objects in rasterio.io?
rasterio/vrt.py
Outdated
|
||
kwargs : dict | ||
Optional dataset opening options to be passed to | ||
rasterio.open(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rasterio.open(). | |
:func:`rasterio.open`. |
rasterio/vrt.py
Outdated
from rasterio.io import MemoryFile | ||
|
||
tree = ET.fromstring(text) | ||
doc = ET.tostring(tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding why we are parsing an xml string into an elementtree, only to write it back out to an xml string. Can we not use text
as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@groutr I didn't have any intention other than 1) very basic validation (is it correct XML?) and 2) convert from bytes/unicode to just bytes.
I think we may well want to do schema validation. But for now, this is just a sketch.
@sgillies, I am thinking that it may make sense to move the |
Is the goal of the VirtualDataset to replicate the functionality of a rasterio Dataset, permit construction and then be able to render a vrt xml or is it simply serving as a VRT wrapper for MemoryFile? |
Instead of GDALBuildVRT Resolves #2573
Allow rasterio.open to take XML bytes as well as unicode strings.
else: | ||
|
||
@contextmanager | ||
def nullcontext(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this could be confused with contextlib.nullcontext
, I propose renaming this to something like:
def nullcontext(obj): | |
def dataset_opener(obj): |
And removing dataset_opener = nullcontext
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that rasterio supports python 3.9+, why not use contextlib.nullcontext directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function defined here behaves differently from contextlib.nullcontext
. nullcontext
returns None whereas this function returns the file handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contextlib.nullcontext will return whatever you pass as the argument (the default is None).
https://docs.python.org/3/library/contextlib.html#contextlib.nullcontext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - then I agree with your suggestion to use contextlib.nullcontext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@groutr great suggestion! I will make it so.
|
||
ET.SubElement(vrtdataset, "SRS").text = crs_wkt if crs_wkt else "" | ||
ET.SubElement(vrtdataset, "GeoTransform").text = ",".join( | ||
[str(v) for v in output_transform.to_gdal()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work as well:
[str(v) for v in output_transform.to_gdal()] | |
str(v) for v in output_transform.to_gdal() |
nodataval = nodata | ||
else: | ||
warnings.warn( | ||
"The nodata value, %s, is beyond the valid " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The nodata value, %s, is beyond the valid " | |
f"The nodata value, {nodataval}, is beyond the valid " |
else: | ||
warnings.warn( | ||
"The nodata value, %s, is beyond the valid " | ||
"range of the chosen data type, %s. Consider overriding it " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"range of the chosen data type, %s. Consider overriding it " | |
f"range of the chosen data type, {dt}. Consider overriding it " |
warnings.warn( | ||
"The nodata value, %s, is beyond the valid " | ||
"range of the chosen data type, %s. Consider overriding it " | ||
"using the --nodata option for better results." % (nodataval, dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"using the --nodata option for better results." % (nodataval, dt) | |
"using the --nodata option for better results." |
Deferring until 1.5.0. There's a lot to this and it's not ready. |
Hi all! Just out of curiosity I wanted to follow up on this from last year. Looks like this functionality is deferred to 1.5.0. In the meantime, is there a best practice in 2024 recommended by the rasterio devs for building a VRT given the docs' "don't import both rasterio and gdal at the same time" advice? |
There's a lot of overlap between rasterio's merge and GDAL's GDALBuildVRT utility. I believe having two different ways to do the same kind of thing is a step backwards for this project. As an alternative to #2573, which wraps GDALBuildVRT, I've reimplemented the GDAL utility in Python, building on existing code in _boundless_vrt_doc.
Advantages over using GDALBuildVRT:
The name of this function and the module it lives in are undecided.
This can totally replace
gdal.BuildVRT
for OSMnx (in #2573) and doesn't require wrapping GDALBuildVRT. Just because a Rasterio user wants something that GDALX does doesn't mean we need to literally expose GDALX from Rasterio. GDAL's API design can be improved upon and Rasterio is the project to do those improvements.Resolves #2573