Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: optionally retrieve CRS from SRID in from_wkt/from_wkb #2956

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jparta
Copy link

@jparta jparta commented Jul 11, 2023

This PR aims to enable reading geoalchemy WKTs and WKBs with SRID information in them. See documentation here: https://geoalchemy-2.readthedocs.io/en/latest/elements.html

wkb_or_wkb -> wkb_or_wkt
Convert elements to strings if needed
@jparta jparta force-pushed the read_geoalchemy_WKBElement branch from 73e5748 to b9f3b31 Compare July 11, 2023 09:18
@jparta
Copy link
Author

jparta commented Jul 11, 2023

@martinfleis We have a dilemma here: either we have to devise a way to read the SRID from the raw data (hex or string), or convert the WKBElements into strings before passing them off to lib.from_wkb. Should an attempt be made to convert the elements to strings or bytes?

I solved it now by converting elements to string when needed. It's not very clean, though.

@jparta
Copy link
Author

jparta commented Jul 11, 2023

Now I've used the shapely get_srid method as suggested in #2952. EWKT reading is not working, with the following:

FAILED geopandas/tests/test_geoseries.py::TestSeries::test_from_ewkt - pygeos.GEOSException: ParseException: Unknown type: 'SRID=4326;POLYGON'
FAILED geopandas/tests/test_geoseries.py::TestSeries::test_from_ewkt_series - pygeos.GEOSException: ParseException: Unknown type: 'SRID=4326;POLYGON'

It seems GEOS is not able to read this format.
One option is to not do EWKT at all. Another is to remove the SRID header manually, which is slow and requires geopandas to keep track of possible changes to the standard.

What do you think?

@jparta jparta marked this pull request as ready for review July 11, 2023 11:05
Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some comments on the implementation side.

This will also need a changelog note.

Comment on lines +185 to +191
try:
len(data)
except TypeError:
raise TypeError(
"Input must be iterable (e.g. Series, list, array)"
f"found {type(data)} instead."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +198 to +201
first_srid = srid_getter(data[0])
if all(first_srid == srid_getter(obj) for obj in data):
if first_srid > 0:
shared_crs = first_srid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you can do shapely.get_srid(data) to get an array of srids without a for loop on Python side, no? And then check they're all the same.

Comment on lines +204 to +205
"Geometry elements do not agree on CRS. This may cause problems. "
"CRS could not be inferred from data.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Geometry elements do not agree on CRS. This may cause problems. "
"CRS could not be inferred from data.\n",
"Geometries contain different SRIDs encoding different coordinate reference systems. "
"CRS could not be inferred from data.\n",

Something along these lines may be better?

Comment on lines +211 to +219
warnings.warn(
f"The elements passed share a CRS, but it does not match the "
"CRS passed as parameter.\n\n"
f"CRS passed as parameter: {param_crs}.\n"
f"CRS from data: {shared_crs}.\n"
"CRS could not be inferred from data.\n",
UserWarning,
stacklevel=4,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a custom CRS is passed, we should not try to retrieve it from SRID to avoid calling get_srid when we don't need to. So that shall be added in the first place.

Then this check is not needed as we would go this path only if param_crs is None.

@@ -174,13 +230,18 @@ def from_wkb(data, crs=None):
----------
data : array-like
list or array of WKB objects
The 'srid' can be read from geoalchemy2's WKBElement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The 'srid' can be read from geoalchemy2's WKBElement.

@@ -174,13 +230,18 @@ def from_wkb(data, crs=None):
----------
data : array-like
list or array of WKB objects
The 'srid' can be read from geoalchemy2's WKBElement.
crs : value, optional
Coordinate Reference System of the geometry objects. Can be anything accepted by
:meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`,
such as an authority string (eg "EPSG:4326") or a WKT string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
such as an authority string (eg "EPSG:4326") or a WKT string.
such as an authority string (eg "EPSG:4326") or a WKT string. If None, geopandas will try to infer CRS from the SRID of geometries (if available).

I would add this here, instead of that sentence above (note it will need better formatting to avoid linting fail).

Comment on lines +240 to +244
converted_data = vectorized.from_wkb(data)
shared_crs = _get_common_crs_from_geometries(converted_data, crs)
if shared_crs is not None:
crs = shared_crs
return GeometryArray(converted_data, crs=crs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
converted_data = vectorized.from_wkb(data)
shared_crs = _get_common_crs_from_geometries(converted_data, crs)
if shared_crs is not None:
crs = shared_crs
return GeometryArray(converted_data, crs=crs)
geoms = vectorized.from_wkb(data)
if not crs:
crs = _get_common_crs_from_geometries(geoms, crs)
return GeometryArray(geoms, crs=crs)

If _get_common_crs_from_geometries returns None when nothing useful is found, we can simplify it like this.

@@ -200,12 +261,17 @@ def from_wkt(data, crs=None):
----------
data : array-like
list or array of WKT objects
The 'srid' can be read from geoalchemy2's WKTElement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

Comment on lines +271 to +274
converted_data = vectorized.from_wkt(data)
shared_crs = _get_common_crs_from_geometries(converted_data, crs)
if shared_crs is not None:
crs = shared_crs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above. Note that you missed adapting return here.

@martinfleis martinfleis changed the title Read geoalchemy wkb / wkt element ENH: optionally retrieve CRS from SRID in from_wkt/from_wkb Jul 15, 2023
@martinfleis martinfleis added this to the 0.14 milestone Jul 15, 2023
@martinfleis
Copy link
Member

Could you also check what is the performance hit on this one? I.e. how long does it take to get srids compared to the geometry decoding. We may want to add an option to completely avoid that path, even for crs=None.

Comment on lines +305 to +306
t1_wkb_elem = WKBElement(self.t1.wkb, srid=srid).as_ewkb()
sq_wkb_elem = WKBElement(self.sq.wkb, srid=srid).as_ewkb()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not depend on geoalchemy2 in basic tests and depend on it only optionally when we want to test it explicitly? We don't have geoalchemy2 in most of our CI envs and I'd like to keep that like that. You can use shapely (set_srid) directly instead.

@martinfleis martinfleis modified the milestones: 0.14, 1.0 Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants