-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: register GeoSeries
as the pandas.Series.geo
accessor
#3272
base: main
Are you sure you want to change the base?
Conversation
…me accessors, respectively
geopandas/geoseries.py
Outdated
@@ -75,6 +76,7 @@ def _geoseries_expanddim(data=None, *args, **kwargs): | |||
return _expanddim_logic(df) | |||
|
|||
|
|||
@pandas.api.extensions.register_series_accessor("geo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should be registering GeoSeries as the accessor, but an accessor class (perhaps derived from GeoSeries so that all the methods exist). This seems a bit counter to the extension registration documentation above. Implemented like this, we don't conform to the documented expectation:
For consistency with pandas methods, you should raise an AttributeError if the data passed to your accessor has an incorrect dtype.
I think we should be doing this.
Also, with GeoSeries as the accessor, this would automatically coerce object dtypes, e.g.
s = pd.Series([Point(x, x) for x in range(3)])
x = s.geo.x # this works, even though s is not GeometryDtype
I would expect the accessor should only work for GeometryDtype Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I can look into providing a separate accessor class that delegates to GeoSeries and raises AttributeError for incorrect dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 42ea6c5
Hi @tswast, thanks for putting this PR forward. Would you be able to share a little bit about the use case / goal you have in mind for this? I'd just like to understand this a little bit better before reviewing. The issues you have linked to are quite old, and have relatively little commentary from current maintainers of geopandas, so it'd be good to discuss what the aim is a bit before finessing the PR. You may or may not have seen there is also #1947 and the corresponding #1952 PR which are related to this, but got a bit stalled, in part because they didn't have a super clear statement of intent. Now I can try infer this from the documentation notebook you've added, but it would be good to hear directly from you, rather than me guessing. |
#1947 is kinda the opposite of what I'm doing here. Those are about copying the implementation of the register accessor API from pandas to geopandas so folks can register geopandas-specific accessors. This PR registers geopandas as a true extension dtype in pandas, making it possible to use geopandas consistently with other extension dtypes. In this PR, I am adding a "geo" accessor to pandas.Series and pandas.DataFrame, as proposed in #166. This builds on the work that has already been done here to make the Geometry extension dtype and extension array. To be fully transparent, my primary project at work is BigQuery DataFrames. We currently return pandas objects with the existing geopandas extension dtype when folks download a table that contains a BigQuery GEOGRAPHY column. My goal here is to make things consistent so that those downloaded objects can be much more useful and that if/when BigQuery DataFrames starts adding methods for working with those GEOGRAPHY columns we can do so without forcing users to convert things from DataFrame/Series to GeoDataFrame/GeoSeries. |
Great! - I hoped you'd share something like this. I think there certainly is a use case here of supporting geometrydtype data in regular pandas series/ dataframes - even if it's not the core usage mode of geopandas (I don't anticipate adding the As I understand it, the accessor registration means that @pytest.fixture
def df():
return pd.DataFrame(
{
"geometry2": pd.Series(
[Point(x, x) for x in range(3)], dtype=GeometryDtype()
),
"value1": np.arange(3, dtype="int64"),
"value2": np.array([1, 2, 1], dtype="int64"),
}
) (now in this case you could work around it by specifying The current way of going from a pandas dataframe to a geodataframe without using the constructor is via Do your users expect to be able to use geodataframe exclusive methods like |
At the moment, in BigQuery DataFrames we're actually just exposing the Series accessors. I would like to support spatial joins at some point, but I'm not sure if we'd do that via an analogue to the
Maybe what I can do in the meantime is update that notebook to have an |
Regarding the failing test, it's passing for me:
|
I am personally not a big fan of this as I am mostly worried about the maintenance cost it will bring. Yes, simple cases work but in order to properly support this, we would need extensive test suite to catch all the corner cases and then ensure we squash all potential bugs that may lurk there in future. All that for very little gain. The example with the geometry column name is one of the sort but there will undoubtedly be other. We struggle to keep up with maintenance and development and this feels like a lot of maintenance that brings near nothing. @tswast is Google Cloud willing to support the maintenance in a long term, if you want to use this feature? |
Unlikely. I made this change on my personal time over the weekend. It seemed to me to be aligned with the discussions I could find about making geopandas a proper extension dtype, but I may have misread the situation. FWIW: https://pandas.pydata.org/docs/development/extending.html is a stable API now and has been around for a good few years (announcement blog post). |
It seems to me that the |
GeoSeries
as the pandas.Series.geo
accessor
I think that's fair to say
I don't think I'd support this - it's not intutive, the error is confusing and it's counter to geopandas generally trying to track this for you so this kind of thing doesn't happen - it's why we have a geodataframe subclass (historically this used to track other things like
I think in terms of trying to progress this in the short term that's a good idea. I'm a bit more optimistic than Martin in terms of potential edge cases (perhaps naively), especially for the GeoSeries case, as there should be no action at a distance - we're just attaching a way to indirectly convert GeometryDtype Series to a GeoSeries. |
Thanks, done in the most recent 2 commits: https://github.com/geopandas/geopandas/pull/3272/files/17504e58fe3f00f680963949b80cdb8ede3e0a52..97dc3fb008a371c1d3a9f1c363173fa92b90f78b I appreciate the feedback so far. I'm personally finding some satisfaction in making geopandas more consistent with other packages that extend pandas. Thankfully all the hard work you've all done so far made this change quite easy to make, so not much lost if not accepted in the end. |
This allows
geopandas
to be used like other extension dtypes.TODO
Fixes #166
Related #680