Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: geospatial column analysis #1473

Open
proselotis opened this issue Oct 5, 2023 · 2 comments
Open

Feature Request: geospatial column analysis #1473

proselotis opened this issue Oct 5, 2023 · 2 comments
Labels
feature request 💬 Requests for new features

Comments

@proselotis
Copy link

Missing functionality

As a newer user to ydata-profiling, I am leveraging the package with various types of data. One data type that I am using is geospatial data, I am currently using separate columns for Latitude and Longitude as float values. However the output of a bar chart and correlation may be useful in some cases, it may be more useful to provide the output in a form of a map so the coverage of different areas can data can be seen by the user. Unless I missed something reading through the documentation, there isn't any current functionality for this.

I noticed in the contributing guidelines this was highlighted as a potential EDA: extending data type support (GPS coordinates).

Proposed feature

Plotting of points on a map within the variable exploration so you could consider the coverage areas of the data.

I am happy to help contribute on a topic like this, however there are a few different paths this could appear in so I wanted to know if there was a preference from the ydata team.

Data size

  • Based on the data size it may take to much time or compute to plot points using some packages. This could be averted by a few ways, one of my initial thoughts is to leverage data shader to make the plots.

Type of data

  • I think it might be easiest to handle a column as a type of shapely point. I think figuring out how to coordinate two separate columns will not be optimal with the current structure of ydata-profiling.
  • This could also be handled as a list of tuples to reduce dependencies

Alternatives considered

  • Ydata-profiling could leverage plotly express for graphing, however this package may struggle with plotting larger datasets.
  • Ydata-profiling could leverage Cartopy using matplotlib for graphing, I am unsure of the limitations on data size.

Additional context

No response

@fabclmnt fabclmnt added feature request 💬 Requests for new features and removed needs-triage labels Oct 6, 2023
@fabclmnt
Copy link
Contributor

fabclmnt commented Oct 6, 2023

Hi @proselotis,

thank for you feature request! And enthusiasm - contributions are always welcomed!

I would be happy to have you contribution for the feature, nevertheless there are a few points that need further definition.

  • Latitude and Longitude can be provided in several formats by users, and we want to keep it as easy as possible. For that reason there is an interface decision to be made;
  • The report structure design and flow for datasets with geo-location data;
  • The library to be used for the plot - data shader depends on DASK, and for that reason we prefer to avoid it. We will assess Cartopy in more depth to validate the potential.

Let me get back to you with some more detailed requirements for this feature so we can iterate together!

@fabclmnt fabclmnt pinned this issue Oct 6, 2023
@mrit64
Copy link

mrit64 commented Nov 2, 2023

Hi,

An alternative could be to extend ydata-profiling to support GeoPandas. GeoPandas uses Shapely internally and can handle Points, Multi-Points, Lines, Multi-Lines, Polygons, Multi-Polygons.
Profiling one dataset could give information about the type of geometry, coordinate system, etc… And eventually plot the shapes on a map.
Comparing two geographic datasets may be more challenging to represent the differences between the geometries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request 💬 Requests for new features
Projects
None yet
Development

No branches or pull requests

4 participants