-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas DataFrame is reported as array, but it should be a map as it has keys
method!
#353
Comments
In Python you also cannot access it via indices. I guess we should just not report it as an array, but it's not possible for us. Python doesn't really have a way to tell you what kind of indices any one object supports until you try. |
The current logic follows https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes. We say anything that is a |
The Truffle So the Pandas DataFrame reports that it Do you think it would be possible to somehow make the logic for discovering if something is an array not make the DataFrame an array? It seems that from polyglot perspective that is only causing trouble, like in this bug report. Interestingly, looking at Python docs, the
How do you check that the Pandas DataFrame is a sequence? I think checking just for the existence of |
We can only check for the existence of slots though. For native types like dataframes, we can implement a different check where we go to the native side, because on the native side there are two different function pointers for mapping subscript and sequence subscript. But on the Python side, both are simply called |
However, this is a bit more work and will not happen in the next weeks or two. For now, you will have to do a workaround and use |
But |
It doesn't matter what slots or methods we check. Whether a dataframe can or cannot be indexed by integers cannot be determined by the type. Observe:
There's no way to tell those two cases apart until you try to access the item. And we cannot do that in |
That fact that something can accidentally be indexed by integers does not make it
... no need for that. Just claim Looks like the |
By that logic, very few things can then be defined has having array elements in Python ;) Also, hasHashEntries means we can iterate the keys and values. What would you like the keys to be? The columns or the rows? |
A DataFrame is not a hash, neither it is an array. The correct fix in this case would be to support neither. The correct user code in this case will be to use the pandas specific APIs directly. We don't have a DataFrame abstraction in interop so far :/ |
Python is duck-typed. We don't have anything better to rely on than presence of methods.
I think we can do this. @timfel |
keys
method!
I think this would better be resolved using the mechanism outlined in #354. We would by default claim that a dataframe is neither an array, nor a hash. And you can then implement your own interop behavior for dataframes, i.e. define that it should behave as a hash. |
Steps to reproduce:
Create
Pandas.java
Filethe Panda array can be printed in Python, it is recognized as an array, but accessing its elements doesn't work yielding unexpected exceptions.
The text was updated successfully, but these errors were encountered: