Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radviz error from DataFrame which doesn't have sequantial index #1285

Open
KimByoungmo opened this issue Oct 19, 2022 · 3 comments
Open

Radviz error from DataFrame which doesn't have sequantial index #1285

KimByoungmo opened this issue Oct 19, 2022 · 3 comments

Comments

@KimByoungmo
Copy link

KimByoungmo commented Oct 19, 2022

Describe the bug
I drew RadViz for below DataFrame (below dataset)
below error occured (refer to python code)
I think this is becoz of below code
[radviz.py]
image
y[i] should be changed to y.iloc[i]

To Reproduce

fig,ax = plt.subplots()
rad = RadViz(ax =ax,
             classes=["not diabete","diabete"])
rad.fit(train_X,train_y)
rad.show()

error message
KeyError Traceback (most recent call last)
File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\indexes\base.py:3800, in Index.get_loc(self, key, method, tolerance)
3799 try:
-> 3800 return self._engine.get_loc(casted_key)
3801 except KeyError as err:

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas_libs\index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas_libs\index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas_libs\hashtable_class_helper.pxi:2263, in pandas._libs.hashtable.Int64HashTable.get_item()

File pandas_libs\hashtable_class_helper.pxi:2273, in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
Cell In [8], line 7
1 fig,ax = plt.subplots()
2 rad = RadViz(ax =ax,
3 classes=["not diabete","diabete"],
4
5 )
----> 7 rad.fit(train_X,train_y)
8 rad.show()

File c:\Users\ksd20\Python\Python39\lib\site-packages\yellowbrick\features\radviz.py:159, in RadialVisualizer.fit(self, X, y, **kwargs)
137 """
138 The fit method is the primary drawing input for the
139 visualization since it has both the X and y data required for the
(...)
156 Returns the instance of the transformer/visualizer
157 """
158 super(RadialVisualizer, self).fit(X, y)
--> 159 self.draw(X, y, **kwargs)
160 return self

File c:\Users\ksd20\Python\Python39\lib\site-packages\yellowbrick\features\radviz.py:200, in RadialVisualizer.draw(self, X, y, **kwargs)
198 row_ = np.repeat(np.expand_dims(row, axis=1), 2, axis=1)
199 xy = (s * row_).sum(axis=0) / row.sum()
--> 200 label = self._label_encoder[y[i]]
202 to_plot[label][0].append(xy[0])
203 to_plot[label][1].append(xy[1])

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\series.py:982, in Series.getitem(self, key)
979 return self._values[key]
981 elif key_is_scalar:
--> 982 return self._get_value(key)
984 if is_hashable(key):
985 # Otherwise index.get_value will raise InvalidIndexError
986 try:
987 # For labels that don't resolve as scalars like tuples and frozensets

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\series.py:1092, in Series._get_value(self, label, takeable)
1089 return self._values[label]
1091 # Similar to Index.get_value, but we do not fall back to positional
-> 1092 loc = self.index.get_loc(label)
1093 return self.index._get_values_for_loc(self, loc, label)

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\indexes\base.py:3802, in Index.get_loc(self, key, method, tolerance)
3800 return self._engine.get_loc(casted_key)
3801 except KeyError as err:
...
3805 # InvalidIndexError. Otherwise we fall through and re-raise
3806 # the TypeError.
3807 self._check_indexing_error(key)

KeyError: 0

Dataset
[used dataset]
image
not started with index 0 and not sequential

Desktop (please complete the following information):

  • OS: [Windows 10]
  • Python Version [3.9]
  • Yellowbrick Version [1.5]

Additional context
Add any other context about the problem here.

@rebeccabilbro
Copy link
Member

Hello @KimByoungmo and thank you for using Yellowbrick!

This is an issue that has been noted in the past (#1180), and we can definitely appreciate your use case. However, Yellowbrick is designed to work with or without pandas, treating X and y as numpy arrays. Because of that, we cannot simply change y[i] to y.iloc[i], as that would break Radviz's functionality for all users who are not passing in a dataframe.

That said, if you have any ideas about how to change the functionality so that Yellowbrick works more easily with both pandas and non-pandas data, we're all ears! Would you be interested in opening up a PR with some exploratory code to handle non-continuously indexed dataframes that passes both our pandas and our numpy integration tests?

@KimByoungmo
Copy link
Author

KimByoungmo commented Oct 20, 2022

@rebeccabilbro

I think it would be better to use "isinstance" function
for instance, refer to below code

if isinstance(my_object,pd.Series) or isinstance(my_object,pd.DataFrame) :
##my_object.iloc[i]
else :
##my_object[i]

@rebeccabilbro
Copy link
Member

@KimByoungmo looks like you're on the right track (though I think you'll also want an isinstance(my_object, numpy.array) in there too, right?)

We would welcome looking at a PR from you if you are open to contributing to Yellowbrick. We are a small team of unpaid volunteers, so we depend a lot on the contributions of our users to expand/change current functionality. You can check out our contributor's guide to see how to get started!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants