Radviz error from DataFrame which doesn't have sequantial index #1285

KimByoungmo · 2022-10-19T13:51:46Z

Describe the bug
I drew RadViz for below DataFrame (below dataset)
below error occured (refer to python code)
I think this is becoz of below code
[radviz.py]

y[i] should be changed to y.iloc[i]

To Reproduce

fig,ax = plt.subplots()
rad = RadViz(ax =ax,
             classes=["not diabete","diabete"])
rad.fit(train_X,train_y)
rad.show()

error message
KeyError Traceback (most recent call last)
File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\indexes\base.py:3800, in Index.get_loc(self, key, method, tolerance)
3799 try:
-> 3800 return self._engine.get_loc(casted_key)
3801 except KeyError as err:

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas_libs\index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas_libs\index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas_libs\hashtable_class_helper.pxi:2263, in pandas._libs.hashtable.Int64HashTable.get_item()

File pandas_libs\hashtable_class_helper.pxi:2273, in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
Cell In [8], line 7
1 fig,ax = plt.subplots()
2 rad = RadViz(ax =ax,
3 classes=["not diabete","diabete"],
4
5 )
----> 7 rad.fit(train_X,train_y)
8 rad.show()

File c:\Users\ksd20\Python\Python39\lib\site-packages\yellowbrick\features\radviz.py:159, in RadialVisualizer.fit(self, X, y, **kwargs)
137 """
138 The fit method is the primary drawing input for the
139 visualization since it has both the X and y data required for the
(...)
156 Returns the instance of the transformer/visualizer
157 """
158 super(RadialVisualizer, self).fit(X, y)
--> 159 self.draw(X, y, **kwargs)
160 return self

File c:\Users\ksd20\Python\Python39\lib\site-packages\yellowbrick\features\radviz.py:200, in RadialVisualizer.draw(self, X, y, **kwargs)
198 row_ = np.repeat(np.expand_dims(row, axis=1), 2, axis=1)
199 xy = (s * row_).sum(axis=0) / row.sum()
--> 200 label = self._label_encoder[y[i]]
202 to_plot[label][0].append(xy[0])
203 to_plot[label][1].append(xy[1])

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\series.py:982, in Series.getitem(self, key)
979 return self._values[key]
981 elif key_is_scalar:
--> 982 return self._get_value(key)
984 if is_hashable(key):
985 # Otherwise index.get_value will raise InvalidIndexError
986 try:
987 # For labels that don't resolve as scalars like tuples and frozensets

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\series.py:1092, in Series._get_value(self, label, takeable)
1089 return self._values[label]
1091 # Similar to Index.get_value, but we do not fall back to positional
-> 1092 loc = self.index.get_loc(label)
1093 return self.index._get_values_for_loc(self, loc, label)

File c:\Users\ksd20\Python\Python39\lib\site-packages\pandas\core\indexes\base.py:3802, in Index.get_loc(self, key, method, tolerance)
3800 return self._engine.get_loc(casted_key)
3801 except KeyError as err:
...
3805 # InvalidIndexError. Otherwise we fall through and re-raise
3806 # the TypeError.
3807 self._check_indexing_error(key)

KeyError: 0

Dataset
[used dataset]

not started with index 0 and not sequential

Desktop (please complete the following information):

OS: [Windows 10]
Python Version [3.9]
Yellowbrick Version [1.5]

Additional context
Add any other context about the problem here.

rebeccabilbro · 2022-10-19T14:08:13Z

Hello @KimByoungmo and thank you for using Yellowbrick!

This is an issue that has been noted in the past (#1180), and we can definitely appreciate your use case. However, Yellowbrick is designed to work with or without pandas, treating X and y as numpy arrays. Because of that, we cannot simply change y[i] to y.iloc[i], as that would break Radviz's functionality for all users who are not passing in a dataframe.

That said, if you have any ideas about how to change the functionality so that Yellowbrick works more easily with both pandas and non-pandas data, we're all ears! Would you be interested in opening up a PR with some exploratory code to handle non-continuously indexed dataframes that passes both our pandas and our numpy integration tests?

KimByoungmo · 2022-10-20T22:38:35Z

@rebeccabilbro

I think it would be better to use "isinstance" function
for instance, refer to below code

if isinstance(my_object,pd.Series) or isinstance(my_object,pd.DataFrame) :
##my_object.iloc[i]
else :
##my_object[i]

rebeccabilbro · 2022-10-21T15:40:25Z

@KimByoungmo looks like you're on the right track (though I think you'll also want an isinstance(my_object, numpy.array) in there too, right?)

We would welcome looking at a PR from you if you are open to contributing to Yellowbrick. We are a small team of unpaid volunteers, so we depend a lot on the contributions of our users to expand/change current functionality. You can check out our contributor's guide to see how to get started!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Radviz error from DataFrame which doesn't have sequantial index #1285

Radviz error from DataFrame which doesn't have sequantial index #1285

KimByoungmo commented Oct 19, 2022 •

edited

rebeccabilbro commented Oct 19, 2022

KimByoungmo commented Oct 20, 2022 •

edited

rebeccabilbro commented Oct 21, 2022

Radviz error from DataFrame which doesn't have sequantial index #1285

Radviz error from DataFrame which doesn't have sequantial index #1285

Comments

KimByoungmo commented Oct 19, 2022 • edited

rebeccabilbro commented Oct 19, 2022

KimByoungmo commented Oct 20, 2022 • edited

rebeccabilbro commented Oct 21, 2022

KimByoungmo commented Oct 19, 2022 •

edited

KimByoungmo commented Oct 20, 2022 •

edited