New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for issue skipped_columns
not working #15947
#16071
base: master
Are you sure you want to change the base?
Conversation
skipped_columns
not working #15947
@wendycwong @tomasfryda please review this pr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Devanshusisodiya for spending time on this issue. Unfortunately, your solution doesn't seem to me to be valid. Please don't be discouraged, H2O-3 doesn't have code base that's easy to contribute to.
h2o-py/h2o/frame.py
Outdated
@@ -146,6 +146,12 @@ def _upload_python_object(self, python_obj, destination_frame=None, header=0, se | |||
if not column_names: | |||
column_names = col_header | |||
|
|||
if skipped_columns: | |||
column_names = [column_name for column_name in column_names[:len(column_names)-len(skipped_columns)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ignores the skipped columns so in some cases the skipped column is present in the resulting frame.
Let's have a frame:
| a | b | c |
+---+---+---+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
and if we have skipped_columns=[0,1]
, then we would get just:
| a |
+---+
| 1 |
| 4 |
but I would expect to get:
| c |
+---+
| 3 |
| 6 |
Also note that it's likely that skipped_columns
should support inputs that are column indices (List[int]
) or column names (List[str]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomasfryda I understand your point, nevertheless i understand the problem and will try to solve it.
Are there any other cases I should be aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also make sure you modify the csv writing accordingly, so that only those columns that are not skipped are written. And please write a test for those cases when there's skipped_columns
are at the beginning, the end, and in the middle (and possibly more cases depending on your actual implementation).
And while you're at it, you can also improve the code so that it makes sure that the tmp_path
is removed, e.g.:
tmp_handle, tmp_path = tempfile.mkstemp(suffix=".csv")
try:
.....
finally:
os.remove(tmp_path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomasfryda I made more changes based on the requirements and the original issue description. Also I think there is no need to make any in change csv writing because it takes the updated column_names
as argument.
Please correct me if I'm wrong
Closes #15947
The changes done in
frame.py
have been validated with the required tests ofH2OFrame
being tested in/h2o-py/tests/testdir_apis/Data_Manipulation/pyunit_h2oH2OFrame.py
skipped_columns
parameter is now functional and all tests are passing.