-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataFrameColumn Apply and DropNulls methods #7123
base: main
Are you sure you want to change the base?
Conversation
cc @JakeRadMSFT |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7123 +/- ##
==========================================
- Coverage 68.66% 68.50% -0.16%
==========================================
Files 1262 1263 +1
Lines 257774 255262 -2512
Branches 26660 26398 -262
==========================================
- Hits 176991 174862 -2129
+ Misses 73971 73696 -275
+ Partials 6812 6704 -108
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@@ -79,6 +79,27 @@ public void Append(string value) | |||
Length++; | |||
} | |||
|
|||
/// <summary> | |||
/// Applies a function to all values in the column, that are not null. | |||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not null instead of using the isvalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsValid can be used here as well, however for StringDataFrame column it doesn't make a lot of sense (actualy IsValid works a little bit slower in this case). Using IsValid instead of checking for null dramaticaly improves performance in case of PrimitivesDataFrame columns without Nulls (most use cases), because such columns uses validity buffers and checking for Null is very expensive and can be skiped if NullCount is 0
# Conflicts: # src/Microsoft.Data.Analysis/DataFrameColumns/VBufferDataFrameColumn.cs
Fixes #7107 as was asked in #6144 (comment)
Additionaly:
Reasons for refactoring: