Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/1411 implement sketched percentile #1420

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

mrfh92
Copy link
Collaborator

@mrfh92 mrfh92 commented Apr 3, 2024

This solves #1411 by implementing a sketched version of percentile / median and a corresponding option for the RobustScaler in the preprocessing module.

All new features integrate into the current functionality without changes, due to choice of the current non-sketched version as default.

Performance comparison
on 14 MPI-processes (Workstation, CPU) on behalf of the "asteroids data set" (1317275 data points, 9 features), times and deviation measured over10 runs

avg time inaccuracy*
median main ~12.3s
median sketched (10% of data) 0.247s 0.30%
median sketched (1% of data) 0.01212s 1.3%
median sketched (0.1% of data) 0.00312s 2.7%

*relative deviation of the sketched median vs true median per feature, averaged over 10 runs; only relative deviation for the feature with maximal deviation is reported

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Thank you for the PR!

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 91.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 91.86%. Comparing base (bc5ec47) to head (d9723e9).

Files Patch % Lines
heat/core/statistics.py 90.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1420      +/-   ##
==========================================
- Coverage   91.87%   91.86%   -0.02%     
==========================================
  Files          80       80              
  Lines       11822    11840      +18     
==========================================
+ Hits        10862    10877      +15     
- Misses        960      963       +3     
Flag Coverage Δ
unit 91.86% <91.66%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Apr 4, 2024

Thank you for the PR!

1 similar comment
Copy link
Contributor

github-actions bot commented Apr 4, 2024

Thank you for the PR!

Copy link
Collaborator

@mtar mtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments on documentation

heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/tests/test_statistics.py Outdated Show resolved Hide resolved
mrfh92 and others added 3 commits April 5, 2024 09:40
followed mtar's suggestion

Co-authored-by: Michael Tarnawa <[email protected]>
updated docstring according to review
Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

1 similar comment
Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

mrfh92 and others added 4 commits April 5, 2024 09:54
updated docstring according to review
updated docstring according to review
Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great feature to have, thanks a lot @mrfh92 ! A few comments below...

heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Jun 5, 2024

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants