Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Refactor: Remove Pandas as a dependency by using the natively in built csv module instead #1131

Merged
merged 6 commits into from
Jun 6, 2024

Conversation

dzumii
Copy link
Contributor

@dzumii dzumii commented May 30, 2024

Description
Code refactoring to remove Pandas from the tracking functionality and implement the functionality using built-in Python libraries.

Changes Made

  • defined read_csv function to read CSV files using the native CSV library.
  • Replaced some other pandas data manipulation.
  • Adjusted the stats function to calculate statistics using the statistics library.
  • Adjusted the get_file_size function

Status

  • The tracking functionality works with the changes.
  • Time taken is now a bit shorter.
  • However there are still some inconsistencies in the outcome in comparison with the outcome generated using the tracking functionality with pandas, particularly relating to the file_size and stats functions.

To Do

  • Refine the check_types function.
  • Refine the get_file_size function.

This is still a Work In Progress
Comments and suggestions would be really appreciated

Related to #1090

Closes #1054

@dzumii dzumii force-pushed the code-refactoring-remove-pandas branch from 5a825d5 to b4d649e Compare June 3, 2024 12:42
@dzumii
Copy link
Contributor Author

dzumii commented Jun 3, 2024

Fixed all inconsistencies, tracking functionality now works as it did before removing pandas dependency

@DhanshreeA DhanshreeA self-requested a review June 4, 2024 06:41
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
@DhanshreeA
Copy link
Member

@dzumii could you also share before and after results? That is, results from the master branch and your work?

ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
@miquelduranfrigola
Copy link
Member

@DhanshreeA @dzumii anything you need from me here?

@DhanshreeA
Copy link
Member

@DhanshreeA @dzumii anything you need from me here?

All good on my end.

@dzumii
Copy link
Contributor Author

dzumii commented Jun 4, 2024

@DhanshreeA, I have been able to finally fix the inconsistency in the mismatch_type ,thanks to @Malikbadmus,took us some hours to crack

here is the comparison between the two metric files when i run
diff -y <txt-file-generated-from-master-before-changes> <txt-file-generated-after-removing-pandas>

I also noticed a warning when i try to close a model having used the track flags

ersilia/core/tracking.py Outdated Show resolved Hide resolved
@dzumii dzumii force-pushed the code-refactoring-remove-pandas branch from c744232 to 3973044 Compare June 5, 2024 20:38
@DhanshreeA
Copy link
Member

DhanshreeA commented Jun 6, 2024

@DhanshreeA, I have been able to finally fix the inconsistency in the mismatch_type ,thanks to @Malikbadmus,took us some hours to crack

here is the comparison between the two metric files when i run diff -y <txt-file-generated-from-master-before-changes> <txt-file-generated-after-removing-pandas>

I also noticed a warning when i try to close a model having used the track flags

Hey @dzumii Could you please report this in an issue (the warning) and we can tackle it there?

@DhanshreeA DhanshreeA merged commit d0e0f6c into ersilia-os:master Jun 6, 2024
16 checks passed
@DhanshreeA
Copy link
Member

LGTM on the PR! Ready to merge.

@dzumii
Copy link
Contributor Author

dzumii commented Jun 6, 2024

@DhanshreeA, I have been able to finally fix the inconsistency in the mismatch_type ,thanks to @Malikbadmus,took us some hours to crack
here is the comparison between the two metric files when i run diff -y <txt-file-generated-from-master-before-changes> <txt-file-generated-after-removing-pandas>
I also noticed a warning when i try to close a model having used the track flags

Hey @dzumii Could you please report this in an issue (the warning) and we can tackle it there?

Ok, will do that

@dzumii dzumii mentioned this pull request Jun 20, 2024
@dzumii dzumii deleted the code-refactoring-remove-pandas branch June 23, 2024 15:02
@dzumii dzumii restored the code-refactoring-remove-pandas branch June 23, 2024 15:02
@dzumii dzumii deleted the code-refactoring-remove-pandas branch June 23, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

🐛 Bug: Pandas slows down build time for ersiliaos/base image
3 participants