Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve duplication between RunLogger in ersilia/core/session.py and RunTracker in ersilia/core/tracking.py #1132

Merged
merged 12 commits into from
Jun 12, 2024

Conversation

Malikbadmus
Copy link
Contributor

Description

Resolve duplication between RunLogger in ersilia/core/session.py and RunTracker in ersilia/core/tracking.py. For example, both RunTracker and RunLogger subsample result output when the output dimensions are too many to reduce storage overhead and to not clutter the monitoring dashboard.

Changes Made

  • Runlogger Class merged to the RunTracker classes in tracking.py, creating a new unified class that inherits from both base classes and includes the initialization logic for both.
  • Created a single init method that handles the initialization requirement for tracking and logging.
  • Dropped the sample_df method, and modified modified most of the functions in the RunTracker class.
  • Created a track_run function with clearer method names and simplified parameters, excluding input and result data frames, meta, and metadata from the JSON output.
  • Changed the session file path to be under ersilia_runs.
  • Modified model.py to reflect the changes.

Status

  • The tracking module works with the changes made.
  • The size of the JSON object has been significantly reduced (more than 10 times), and potential issues with large DataFrames have been avoided.
  • The session file is now rerouted to the same directory used by the rest of the logging and tracking files.
  • The memory usage during the model run was also reduced.

Related to #1090

ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
@DhanshreeA
Copy link
Member

DhanshreeA commented Jun 4, 2024

Hey @Malikbadmus, good work so far and thank you for your efforts and the detailed explanations here! I have a few questions.

You mention:

The size of the JSON object has been significantly reduced (more than 10 times), and potential issues with large DataFrames have been avoided.

  1. Which JSON object? The one containing all the tracking information?
  2. What potential issues with large dataframes?

The memory usage during the model run was also reduced.
I understand that this might be happening because we have a single class now instead of two classes, however:

  1. This is not the memory usage we are interested in anyway. We are interested in the memory used by the actual model however this code is currently getting the memory used by the ersilia CLI python process running the model. However, I agree the code could benefit from optimization, which brings me to my second point about this...
  2. How much was the reduction? Do you have estimates/results on that? If yes, could you please share those here.

The session file is now rerouted to the same directory used by the rest of the logging and tracking files.
This is useful, thank you!

@Malikbadmus
Copy link
Contributor Author

Malikbadmus commented Jun 4, 2024

Hey @Malikbadmus, good work so far and thank you for your efforts and the detailed explanations here! I have a few questions.

You mention:

The size of the JSON object has been significantly reduced (more than 10 times), and potential issues with large DataFrames have been avoided.

  1. Which JSON object? The one containing all the tracking information?
  2. What potential issues with large dataframes?

The memory usage during the model run was also reduced.
I understand that this might be happening because we have a single class now instead of two classes, however:

  1. This is not the memory usage we are interested in anyway. We are interested in the memory used by the actual model however this code is currently getting the memory used by the ersilia CLI python process running the model. However, I agree the code could benefit from optimization, which brings me to my second point about this...
  2. How much was the reduction? Do you have estimates/results on that? If yes, could you please share those here?

The session file is now rerouted to the same directory used by the rest of the logging and tracking files.
This is useful, thank you!

@DhanshreeA , Many thanks for the review.

  1. Yes, I meant the result of the tracking run, the variable json_object.
  2. Since the json_object contains the full Input and result data frame, which can be large depending on the input file we use, excluding it should reduce memory consumption and improve performance.
  3. The peak memory used before the refactor was 4.01 MB, and after 1.135 MB.

2024-05-2818-33-15.txt

current_session.txt

@DhanshreeA
Copy link
Member

2. Since the json_object contains the full Input and result data frame, which can be large depending on the input file we use, excluding it should reduce memory consumption and improve performance.

Thanks @Malikbadmus, you're right this is absolutely not required, your suggestion is helpful.

@Malikbadmus
Copy link
Contributor Author

Malikbadmus commented Jun 4, 2024

@DhanshreeA , I have implemented the suggested changes and for the global variable, the reason I made use of it is here, though I have included an alternative in the new PR, I'll welcome your suggestion on ways of improving this solution.

@Malikbadmus
Copy link
Contributor Author

with @Inyrkz assistance , a better alternative to the Global variable has been implemented.

@Malikbadmus
Copy link
Contributor Author

@DhanshreeA , I have successfully rebased the pull request to work with the latest changes made to the master branch and resolve the conflict that emerged.

ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
@DhanshreeA
Copy link
Member

DhanshreeA commented Jun 10, 2024

LGTM @Malikbadmus, just few more very small changes and we can merge this

ersilia/core/tracking.py Outdated Show resolved Hide resolved
@Malikbadmus
Copy link
Contributor Author

LGTM @Malikbadmus, just few more very small changes and we can merge this

@DhanshreeA , I have modified the PR to reflect these changes.

ersilia/core/tracking.py Outdated Show resolved Hide resolved
ersilia/core/tracking.py Outdated Show resolved Hide resolved
@Malikbadmus
Copy link
Contributor Author

Malikbadmus commented Jun 10, 2024

@DhanshreeA , i can understand what you mean, we just want the write_persistent_file and close_persistent_file to perform writing content and close file respectively, not creating a new file anytime the fuctions are called.

I've modified the tracking.py to reflect these changes, I've added an error handling in both functions to raise a Filenotfound error if check_file_exist returns false.

@Malikbadmus
Copy link
Contributor Author

Malikbadmus commented Jun 10, 2024

The new error handling I added to the function close_persistent_file is being thrown up, this is because if a model is run without the tracking flag( Like the GitHub workflow action does), a persistent file is not going to be created, therefore the function check_file_exist will return false.

I have added an if statement to address this in close.py

cleaning up import statement and comment

Added get_persistent_file_path function to centralize file path creation.
@DhanshreeA
Copy link
Member

LGTM @Malikbadmus merging this!

@DhanshreeA DhanshreeA merged commit 626126e into ersilia-os:master Jun 12, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants