Resolve duplication between RunLogger in ersilia/core/session.py and RunTracker in ersilia/core/tracking.py #1132

Malikbadmus · 2024-05-31T10:07:57Z

Description

Resolve duplication between RunLogger in ersilia/core/session.py and RunTracker in ersilia/core/tracking.py. For example, both RunTracker and RunLogger subsample result output when the output dimensions are too many to reduce storage overhead and to not clutter the monitoring dashboard.

Changes Made

Runlogger Class merged to the RunTracker classes in tracking.py, creating a new unified class that inherits from both base classes and includes the initialization logic for both.
Created a single init method that handles the initialization requirement for tracking and logging.
Dropped the sample_df method, and modified modified most of the functions in the RunTracker class.
Created a track_run function with clearer method names and simplified parameters, excluding input and result data frames, meta, and metadata from the JSON output.
Changed the session file path to be under ersilia_runs.
Modified model.py to reflect the changes.

Status

The tracking module works with the changes made.
The size of the JSON object has been significantly reduced (more than 10 times), and potential issues with large DataFrames have been avoided.
The session file is now rerouted to the same directory used by the rest of the logging and tracking files.
The memory usage during the model run was also reduced.

Related to #1090

ersilia/core/tracking.py

DhanshreeA · 2024-06-04T08:08:53Z

Hey @Malikbadmus, good work so far and thank you for your efforts and the detailed explanations here! I have a few questions.

You mention:

The size of the JSON object has been significantly reduced (more than 10 times), and potential issues with large DataFrames have been avoided.

Which JSON object? The one containing all the tracking information?
What potential issues with large dataframes?

The memory usage during the model run was also reduced.
I understand that this might be happening because we have a single class now instead of two classes, however:

This is not the memory usage we are interested in anyway. We are interested in the memory used by the actual model however this code is currently getting the memory used by the ersilia CLI python process running the model. However, I agree the code could benefit from optimization, which brings me to my second point about this...
How much was the reduction? Do you have estimates/results on that? If yes, could you please share those here.

The session file is now rerouted to the same directory used by the rest of the logging and tracking files.
This is useful, thank you!

Malikbadmus · 2024-06-04T09:11:31Z

Hey @Malikbadmus, good work so far and thank you for your efforts and the detailed explanations here! I have a few questions.

You mention:

The size of the JSON object has been significantly reduced (more than 10 times), and potential issues with large DataFrames have been avoided.

Which JSON object? The one containing all the tracking information?

What potential issues with large dataframes?

The memory usage during the model run was also reduced.
I understand that this might be happening because we have a single class now instead of two classes, however:

This is not the memory usage we are interested in anyway. We are interested in the memory used by the actual model however this code is currently getting the memory used by the ersilia CLI python process running the model. However, I agree the code could benefit from optimization, which brings me to my second point about this...

How much was the reduction? Do you have estimates/results on that? If yes, could you please share those here?

The session file is now rerouted to the same directory used by the rest of the logging and tracking files.
This is useful, thank you!

@DhanshreeA , Many thanks for the review.

Yes, I meant the result of the tracking run, the variable json_object.
Since the json_object contains the full Input and result data frame, which can be large depending on the input file we use, excluding it should reduce memory consumption and improve performance.
The peak memory used before the refactor was 4.01 MB, and after 1.135 MB.

2024-05-2818-33-15.txt

current_session.txt

DhanshreeA · 2024-06-04T12:11:32Z

2. Since the json_object contains the full Input and result data frame, which can be large depending on the input file we use, excluding it should reduce memory consumption and improve performance.

Thanks @Malikbadmus, you're right this is absolutely not required, your suggestion is helpful.

Malikbadmus · 2024-06-04T14:41:00Z

@DhanshreeA , I have implemented the suggested changes and for the global variable, the reason I made use of it is here, though I have included an alternative in the new PR, I'll welcome your suggestion on ways of improving this solution.

Malikbadmus · 2024-06-05T09:43:38Z

with @Inyrkz assistance , a better alternative to the Global variable has been implemented.

Malikbadmus · 2024-06-06T12:13:53Z

@DhanshreeA , I have successfully rebased the pull request to work with the latest changes made to the master branch and resolve the conflict that emerged.

ersilia/core/tracking.py

DhanshreeA · 2024-06-10T05:21:28Z

LGTM @Malikbadmus, just few more very small changes and we can merge this

ersilia/core/tracking.py

Malikbadmus · 2024-06-10T07:00:50Z

LGTM @Malikbadmus, just few more very small changes and we can merge this

@DhanshreeA , I have modified the PR to reflect these changes.

ersilia/core/tracking.py

Malikbadmus · 2024-06-10T12:55:06Z

@DhanshreeA , i can understand what you mean, we just want the write_persistent_file and close_persistent_file to perform writing content and close file respectively, not creating a new file anytime the fuctions are called.

I've modified the tracking.py to reflect these changes, I've added an error handling in both functions to raise a Filenotfound error if check_file_exist returns false.

… ensure error handling

Malikbadmus · 2024-06-10T15:21:09Z

The new error handling I added to the function close_persistent_file is being thrown up, this is because if a model is run without the tracking flag( Like the GitHub workflow action does), a persistent file is not going to be created, therefore the function check_file_exist will return false.

I have added an if statement to address this in close.py

ersilia/core/tracking.py

cleaning up import statement and comment Added get_persistent_file_path function to centralize file path creation.

DhanshreeA · 2024-06-12T11:35:19Z

LGTM @Malikbadmus merging this!

Malikbadmus force-pushed the refactoring branch from d222965 to d780d0e Compare June 3, 2024 12:37

DhanshreeA requested changes Jun 4, 2024

View reviewed changes

Malikbadmus force-pushed the refactoring branch from 2741001 to ed6acf9 Compare June 6, 2024 11:58

DhanshreeA requested changes Jun 10, 2024

View reviewed changes

ersilia/core/tracking.py Outdated Show resolved Hide resolved

ersilia/core/tracking.py Outdated Show resolved Hide resolved

DhanshreeA requested changes Jun 10, 2024

View reviewed changes

ersilia/core/tracking.py Outdated Show resolved Hide resolved

Malikbadmus added 10 commits June 10, 2024 07:55

Merged the RunTracker and RunLogger class together

eb0508d

Remove duplications in functions

e5e7a7a

Reroute session file to the EOS directory

4e5f4ce

modified Track_run function

2051749

Adding comments

aabbc11

Implemented PR reviews and changes

f27100b

Rebase Pr

7e78c2c

Merged the RunTracker and RunLogger class together

085bdee

Rebase Pr

ee352f0

Implementing PR review changes

dbb27df

Malikbadmus force-pushed the refactoring branch from caee1cd to dbb27df Compare June 10, 2024 06:55

DhanshreeA requested changes Jun 10, 2024

View reviewed changes

ersilia/core/tracking.py Outdated Show resolved Hide resolved

ersilia/core/tracking.py Outdated Show resolved Hide resolved

Added check_file_exists method to prevent redundant file creation and…

8edc781

… ensure error handling

Malikbadmus force-pushed the refactoring branch from b278dd7 to 8edc781 Compare June 10, 2024 13:47

Inyrkz reviewed Jun 10, 2024

View reviewed changes

ersilia/core/tracking.py Show resolved Hide resolved

Added a condition for persistent tracking file

44f405b

cleaning up import statement and comment Added get_persistent_file_path function to centralize file path creation.

Malikbadmus force-pushed the refactoring branch from aee3886 to 44f405b Compare June 10, 2024 23:42

DhanshreeA merged commit 626126e into ersilia-os:master Jun 12, 2024
16 checks passed

DhanshreeA mentioned this pull request Jun 26, 2024

[🐅 Epic]: Splunk Integration into Ersilia #1090

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve duplication between RunLogger in ersilia/core/session.py and RunTracker in ersilia/core/tracking.py #1132

Resolve duplication between RunLogger in ersilia/core/session.py and RunTracker in ersilia/core/tracking.py #1132

Malikbadmus commented May 31, 2024

DhanshreeA commented Jun 4, 2024 •

edited

Loading

Malikbadmus commented Jun 4, 2024 •

edited

Loading

DhanshreeA commented Jun 4, 2024

Malikbadmus commented Jun 4, 2024 •

edited

Loading

Malikbadmus commented Jun 5, 2024

Malikbadmus commented Jun 6, 2024

DhanshreeA commented Jun 10, 2024 •

edited

Loading

Malikbadmus commented Jun 10, 2024

Malikbadmus commented Jun 10, 2024 •

edited

Loading

Malikbadmus commented Jun 10, 2024 •

edited

Loading

DhanshreeA commented Jun 12, 2024

Resolve duplication between RunLogger in ersilia/core/session.py and RunTracker in ersilia/core/tracking.py #1132

Resolve duplication between RunLogger in ersilia/core/session.py and RunTracker in ersilia/core/tracking.py #1132

Conversation

Malikbadmus commented May 31, 2024

DhanshreeA commented Jun 4, 2024 • edited Loading

Malikbadmus commented Jun 4, 2024 • edited Loading

DhanshreeA commented Jun 4, 2024

Malikbadmus commented Jun 4, 2024 • edited Loading

Malikbadmus commented Jun 5, 2024

Malikbadmus commented Jun 6, 2024

DhanshreeA commented Jun 10, 2024 • edited Loading

Malikbadmus commented Jun 10, 2024

Malikbadmus commented Jun 10, 2024 • edited Loading

Malikbadmus commented Jun 10, 2024 • edited Loading

DhanshreeA commented Jun 12, 2024

DhanshreeA commented Jun 4, 2024 •

edited

Loading

Malikbadmus commented Jun 4, 2024 •

edited

Loading

Malikbadmus commented Jun 4, 2024 •

edited

Loading

DhanshreeA commented Jun 10, 2024 •

edited

Loading

Malikbadmus commented Jun 10, 2024 •

edited

Loading

Malikbadmus commented Jun 10, 2024 •

edited

Loading