Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Increasing RAM Usage with DeepHyper After 10k+ Evaluations With 8+ Parameters #188

Open
evvaletov opened this issue May 18, 2023 · 3 comments
Assignees
Labels
bug Something isn't working properly, like a bad convergence

Comments

@evvaletov
Copy link

evvaletov commented May 18, 2023

Describe the bug

While using the DeepHyper optimization library, I have noticed a concerning trend where the RAM usage of the user program is slowly increasing, with a possible rapid increase after some time. This issue has been observed on NERSC's Perlmutter supercomputer as well as a desktop system, across two categories of optimization problems. Specifically, the issue seems to surface after approximately 10,000 to 40,000 evaluations. The number of optimization parameters typically used in these cases ranges from 8 to 12.

I note that a common feature of the optimizations I am running is that the objective function calls an external program using subprocess. I observed the issue with different external programs, but each of them uses about 2 to 3.5 GB of RAM while running. However, the RAM used by an external program called using subprocess should be freed automatically after its completion, and I don't see any reason why that shouldn't be the case here.

To Reproduce

Steps to reproduce this behavior:

  1. Run any optimization problem using DeepHyper on NERSC's Perlmutter or on a desktop system, using between 8 to 12 optimization parameters.
  2. Perform 10,000 to 40,000 evaluations.
  3. Observe the RAM usage of the user program over time.

Expected behavior

Ideally, the RAM usage of the user program should remain relatively stable over the course of the optimization problem. Increases in RAM usage should be temporary and followed by corresponding decreases, reflecting defensive memory management. This should hold true even after numerous evaluations and despite the number of optimization parameters being used.

Screenshots

None

Desktop (please complete the following information):

OS: (1) SUSE Linux Enterprise Server 15 SP4 (Perlmutter), (2) Oracle Linux 8
Systems: (1) NERSC Perlmutter, (2) a desktop system with a 32-core CPU and 128 GB RAM
Python version: 3.9.16
DeepHyper Version: 0.5.0

Additional context

This growing RAM usage issue has the potential to seriously impact the ability to use DeepHyper for long-running optimization problems with more than 8 optimization paremeters.

@evvaletov evvaletov added the bug Something isn't working properly, like a bad convergence label May 18, 2023
@evvaletov evvaletov changed the title [BUG] : Increasing RAM Usage with DeepHyper After 10k Evaluations With 8+ Parameters [BUG] Increasing RAM Usage with DeepHyper After 10k Evaluations With 8+ Parameters May 18, 2023
@evvaletov evvaletov changed the title [BUG] Increasing RAM Usage with DeepHyper After 10k Evaluations With 8+ Parameters [BUG] Increasing RAM Usage with DeepHyper After 10k+ Evaluations With 8+ Parameters May 18, 2023
@evvaletov
Copy link
Author

evvaletov commented May 19, 2023

I implemented a conditional pause in a while loop in the evaluation of an objective function in case the available RAM falls on the node under a threshold. With this modification, the amount of available RAM eventually fell under this threshold during an optimization and then started increasing again. I am logging the amount of available RAM vs. time into a log file and can provide one if useful.

@Deathn0t Deathn0t self-assigned this May 19, 2023
@Deathn0t
Copy link
Member

Hi @evvaletov ,

I will try to help on this. Something is not clear to me with the issue raised. Is the RAM consumed coming from the objective function or the optimization algorithm? It looks like this is related to how the objective function is defined.

  • Could you give code examples of such a function (something easy to reproduce)?
  • Can you specify which Evaluator backend is being used (threads, process, mpi...)?

@evvaletov
Copy link
Author

Hi @Deathn0t , I am going to make an easily reproducible MWE and get back to you with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working properly, like a bad convergence
Projects
None yet
Development

No branches or pull requests

2 participants