Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MD workflows #1672

Open
wants to merge 95 commits into
base: main
Choose a base branch
from

Conversation

tomdemeyere
Copy link
Contributor

@tomdemeyere tomdemeyere commented Feb 7, 2024

Summary of Changes

Closes #1577.

Before going further a little consultation:

MD users use workflows all the time, just for the NVE example in this commit there is pretty much a thousand ways you can run that; doing a first NVT to relax to the desired temperature, or a first NPT to relax to the desired density, etc...

  1. For example, should we have one NVT workflow with a parameter "thermostat", or one workflow per thermostat?
  2. If you want things to be in recipe.common, atoms will have to be sent in with a calculator already attached, is that what you want?
  3. From what I understand you want to keep things simple (not something like the code here), users then build their workflows to get what they want.
  4. In any case, for now doing these kinds of workflows is difficult, because it is generally hard to transfer results between ASE calculators, which will always happen between runs, this is critical since MD needs momenta and forces to ensure proper continuation. (This is a problem I would like fixed above since it also causes problems when restarting optimization etc)

Checklist

  • I have read the "Guidelines" section of the contributing guide. Don't lie! 😉
  • My PR is on a custom branch and is not named main.
  • I have used black, isort, and ruff as described in the style guide.
  • I have added relevant, comprehensive unit tests.

@buildbot-princeton
Copy link
Collaborator

Can one of the admins verify this patch?

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Feb 7, 2024

@tomdemeyere: Happy to respond to each question but before I do, I just wanted to check that this is the full commit you wanted to share --- just the args/kwargs, right?

(In short, I really would love to have MD flows and am willing to work together to figure out how to get this done the best way possible)

Copy link

codecov bot commented Feb 7, 2024

Codecov Report

Attention: Patch coverage is 98.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.99%. Comparing base (10ee9b0) to head (65bc767).

Files Patch % Lines
src/quacc/runners/ase.py 95.91% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1672      +/-   ##
==========================================
- Coverage   99.02%   98.99%   -0.04%     
==========================================
  Files          82       84       +2     
  Lines        3387     3481      +94     
==========================================
+ Hits         3354     3446      +92     
- Misses         33       35       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tomdemeyere
Copy link
Contributor Author

@tomdemeyere: Happy to respond to each question but before I do, I just wanted to check that this is the full commit you wanted to share --- just the args/kwargs, right?

(In short, I really would love to have MD flows and am willing to work together to figure out how to get this done the best way possible)

Yes, before doing more I wanted to make sure we are on the same page.

I guess my main question here is: how to make these flows common to all codes? Should they call code specific MD jobs? Or, should a calculator be attached to the Atoms object being sent to the MD flow?

Once I know this we can work on details

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Feb 8, 2024

I guess my main question here is: how to make these flows common to all codes? Should they call code specific MD jobs? Or, should a calculator be attached to the Atoms object being sent to the MD flow?

@tomdemeyere: Good question.

Personally, I think what would likely make the most sense is to make a run_md function in quacc.runners.ase that would take an Atoms object with an attached calculator just like the other methods in there. You likely were aware of that already.

From there, we get to the recipes, and my answer is: "it depends." If it is just a simple @job (i.e. not a @flow), then this is no different than a relaxation job --- one would just be calling run_md instead of run_calc or run_ase_opt in a given code's module (e.g. quacc.recipes.emt). Since there wouldn't be much logic beyond defining default arguments and calling run_md, this is fine and doesn't introduce much duplication. I would treat the individual MD jobs in the same way as a relaxation --- we have a runner, which is called in the various recipes where it is appropriate. It does mean that there would be multiple MD modules, but we would need that anyway because every code will have different parameters (defaults) that are needed to properly run an MD calculation.

If we are talking about a @flow where there are multiple Jobs being stitched together, and we anticipate this pattern (i.e. the workflow DAG) to be calculator-agnostic, then we'll put that in common. This is the philosophy behind the existing common recipes. But we would still need individual Jobs for each code. In general, users aren't expected to call the common workflow directly. Aside from usability aspects, there are some other nuances about (de)serialization needed by workflow engines for why I say this, but I won't bore you with those details here.

To get started, I would suggest making quacc.runners.ase.run_md and then a simple demo for a cheap-to-run calculator, like EMT or LJ. That will likely be illustrative and help guide the design further.

I should also note that we can always refactor later if there is something we can refactor. No need to prematurely optimize.

For example, should we have one NVT workflow with a parameter "thermostat", or one workflow per thermostat?

If the workflow logic is staying the same regardless of thermostat, then I would suggest one job/workflow where thermostat would be a keyword argument with a sensible default.

If you want things to be in recipe.common, atoms will have to be sent in with a calculator already attached, is that what you want?

See comment above. If we are talking about a @job, then that would call run_md and be specific for the code since we need to set some sensible default parameters. If it's a @flow pattern, that can go in common but would not require passing in Atoms objects with attached calculators --- it would require passing in one or more Jobs.

From what I understand you want to keep things simple (not something like the code here), users then build their workflows to get what they want.

Hard for me to tell from the commit you shared here, but I am not opposed to complexity. In fact, we should not rely on the users to build something complex. If the @flow pattern is "trivial", like a relax then an MD run, it is debatable if we should make that its own @flow or not (honestly, it can go either way). But if it's more complex, for sure we should.

In any case, for now doing these kinds of workflows is difficult, because it is generally hard to transfer results between ASE calculators, which will always happen between runs, this is critical since MD needs momenta and forces to ensure proper continuation. (This is a problem I would like fixed above since it also causes problems when restarting optimization etc)

This is a good point. However, I think it is solvable. Let's circle back to this when the time is right.

@tomdemeyere
Copy link
Contributor Author

@Andrew-S-Rosen Thanks for all these details. I will come back to it at some point

@tomdemeyere
Copy link
Contributor Author

@Andrew-S-Rosen I think this is now a good point to have your input

Copy link
Member

@Andrew-S-Rosen Andrew-S-Rosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for kicking this off, @tomdemeyere! Really appreciate the contribution!

This is looking good. I have some comments that are mostly related to some minor restructuring.

Starting with this isolated EMT example is very helpful for me in understanding the process and figuring out what should be refactored, so I thank you for doing that.

src/quacc/recipes/emt/core.py Outdated Show resolved Hide resolved

if temperature:
MaxwellBoltzmannDistribution(
atoms, temperature_K=temperature, **initial_temperature_params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the fixcm and fixrot as dedicated keyword arguments, the **kwargs here can be more clearly described as simply being the **maxwell_boltzmann_distribution_kwargs. That is a mouthful (although nobody is calling the name directly). Could try something like **maxwell_boltzmann_kwargs. Either way, the point is that both the docstring and name would be much clearer this way. At a glance, it's not clear what an **initial_temperature_params is.

src/quacc/recipes/emt/core.py Outdated Show resolved Hide resolved
src/quacc/recipes/emt/core.py Outdated Show resolved Hide resolved
src/quacc/runners/ase.py Outdated Show resolved Hide resolved
src/quacc/schemas/_aliases/ase.py Outdated Show resolved Hide resolved
src/quacc/schemas/_aliases/ase.py Outdated Show resolved Hide resolved
src/quacc/schemas/ase.py Outdated Show resolved Hide resolved
src/quacc/schemas/ase.py Outdated Show resolved Hide resolved
src/quacc/recipes/emt/core.py Outdated Show resolved Hide resolved
@tomdemeyere
Copy link
Contributor Author

tomdemeyere commented Feb 15, 2024

Thank you for kicking this off, @tomdemeyere! Really appreciate the contribution!

This is looking good. I have some comments that are mostly related to some minor restructuring.

Starting with this isolated EMT example is very helpful for me in understanding the process and figuring out what should be refactored, so I thank you for doing that.

I see that your comment is different than the one I received by email from git. But yes, originally my plan was to have these "_base" functions. Either per calculator or in common like you previously said.

After discussing that along with other details I was then planning to introduce recipes for the other ensembles (NVT, NPT).

Didn't see your comments originally (github mobile) will work on it at some point

@Andrew-S-Rosen
Copy link
Member

I see that your comment is different than the one I received by email from git. But yes, originally my plan was to have these "_base" functions. Either per calculator or in common like you previously said.

Apologies about the edits! I was trying to sort out where to place them and was going back-and-forth so decided to just save it for a future discussion to avoid confusion. But you got the right idea anyway.

After discussing that along with other details I was then planning to introduce recipes for the other ensembles (NVT, NPT).

That would be great!

Didn't see your comments originally (github mobile) will work on it at some point

Thank you!

@tomdemeyere
Copy link
Contributor Author

tomdemeyere commented Feb 15, 2024

Few comments:

  • Without run_kwargs there is no way to pass things to dyn.run()
  • There is still a check_convergence, although it does not make much sense for MD. By definition you reached the max number of steps, but there is a case where it makes sense:

We are currently working on a error-free quacc as part of another project. We make sure that quacc always reaches the point of writing to the database: it writes the errors that occurred, along with the available results, if any (the errors might happen at some point during the MD, previous results should be returned, actually in some cases, errors might even be part of your journey (although this might not be what Quacc was originally made for)). Currently if one is running a bunch of simulations, if errors occur, there will be a bunch of directories with no easy way to figure things out.

Because you said you might be interested in error handling I am leaving that there.

  • I changed how temperature/time etc... is written to results, feel free to tell me what you think

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Feb 15, 2024

Without run_kwargs there is no way to pass things to dyn.run()

The fmax and steps are already passed in quacc.runners.ase.run_opt, and .run() doesn't take any other keyword arguments, so we don't need it.

There is still a check_convergence, although it does not make much sense for MD. By definition you reached the max number of steps, but there is a case where it makes sense:

I agree that a check_convergence doesn't make much sense for this job type. We should remove that logic entirely.

We are currently working on a error-free quacc as part of another project. We make sure that quacc always reaches the point of writing to the database: it writes the errors that occurred, along with the available results,

One could catch the typical RuntimeError errors when ASE is called and try to give the user some flexibility via the global settings about how to handle things with the CHECK_CONVERGENCE and/or some other global parameter. But of course, feel free to continue doing what you're doing!

Currently if one is running a bunch of simulations, if errors occur, there will be a bunch of directories with no easy way to figure things out.

Indeed. This is a limitation of Parsl/Dask, which is really best thought of a task manager. Other true workflow tools like Prefect or Covalent or FireWorks have a separate database for the task metadata and would highlight when/where the job fails. But since Parsl does not have a task database of its own, there is no mechanism to easily store that info. The recommended approach would be to launch your calculations in a for loop and log any errors from the Parsl AppFuture. But of course, your mechanism is fine as well.

I changed how temperature/time etc... is written to results, feel free to tell me what you think

I like that a lot more!

@tomdemeyere
Copy link
Contributor Author

Some news on this? Just to know if it is waiting on https://gitlab.com/ase/ase/-/merge_requests/3310

Copy link
Member

@Andrew-S-Rosen Andrew-S-Rosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomdemeyere: Thank you for your enormous amount of patience with this PR. I didn't see your comment from 3 weeks ago, so my apologies about not responding and for keeping you in the dark about this.

The reason I had been reluctant to merge this was that there was a fair bit of refactoring I wanted to do to remove code duplication in the run_md method and also in the schema. This was dependent on some other PRs I had in the pipeline. I was finally able to devote time to do so properly, and now it is much cleaner. 👍 I didn't change any functionality (to the best of my knowledge).

At this point, I am quite pleased with this. Whenever you have a moment, however, I have a few remaining questions outlined below. I am happy to take over of any (small) changes that result from my questions --- but I wanted to get your professional input before attempting to proceed since I am not an MD expert. Finally, before I merge, if you have a moment to look at a similar attempt recently implemented in matcalc, it'd be nice to know if we are missing anything particularly useful that's in there. For instance, there is something about upper triangular cells that is entirely foreign to me.

Comment on lines +52 to +53
Additionally, Quacc uses the keywords `fix_com` and `fix_rot` instead of
`fixcm` and `fixrot` to fix the center of mass and rotation, respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just a matter of renaming and nothing else, let's just stick with the ASE ones. If you approve, let me know and I can revert the change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure!

Comment on lines +55 to +64
As of March 2024, ASE still accept deprecated keywords such as:
- temperature instead of temperature_K
- pressure instead of pressure_au
- compressibility instead of compressibility_au
- dt instead of timestep
All of these keywords will be accepted by Quacc, but it is recommended to use the
new keywords to avoid confusion. Everything will always be converted to the Quacc
units mentioned above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to mention this since ASE itself will raise a deprecation warning if they are used, right? Ultimately, it's on the user to use the current ASE names. Or maybe I'm misunderstanding the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be deleted yes

Comment on lines +441 to +456
if "temperature" in dynamics_kwargs or "temp" in dynamics_kwargs:
LOGGER.warning(
"In quacc `temperature`, `temp` and `temperature_K` are"
" equivalent and will be interpreted in Kelvin."
)
dynamics_kwargs["temperature_K"] = dynamics_kwargs.pop(
"temperature", None
) or dynamics_kwargs.pop("temp", None)

if "pressure" in dynamics_kwargs:
LOGGER.warning(
"In quacc `pressure` and `pressure_au` are equivalent and will be"
" interpreted in GPA."
)
dynamics_kwargs["pressure_au"] = dynamics_kwargs.pop("pressure")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all these checks, or can we just rely on the fact that the user should be using the up-to-date names and if they don't, ASE will warn them or a crash will happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that, parameters are a very big mess currently, for example for NVT Berendsen we have:

temperature: float
The desired temperature, in Kelvin.

temperature_K: float
Alias for temperature

And for NPT (Nose-Hoover) :

temperature: float (deprecated)
The desired temperature in eV.

temperature_K: float
The desired temperature in K.

Sometimes temperature is in eV, sometimes in K, you don't know, surprise!

I know it's a lot of time to ask from you but your support on https://gitlab.com/ase/ase/-/merge_requests/3391 (probably not everything in this PR is good to take) would be more than welcome.

If/when this goes, I would rewrite the entire MD doc and we can finally move on with X-years old deprecated keywords and have user-friendliness and consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooof, yeah that is a mess! Thanks for the reminder.

Let's see how the discussion plays out and then I am happy to proceed how you suggest. I am happy to review on ASE's side if/when we get a blessing from Ask on certain components.

@tomdemeyere
Copy link
Contributor Author

@tomdemeyere: Thank you for your enormous amount of patience with this PR. I didn't see your comment from 3 weeks ago, so my apologies about not responding and for keeping you in the dark about this.

The reason I had been reluctant to merge this was that there was a fair bit of refactoring I wanted to do to remove code duplication in the run_md method and also in the schema. This was dependent on some other PRs I had in the pipeline. I was finally able to devote time to do so properly, and now it is much cleaner. 👍 I didn't change any functionality (to the best of my knowledge).

At this point, I am quite pleased with this. Whenever you have a moment, however, I have a few remaining questions outlined below. I am happy to take over of any (small) changes that result from my questions --- but I wanted to get your professional input before attempting to proceed since I am not an MD expert. Finally, before I merge, if you have a moment to look at a similar attempt recently implemented in matcalc, it'd be nice to know if we are missing anything particularly useful that's in there. For instance, there is something about upper triangular cells that is entirely foreign to me.

I had a look at the code you sent me and I am afraid I have never had issues with the upper vs lower triangular matrix (mainly because I don't often run NPT) but from what I understand this is taken care by a PR in ASE (https://gitlab.com/ase/ase/-/merge_requests/3277) by just modifying the NPT class without the need of modifying the atoms object like in matcalc.

For more context: left is my slab before, right is the slab after being passed in matcalc.

image image

To refresh the current challenges in this PR:

  • The lack of proper restart, which breaks the purpose of "MD workflows"
  • Choose a convention for the units being used and make sure that it is clear and user-friendly

For the first point, I am afraid this is not for today :/. For the later I tried to make things as clean as I could but of course feel free to discard these suggestions and do as you like.

@Andrew-S-Rosen
Copy link
Member

Thanks for the reply, @tomdemeyere! Regarding the triangular stuff, thanks for linking me to the MR. If this is just an ASE bug/inconvenience, we will take care of that upstream. I'll keep tabs on the MR.

As for this PR, not to worry about restarts. I don't think it is a dealbreaker, and I disagree that it defeats the purpose of a workflow. After all, one might want to do a non-MD job afterwards (e.g. a DFT relaxation) which would be fully supported. It's just that MD-->MD might be tricky. I wouldn't let that hold up this PR. Let me know if/when your upstream ASE MR though needs any attention from me.

The second point is really the only one I feel is necessary to make sure we get right. I trust your opinion on this and am content with leaving it as-is.

So, I think the plan (unless you disagree) should be:

  • I'll monitor the conversation at https://gitlab.com/ase/ase/-/merge_requests/3391 to see if there's anything productive that might turn up in the short term.
  • Assuming the above answer is no, I'll go ahead and merge this PR pretty much as-is sometime next week.

Thoughts?

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jun 15, 2024

Oh, but also I still had the question about fixcm and fixrot. Did we need to rename them for some reason? Was that also an internal consistency thing?

@tomdemeyere
Copy link
Contributor Author

tomdemeyere commented Jun 15, 2024

Oh, but also I still had the question about fixcm and fixrot. Did we need to rename them for some reason? Was that also an internal consistency thing?

Most of the time you will find fixcm but occasionally you might find fix_com (mostly internally) I prefer fix_com so I used that (it is less obscure in my opinion) but I don't feel strongly about it, change it as you want.

Thoughts?

Yes! I think that's a great plan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add some general, ASE-based MD compute jobs
3 participants