Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ipcMode in AWS batch #4979

Open
nhammond opened this issue May 7, 2024 · 4 comments
Open

Support for ipcMode in AWS batch #4979

nhammond opened this issue May 7, 2024 · 4 comments

Comments

@nhammond
Copy link

nhammond commented May 7, 2024

New feature

I would like to see the ipcMode=Host setting supported in AWS batch to enable shared memory between processes.

Usage scenario

This came up when trying to run a workflow with many parallel STAR align processes. STAR has a shared memory option that can be controlled with the "--genomeLoad" flag. For example by setting "--genomeLoad LoadAndKeep", the process will only load the genome into memory if it has not already been loaded by another process, and upon completion it will remove it from memory only if it is not in use by another process. This works fine with Docker containers on the same host by using "docker run --ipc host ...", and AWS Batch supports an 'ipcMode: "host"' setting in the Batch job definition. It would be nice to enable this option via nextflow. Using a pre-loaded genome reduces startup time by about 1 minute and reduces the ram needed for each process from about 32 GB to about 4 GB, so the impact is significant when there are many STAR align processes.

Suggest implementation

It appears there are two versions of Batch job definitions, using either ContainerProperties or EcsProperties as described here: Creating job definitions using EcsProperties - AWS Batch. Nextflow is using the legacy "ContainerProperties" job definition, and this does not support ipcMode or other options described in the link above (not sure if any of the other impacted options are significant for nextflow users: dependsOn, essential, name, and pidMode).

I believe enabling ipcMode would require adding support for the new EcsProperties job definition. This could continue to support the current containerOptions directive, and ipcMode could be added to the supported containerOptions.

In addition to updating the job definition format, it appears there are overrides applied to the job definition at runtime whose structure would also need to change, from ContainerOverrides to TaskContainerOverrides. (This last point prevents the workaround of manually updating the job definition to EcsProperties format with "ipcMode=host" and continuing to run NextFlow, as this raises the error "Container overrides should not be set for ecsProperties jobs.")

@bentsherman
Copy link
Member

AWS Batch can be pretty opaque about how it packs tasks onto the same VM. How can you be sure that the star align tasks will be packed together if you have other processes running at the same time?

@nhammond
Copy link
Author

nhammond commented May 8, 2024

You're right, there is no control of that. In practice, when running 1 workflow at a time or kicking off a batch of workflows together, these STAR align jobs tend to flood all available batch instances for a window of time to where setting aside some memory could work well. If there are many different jobs running at different stages of the workflow, it would be hard to get the memory allocations right. It's something I would like to try, but I have the same concern as you.

Is there anything else pushing us toward updating from the legacy Batch job definitions? ipcMode support would become trivial if were already using the EcsPropertes job definitions, but I know that change is a heavy lift just to enable this edge use case.

@bentsherman
Copy link
Member

We are planning to migrate to the AWS SDK v2 #4741 . Not sure if that's required to support the EcsProperties but it's certainly something we could fold into that effort

@nhammond
Copy link
Author

nhammond commented May 8, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants