Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Kernels: Use AMREX_NO_INLINE #4716

Open
ax3l opened this issue Feb 22, 2024 · 0 comments
Open

Large Kernels: Use AMREX_NO_INLINE #4716

ax3l opened this issue Feb 22, 2024 · 0 comments
Assignees
Labels
backend: hip Specific to ROCm execution (GPUs) Performance optimization

Comments

@ax3l
Copy link
Member

ax3l commented Feb 22, 2024

In ROCm compilers as of early 2024, the compiler force inlines everything.

While generally nice, this can be problematic for very large kernels in both compile and runtime, if we actually want to enforce a function call and jump.

We should investigate if we have places like this, GatherAndPush comes to mind for some of the larger runtime combinations, where we want to add a AMREX_NO_INLINE to prevent this.

To be evaluated. Thanks to @zingale for bringing this up.

@ax3l ax3l added Performance optimization backend: hip Specific to ROCm execution (GPUs) labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: hip Specific to ROCm execution (GPUs) Performance optimization
Projects
None yet
Development

No branches or pull requests

3 participants