Large Kernels: Use `AMREX_NO_INLINE` #4716

ax3l · 2024-02-22T18:18:25Z

In ROCm compilers as of early 2024, the compiler force inlines everything.

While generally nice, this can be problematic for very large kernels in both compile and runtime, if we actually want to enforce a function call and jump.

We should investigate if we have places like this, GatherAndPush comes to mind for some of the larger runtime combinations, where we want to add a AMREX_NO_INLINE to prevent this.

To be evaluated. Thanks to @zingale for bringing this up.

The text was updated successfully, but these errors were encountered:

ax3l added Performance optimization backend: hip Specific to ROCm execution (GPUs) labels Feb 22, 2024

ax3l assigned atmyers and WeiqunZhang Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Kernels: Use `AMREX_NO_INLINE` #4716

Large Kernels: Use `AMREX_NO_INLINE` #4716

ax3l commented Feb 22, 2024

Large Kernels: Use AMREX_NO_INLINE #4716

Large Kernels: Use AMREX_NO_INLINE #4716

Comments

ax3l commented Feb 22, 2024

Large Kernels: Use `AMREX_NO_INLINE` #4716

Large Kernels: Use `AMREX_NO_INLINE` #4716