Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CUDA intrinsics #23

Open
mxmlnkn opened this issue Feb 28, 2016 · 3 comments
Open

add CUDA intrinsics #23

mxmlnkn opened this issue Feb 28, 2016 · 3 comments
Labels

Comments

@mxmlnkn
Copy link
Contributor

mxmlnkn commented Feb 28, 2016

I have code which makes use of warpSize and __shfl_down. The latter may be impossible to implement with alpaka, but warpSize could be mapped to something of value, e.g. elemDim.

@psychocoderHPC
Copy link
Member

Which operation do you perform with warpsize?
Warp is no layer of alpaka therefore you can use it with #ifdef on nvidia hardware and on all other hardware write your algorithm so that your warpsize is one.

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Feb 28, 2016

I'm doing __shfl_down reduction using warpSize. I guess it's very CUDA specific anyway

    /* reduce per warp (warpSize == 32 assumed) */
    int constexpr cWarpSize = 32;
    assert( cWarpSize == warpSize );
    #pragma unroll
    for ( int32_t warpDelta = cWarpSize / 2; warpDelta > 0; warpDelta /= 2)
        localReduced = f( localReduced, __shfl_down( localReduced, warpDelta ) );

    if ( threadIdx.x % cWarpSize == 0 )
        atomicFunc( rdpResult, localReduced, f );

@sbastrakov
Copy link
Member

@psychocoderHPC should this one be closed?

@ax3l ax3l added the question label Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants