add CUDA intrinsics #23

mxmlnkn · 2016-02-28T01:26:15Z

I have code which makes use of warpSize and __shfl_down. The latter may be impossible to implement with alpaka, but warpSize could be mapped to something of value, e.g. elemDim.

The text was updated successfully, but these errors were encountered:

psychocoderHPC · 2016-02-28T07:22:10Z

Which operation do you perform with warpsize?
Warp is no layer of alpaka therefore you can use it with #ifdef on nvidia hardware and on all other hardware write your algorithm so that your warpsize is one.

mxmlnkn · 2016-02-28T10:43:16Z

I'm doing __shfl_down reduction using warpSize. I guess it's very CUDA specific anyway

    /* reduce per warp (warpSize == 32 assumed) */
    int constexpr cWarpSize = 32;
    assert( cWarpSize == warpSize );
    #pragma unroll
    for ( int32_t warpDelta = cWarpSize / 2; warpDelta > 0; warpDelta /= 2)
        localReduced = f( localReduced, __shfl_down( localReduced, warpDelta ) );

    if ( threadIdx.x % cWarpSize == 0 )
        atomicFunc( rdpResult, localReduced, f );

sbastrakov · 2020-02-04T12:32:07Z

@psychocoderHPC should this one be closed?

ax3l added the question label Feb 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CUDA intrinsics #23

add CUDA intrinsics #23

mxmlnkn commented Feb 28, 2016

psychocoderHPC commented Feb 28, 2016

mxmlnkn commented Feb 28, 2016

sbastrakov commented Feb 4, 2020

add CUDA intrinsics #23

add CUDA intrinsics #23

Comments

mxmlnkn commented Feb 28, 2016

psychocoderHPC commented Feb 28, 2016

mxmlnkn commented Feb 28, 2016

sbastrakov commented Feb 4, 2020