Replies: 2 comments 5 replies
-
The key focus of ISPC is providing an efficient programming model to program SIMD hardware, i.e. to exploit vector parallelism (i.e. same level as CUDA warp). When it comes to managing threads, ISPC strategy is to be compatible with any external threading libraries that user may want to use. If you'd like to target CPU (ISPC supports x86 and ARM CPUs and Intel GPUs), then it's important to emphasis the difference in job scheduling versus CUDA or ISPC for GPU. On GPU you have device/host model - host code spawns multiple device threads over some iteration space (in CUDA through
Depending on the approach you take, the answers to the questions above will vary.
If ISPC RT is not used, then error checking for memory allocation is done through standard CPU mechanisms available in C++. If ISPCRT is used, then it's similar to CUDA. Note that ISPCRT is just a wrapper over Level Zero library for GPU, which also enables scheduling to CPU.
CUDA Synchronization also varies - in approach 1 it's completely on C++ side. In 2 it's a
When targeting CPU, any global is a regular global variable as you would have it in C++. Typically you would want to have
Check Atomic Operations and Memory Fences
Correct.
You don't need it, unless you'd like to use ISPCRT and target both CPU and GPU at the same time.
This depends on how you do threading.
To fill the memory you can use As for working with files, ISPC doesn't have builtin way working with files, you will need to write to memory buffers and write them to files on C++ side. As for data type limits, ISPC data types have well-defined width. I.e. |
Beta Was this translation helpful? Give feedback.
-
@dbabokin I am especially confused with implementing/porting the following cuda lines/logic :--
|
Beta Was this translation helpful? Give feedback.
-
I am trying to port some cuda c code to ispcc to run spmd program on cpu.
I am having a few doubts like (answers to any/some of these will help a lot) ;–
Does the Ispc code need error checking mechanism like CUDA_CHECK_ERRORS which is useful for checking proper memory allocation on device/host.
I believe global qualifier is for compiler to know to run function on device and __syncthreads() is for barrier synchronization among instances which are probably not needed in ispcc so is there something I am missing.?
I think threadidx in cuda c is equivalent to programIndex in ispcc as cuda uses threads in blocks whereas ispc uses program instances in gangs however I am having trouble porting this statement (const unsigned int id = 32 * blockIdx.x + threadIdx.x;). My thought on this is that 1 block contains 32 threads and each id is for unique thread but in ispcc I would use const unsigned int id = programIndex; Will this be correct??.
Should I use “uniform” keyword for declaring global variables as these will be shared across the gang . Also shared keyword is used in cuda c to to make variables reside in shared memory easing communication between threads in the same block , is some similar mechanism possible in ispcc or should I ignore it ??
atomicAdd like atomic operations used in cuda C is good for operating safely on memory without being affected by other threads. Its equivalents in ispcc are atomic_add_global and atomic_add_local. Which one should be used in porting code ??
Regarding memory allocation since there is no host/device i believe there is no need for using two variables like var1 and device_var1 for same value and then using memcopy, instead I think of using single variable and also for porting statements like "cudaMalloc (&var, 4 * sizeof(float))" i'll use "float* var = new float[4] instead'. Am I thinking in right direction?.
Statements like cudaSetDevice(0) and cudaHostAllocPortable used for device selection and allocating page-locked memory as portable are mostly useful when dealing with multiple gpu devices. Is any similar mechanism required in Ispcc or these should be ignored .?
For porting the following code(which is probably used to launch kernels using multidimensional grids of blocks and threads of device in cuda) to ispcc :--
const dim3 threads(32, 1);
const dim3 grid(1, 1);
initKernel<<<grid, threads>>>(first_var);
updatKernel<<<grid, threads>>>(device_var1,device_var2);
I plan to use these lines directly ;--
initKernell((first_var); // here initkernel is just the name of function in ispc file
updatKernel((device_var1,device_var2);
Should I be using something else corresponding to thread and block information or is this ispc function call fine??..
Instead of std::fill_n(used for assigning values in std c++) and which I believe won't be present in ispcc (please correct me if I am wrong), I am planning to use a for loop to fill the array's elements and instead of std::ofstream ( used for writing data to files ) which is used here;-
"std::ofstream file1("file1.csv")"; and then
"file1 << i << "," << array_of_unsigned_ints[ s ] << std::endl;" // s is the index
I am planning to use the C-style file handling :-- FILE *file1 = fopen("file1.csv","w") ; fputs(text,file1); fclose(file1); // where text is defined as char text[] = strcat(strcat(strcat( i , "," ) , array_of_unsigned_ints[ s ]), "\n"). Will these be fine or I may do something better ?. Also I believe for putting text which is a character array I might need to find the maximum no. of of characters possible in an unsigned_integer ,, so what is it ??
Beta Was this translation helpful? Give feedback.
All reactions