How can i run for loop parallel in poverVR GPU

hi all.
i have a for loop which is doing four iterations sequentially. i want to run all four iterations parallel in GPU. is there any methods. all suggestions are welcome.

thank you
Palasani

Hi palasani,

Could you please give more details of the context and the purpose you’re trying to achieve so we can provide good advice?

  1. Are you using a compute pipeline? In that case, are you trying to make all the threads of a workgroup to run in parallel?

  2. Why do you need the loops to be executed in parallel, do you rely on the results of those executions for other parts of your graphics / compute pipeline? (Maybe you’re storing those results in a buffer / local memory in the case of compute).

Best regards,
Alejandro

hi Alejandro
actually i have 4 host buffers and data in it. i call a function clcreatebuffer and create 4 device buffers and copied data into device buffers from host buffers. for this is used for loop for 4 iterations. i think this is executing sequentially and taking much time in coping i want to reduce time. is there any method run this for loop in parallel in GPU
Thanks & Regards
Palasani

Hi Palasani,

The most efficient way to copy from one opencl buffer to another is by calling clEnqueueCopyBuffer() on the CPU which enqueues a copy command, or enqueueWriteBuffer() to copy data from a host memory pointer. If the process is too slow for you, that will be because of the bandwidth limitations of your particular system.

As to your first question, to make a for loop run in parallel on the GPU you take the code inside your for loop and put that in a kernel, using get_global_id() to determine which data to operate on instead of a for loop iterator. Then enqueue as many invocations of said kernel as you need with enqueueNDRangeKernel() and your code will execute in lockstep on the GPU.

Kind Regards,
David

Hi respected David
Thank you for your reply. i have one more question is our PoverVR GPU supports parallel Write operation.

Thanks & Regards
Palasani Sivaiah

Hi Palasani,

Yes PowerVR GPUs are capable of parallel write operations.

Hi David

can you please give an example to write 4 buffers to four different locations on GPU by using clEnqueueWriteBuffer function.

thank you
Palasani Sivaiah

Hi Palasani,

in your c/c++ code:
clEnqueueWriteBuffer(queue, buffer0, CL_FALSE, buffer_offset0, data_size0, data_ptr0, NULL, NULL);
clEnqueueWriteBuffer(queue, buffer1, CL_FALSE, buffer_offset1, data_size1, data_ptr1, NULL, NULL);
clEnqueueWriteBuffer(queue, buffer2, CL_FALSE, buffer_offset2, data_size2, data_ptr2, NULL, NULL);
clEnqueueWriteBuffer(queue, buffer3, CL_FALSE, buffer_offset3, data_size3, data_ptr3, NULL, NULL);

this will send data copy commands to the opencl drivers and you can use clFinish(queue); to wait for the operations to complete, or use events in the last parameter.

The documentation for this function can be found here https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clEnqueueWriteBuffer.html

Hi David
actually my data size is 1024000 and have 4 that buffers. i created 4 device buffers of that size. i write code as you mentioned by placing offset 1024000, 10240002, 10240003 respectively for 4 buffers. the thing here is same buffer1 data is copying to buffer2,3,4 instead of there own data. so how can i set offset to write 4 different host buffers data to 4 different device buffers parallely using threads.

Thanks & Regards
palasani

Hi Palasani,

I think you are confused about the meaning of the offset parameter. From the documentation I linked you in the last reply, the offset parameter is “The offset in bytes in the buffer object to write to” so if your buffer is 1024000 bytes in size and your offset is also 1024000 you are trying to write outside the buffer. From what I can gather from your case, you want to set the offsets to 0.

This conversation has steered well away from powervr related queries so I’d like to kindly recommend you take these general opencl questions to a relevant forum such as stack overflow or a gpu oriented forum. There are also some great resources online for learning OpenCL that can be found easily with a search e.g. https://rocmdocs.amd.com/en/latest/Programming_Guides/Opencl-programming-guide.html