PBO (Pixel Buffer Object) performance issue

Hello everyone,

I have a question regarding the usage of a PBO (PixelBufferObject). We have implemented a data transfer from the GPU to the CPU using GL_PIXEL_UNPACK_BUFFER.

The implementation is functional, but the performance seems to be very weak. I also made measurements with a plain glReadPixels, and the runtime was nearly the same.

In my understanding, the call to glReadPixels (using a PBO) is asynchronous and should return immediately. However, based on my investigation, the call blocks or at least consumes much runtime.

One possible issue could be not using the internal format of the GPU. I render to a texture bound to an FBO and always use GL_RGBA as the internal format. Is this correct? I also did some tests with GL_BGRA, but the performance is also very weak, so I guess both formats are wrong.

If the internal format does not match the format you want to copy, then a conversion has to be done by the CPU, which blocks the glReadPixels call. What format should I use in this case?

I need to copy small patches of 16x16 pixels every frame. Do you have any general tips to improve the performance and ensure that the PBO uses DMA transfer?

My implementation is based on the following resource: OpenGL PBO Tutorial

Thank you for your assistance.

Hi marcus,

Thanks a lot for your message and welcome to the PowerVR Developer Forum!

My advice is to take a PVRTune recording and look for possible bottlenecks to understand where the main issues are in the application. Let me know if you need any indications on how to use the tool or how to do some basic profiling.

Best regards,
Alejandro