Extremely slow render to texture performance

I ran into a slow performance problem with render to texture. To verify this, i modified the HelloAPI code to render to texture instead of render to screen. The screen resolution is 1920x1080 and I changed the triangle to cover the whole screen (2 triangles). Since the driver will not render if it sees the target texture is not used, I have to do a small read using glReadPixel for just one pixel size. With the normal rendering to screen, it takes 27 ms for each frame. For render to texture, it is 87 ms. Is there anything wrong here?

glReadPixels is a very expensive operation that serializes the render. For this reason, it should not be used to force renders to complete. As we’re discussing in this thread, rendering to two FBOs that reference each other should force the driver to kick renders without impacting performance.



Thank you. I will try not to use glReadPixels then. But one question is, how comes it takes 27 ms to render a 1920x1080 rectangle? It’s doing nothing really, isn’t that taking too long time?


I’ve done some calculations, and the triangle rendered full screen on your platform should take ~17ms.

How have you calculated your render time? If you’ve done it in your application, did you disable vsync?

I’d recommend using PVRTune’s timing data to find the cost of your render. Even with VSync enabled, PVRTune will be able to show you the cost of your render in ms.



I just do a while loop to let it render 2000 times, measure the begin time and end time, the result is (end time - begin time/2000.

I took the code from HelloAPI and modified it. It does not seem to have vsync.

I will try the PVRTune when I get a chance.

Just curious: how can you do the calculation and come up with 17ms? How fast does it run on your system with GSX 540?

Your platform may still be VSync limited (which would account for the additional time). PVRTune’s timing data will make it clear if this is the case.

I used a clock speed of 304MHz (from the Pandaboard Wikipedia page). I also used the number of USSE pipes in the 540 (4) and the resolution you’re rendering to.



I got the PVRTune running but what “timing data” I am supposed to look for for Vsync? Read the document and it did not mention vsync.

Could you elaborate on how the 17ms is calculated? What is USSE pipes?

If v-sync is disabled, then the GPU should be constantly busy. For example, a fragment processing limited application should not have any gaps between its 3D (fragment processing) tasks.

If there are periods of time when there are no TA or 3D tasks being processed, then the render is either v-sync limited or CPU limited (a CPU limit can be confirmed by dragging the CPU load counter onto PVRTune’s graph).

Our “Getting Great Graphics Performance with

the PowerVR Insider SDK”
presentation from GDC 2012 gives an overview of how PVRTune can be used to identify bottlenecks.