PowerVR GPUs and Multiple Draw Threads

The generally accepted advice for rendering two very content-heavy “GPU bound” surfaces/displays on a single GPU with a single application is to render them sequentially, offloading all non-GL CPU-heavy or IO-heavy operations to background threads when possible (to take advantage of multicore). This minimizes context swap overhead where 1) the driver has to serialize command streams for two different surfaces to the shared GPU resource and 2) swap the GPU’s use between two competing users, adding needless cost.

Does this advice extend to PowerVR GPUs?

(The only exception to this I’m aware of for PowerVR GPUs is performing non-rendering largely CPU “prep” tasks such as texture/buffer uploading and precompiling shaders. IIRC, it’s acceptable and recommended to off-load this overhead to a background thread. But please check me on that!)

Thanks!

Hello Dark_Photon,

Sorry for the delay.

There is an added complication if the two surfaces render at different rates. For example, one surface may have a different VSync rate or a different SwapInterval.

With the same render rates the single thread solution is a good choice. With different render rates I would recommend two threads. For example, one window may have a simple scene required to draw at 30fps always. The second window may have a complex scene allowed to drop below 30fps. By having two threads the simple scene can run at 30fps whilst the complex one can drop to 20fps and lower.

Please note the correct behavior of this will be specific to the platform and driver.

Thanks,
Paul

Thanks Paul. What are the “gotchas” we need to watch out for here? What GL resource contention exists within the PowerVR driver when rendering is occurring to the same GPU from multiple threads, and what can we do to avoid it (we’ve had some bad experience in the past with this)? How much overhead should we expect associated with having two threads render to the same Series 6 GPU simultaneously?

Also, I guess the multithreaded draw approach assumes that we will be able to have two hot draw threads running concurrently on the CPU rather than just one.

Also, one quick follow-up question:

As a low-risk first-step to move us toward a full-up multithreaded draw, we were thinking about:

  1. DRAW THREAD #1: Draw surface A and B (all but SwapBuffers),
  2. DRAW THREAD #2: Do the SwapBuffers for surface B
  3. DRAW THREAD #1: Do the SwapBuffers for surface A

Rendering surface B is fairly cheap, and can happen at a lower rate. So the main concern is the out-of-sync VSync clocks on the two displays, and the resulting idle time in SwapBuffers for display B due to the wait-for-VSync. We don’t want to be blocking draw thread #1 (the main draw thread) for this.

Does this approach sound reasonable?

Thanks!

The big gotcha would be eglMakeCurrent - this call causes a full flush. I believe the description you gave would require multiple eglMakeCurrent calls on thread 1.

My recommendation is two graphic contexts (shared context) -
Thread 1 draw and Swap A
Thread 2 draw and Swap B

You could go further and use a 3rd context/thread for resource loading!

Sorry that we can’t be more specific on behavior. We do not have a system at this time with independent v-sync sources available. Specifically, a platform that doesn’t use a compositor.

As a result I have to recommend writing a test to emulate heavy multi-threaded workloads (simple looped draw calls etc) and observe behavior. Furthermore, the display controller will be from a third party and will also affect performance of this.

Any more questions - please let me know.

Thanks,
Paul

Thanks, Paul! That’s great info to have before we go down this road!

Your approach sounds like to best way to go long-term. As a low-risk first-step though (short timeline; trying to minimize the amount of multithreaded code here :-), could we take your approach and tweak it a bit as follows?:

  1. DRAW THREAD #1: Draw B’s contents to texture T via FBO. Draw and Swap surface A.
  2. DRAW THREAD #2: Blit texture T to surface B. Swap surface B.

With shared contexts, this “should” avoid the need for either thread to do an eglMakeCurrent (past startup).

Details:

  • A sync object would be raised by thread #1 to let thread #2 know when it can swap surface B.
  • Further, EXT_multisampled_render_to_texture would be used for FBO rendering so that the MSAA rep never needs to be flushed to main memory.

Would this be relatively efficient?

Also, re “full flush”. Do you mean a full pipeline flush including 3D?

Yes - full flush will be include 3D.

Your approach is interesting and should avoid eglMakeCurrent calls.

Following the flow:
Thread A -> Draw B -> Sync -> Draw A -> Swap A
Thread B -> Check Sync -> Draw old texture or new -> Swap B

Could you state what fps/swap interval the threads would be running at?

If Thread B runs at a lower rate you could selectively miss doing the Draw for B.

Thanks,
Paul

Thanks Paul! I appreciate your feedback. The redraw rate for A is 30Hz (60Hz scan-out; SwapInterval 2), and the frame rate for B might be as low as 10Hz or as high as 30Hz (30Hz scan-out; SwapInterval 1). The frame rate on B isn’t quite so critical as the view is “nearly” static. So it sounds like this approach may be practical.

Yes - I think it could allow you to skip drawing Frame B when you need more time to draw A.

Let me know the results!

Will do! As always, I appreciate your expertise!