How does framebuffer fetch influence performance?

desperado · December 16, 2019, 1:38pm

Greetings,

is there documentation available that describes whether and to what extend using GL_EXT_shader_framebuffer_fetch in GLSL fragment shaders influences the performance of rendering?

Regards

MartonTamas · December 16, 2019, 2:19pm

Hi,

framebuffer fetch is quite well supported in our modern GPUs.
In terms of performance FBO Fetch operations use the GPUs on-chip memory therefore it does not incur system memory bandwidth cost. While this on-chip memory is significantly faster than system memory it does have finite bandwidth but it’s a lot more difficult to exhaust.
Also, we need to fit FBO pixels onto this on-chip memory, if you use too much we need to reduce the number of tiles processed at a given time which will result in worse memory latency hiding capabilities.
You can check the number of tiles processed in parallel using PVRTune’s “Tiles in Flight” counter. This of course needs latest PVRTuneComplete and latest drivers.

bests,
Marton

desperado · December 16, 2019, 2:26pm

Hello,

thanks for your fast answer. Does that mean framebuffer fetches are faster than say texture fetches (because those are from system memory, right?).

In my scenario, there will be one framebuffer fetch per fragment shader invokation.

MartonTamas · December 16, 2019, 2:43pm

Hi,

they are only comparable if you are fetching the same data in the same pattern, eg. read in a viewport size texture 1-to-1 with a fragment shader VS fbo fetch the previous pass’ results.
It could be faster, but the main benefit should be less bandwidth usage.

What would be your usage scenario?

thanks,
Marton

desperado · December 16, 2019, 3:21pm

The usage scenario is as follows:

I render a number of objects with depth test. There are subtypes of objects that need globally ordered drawing among themselves, that means object type A is always in front of object type B etc. . This is currently done using stencil test. Problem is, that I have to use a separate drawcall every time I change the stencil reference value which causes 10 x more drawcalls than without the test. Now I examine multipass schemes that might be able to evaluate the order in place and for that framebuffer fetch may be useful.

MartonTamas · December 16, 2019, 3:37pm

Hi,

have you thought of ordering draw calls and submitting them in the correct order? eg. something like this:
http://realtimecollisiondetection.net/blog/?p=86

bests,
Marton

desperado · December 16, 2019, 3:50pm

Problem is I cannot do that as the objects are stored in separate VBOs and generated at different times. Performing a global sorting every time the view changes would be expensive.

MartonTamas · December 16, 2019, 4:08pm

I think they are collecting draw calls in each frame, sorting them using this method then drawing them in the correct order. But that’s understandable if you think that would be too expensive.

MartonTamas · December 16, 2019, 4:10pm

Note that they also had a situation where flexibility was a priority:
“In the past, however, we had slightly different criteria and used a different solution. On the PS2 we were not willing to sacrifice speed for flexibility (and the needs for flexibility were lesser too) so we simply had predefined buckets for all the different combinations of layers, viewports, translucency, etc. and just inserted draw calls directly into the appropriate bin. On drawing, we would just process each bucket in whichever order we wanted to draw them, no sorting necessary! Obviously, this system is much faster with no sorting required, but it is also much less flexible, and doesn’t deal well with e.g. including depth sorting as part of the bucketing. (A very similar bucketing system was used on the PS1 as well.)”

desperado · December 16, 2019, 4:27pm

Problem is even if I did sort drawcalls I would still have to do the VBO switch which requires another drawcall. Currently the outer loop features VBO switch and the inner loop features stencil reference value switch. Sorting the drawcalls for stencil switch would just move the VBO switch to the inside and the result is the same.

Topic		Replies	Views
Performance of glCopyBufferSubData in GPU -> GPU case	1	879	August 27, 2019
glBindFramebuffer slow PowerVR Insider pvrtune	2	582	February 14, 2012
Clarifications on some Power VR Architecture tech PowerVR Insider	2	385	September 21, 2011
PBO (Pixel Buffer Object) performance issue PowerVR Insider	1	52	October 9, 2024
glClear PowerVR Insider	1	340	April 21, 2008

How does framebuffer fetch influence performance?

Related topics