TBDR: when does the hardware stop caching drawing commands and start to execute them?

HI everyone:



I’m a Game Developer targeting the iOS Devices which have the Power-VR GPU built in.



I’ve already read the documents for developers such as “PowerVR Series 5.Architecture Guide for Developers”, “POWERVR SGX.OpenGL ES 2.0 Application Development Recommendations.1.1f.External”, “PowerVR.Performance Recommendations” and got the detailed information which are very very helpful about the PVR SGX hardware.



I want to take as much advantage of HSR as possible, so a little issues confusing me:


  1. The Deferred Rendering means the hardware keeps caching(or something like that) the drawing commands and data till some point and then execute them all. This

    could let the ISP perform HSR before fragments processing because the ISP has complete visibility info inside a tile using PB. But I want to know what makes or when does the hardware stop caching and start to execute ?

    I know some cases that will make the hardware start executing :

    a. some opengl calls that would modify opengl objects

    b. glFlush() glFinish()

    c. the Parameter Buffer is full


  2. To render a scene, as my application calling “glDrawArray” or “glDrawElements” again and again using different Textures Binding, how does the hardware deal with the code below:



    glBind(GL_TEXTURE_2D, tex1);



    glDrawArray(…); // 1



    glBind(GL_TEXTURE_2D, tex2);



    glDrawArray(…); // 2



    In the pseudo-code above, the 2 glDrawArray calls use different textures, I want to how the hardware deal with it.

    Does the hardware start executing commands submitted by “glDrawArray 1” just after “glDrawArray 1” is called only because later another different texture is bound?

    Or the hardware will cache as much as possible even if the primitives using different textures or buffer objects(vbo,vao) ? (Assuming Parameter Buffer won’t be overflow)





    The reason that I’m confusing with it is that I think once the hardware stop caching and start to execute drawing command, the HSR processing merely available for the bunch of primitives submitted previously, and I just don’t know how I could maximize the bunch of primitives being executed together taking as much advantage of HSR as possible in as the “glDrawArray” or “glDrawElements” being called one by one.


  3. Giving a rendering situation:

    – drawing two 2D quads:

    A is textured with a image partially transparent(eg. a red opaque circle, with alpha=0 texels round)

    B is overlapped by A.

    – no using of “discard” or “alpha test”, PB won’t be overflow

    – no matter the alpha blend is on or off

    – enable depth test and depth buffer is prepared

    As the principle of HSR described in the documents I mentioned ahead, although A’s texture is partially transparent, the hardware also take it as a opaque primitive processing HSR, so the fragments of B overlapped by A is discarded by HSR, am I right?



    P.S : all situations talked above assuming not using “discard” or “alpha test”, and Parameter Buffer won’t be overflow.



    Thank you very much for your attention. Hope for your reply.

Hi Chris,



The deferred part of our “TBDR” actually refers specifically to the HSR, but you’re correct in that we do queue up commands until we’re ready to execute which is a form of deferring. All modern GPUs actually perform this second kind of deference, and the number of commands queued up is usually a limit set by the driver in terms of numbers of frames. Typically we queue up around 3-4 frames of commands, each frame determined by an eglSwapBuffers() call, or the equivalent for other APIs. However, the first frame is usually executed as soon as possible, with the driver queuing up no more than this limit.



For our hardware, each frame is executed one at a time in each part of the hardware (Vertex/Fragment), and HSR will work on any data it has stored up until either the end of a frame’s vertex data is processed or a vertex flush event occurs for another reason. This is regardless of whether that data uses different textures, blend states, shaders etc. So yes, in the case you’ve described below it will be used effectively.



In the third situation you’ve described, if blending is turned off, then the entire polygon will be rendered opaque - this is due to the way OpenGL works, not due to anything in our hardware/drivers. This being the case, both polygons will be rendered and culled as opaque objects. Assuming A is in front of B then yes, these portions of B will be discarded. If on the other hand, alpha blending was turned on, HSR would be unable to discard anything, as there’s no way of knowing which pixels are going to be entirely opaque until after the fragment shader has executed.



Hope this helps? Let us know if you have any further questions.



Thanks,

Tobias

HI Tobias



Thanks a lot!



I’ve read your reply about the 3rd question.

It seems to be wrong that what I thought about the way the graphics pipeline does.


  1. Do you mean that either the alpha blending or “discard” would make the HSR unavailable?


  2. I think that the blending operation occurs when the fragment’s color is calculated, it won’t be related with HSR.

    I mean that no matter the blending is disabled or enabled, the HSR could just take any fragments as opaque ones even if the texels which are going to be sampled for the fragments is transparent.



    :slight_smile: Hope for your reply



    Chris

Hi Chris,



HSR is not entirely unavailable in these cases - they can still be occluded if an opaque object is drawn in front of them. However, if there is geometry that may be blended or discarded it cannot be occluded by HSR, as it would cause incorrect renders.



To illustrate, take two pixels, A and B, where B renders in front of A. B and A both have the possibility of being discarded, but it’s impossible to tell at the time of the HSR, as it depends on the execution of the fragment shader. If B is discarded and A is not, then if A was occluded by B at time of HSR, no pixel would ever be drawn. The user expectation however is that A is drawn, as this is what was intended. For this reason, HSR is unable to occlude any pixels in this case.



In another case, where B is definitely opaque, but A is discardable, if B is in front of A, HSR can remove A. This is because we always know that B is going to draw, so there’s no possibility that A needs to be drawn.



Regards,

Tobias

Hi Tobias



Thank you for your patient reply.



I realized that the case I described above makes no sense even sounds reasonable to myself.



But I still want to know that how does the hardware make sure the primitive is definitely opaque?

Or, what can I do to let the hardware know that the primitives I’m going to submit is expected to be treated as opaque ones?



Are the primitives submitted while alpha blending being turned off and no “discard” in f-shader treated as opaque ones?



Hope for your reply and thank you very much ! It’s so important to me.





Chris

Hi Chris,



The hardware determines opacity in the most obvious of ways there is - if you have blending switched on in OpenGL ES, or there is a discard keyword in your shader (or if shader framebuffer fetch is used on iOS devices), it treats the object as not opaque.



If none of these conditions are true, it treats the polygon as opaque and performs effective HSR on them. So the key to utilising our HSR is to make sure you have blending turned off when you don’t need it, and avoid using discard in your fragment shader. The same is true for Early-Z on other hardware in fact, so it’s a very worthwhile optimisation.



Thanks,

Tobias

Thank you Tobias!

I got the clear answer in your reply!