Hidden surface removal for alpha blended fragment under opaque one

Hi guys!



Like title says I have a little problem understanding why HRS doesn’t seem to work for a my “weird” situation:


  • working with SGX535 (iPhone4)
  • I don’t use depth buffer/depth test
  • basically working with 2D game engine
  • geometry is presorted on CPU so visibility is defined just by the order of draw calls
  • first I draw all my backgrounds, sprites hud etc for the actual game…
  • then I start with my “HSR test” so I draw like 10 full-screen textured quads with blending enabled to render even more “hidden” alpha blended stuff
  • then I do one opaque full-screen textured quad with blending disabled



    the result is that everything what was rendered before the last full-screen quad is not visible but according to my low framerate I can say that every single alpha blended fragments were still processed. I wonder why because HRS should be smart enough not to render anything besides the last opaque full screen quad.



    Do you have any idea why HRS didn’t “discard” those alpha blended fragments under opaque one? Am I missing something or is this “normal” behavior?



    Thanks!

Hi Lokiman,



A depth buffer is required for HSR to function correctly. Try attatching a depth buffer to your framebuffer and see if that helps.



Regards,

Bob

Hello Bob,



Thanks for an advice but I have already tried that yesterday as I suspected it may be an requirement. It still didn’t help because my FPS was the same without it. I am sure that depth test definitely worked but HSR for alpha blended primitives behind an opaque one didn’t.



Btw, why is depth buffer/ depth test required for HSR? This neat feature should work without it - I mean GPU knows about those occluded pixels so what the point of an advanced TBDR if it disables perfect HSR even if it can do it? :smiley:

Hi Lokiman,



In an entirely opaque render drawn in a painter’s algorithm order (where the draw call order is sorted before draw call submission), there will be no overdraw because the HSR will only process a fragment colour for the last object that covered a given pixel area. The GPU does this by using its tag buffer to track the object that should render a colour into a given pixel.



In a render where blended and opaque draw calls are mixed, it’s unlikely that the GPU will be able to completely remove all redundant blending work. I can’t discuss implementation details due to IP sensitively, but essentially the GPU can only remove a certain amount of redundant blending work during the HSR process. As the number of blended layers increases, there will eventually be a point at which the GPU has to kick fragment processing.



One way of avoiding this cost is to attach a depth buffer, translate your objects along the Z axis during your draw call sorting (i.e. the objects that should be drawn first are translated away from the camera, the next object is a little closer to the camera etc), render your all of your opaque objects with an orthographic projection, and finally render all of your blended objects in the same orthographic projection. This will build up the depth buffer when all of the opaque objects are rendered so that the GPU can use the depth values to mask out redundant blending work. Using this approach, the only fragments that will be blended are those that will affect the final rendered image.



You should refer to our “SGX Architecture Guide for Developers” and “PowerVR Performance Recommendations” documents for more information about the PowerVR architecture’s HSR and the best way to render your scene (the documentation can be found in our SDK or online here.



On a slight tangent, another good optimization for any render with blended sprites is to use tight fitting geometry instead of squares. This will allow you to reduce the number of blended fragments you are submitting to the GPU. An example of sprite optimization can be found here: http://www.humus.name/index.php?ID=266

Hi Joe,



First thanks for providing some information about this issue.



Unfortunately I didn’t learn anything new because I am already familiar with documents like “POWERVR Series5 Graphics SGX architecture guide for developers”. and I recommend them to everyone :slight_smile: They provide quite a bit of inside info on how the SGX architecture works but many things are discussed in “foggy” way. I understand that it may be because of the IP :slight_smile:



But I am still curious about one thing you mentioned in your second paragraph:



"… but essentially the GPU can only remove a certain amount of redundant blending work during the HSR process. As the number of blended layers increases, there will eventually be a point at which the GPU has to kick fragment processing."



So you say that HSR work even for blended surfaces under/behind the opaque one but with limited number of surfaces? I’m still talking about scenario where surface visibility is defined just by the order of draw calls and the depth buffer/test is disabled. What do you mean by “at some point has to kick fragment processing”? Is this about overflow in one of GPU’s internal buffers (Parameter/Tag buffer)? I do not know about sizes of those buffers but I tried one more thing - a very simple HRS test to rule out any overflow:



glClear(GL_COLOR_BUFFER_BIT);

glDisable(GL_DEPTH_TEST);

glEnable(GL_BLEND);

glBlendFunc(GL_ONE, GL_ONE);

for (int i = 0; i < 3; i++) {

glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

}

glDisable(GL_BLEND);

glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);



I draw 4 full screen quads with very demanding fragment shader. With only 4 draw call I assume there will be no buffer overflow and thus no forced flush somewhere in between. With this setup GPU must knows that those 3 alpha blended draw calls will be occluded by the last draw call so HSR should filter all 3 calls and there should be no overdraw. But my results (via FPS) show that HSR is indeed somehow broken and not working at all. So my another question is why is that so and what is the reason for not removing redundant blending work?



Once again thanks for help.



PS.: I don’t want to go against the Best Practices (advices like draw opaque first and then alpha blended) but I just simply don’t understand why HSR doesn’t work for my super simple and supposedly HRS friendly test :slight_smile:

HSR relies on a depth value (even when a depth buffer isn’t used) as it has to be certain that the existing fragment at a given pixel occludes the current fragment that is being processed. In the case of a 2D game with blended and opaque layers that are all, effectively, at the same depth, the GPU cannot be certain during the HSR stage that the opaque fragment in the final object is occluding the previous blended layers. You should be able to offset your opaque layer by either translating it or using glPolygonOffset() to ensure that the HSR can mask out some of the blending beneath it.

Hmmm, this is interesting but to me it doesn’t make sense at all :slight_smile: Can we discuss about it a little bit more?



So does the implementation use depth test related GL states like glDepthFunc/glDepthRange/glDepthMask/glClearDepth even though depth test is disabled (or no depth buffer is attached) for HSR too? This sounds weird and almost like a hack to me :slight_smile: Because there is no need to use that in such a scenario. When you rely on draw call order HSR doesn’t need to rely on depth but if you say so then depth specific parameters don’t need to be specified because they all can be set implicitly -> for example just use primitive index (every new rendered triangle increments this index) as its “z-value” and use that when comparing two pixels. Pixel with greater index is in front of the pixel with lower index. Nice and simple behavior but the best thing is that this is also what I expected in the first place. I am not a GL driver programmer and I don’t know how Imagination Technologies handle that kind of stuff but for a scenario with depth test disabled HSR should work something like that :wink:



Now I better stop talking hypothetically and let’s show you some results: I tried to assign different z coordinates (offset the layers) like you suggested for those 4 draw calls but the result is the same - no improvement according to FPS. I tried every possible combination but to no avail ;(



So the ultimate question remains the same: Does HSR really work for alpha blended layers behind an opaque one all rendered “back to front” and if yes then could you please send me a working example? I don’t care if you need depth test enabled, offset polygons of any other tricks to do that - I’m just curious to see how it can be done :slight_smile:



Thanks Joe, I think we can eventually pull this off!

Apologies. The previous comment was completely incorrect. I was misinformed.

As discussed in my original post, HSR is designed to remove overdraw in completely opaque renders. It is not designed to optimize out redundant blending work.

The best way to benefit from HSR is to implement the approach outlined in my first post (use a depth buffer, render all opaque objects, render all blended objects). Rendering blended objects before opaque objects will never result in better performance than rendering all opaque objects first.

Joe, thanks for your clear answer. Now I know that it doesn't work but I think ImgTech should make it work anyway - maybe sometime in future as a nice improvement, hehe (alpha blended surfaces without depth buffer when only draw call order defines z depth)  ;) 

Have a nice day!