Cost of 'static' branches in SGX shaders?

Are 'static' branches optimized out by the SGX driver?
I'm thinking of implementing alpha-test by adding an optional  'discard'  shader cmd inside an if(bUniformDoAlphaTesting) branch in the shader, but I don't want the presence of the 'discard' statement to disable 'early-Z' optimizations for the non-alpha-tested branch, forcing the pixel shader to always be evaluated before z-tests occur.

 

Since I dont really know what's happening 'underneath the hood', have to ask here.

Alternatively I suppose I could make a separate set of shader variations that do alpha-test, but that is more of a pain to manage.

 

Thx,

Kris

 

 

Using the discard keyword in a shader forces the driver to mark that shader as an alpha-test shader and any triangles drawn with it will need to be resolved fully whether using discard or not, eliminating the benefit of the hidden surface removal inherent in the SGX TBDR. In short: this is a bad idea.





The separate shader is a better plan, although even then prefer using alpha blend to using alpha test and reserve any drawing with these shaders for after you’ve rendered opaque geometry.





For further information on this and other optimisations look at:





https://www.imgtec.com/factsheets/SDK/POWERVR%20SGX.OpenGL%20ES%202.0%20Application%20Development%20Recommendations.1.8f.External.pdf





Branching can be a method of optimisation in shaders on SGX, but not in this case.

Gordon wrote:


The separate shader is a better plan, although even then prefer using alpha blend to using alpha test and reserve any drawing with these shaders for after you've rendered opaque geometry.



OK, thanks.  I really do need alpha-test in this case, I've got some intersecting foliage and numerous billboards for which per-primitive back-to-front sorting for alpha-blend is not really feasible.  But based on your feedback I'll make a separate shader for it so other objects dont incur the 'discard' cost.

Ignoring the 'discard' issue for a sec, if I have a shader that does some optional heavy computation (say a spot light and an environment map lookup, instead of a simple directional light), but I put that heavy computation inside a static branch based on an "if" test against uniform const bool or int and the branch is disabled so the heavy computation is skipped, will it be as fast as a shader that doesn't include the branch/heavy computation at all?

I'm wondering if I really need to go to the trouble of generating separate shaders for all such cases, or if the driver makes such 'uber-shaders' efficient anyway so that work would be unnecessary.

Thanks,
Kris
Kris wrote:
Ignoring the 'discard' issue for a sec, if I have a shader that does some optional heavy computation (say a spot light and an environment map lookup, instead of a simple directional light), but I put that heavy computation inside a static branch based on an "if" test against uniform const bool or int and the branch is disabled so the heavy computation is skipped, will it be as fast as a shader that doesn't include the branch/heavy computation at all?

There is a small cost to the if/test itself. If the rest of the shader is very short, this impact can be significant, but it can still be ok if you rarely use the short path or if the number of shader permutations you'd need otherwise is high.

Just to clarify - the branch would be done by the shader processors on the SGX, not in some pre-processing step by the driver.

If you want to avoid the overhead of the ‘if’ in your shader, but do not want to needlessly create separate shader files, you can use C-style #defines, #ifdefs etc to determine the path taken when the shader is compiled by the online complier. You can use the PVRTShaderLoadFromFile(…) function in the SDK tools to pass in multiple defines when loading a shader file, allowing you to repeatedly load the same shader file, but define different paths for it (avoiding the need to create separate shader files).





The Water demo in the most recent version of the SDK implements this functionality to allow different paths to be taken through shaders, without incurring the overhead of the ‘if’.