PVRUniSCoEditor cycle count misleading?

Hello,





I’m using PVRUniSCoEditor v1.4 to optimize one of my fragment shaders. The shader is pretty large, doing light calculations for multiple lights in a single shader, to save some setup cost. In total, it runs in 109 cycles, according to PVRUniSCoEditor. One section of the shader looks like this:





Code:

1     lowp float NdotL = clamp(dot(surface_normal, light_direction), 0.0, 1.0);
2     lowp float NdotR = clamp(dot(surface_normal, light_half_vect), 0.0, 1.0);
5     color = attenuation * vec4(light_color[0] * NdotL, sqrt(NdotR));



with all vectors being lowp. Re-arranging the code a bit, as follows:



Code:

1     lowp float NdotL = dot(surface_normal, light_direction);
1     lowp float NdotR = dot(surface_normal, light_half_vect);
4     color = attenuation * clamp(vec4(light_color[0] * NdotL, sqrt(NdotR)), 0.0, 1.0);



reduces the cycle count by 2 cycles. This code is repeated for each light, and the total cost of my shader drops to 99 cycles, a 9% improvement.



However, when I actually execute this shader on my iPhone 3GS, it renders at one frame per second slower than before. I was hoping for, and expecting, an improvement in frame rate.



Is PVRUniSCoEditor tricking me here somehow? Is total cycle count not a valid way to estimate performance (of non-texture lookup operations)? Could Apple's version of compiler be doing a worse job compiling the shader, or perhaps a better job on the first version?



Thankful for replies.

PVRUniSCoEditor links into an offline SGX compiler (this can be configured in the editor program). This isn’t necessarily identical to the compiler that is on your device and so the output shader cycle counts from it can sometimes differ from those of the shader actually executing on the device.

Given the amount of calculation that you are carrying out , it seems reasonable that your application is limited by the processing of this shader and optimising in the way you have done is the obvious way to improve this.

It looks like Apple’s compiler is differing from the one we bundle with PVRUniSCoEditor in this case, as you suspect.

What is your reason for using so many lights?

Gordon, thanks for replying.





I’m working on a graphics engine to be used in a couple of upcoming projects, and I’m trying to get it as fast as possible. My test scene has four point lights, and each fragment is per-fragment shaded. This would represent the worst case rather than the common case scenario when the engine is actually used.





Also, many lights are pretty.





I suppose I’ll have to talk to Apple when it comes to questions regarding their compiler. Do you know if it is based on a reference compiler provided by you?

I can’t talk about details of Apple’s compiler I’m afraid. You would have to approach them.

Overall, the accuracy of the PVRUnSCoEditor output should be improved in our next releases, as it will reflect more of the context in which shader instructions are used. When compared to a device, unless it’s exactly the same compiler, then there will still be no way of guaranteeing that the shader binaries will match and so the instruction count output may still be different, though.

Four lights is still quite a lot, but if it’s worse case then you may be okay - I would suggest aiming for a lower instruction total if you can. If you can hit a high frame rate with many lights then it may well be pretty - I imagine fewer lights, but faster rendering may be prettier :wink:

We would be interested to hear about how your projects are progressing - and if you need any help with shader optimisation and technical support in general then please post further.

Definitely, I’m aiming for a stable 30 fps.





Thanks for your help. I’m sure I’ll call for your expertise again.