I’m using PVRUniSCoEditor v1.4 to optimize one of my fragment shaders. The shader is pretty large, doing light calculations for multiple lights in a single shader, to save some setup cost. In total, it runs in 109 cycles, according to PVRUniSCoEditor. One section of the shader looks like this:
1 lowp float NdotL = clamp(dot(surface_normal, light_direction), 0.0, 1.0);
2 lowp float NdotR = clamp(dot(surface_normal, light_half_vect), 0.0, 1.0);
5 color = attenuation * vec4(light_color * NdotL, sqrt(NdotR));
with all vectors being lowp. Re-arranging the code a bit, as follows:
1 lowp float NdotL = dot(surface_normal, light_direction);
1 lowp float NdotR = dot(surface_normal, light_half_vect);
4 color = attenuation * clamp(vec4(light_color * NdotL, sqrt(NdotR)), 0.0, 1.0);
reduces the cycle count by 2 cycles. This code is repeated for each light, and the total cost of my shader drops to 99 cycles, a 9% improvement.
However, when I actually execute this shader on my iPhone 3GS, it renders at one frame per second slower than before. I was hoping for, and expecting, an improvement in frame rate.
Is PVRUniSCoEditor tricking me here somehow? Is total cycle count not a valid way to estimate performance (of non-texture lookup operations)? Could Apple's version of compiler be doing a worse job compiling the shader, or perhaps a better job on the first version?
Thankful for replies.