Ipad fragment shader performance

Hi,





I am trying to work out what kind of shading I can support in my ipad application. I have a single quad (two triangles) in my scene with no other geometry. I have zoomed in so that the quad completely fills the screen. The shader program I am running is:








attribute vec4 vertices;


attribute vec4 normals;





uniform mat4 ModelViewProjMatrix;


uniform mat4 ModelViewMatrix;


uniform mat4 NormalMatrix;





varying vec3 normal;


varying vec3 v;





void main(void)


{


    gl_Position = ModelViewProjMatrix * vertices;


    vec4 v4 = ModelViewMatrix * vertices;


    v = v4.xyz;


    normal = normalize(NormalMatrix * normals).xyz;


}





precision highp float;





varying vec3 normal;


varying vec3 v;





void main(void)


{


    vec3 n = normalize(normal);


    if( !gl_FrontFacing ) {


        n = -n;


    }





    vec3 E = normalize(-v);


    vec3 R = -reflect(E,n);


    vec3 lspec = vec3( 0.5, 0.5, 0.5 );


    vec3 ldiff0 = vec3( 1.0, 0.986432, 0.909548 );


    vec3 ldir0 = normalize( vec3( 0.823363,0.212563,0.526204 ) );





    vec3 Idiff = max(dot(n,ldir0), 0.0) * ldiff0;


    vec3 Ispec = lspec * pow(max(dot(R,ldir0),0.0), 1.0);


    


    Idiff *= vec3( 1.0, 0.0, 0.0 );





    gl_FragColor.rgb = Idiff + Ispec;


    gl_FragColor.a = 1.0;


}





I have hard coded the colors of the lights and material. I have also hard coded the direction of the lights and set the specular coefficient to be low (1.0).





I find that I get a frame rate of 10 fps with this shader. If I comment out the use of gl_FrontFacing the frame rate goes up to 15fps. If I reduce the shader to just output a color I get the expected fps of 60. I have disabled blending when doing this drawing.





Are there are shader performance tools available on the mac? This seems pretty basic am I missing something obvious?





Thanks

Hi beeps,





Unfortunately, there are no performance tools available on the Mac currently that do this, as far as I’m aware. We provide a text editor utility with an integrated offline compiler that can give you per line instruction count estimates, which makes it very easy to spot bottlenecks and quick performance optimizations you can make in your shader code. We currently provide this utility for Windows and Linux, but plan to release a Mac version in the future (hopefully for our next SDK release). In the meantime though, dual-booting, using a virtual machine or finding another machine you can run Windows or Linux on will allow you to run this utility.





After running our PVRUniSCoEditor utility with you code, you have ~32 instructions for your vertex shader, which isn’t too bad. The ‘v = v.v4.xyz’ line is redundant as you are not using ‘v’ again. The line with normalize() is ~16 instructions by itself.





Your fragment shader is ~34 instructions, which is very expensive for a full screen effect. The first thing to note is that ‘highp’ precision is not required, as most colour operations can be done with ‘lowp’ precision far more efficiently while still giving the same result (see our OpenGL ES 2.0 Application Development Recommendations document in our SDK for more information). Secondly, normalize is a fairly expensive operation on some SGX revisions. You may find doing using a normalization cube map (as the Water demo does) will require less cycles. The ‘pow()’ call you are doing is redundant, as anything to the power of 1 is equal to the original value. Depending on what you application is doing, ‘gl_FrontFacing’ may also be redundant, e.g. if you are culling all back facing polygons (as you should do unless you have any reason to render them), then ‘gl_FrontFacing’ will never return false.





Rather than zooming into the quad, you may find it easier to render your quad with orthographic projection so that you can fit it exactly to the screen and know that it is screen aligned. There is an example of screen aligned quads being brawn in the Water demo, which should help you with this :slight_smile:





Regards,


Joe

Hi Joe,





I will check out the utility on windows for debugging my shaders. Thanks for the update on the mac version, I look forward to testing it out.





I wasn’t actually trying to do a full screen effect, I was trying to simulate what the performance would be if a user tumbled around in my 3D scene and had most of the fragments occupied by a phong shader that is similiar to a fixed function shader but with per pixel lighting (trying to get an idea for worst case fragment shader performance so that I can at least provide a decent frame rate while tumbling).





I tried changing the precision but it did not have a large affect on performance, I was keeping it as highp due to the use of the pow function. I was actually testing with an exponent of 16.0 typically for the pow, but reduced it to 1.0 while testing to see if it was really costly. I will play with the precision more to see how this helps.





I need the front facing check since the models I am dealing with are not solid models and I need to have two sided lighting.





After some further testing I did determine that the normalize hurts a lot on the ipad. I removed the call to normalize the light vector (since it should have been anyway and put in a normalized version) and the frame rate went up a decent amount.





I also removed the normalize for the incoming normal and vector from the eye to the vertex just to check and I got a frame rate of 45 fps. I will check out the water demo, and the method you mention there, hopefully it will help.





I read the recommendations document in your SDK. Do you have any documents that discuss the guidelines for the number of instructions on various SGX revisions or other more specific guidelines such as normalize (I don’t remember seeing that in the document, sorry if I missed it)? I have mainly written shaders for the desktop so I am trying to get a grasp of what is reasonable on this hardware (specifically the ipad).





Thanks a lot for your help and quick feedback I really appreciate it.


There are many factors that affect shader performance beyond cycle counts alone (texture reads etc.). The large number of pixels on the iPad make fragment shader optimisations important, though. If 45fps is your worse case then I would concentrate on minimizing the occurrence of the worst case and seeing what you can save in your other shaders from here. Please ask further questions if you have any.

Did you try tweaking the precision per variable or just change the default for the shader? We have gained great performance benefit from setting precisions per variable (watch out for conversion expenses) and this is how all our shaders are written in the SDK. A guide to precision use is in the recommendations document. And as Joe says, I’d recommend using PVRUniSCoEditor from the SDK if you can. The instant instruction count feedback it can provide on your changes is invaluable for this kind of optimization. The numbers aren’t always identical to what’s happening on a particular device, but they usually give a pretty good idea.

Hi Gordon,





I actually just tried tweaking the default precision. I have requested access to the SDK and am just waiting for reply. After that I will play with the PVRUniSCoEditor, it seems to be what I am looking for.





The examples I am using are really just to get an idea for what I want to do during tumble and to get an idea for which lighting models make sense for this platform.





Thanks a lot for your help