iPad poor performance on texture lookup?

Hi, I’m working on Apps with iPad platform. I got a weird thing. If I fetch texture only once in my fragment shader, then the fps can be 60. But if I run two texture lookups, the fps drop down to 27.


Is it possible? Thanks in advance for any suggestions.


Here is my fragment shader code, and my texture size is 1024x1024





//////////////////////////////////////////////


precision mediump float;





uniform sampler2D in_texture;





varying vec4 out_tex_coord;





void main()


{


     


     vec4 col1 = texture2D(in_texture, out_tex_coord.xy);


     vec4 col2 = texture2D(in_texture, out_tex_coord.zw);


     


     


     gl_FragColor = (col1+col2)/2.0;


}


If your application is bandwidth limited then increasing that bandwidth use will directly affect your framerate. As the iOS devices are v-synced then average fps tends towards numbers like 30 or 40 so dropping from 60fps can mean a direct fall to 30fps. Getting such a drop from two texture reads seems a little extreme, however.

What texture dimensions and format are you using and what sizes of triangles are these being applied to? What filter/mipmapping modes are set for these? What other memory intensive stuff is your application doing?

Remember, on this system and most other mobile devices, there is a unified memory architecture so bandwidth use by one system can affect others.


A better answer has been pointed out to me (thanks, I missed that%3cmbarrass%3cd):

The answer is of course that the second texture lookup is a dependent lookup due to the swizzle on out_tex_coord. His shader should be changed to

Code:
precision mediump float;

uniform sampler2D in_texture;

varying vec2 out_tex_coord;
varying vec2 out_tex_coord2;

void main()
{

vec4 col1 = texture2D(in_texture, out_tex_coord);
vec4 col2 = texture2D(in_texture, out_tex_coord2);


gl_FragColor = (col1+col2)/2.0;
}

Dependent texture reads are bad for performance. The major problem with them is that the texture fetch can only be initiated once the shader has calculated the texture coordinate to be read from and so the usual optimisations to mask the latency of a fetch can't be carried out.

Actually, while the shader above does fix the dependent read, it’s still not very efficient - the compiler with our shader editor PVRUniSCoEditor reports 11 cycles. Using precision modifiers can make a big difference - this shader does the same thing and gives 4:

Code:

uniform  sampler2D in_texture;

varying mediump vec2 out_tex_coord;
varying mediump vec2 out_tex_coord2;

void main()
{

    lowp vec4 col1 = texture2D(in_texture, out_tex_coord);
    lowp vec4 col2 = texture2D(in_texture, out_tex_coord2);
    gl_FragColor = (col1+col2)*0.5;
}