mask shader performance

wogz · March 22, 2011, 4:00pm

Hello,

I’m implementing the renderer for my game in OpenGL ES 2 and targeting Android mobile devices, like Galaxy S.

I have a fragment shader which does some tricky texture masking to a screen aligned quad (ortho projection).

Thing is it’s currently running with 17 fps. I have tried many things to optimize it, like:

+ removed all branching code (ifs) and float comparison (3 fps increase)
+ tried to put lowp to everything (1 fps increase on average)

For instance if I use comment out line in main() as the color output… performance goes up to 40-50 fps.

Any ideas why this runs so slow?

Code:

#ifdef GL_FRAGMENT_PRECISION_HIGH
   // Default precision
   precision highp float;
#else
   precision mediump float;
#endif

uniform sampler2D ScreenSampler;
uniform sampler2D BGSampler;
uniform sampler2D RockSampler;
uniform sampler2D HomeSampler;
uniform sampler2D GrassSampler;
uniform sampler2D GoldenSampler;

uniform vec2 Texel;
uniform vec2 Aspect;
uniform float Wave;

varying vec2 vTextureCoord;

vec4 PixelShaderFn()
{
   vec4   bg =      texture2D(BGSampler, vTextureCoord * Aspect);
   vec4   wave1 =   texture2D(BGSampler, vec2(vTextureCoord.x - Wave * 0.03, vTextureCoord.y - Wave * 0.01));
   vec4   wave2 =   texture2D(BGSampler, vec2(vTextureCoord.x + Wave * 0.02, vTextureCoord.y + Wave * 0.02));
   vec4   rock =    texture2D(RockSampler, vTextureCoord * Aspect);
   vec4   home =    texture2D(HomeSampler, vTextureCoord * Aspect);
   vec4   grass =   texture2D(GrassSampler, vTextureCoord * Aspect);
   vec4   golden = texture2D(GoldenSampler, vTextureCoord * Aspect * 4.0);

   vec4   color =   texture2D(ScreenSampler, vTextureCoord);
   float top =     texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y - Texel.y * 4.0)).r;
   float top2 =    texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y - Texel.y * 9.0)).r;
   float top3 =    texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y - Texel.y * 22.0)).r;
   float bottom = texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y + Texel.y * 2.0)).r;
   float bottom2 = texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y + Texel.y * 4.0)).r;
   float bottom3 = texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y + Texel.y * 7.0)).r;

   float groundX = step(0.5, color.g);
   vec4   ground = grass * (1.0 - groundX) + home * groundX;
   float goldX =   color.b * color.r;
   float lavaX =   color.b - color.r;

   vec4 lava =    vec4(1.0, wave1.g + wave2.g, wave1.ba);
         bg =      bg * (1.0 - lavaX) + lava * lavaX;
         top =     (top * 0.05 + top2 * 0.09 + top3 * 0.15) * (1.0 - color.r);
         bottom = (bottom * 0.2 + bottom2 * 0.6 + bottom3) * (1.0 - color.r);

   vec4 base =   bg * (1.0 - color.r) + rock * color.r;
         base =   base * (1.0 - bottom) + ground * (bottom - top);
         base =   base * (1.0 - goldX) + golden * goldX;

   return base;
}

void main(void)
{
    //gl_FragColor = texture2D( ScreenSampler, vTextureCoord );
    gl_FragColor = PixelShaderFn();
}

JoeDavis · March 22, 2011, 4:36pm

Hi,

The main reason your performance is suffering is because of the large number of dependent texture reads you are performing. Every time you use a texture coordinate that has been calculated in the fragment shader as a parameter to texture2D(), the thread processing that fragment will have to stall until the texture data is retrieved. POWERVR hardware is capable of hiding the latency of a few dependent texture reads as the hardware will schedule in other threads until the data is retrieved. When a large number of dependent texture reads are performed, the hardware will reach a point where all queued threads are waiting for texture data.

You should perform independent texture reads where possible instead of dependent reads. Independent texture reads are performed by the hardware when the texture coordinate is already known for the fragment before it is processed, which means the hardware can pre-fetch texture data (e.g. the texture coordinate is a varying or a constant value). For example, if you calculate vTextureCoord * Aspect in the vertex shader and pass it to the fragment shader as a varying, 5 of your texture2D() calls will be independents and faster.

Ideally, you’ll want to reduce the number of texture samples you have here as much as possible. If you can, you should pack textures so instead of only retrieving the data in the .r channel, you could be retrieving data in all .rgba channels.

If you can either post a picture of your rendered effect here, or send it to devtech@imgtec.com I can advice further. If I can understand the end effect you are going for, I think I’ll be able to suggest further optimisations

wogz · March 22, 2011, 5:03pm

Thanks for the tips and fast reply. Pics below.

Some more details…
Top and bottom variables are used to give a sense of depth to the scene according to what is below and above current pixel.

texColor * (1.0 - variable.channel) … or texColor * variable.channel is doing the masking according to the color and picks the correct texture. This avoids using ifs and color comparison.

Any more tips would be appreciated, I will try then out and let you know =P

Mask (ScreenSampler)

Final render

wogz · March 24, 2011, 11:33am

Once more I have run into a weird problem.

After modifying the VS and FS to the ones below, when trying to add the depth color component (bt) the shader compilation fails, that’s it no other error messages. Tried several things but I wasn’t able go around the problem.

But for instance outputting bt as the final color, it looks correct. check the pics below.

Good thing is that the performance increased 10 fps.

Any ideas?

VS

Code:

uniform mat4 uMVPMatrix;
attribute vec4 aPosition;
attribute vec2 aTextureCoord;
varying vec2 vTextureCoord;
varying vec2 vTextureCoordAspect;
varying vec2 vTextureCoordAspect4;

uniform vec2 Aspect;

void main() {
gl_Position = uMVPMatrix * aPosition;
vTextureCoord = aTextureCoord;
vTextureCoordAspect = vTextureCoord * Aspect;
vTextureCoordAspect4 = vTextureCoordAspect * 4.0;
}

FS

Code:

#ifdef GL_FRAGMENT_PRECISION_HIGH
   // Default precision
   precision highp float;
#else
   precision mediump float;
#endif

uniform sampler2D ScreenSampler;
uniform sampler2D BGSampler;
uniform sampler2D RockSampler;
uniform sampler2D HomeSampler;
uniform sampler2D GrassSampler;
uniform sampler2D GoldenSampler;

uniform vec2 Texel;
uniform vec2 Aspect;
uniform float Wave;

varying vec2 vTextureCoord;
varying vec2 vTextureCoordAspect;
varying vec2 vTextureCoordAspect4;

vec4 PixelShaderFn()
{
   vec4   bg =      texture2D(BGSampler, vTextureCoordAspect );
   vec4   wave1 =   texture2D(BGSampler, vec2(vTextureCoord.x - Wave * 0.03, vTextureCoord.y - Wave * 0.01));
   vec4   wave2 =   texture2D(BGSampler, vec2(vTextureCoord.x + Wave * 0.02, vTextureCoord.y + Wave * 0.02));
   vec4   rock =    texture2D(RockSampler, vTextureCoordAspect );
   vec4   home =    texture2D(HomeSampler, vTextureCoordAspect );
   vec4   grass =   texture2D(GrassSampler, vTextureCoordAspect );
   vec4   golden = texture2D(GoldenSampler, vTextureCoordAspect4 );

   vec4   color =   texture2D(ScreenSampler, vTextureCoord);
   float top =     texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y - Texel.y * 4.0)).r;
   float top2 =    texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y - Texel.y * 9.0)).r;
   float top3 =    texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y - Texel.y * 22.0)).r;
   float bottom = texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y + Texel.y * 2.0)).r;
   float bottom2 = texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y + Texel.y * 4.0)).r;
   float bottom3 = texture2D(ScreenSampler, vec2(vTextureCoord.x, vTextureCoord.y + Texel.y * 7.0)).r;

   float groundX = step(0.5, color.g);
   vec4   ground = grass * (1.0 - groundX) + home * groundX;
   float goldX =   color.b * color.r;
   float lavaX =   color.b - color.r;

   vec4 lava =    vec4(1.0, wave1.g + wave2.g, wave1.ba);
         bg =      bg * (1.0 - lavaX) + lava * lavaX;
         top =     (top * 0.05 + top2 * 0.09 + top3 * 0.15) * (1.0 - color.r);
         bottom = (bottom * 0.2 + bottom2 * 0.6 + bottom3) * (1.0 - color.r);

   vec4 bt = ground * (bottom - top);

   vec4 base =   bg * (1.0 - color.r) + rock * color.r;
         base *= (1.0 - bottom);
        //base += bt;
         base =   base * (1.0 - goldX) + golden * goldX;

   return base;
   //return bt;
}

void main(void)
{
    //gl_FragColor = texture2D( ScreenSampler, vTextureCoord );
    gl_FragColor = PixelShaderFn();
}

base without bt

bt

Topic		Replies	Views
"for" or "if" keyword performance problem!!!! PowerVR Insider	3	294	October 11, 2010
Is this optimization trick still true? PowerVR Insider	1	465	January 13, 2017
iPad poor performance on texture lookup? PowerVR Insider	3	287	September 21, 2010
simple but slow Shader? PowerVR Insider	5	291	April 27, 2010
texture2d with bias without dependent texture read. PowerVR Insider	3	353	February 25, 2015

mask shader performance

Related topics