Help with speeding up a pixel shader please...

This is a shader I use to add extra detail to a low resolution texture. You can think of it as colour keying. Running this shader is costing me about 6fps and I think it’s because of the way I use the dynamic branching. Any idea’s on speeding it up?

I did a test were the if’s were always true and it ran quicker so it’s not the extra texture sampling, I think…

My best bet is to use lessThan function cast the result to a vec4 and do a mix.

For example

vec3 diff = texel.xyz - detail_mapping[0].xyz;           
vec3 replace = lessThan(dot(diff,diff),detail_mapping[0].www);


texel.xyz = mix(texel.xyz, texture2D(tex_sampler1,v_texcoord_detail),replace);



But for just doing a simple “replace colour with another colour if colour == A_value” seems a lot of work.

precision lowp float;               
uniform bvec4 flags;               
uniform sampler2D tex_sampler0;   
uniform vec4 fog_colour;           
uniform vec4 detail_mapping[2];   
uniform sampler2D tex_sampler1;   
uniform sampler2D tex_sampler2;   
varying mediump vec2 v_texcoord;           
varying mediump vec2 v_texcoord_detail;   
varying vec4 v_colour;               
varying float v_fog_factor;       
void main()
{
      vec4 texel = texture2D(tex_sampler0,v_texcoord);       
      vec3 diff = texel.xyz - detail_mapping[0].xyz;           

      if( dot(diff,diff) < detail_mapping[0].w  )               
      {                                                       
          texel = texture2D(tex_sampler1,v_texcoord_detail);   
      }                                                       
      diff = texel.xyz - detail_mapping[1].xyz;               
      if( dot(diff,diff) < detail_mapping[1].w  )               
      {                                                       
          texel = texture2D(tex_sampler2,v_texcoord_detail);   
      }                                                       
      gl_FragColor = texel * v_colour;                       

    if( flags[2] )
    {
        gl_FragColor = mix(fog_colour,gl_FragColor,v_fog_factor);
    }
}

I’ve gained an extra 4 to 5 fps using step to do the test and mix to do the replacement.

vec3 diff = texel.xyz - detail_mapping[0].xyz;
float replace = step(dot(diff,diff),detail_mapping[0].w);
texel = mix(texel,texture2D(tex_sampler1,v_texcoord_detail),replace);

I would have thought the compiler would have done the same???