Surprisingly Slow Bloom

primitive · May 7, 2012, 10:48pm

Greetings, PowerVRians,

I’m trying to optimize a bloom pass on iPad 2, and I’m running into performance problems with my shader. In the final stage, I switch from rendering to a texture target to the real backbuffer, then draw a fullscreen quad using the texture of the original scene and a lower rez blurred bloom buffer.

The final pass is taking about 7ms based on a GPU utilization of 21% according to Apple’s Instruments. Is that as fast as I can expect it to go without simplifying the model, or does that performance indicate I’m doing something Wrong? Switching the shader out for a very simple add of the 2 textures results in a draw time of only 4ms, so I guess it could just be my bloom combine is too complex. Any thoughts on speeding this up?

Thanks in advance

    precision lowp float;

    precision lowp sampler2D;

    varying mediump vec2 v_TexCoord;

    varying lowp vec4 v_Color;

    vec3 AdjustSaturation(vec3 color, float saturation)

    {

        // The constants 0.3, 0.59, and 0.11 are chosen because the

        // human eye is more sensitive to green light, and less to blue.

        float grey = dot(color, vec3(0.3, 0.59, 0.11));

        vec3 result = mix(vec3(grey,grey,grey), color, saturation);

        return result;

    }

    uniform sampler2D TextureSampler;

    uniform sampler2D BaseSampler;

    uniform float BloomIntensity;

    uniform float BaseIntensity;

    uniform float BloomSaturation;

    uniform float BaseSaturation;

    void main(void)

    {

        // Look up the bloom and original base image colors.

        vec3 bloom = texture2D(TextureSampler, v_TexCoord).rgb;

        vec3 base = texture2D(BaseSampler, v_TexCoord).rgb;



        // Adjust color saturation and intensity.

        bloom = AdjustSaturation(bloom, BloomSaturation) * BloomIntensity;

        base = AdjustSaturation(base, BaseSaturation);



        // Darken down the base image in areas where there is a lot of bloom,

        // to prevent things looking excessively burned-out.

        base *= (1.0 - clamp(bloom, 0.0, 1.0)) * BaseIntensity;

        // Combine the two images.

     vec4 vout;

        vout.rgb = base + bloom;

     vout.a = 1.0;

     gl_FragColor = vout;

    }

marco · May 8, 2012, 2:47pm

Hi Primitive,

does the saturation adjustment have a great impact on the final image?
Could you adjust the saturation for just the final fragment colour instead of the individual input colours from the textures?

Looking at the cycle counts, the reported 7ms do make sense.
You could speed it up by removing the adjustement completely, this more than halves the amount of instructions that are generated (PVRUniscoEditor).
And as this is a fullscreen shader, every instruction you can avoid is having an impact on your final performance.

Let me know if you need more help!

Regards,
Marco

primitive · May 8, 2012, 5:05pm

Thanks for the answer, Marco. It does have a big impact, yeah. I guess I’ll just make 9 versions of the shader to handle cases where saturation for the 2 parameters are 0 / 1 / SomethingElse to shave 1 or 2 more ms off (that happens a lot) and call it done.

From playing with the compiler, it looks like doing static branching on uniforms is cheaper, but not totally free, so writing 9 shaders by hand, or doing my own shader preprocessor would be faster at run time. Is that understanding correct?

Topic		Replies	Views
iPad GL ES 2.0 poor performance on texture blend PowerVR Insider	9	318	August 23, 2010
iPad poor performance on texture lookup? PowerVR Insider	3	287	September 21, 2010
glFramebufferTexture2D copy it's data PowerVR Insider pvrtune , pvrtrace	2	406	November 18, 2011
Ipad fragment shader performance PowerVR Insider	4	291	December 17, 2010
simple but slow Shader? PowerVR Insider	5	291	April 27, 2010

Surprisingly Slow Bloom

Related topics