Using textures with premultplied alpha efficiently

th_in_gs · November 4, 2010, 12:21pm

I’m attempting to switch to using textures with premultiplied alpha in my fragment shader (because the source of the texture images uses premultplied alpha, and it seems wasteful to un-premultiply them). My intuition tells me that this should be slightly /more/ efficient than using regular non-premultiplied textures, but I’m seeing lower performance - presumably because my calculation is not as efficient as the built-in mix() function.

My testing’s being done on an Apple iPad.

This is the (straightforward) code I’m using for non-premultiplied textures:

 lowp vec4 contentsColor = texture2D(sContentsTexture, vContentsCoordinate);

 lowp vec4 zoomedColor = texture2D(sZoomedContentsTexture, vZoomedContentsCoordinate);

 contentsColor = mix(contentsColor, zoomedColor, zoomedColor);

[/CODE]

Using the premultipleed sZoomedContentsTexture, I switch the mix() line to:

     contentsColor = (contentsColor * (1.0 - zoomedColor)) + zoomedColor; 

And see my frame rate drop.

Is there anything I can do to make this more efficient (beyond giving up on the premultiplied textures, of course)?

     lowp vec4 contentsColor = texture2D(sContentsTexture, vContentsCoordinate);     lowp vec4 zoomedColor = texture2D(sZoomedContentsTexture, vZoomedContentsCoordinate);     contentsColor = mix(contentsColor, zoomedColor, zoomedColor); 

Using the premultipleed sZoomedContentsTexture, I switch the mix() line to:

 contentsColor = (contentsColor * (1.0 - zoomedColor)) + zoomedColor;

[/CODE]

And see my frame rate drop.

Is there anything I can do to make this more efficient (beyond giving up on the premultiplied textures, of course)?

     contentsColor = (contentsColor * (1.0 - zoomedColor)) + zoomedColor; 

And see my frame rate drop.

Is there anything I can do to make this more efficient (beyond giving up on the premultiplied textures, of course)?

martin-kraus · November 4, 2010, 1:27pm

I assume the lines should be
<pre =“BBcode”>contentsColor = mix(contentsColor, zoomedColor, zoomedColor.a);
and
<pre =“BBcode”>contentsColor = (contentsColor * (1.0 - zoomedColor.a)) + zoomedColor;
How does the following perform?
<pre =“BBcode”>float transparency = 1.0 - zoomedColor.a;
contentsColor = contentsColor * transparency + zoomedColor;
In the good old days of assembler-like shading languages there was a command for linear interpolation (corresponding to mix) and also a command for the combination of a multiplication and an addition (often called MAD). Maybe the compiler is clever enough to use it when the expressions are simple enough.

Also, if you don’t need the alpha channel of the result, it might be worth to avoid the computation for the alpha channel.

Well, I’m just guessing…

th_in_gs · November 4, 2010, 1:55pm

Ah, yes, you’re correct about the '.a’s - not sure how those went missing when I was editing the post…

Unfortunately pulling out the subtraction doesn’t seem to help much. Maybe a /little/, but I suspect it’s just rounding:

Speeds are:

With mix(): 30fps

My premultiply-aware code: 17fps

With Martin’s extracted transparency subtraction: 18fps

martin-kraus · November 4, 2010, 2:16pm

Hmmm, is it possible to move the computation of the transparency further up in the fragment shader? (In general it is a good idea to try to move instructions that depend on each other as far apart as possible. Since transparency is used in the computation of contentsColor, this might require the unit to wait for the result of transparency before computing contentsColor. Of course, the computation of transparency itself depends on a texture lookup and it should also be as far away from that line as possible. However, many compilers are pretty good at reordering instructions, thus, it might not make any difference if you change the order. In any case, it shouldn’t hurt to give the compiler a hint.

th_in_gs · November 4, 2010, 4:25pm

I tried moving it as far away as possible (which admittedly is not very far away - this shader doesn’t do much beyond blending textures), but it made no difference. Isn’t that to be expected though? Presumably a similar subtraction has to occur in the mix() function, and it’s plenty fast.

Is mix() a hardware routine? I’m quite confused as to why it’s faster when I’m /trying/ to do a logically simpler operation (hence my thinking there must be a better way to do what I’m trying to do).

martin-kraus · November 4, 2010, 9:53pm

It depends on the hardware but it probably is. Have a look at the GL_ARB_fragment_program extension (http://www.opengl.org/registry/specs/ARB/fragment_program.txt); one of the commands is LRP:

    3.11.5.14  LRP: Linear Interpolation

    The LRP instruction performs a component-wise linear interpolation 
    between the second and third operands using the first operand as the
    blend factor.
    
      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
      result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
      result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
      result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;

Of course, it's up to the hardware producer how they implement the LRP instruction, 
but it's likely that it is rather efficient.
You could also try PowerVRs shader compiler tools (I forgot the name) which should 
give you at least a rough idea about the number of actual instructions.

Martin Kraus2010-11-04 22:55:55

Topic		Replies	Views
Pre-multiplied or not pre-multiplied, that is the question PowerVR Insider	4	367	June 24, 2015
PVRTC and premul alpha: RGB leaking when A is 0? PowerVR Insider pvrtextool	5	400	October 20, 2011
Possible to Layer Alpha Textures within Poly? Poly Stays Opaque No Sorting Issues. PowerVR Insider pvrtune	4	456	June 4, 2014
Help with speeding up a pixel shader please... PowerVR Insider	1	298	March 20, 2009
Pre-multiplied Alpha Limbo? PowerVR Insider	1	301	May 28, 2012

Using textures with premultplied alpha efficiently

Related topics