Question on shader internal execution

I’m working on iOS, so series 5 and 6 GPUs. I’m wondering about a possible optimisation opportunity for an image processing system I’ve written.

At the moment, the rendering happens in 2 passes, using 2 separate shaders, all in lowp or mediump. Those passes are actually separate but could be done in parallel.

However, the recent disclosers about the ALU layouts got me thinking. There’s a mix of 16 and 32 bit ALUs. If I combined my 2 render passes into 1 shader, I could handle 1 pass in low or mediump, and the other in highp. Since they’re completely separable, they could run in parallel.

So, the question is, how well would that actually work, from the shader side at least? I think it mainly comes down to this question: is it likely the compiler is adapting my current shader so it runs on both 16/32bit ALUs? If it’s already filling the pipeline well, it’s going to be a net loss I’m sure, but if the 32bit ALUs are currently sitting empty it could be a big win.

Any hints on how that might play out would be welcome :slight_smile:

Hi Chris,

In Series6 and Series6XT, the pipelines can only take one precision path at a given point in time, i.e. it’s not possible to process F32 & F16 work during the same cycle. The reason for this is that there is shared hardware logic between the different precision paths.



Thanks Joe, that answers it. Makes perfect sense.