textureGather performance

It appears that textureGather is natively supported on PowerVR GPUs and that each textureGather invocation is translated into a smp2d.fcnorm.replace.data instruction. However, for every one of these instructions the compiler also produces many additional branch and alu instructions.

Does anyone know what these instructions are doing? I’ve put together this gist comparing the results of using textureGather vs textureFetch:

Hi castano,

Thanks for your message.

I will need to look into it and possibly ask some internal teams to try shed light on the reasons behind this difference in the instruction count when using textureGather. I will come back with any findings.

Best regards,

Thanks Alejandro! Most of my clients would like to avoid having to use vendor-specific shader variants, which puts me in the uncomfortable position of choosing what vendor to favor. I could use uniform branching and pick the best path dynamically, but if we could make the textureGather code path work well in all cases that would be much better!