PVR GE8320 textureGather performance.md
I'm investigating the performance difference between textureGather and textureFetch on PVR GE8320.
The instruction set reference does not provide much information about the available functionality:
https://docs.imgtec.com/reference-manuals/powervr-instruction-set-reference/html/topics/data-access-instructions/ITRSMP.html
But compiling some simple shaders that load a 4x4 texture block results in 4 textureGather instructions vs 16 textureFetch instructions, which seems to indicate that the instruction set supports this functionality. In particular the former uses 4 `smp2d.fcnorm.replace.data` while the latter issues 16 `smp2d.fcnorm.nncoords.replace`.
However, in the textureGather code path the compiler emits many additional branch and ALU instructions, which seems to indicate that the inputs or outputs of the `smp2d` instruction need to be processed in some way to produce the desired result.
This file has been truncated. show original
spark_test_sol64.textureFetch.GE8320.txt
Coefficient update program:
---------------------------------------------
Address: 0x0000000000[ 0] 0:
mov ft0, ft1, c0, c0
mov ft2, cf0
cbs ft3, cf0
or ft4, _, ft2, _, c0
This file has been truncated. show original
spark_test_sol64.textureGather.GE8320.txt
Coefficient update program:
---------------------------------------------
Address: 0x0000000000[ 0] 0:
mov ft0, ft1, c0, c0
mov ft2, cf0
cbs ft3, cf0
or ft4, _, ft2, _, c0
This file has been truncated. show original