I'm currently implementing a kind of reduction.
Generally the reduction is conducted by rendering rectangles multiple times, by reducing their size in the order of 2.
I have to sum values in each 11x11 patch in a texture.
My basic implementation was computing horizontal sum for 11 texels, and then do the same thing in vertical.
However, reading 11 texels in a fragment shader is slow as known in general.
In this case, could I improve the speed if I do the sum in a typical way ?
(By putting a 11x11 patch to a 16x16 patch and performing rendering multiple times)
Actually, both methods seem not much different in computational complexity to me.