Thread scheduling

from PowerVR Series5 Graphics.SGX architecture guide for developers.1.0.8.External:
This efficient, hardware based, data driven thread scheduling can benefit applications by hiding the latency induced by any stalls, such as dependent texture reads and branching in shaders.
Applications can take advantage of this by placing as much work as possible ahead of the point at which the shader is likely to stall, which will increase the number of instructions the hardware can use to mask latency induced by the stalls.

I fully understand the first paragraph, but could you explain more detail for the second paragraph? thanks. “which will increase the number of cycles the hardware can use”? why it benefits if the application places as much work as possible ahead of the point such as read texture?


The latency is hidden in this case because when a stall occurs for a fragment (A), the hardware can schedule in another fragment (B) to be processed while the hardware resolves the stall (e.g. retrieve dependent texture data).


If the same shader is applied to fragments A and B, then increasing the amount of computation before the stall in the shader means that fragment B will be executing for longer before it stalls, which gives fragment A more time to resolve it’s stall. This is what the document means by “Applications can take advantage of this by placing as much work as possible ahead of the point at which the shader is likely to stall, which will increase the number of instructions the hardware can use to mask latency induced by the stalls.”

thanks a lot for the very clear answer!