How do you maximize Transform Feedback performance on PowerVR 6 GPUs?
EXAMPLE: Consider iteratively running transform feedback to generate transformed vertices and then render the result, repeating this multiple times per frame. Suppose you write-to/read-from the “same” region in the same buffer for each iteration. Is this a problem?
If so, how about using “different” regions in the same buffer for each iteration? What about different buffers? What about using different regions of the same buffer “if” the TF passes are grouped together before any of the draws sourcing from the buffer?
MOTIVATION: The reason I ask this question is that when we’ve tried TF before on PowerVR, the results were underwhelming. It was actually faster to transform the data on the CPU and then stream the now-larger vertex stream to the GPU for rendering. That doesn’t seem right. Feels like some driver blocks may be kicking in.
Underlying my question is: how do we avoid all implicit pipeline blocking/synchronization in the driver associated with TF and achieve completely asynchronous submission and rendering?
Thanks in advance for any tips!