I’m currently optimizing drawing performance. According to perf traces, a significant cpu overhead occurs in the glDrawArrays function, most of it in a function called DrawArraysAutoIndices.
I assume that this function generates indices for the glDrawArrays function which doesn’t require the supply of indices by the user.
Would it be possible to reduce the overhead of these calls by switching to glDrawElements and provide the indices myself?
I believe so, give it a try
you can use https://github.com/zeux/meshoptimizer
to generate cache-efficient indices.
There is also some footprint to ValidateState. Can you tell me what that does and how to avoid it?
Ok, I switched to glDrawElements and now the burden seems to be on a function called DrawElementsIndexBO. Can you tell me what this method does and if its possible to avoid that overhead?
I can’t tell you about how our driver works internally, neither do I know that.
However, judging by the function name ValidateState is for validating the OpenGLES state you set up while rendering. I think you can reduce this by making sure you are not doing redundant state changes.
DrawElementsIndexBO - I think this is simply the draw call. Can you reduce the number of draw calls somehow? Static/dynamic batching?
I have created a screenshot of the perf trace. It seems that the bulk is indeed inside GLES3EmitState method. However, I do not know which state changes cause which of the submethod calls.
It would be nice if someone with deeper knowledge about the driver code could help.
if you can provide us with a PVRTrace recording of your application, we can give you some tips on how to improve performance.
I will see what I can do.
Can you tell me what data is stored in the “common store” for a standard fragment shader / vertex shader scenario?
uniform data is stored in there.
you can read more about it our docs: https://www.imgtec.com/developers/powervr-sdk-tools/documentation/