Performance of glDraw...() implementations

I know you don’t like to answer iPhone-specific questions, but I think I’ve phrased this question in a way that neatly avoids that problem.





My first profile of my application running on the iPhone shows that over 15% of the CPU time is spent in something called CopyIndexedData(), called from glDrawElements(). If I modify the code to use glDrawArrays() instead of glDrawElements(), the hot spot changes to CopyData(). If I use vertex buffer objects, the behavior is unchanged.





It seems to me that OpenGL ES could optimize this for vertex buffer objects – since their data can only change after glMapBuffer() or glBufferSubData(), OpenGL ES can detect when the programmer is trying to modify a VBO that’s being drawn with, and do something appropriate, like wait for rendering to finish before continuing, or internally allocate a new copy of the VBO.





My question is…is there anything inherent about the hardware design of the PowerVR MBX Lite that says glDraw…() has to behave this way? Does the GPU have some need to copy the data? Is this the “deferred” copy the GPU needs in order to do its tile-based deferred rendering? Or does this performance bug have nothing to do with the nature of the GPU?





Thank you in advance for any clarifications.





I’m afraid questions about the iPhone need to be addressed to Apple and we can’t really discuss the details of how they’ve implemented drivers regarding this issue here.





We are aware of the situation described, but all I can suggest is that you make a similar post on Apple’s own forums about this or contact Apple directly.





The PowerVR MBX Lite hardware doesn’t intrinsically require a copy of data, but implementation on a specific hardware platform may do.