VertexArray data alignment for GPU?

I went to Apple Tech Talk yesterday and the evangelist ( Allan Schaffer ) mentioned something very interesting about data alignment in interleaved array.





It seems vertices data need to 4 bytes aligned or else the driver will have to pad the incoming stream of vertices on drawArray or drawElements.





ex:





//Don’t do


typedef struct{


short pos[3];


short normal[3];


short textUV[2];


} vertex_t ;





But instead pre-pad like this:





typedef struct{


short pos[4];


short normal[4];


short textUV[4];


} vertex_t ;





It doesn’t really make sense to me because whether you declare an array of size 4 with C, the strive you declare later with the GL vertex/norma/texture pointers will hide it from the GPU anyway. And as far as I know the GPU does copy the data anyway.





I though that maybe the GPU/driver can perform a batch memcpy if the data is properly aligned but then we increase the bandwidth consumption and the slight improvement is lost.





Can anyone elaborate on this recommendation ?


Is it still valid in a shader renderer if we upload the data with glVertexAttribPointer ?





            nicolasbol2009-12-05 16:35:35

Apple support the iPhone themselves so questions are better directed at them, but I believe this is a requirement for the iPhone drivers although it is not necessary for other MBX or SGX platforms that we directly support.





Unfortunately, on the iPhone reducing the size of your vertex data can be quite important for performance so padding seems undesirable, but is necessary. As I say, Apple support the platform themselves so it may be better to approach them.

If only they were as outstanding as you are answering questions it would be perfect ! No kidding I can get someone to look at my question within a day here but If I send it to Apple I’m not even sure they will open my email.





I’ll try anyway, thanks Gordon

I don’t think it is required to have your vertex data 4 bytes aligned on the iPhone … it used to make no difference but I think they made some changes with 3.x releases and now it does matter in terms of performance.





The older iPhone (pre 3gs) for some reason always copies vertex data around regardless if you are using VBOs or not and thus what Gordon said about reducing the size of your vertex data is from my experience directly related to that copy being performed in the driver.


So switching to using shorts, minimizing vertex size etc … has nothing to do with the GPU itself ( in fact using shorts will slow things down a bit in terms of GPU processing) but all of it is done to minimize the cost of the CPU preprocessing step that is applied to all vertex streams.





Look here for some extended discussions regarding this very issue:


https://devforums.apple.com/thread/6546?start=0&tstart=0








I’ve seen posts on the oolong group complaining about crashes if POD data wasn’t aligned, although I haven’t seen it myself - I may be incorrect. Alignment options will be available in the next release of the SDK so it shouldn’t be a problem. Unaligned data does harm performance, like warmi says. As always, profiling both ways is probably wise.