To pad or not to pad... (Vertex data alignment on PowerVR)

Does this performance recommendation apply to all Series 6 GPUs?:

[blockquote]When vertex data is interleaved, each vertex should be aligned to a four byte boundary.[/blockquote]

(This is from the PowerVR Perf guide, but I thought I’d double-check as sometimes the recommendations haven’t been updated for Series 6.)

Hi Dark_Photon,

I did a quick review of the performance recommendations doc for the 4.0 SDK to make sure the most important Rogue recommendations were in there (hopefully, we’ll be able to make bigger changes for the 4.1 SDK). Here’s the new section on vertex attributes:

[blockquote]For optimal performance on PowerVR Graphics Cores, a mesh with static attribute data should:

• Use indexed triangle lists
• Interleave VBO attribute data
• Ensure that every VBO attribute is used by the shader
• Align to 16 bytes
• Avoid changing the layout of VBO attribute data[/blockquote]

These rules are applicable to SGX and Rogue.

"Ensure that every VBO attribute is used by the shader"
Shouldn’t that be worded as “don’t include unused attributes”?

Good suggestion - that’s definitely clearer. I’ll see if I can squeeze the change in before the 4.0 release.

Thanks, Joe. Two questions:

[blockquote]* Align to 16 bytes[/blockquote]

Does this mean:

  1. Align the starting address/offset of each batch to a 16-byte aligned boundary, or
  2. Align every vertex within an interleaved VBO to a 16-byte aligned boundary?

I suspect #1, as #2 could waste a lot of space (and memory bandwidth)!

If #1, then these recommendations don’t say anything about alignment/padding of individual vertices within an interleaved VBO. That was my question. Any recommendation there?

Thanks.

Hmm … I was always aligning at the vertex value boundary (not the vertex struct) with resulting wastage since my assumption was that this is related to code reading attributes and not code not copying vertex structs but who knows … maybe i was wrong.

Hi Dark_Photon, warmi,

[blockquote]Align to 16 bytes[/blockquote]
This recommendation refers to padding the attributes of every vertex to 16 byte boundaries.

[blockquote]#2 could waste a lot of space [and memory bandwidth]![/blockquote]
Definitely. I’ve softened the wording of this recommendation in the doc. It should improve the performance of GPU cache accesses but, as you’ve pointed out, is unlikely to help if it causes an application to be bottlenecked by storage space or memory bandwidth.

While making changes, I’ve also added clarification to the “Avoid changing the layout of VBO attribute data” recommendation. Here’s the revised section:

[blockquote]For optimal performance on PowerVR Graphics Cores, a mesh with static attribute data should:

• Use indexed triangle lists;
• Interleave VBO attribute data;
• Not include unused attributes

For optimal vertex shader execution performance, meshes transformed by the same vertex shader (even if compiled into different shader programs) must have the same VBO attribute data layout.

On some devices, padding each vertex to 16 byte boundaries may also improve performance.[/blockquote]

Thank you, Joe. That clarifies it. One short follow-up:

[blockquote]On some devices, padding each vertex to 16 byte boundaries may also improve performance.[/blockquote]

Could you specify which GPU series’ prefer this (Series 6?, SGX?)? Or is this a function of the system that the PowerVR GPU is embedded in.

If the former, this would be great info to have in the Performance recommendations!

I have one more question regarding this topic.
Suppose I have 2 shaders ,one shader uses all the vertex attribute whereas another one does not use all the vertex attributes.
A good example is assume I am rendering reflection texture (later used in main rendering).
So what scenario is better for performance
Do I create 2 seperate vertex buffers or the same vertex buffer can be shared across both the passes.

Hi Dark_Photon,

[blockquote]Could you specify which GPU series’ prefer this [Series 6?, SGX?]? Or is this a function of the system that the PowerVR GPU is embedded in.[/blockquote]
The recommendation is based on the way Rogue GPUs behave. I’ll look into adding SGX recommendations to a future version of the doc. Based on preliminary discussions with our competitive analysis team though, I believe a 16 byte alignment recommendation would also apply to SGX.

Hi Ganesh,

[blockquote]Do I create 2 seperate vertex buffers or the same vertex buffer can be shared across both the passes.[/blockquote]
Theoretically, creating a VBO for each render would result in the best performance as it will increase the speed at which the GPU can copy attribute data to USC registers. However, the benefit of implementing this depends very heavily on where your render is bottlenecked. You would have to implement both solutions and benchmark the performance of each to see if the performance gain is worth the added complexity of having to duplicate data into multiple VBOs. Unless you are heavily vertex processing limited, I suspect using a single interleaved VBO for both passes would be fast enough.

That’s what I needed. Thanks again!