To pad or not to pad... (Vertex data alignment on PowerVR)

dark_photon · November 30, 2015, 6:26pm

Does this performance recommendation apply to all Series 6 GPUs?:

[blockquote]When vertex data is interleaved, each vertex should be aligned to a four byte boundary.[/blockquote]

(This is from the PowerVR Perf guide, but I thought I’d double-check as sometimes the recommendations haven’t been updated for Series 6.)

JoeDavis · December 1, 2015, 8:05am

Hi Dark_Photon,

I did a quick review of the performance recommendations doc for the 4.0 SDK to make sure the most important Rogue recommendations were in there (hopefully, we’ll be able to make bigger changes for the 4.1 SDK). Here’s the new section on vertex attributes:

[blockquote]For optimal performance on PowerVR Graphics Cores, a mesh with static attribute data should:

• Use indexed triangle lists
• Interleave VBO attribute data
• Ensure that every VBO attribute is used by the shader
• Align to 16 bytes
• Avoid changing the layout of VBO attribute data[/blockquote]

These rules are applicable to SGX and Rogue.

SimonFenney · December 1, 2015, 1:49pm

"Ensure that every VBO attribute is used by the shader"
Shouldn’t that be worded as “don’t include unused attributes”?

JoeDavis · December 1, 2015, 2:20pm

Good suggestion - that’s definitely clearer. I’ll see if I can squeeze the change in before the 4.0 release.

dark_photon · December 1, 2015, 8:05pm

Thanks, Joe. Two questions:

[blockquote]* Align to 16 bytes[/blockquote]

Does this mean:

Align the starting address/offset of each batch to a 16-byte aligned boundary, or
Align every vertex within an interleaved VBO to a 16-byte aligned boundary?

I suspect #1, as #2 could waste a lot of space (and memory bandwidth)!

If #1, then these recommendations don’t say anything about alignment/padding of individual vertices within an interleaved VBO. That was my question. Any recommendation there?

Thanks.

warmi · December 1, 2015, 8:19pm

Hmm … I was always aligning at the vertex value boundary (not the vertex struct) with resulting wastage since my assumption was that this is related to code reading attributes and not code not copying vertex structs but who knows … maybe i was wrong.

JoeDavis · December 3, 2015, 1:40pm

Hi Dark_Photon, warmi,

[blockquote]Align to 16 bytes[/blockquote]
This recommendation refers to padding the attributes of every vertex to 16 byte boundaries.

[blockquote]#2 could waste a lot of space [and memory bandwidth]![/blockquote]
Definitely. I’ve softened the wording of this recommendation in the doc. It should improve the performance of GPU cache accesses but, as you’ve pointed out, is unlikely to help if it causes an application to be bottlenecked by storage space or memory bandwidth.

While making changes, I’ve also added clarification to the “Avoid changing the layout of VBO attribute data” recommendation. Here’s the revised section:

[blockquote]For optimal performance on PowerVR Graphics Cores, a mesh with static attribute data should:

• Use indexed triangle lists;
• Interleave VBO attribute data;
• Not include unused attributes

For optimal vertex shader execution performance, meshes transformed by the same vertex shader (even if compiled into different shader programs) must have the same VBO attribute data layout.

On some devices, padding each vertex to 16 byte boundaries may also improve performance.[/blockquote]

dark_photon · December 3, 2015, 5:40pm

Thank you, Joe. That clarifies it. One short follow-up:

[blockquote]On some devices, padding each vertex to 16 byte boundaries may also improve performance.[/blockquote]

Could you specify which GPU series’ prefer this (Series 6?, SGX?)? Or is this a function of the system that the PowerVR GPU is embedded in.

If the former, this would be great info to have in the Performance recommendations!

rgbvision2k11 · December 5, 2015, 4:50pm

I have one more question regarding this topic.
Suppose I have 2 shaders ,one shader uses all the vertex attribute whereas another one does not use all the vertex attributes.
A good example is assume I am rendering reflection texture (later used in main rendering).
So what scenario is better for performance
Do I create 2 seperate vertex buffers or the same vertex buffer can be shared across both the passes.

JoeDavis · December 14, 2015, 2:26pm

Hi Dark_Photon,

[blockquote]Could you specify which GPU series’ prefer this [Series 6?, SGX?]? Or is this a function of the system that the PowerVR GPU is embedded in.[/blockquote]
The recommendation is based on the way Rogue GPUs behave. I’ll look into adding SGX recommendations to a future version of the doc. Based on preliminary discussions with our competitive analysis team though, I believe a 16 byte alignment recommendation would also apply to SGX.

Hi Ganesh,

[blockquote]Do I create 2 seperate vertex buffers or the same vertex buffer can be shared across both the passes.[/blockquote]
Theoretically, creating a VBO for each render would result in the best performance as it will increase the speed at which the GPU can copy attribute data to USC registers. However, the benefit of implementing this depends very heavily on where your render is bottlenecked. You would have to implement both solutions and benchmark the performance of each to see if the performance gain is worth the added complexity of having to duplicate data into multiple VBOs. Unless you are heavily vertex processing limited, I suspect using a single interleaved VBO for both passes would be fast enough.

dark_photon · December 14, 2015, 5:19pm

That’s what I needed. Thanks again!

Topic		Replies	Views
VertexArray data alignment for GPU? PowerVR Insider	4	335	December 10, 2009
Collada2Pod 4-Byte Aligned Padding with GL_SHORT PowerVR Insider	1	326	September 19, 2011
native vertex structure PowerVR Insider	1	273	September 17, 2008
pre-transform vertex cache line size on SGX? PowerVR Insider	1	302	April 22, 2010
Some MBX questions PowerVR Insider	2	297	April 2, 2009

To pad or not to pad... (Vertex data alignment on PowerVR)

Related topics