Triangle strips faster?

I’ve being doing some benchmarking on a SGX535 and I noticed that using triangle strips improves performance quite a bit. I’m wondering is this true for newer PowerVR devices as well?

Hi Lefty,



This will depend on how you’re sending data down the pipeline. Are you using strips over fans or triangles? Is your data indexed or sent as arrays (drawElements or drawArrays)? And is your data sent as pointers or using VBOs?



The ideal situation for our hardware is indexed triangle lists stored in VBOs, as internally this is what our hardware uses. Any polygonal data sent through the hardware will be optimised in this format so generally provides the best performance.



However, if you’re using client arrays rather than VBOs, it’s possible that you might see better performance for larger objects, as a triangle strip holds less data than a triangle list for the same number of triangles. However this should be entirely a CPU based bottleneck, rather than anything to do with our hardware. The reason for this is that sending this data has to be done each frame for client arrays, so the data copies add up quickly.



Could you explain what situation you’re actually seeing this improvement against? Also whether you’re using vbos or client arrays, and whether you’re using indices or not?



Thanks,

Tobias

Thanks for the answer Tobias.

I’m using vbos for static geometry and client arrays for skinned meshes, all with drawElements. I’m comparing triangle groups with triangle strips and I see a 10% improvement when using triangle strips.

Considering what you said about client arrays, I tried to eliminate them from the equation.

So, now I only create triangle strips for the static meshes and compare that with rendering everything with triangle groups (all static meshes use interleaved data, vbos, and glDrawElements). I still measure triangle strips as being notably faster.

Hi Lefty,



The SGX hardware itself uses (effectively) indexed GL_TRIANGLES, so it has to force any other data into this configuration. However there are a number of factors at play before it’s as clear cut as “one is always better than the other”. In particular, vertex cache hits are an issue - are you using the PVRTGeometrySort algorithm? (Either from the GeoPOD exporter, or there is a tools function which you can use). Using this should optimise the way that the vertices and indices are arranged to maximise cache hits and saving bandwidth, and get you closer to optimum performance.



I assume also that your vbos are not being updated repeatedly and are actually static? Also are you using the usage flag “GL_STATIC_DRAW” when creating them? Finally, other than the way you upload your geometry, is everything else equal (Same model, same render state, same number of draw calls etc.)?



Even if you are doing all of this, there may be times where strips might work better, but it is entirely dependent on your data set. In the general case, single triangles are better for performance, but there will be geometry data which is hard to optimise for cache coherency, or too large or too small which may skew the results.



I’d suggest that generally it’s not worth worrying about vertex performance too much anyway however, as unless you’re doing something extremely complex, you’ll be texture or fragment limited well before vertex processing becomes an issue.



Thanks,

Tobias

Hi Lefty,

The SGX hardware itself uses (effectively) indexed GL_TRIANGLES, so it has to force any other data into this configuration. However there are a number of factors at play before it's as clear cut as "one is always better than the other". In particular, vertex cache hits are an issue - are you using the PVRTGeometrySort algorithm?

No, but I tried doing cache optimisation using other tools (NvTriStrip and vcacheopt). Cache optimisation gives a tiny improvement, but no where near as much as using triangle strips.

I assume also that your vbos are not being updated repeatedly and are actually static? Also are you using the usage flag "GL_STATIC_DRAW" when creating them?Finally, other than the way you upload your geometry, is everything else equal (Same model, same render state, same number of draw calls etc.)?

Yes, vbos are static and I use GL_STATIC_DRAW.
Yes, the test case I use is identical, same draw calls, same everything.


I'd suggest that generally it's not worth worrying about vertex performance too much anyway however, as unless you're doing something extremely complex, you'll be texture or fragment limited well before vertex processing becomes an issue.

It's possible in my case vertex performance is more important, because I use shadow mapping (i.e. everything is drawn twice). Anyway, a 10% performance boost is not to be scoffed at (even if it's not ment to happen :-D ).

Well, whatever works for you I guess :slight_smile:



I’d strongly recommend trying the PVRTGeometrySort though, it’s specifically targetted for our hardware, and does things differently to how NvTriStrip and vcacheopt would work. You might see the performance speed up more than 10% after all (though I can’t exactly guarantee this!).



Thanks,

Tobias

I had a look at PVRTGeometrySort. It’s not clear from the documentation how to use it. What are the parameters nBufferVtxLimit, nBufferTriLimit for? Are there any examples how to use it?

that is very interesting , i asked once if the team could provide a full sample of creation if a pod from scratch…

It seem that this function is only use by the tooling that create pod (command line or the GUI ) and except the reference in the doc "PowerVR SGX (Series 5) or Rogue (Series 6) hardware is being targeted use

„PVRTGeometrySort‟ it look like that the most simple option is to use the sdk tools to create the pod :slight_smile:

Unfortunately, I do not use pod.

Hi Lefty,



Yep ok, so that is not a well documented function apparently… something we’ll try to address in the next version of our SDK. Anyway, here’s the way it should be used.



“pVtxData” should be a pointer to your triangle list data.

“pwIdx” should be a pointer to the indices that you pass to draw elements. It should be passed as an unsigned int, though if you need to you can change the definition of “PVRTGEOMETRY_IDX” at the top of the header file, and you should be able to use shorts or bytes etc.

“nStride” is the same value that you would pass to glVertexAttribPointer()

“nVertNum” should be the total number of vertices stored in the array of vertex data

"nTriNum" should be the total number of triangles - Honestly I’m not sure why this exists, since it should always be three times the number of vertices. I think it’s just an internal caching optimisation that for some reason is exposed externally for verification purposes.

“nBufferVtxLimit” is the maximum number of vertexes that can be stored in a buffer, we typically use 32. This basically sets the optimisation window, so the higher a value you set the longer it will take to run but potentially you’ll get a better output.

“nBufferTriLimit” is the same as the above but for the number of triangles. We typically set this to 64.

“dwFlags” is a combination of flags that affect the behaviour of the function. In PVRGeoPOD we set both available flags: PVRTGEOMETRY_SORT_VERTEXCACHE | PVRTGEOMETRY_SORT_IGNOREVERTS



Hope that helps?



It’s worth noting that there is a similar function for generating optimised triangle strips from triangle lists, so even if you don’t get a massive performance boost from PVRTGeometrySort, it might be worth using PVRTTriStrip to generate your triangle strip afterwards to see what sort of performance that gives you as well.



Thanks,

Tobias

Thanks. I got PVRTGeometrySort working. Unfortunately, it only increases performance by a tiny amount.

I didn’t try PVRTTriStrip, because I couldn’t figure out the parameters. It seems to be returning several triangle strips - not what I need.

Hello Lefty ,



will you share this very interesting usage with us or it s classified ?:slight_smile:



cheers



david


will you share this very interesting usage with us or it s classified ?:)

I thought I was sharing. What is it you want to know?

Hello



If you may describe the integration that you did to be able to use PVRTGeometrySort ?

will you share the project ?

are your meshes store in fbx, obj or format custom ?

i think that is very interesting because nowhere in the sdk that is described and it s seems pretty hot and super usefull as i am thingking to remove dependency to POD that doesnt give me enough information for complex joins .



david

Hello

If you may describe the integration that you did to be able to use PVRTGeometrySort ?
will you share the project ?
are your meshes store in fbx, obj or format custom ?
i think that is very interesting because nowhere in the sdk that is described and it s seems pretty hot and super usefull as i am thingking to remove dependency to POD that doesnt give me enough information for complex joins .

david


I am using my own game engine for a commercial game. I have an asset preperation tool that converts collada to proprietary binary format. It's there that I do the vertex optimisation. I don't mind sharing that code, but it's not ment for general use. I only implement features for my own needs (i.e. it only works for collada from 3ds max and will only convert certain types of geometry)

i understand , I believe that the next sdk will handle this kind of case that is very interesting .

tnaks alot