VBO Ghosting?

This ImgTech post:

  • Why GPUs don’t like to share

    seems to imply that streaming updates to a VBO will result in ghosting. Is this correct? Is this still current on Series 6 / GLES3+ drivers with MAP_UNSYNCHRONIZED and MAP_INVALIDATE?

    If so, this seems very strange because this path (streaming VBO with UNSYNCHRONIZED+INVALIDATE_RANGE and with INVALIDATE_BUFFER for orphaning) can yield very high performance streaming and rendering dynamically loaded batches to the GPU with no buffer ghosting per update.

    Thanks for any tips!

Should have appended to the last paragraph “…on desktop GPUs.”

Apologies: I misspoke above (skimmed the article too fast). It says the applications GL thread will block, not that it will ghost the buffer.

But I still have the same question: Is this still current on Series 6 / GLES3+ drivers with MAP_UNSYNC / MAP_INVALIDATE, which are supposed to allow App/CPU updates to a buffer without synchronizing with the GPU?

Also, one related question:

Dynamic attributes that change on a per-frame basis should be uploaded directly to GL instead of modifying VBOs.

Just to make sure, this is suggesting that client arrays be used for dynamic batch data which may change on a per-frame basis?


Hello Dark_Photon,

This behaviour can be specific to the SoC used, commonly MAP_UNSYNCHRONIZED on a VBO will give you direct access to the buffer the GPU will read. As PowerVR architecture is deferred when and which part of the buffer the GPU is reading from can be difficult to determine. You could determine this on a frame granularity by using FenceSyncs.

‘Dynamic attributes that change on a per-frame basis should be uploaded directly to GL instead of modifying VBOs.’

This advice is for OpenGL ES 2.0 where glMapBufferRange was not standard. For a small upload this method would still be advised as it is simple to use. If the buffer is larger then we recommend using MapBufferRange utilising multiple VBO’s to allow the hardware to render ahead.

Please let me know if you have any more questions.


Hey, thanks Paul!

Re MAP_UNSYNCHRONIZED, that’s great! And you’d get access to the mapped buffer with no draw thread blocking and TA kicks?

If so, I think that provides the tools needed to upload dynamically-loaded frame-to-frame reused batch data without a lot of needless VBO creation/destruction (and a lot of VBO rebind overhead when launching batches). Basically, alloc one big VBO (big enough to handle many frames of data), stream the data in front-to-back with reuse, and then when it fills up, orphan it and repeat (see Re: Optimal Streaming Strategy and Buffer Object Streaming).

One related question is: How’s PowerVR’s support for buffer orphaning (sometimes called buffer respecification; see the 2nd link above)?

Good orphaning support is a nice-to-have, but it isn’t a make-or-break because (as you said) I could use multiple buffers and get into the explicit fence/wait sync business. Orphaning is useful though because it avoids the need for the client to synchronize with the server/GPU (and thus the need for the client to perform explicit synchronization).

Have a great weekend!

While we’re on the subject, what’d be especially cool (and useful) is if ImgTech were the first mobile GPU vendor to support mapping buffers PERSISTENT and COHERENT (see ARB_buffer_storage, core in OpenGL 4.4 since July 2013). This seems a perfect fit for mobile where (in the case of VBOs mapped UNSYNCHRONIZED) the application and GPU are literally writing into and reading from the very same memory block!

When there isn’t an OpenGL driver thread, the main thing it gets you is that there’s no need for the application to flush (unmap/map or explicit flush) between:

[] streaming data into a buffer object (via UNSYNCHRONIZED) and
] issuing a draw call referring to that streamed data to make it visible to the GPU.

For more information, see:

  • Buffer_Object_Streaming#Persistent_mapped_streaming (OpenGL Wiki)
  • Persistent mapped buffers (Sole)
  • Persistent Mapped Buffers in OpenGL (Filipek)
  • Beyond Porting (nVidia)

Hello Dark_Photon,

As you have said - on most SoC we are sharing the same memory. In these cases I believe we do not need to flush buffers mapped UNSYNCHRONIZED, only unmap. Memory writes from the CPU should be visible by the next render to the GPU. Please be advised this can be SoC specific again. If you are targeting only one platform then you can test and rely on the behaviour.

We do not recommend buffer re-specification currently as you are relying on the internal ghosting or stalling behaviour of the driver. Instead, we advise using FenceSync to ensure the GPU has finished a particular task before making changes to that part of the VBO.

I believe explicit multi-buffering would be the simplest way to use this.


[edited to correct that we still need to unmap the buffer]

Ok, thanks Paul! I think this is getting me close to where I can see implementing a dynamic upload scheme for PowerVR that doesn’t result in a lot of draw thread blocks and kicks, a lot of needless VBO binds, or a bunch of needless memcpys (to avoid the binds).