Here’s an overview of the driver’s standard texture upload process (without PBO or glTexStorage*):
Application –> driver texture copy
- The modifying function call blocks (e.g. glTexSubImage2D)
- Validation is performed for the supplied parameters. If the combination is invalid, an error will be returned (check out the glTexSubImage2D reference page for more information)
- If the parameters are valid, the driver allocates temporary buffer of requested resolution and format
- The driver CPU memcpy’s the application supplied buffer to the temporary buffer
- The driver then returns the function so the application can continue submitting GL calls
Graphics driver –> GPU texture copy
- For most texture formats, the GPU will perform the copy via DMA. In PVRTune these tasks are shown as Transfer Tasks. In some cases a CPU copy will be used as a fall back
- The data is twiddled during the transfer. This improves texture cache efficiency when the data is sampled by draws
- Once the copy is complete, the driver can free the temporary buffer
Note: The process explained above is simplified. To reduce the number of memory allocations and frees performed, the driver may keep temporary buffers around for some time in case they can be reused.
With glTexStorage2D and glTexStorage3D, the driver can perform validation of all levels of a texture up front. Unlike standard (mutable) textures where the GPU DMA copy is deferred until the first draw using the texture, immutable format textures also kick asynchronous GPU texture uploads when glTexSubImage2D() is called. This means that - with immutable format textures - you don’t have to perform texture warm ups to avoid run-time stutters.
[blockquote]Does it mean that GPU will be accessing that mapped memory directly while rendering? And GPU will not need to load this into driver’s memory?
The GPU’s texture is stored in a twiddled layout to improve texture cache efficiency. PBOs requires the data to be stored linearly in the driver’s buffer. If the glMapBufferRange() GL_MAP_INVALIDATE_RANGE_BIT or GL_MAP_INVALIDATE_BUFFER_BIT flags are used, the driver will allocate a PBO buffer without preserving the contents of the GPU side texture. If neither flag is specified, the GPU will DMA the contents of it’s texture to the PBO.
[blockquote] 3. […] how can PBO be faster? Is it because of that memcpy handled by CPU instead of GPU?[/blockquote]
In the case of uploading texture data from a file, PBO uploads are only faster than standard glTexSubImage2D() uploads if the data is loaded directly from the file into the PBO. If the file is loaded into application memory before it is memcpy’d to the PBO buffer, as many memcpy’s would be performed as a glTexSubImage2D upload.
Standard texture upload
File–>Application buffer–>Driver buffer (e.g. copy performed by glTexSubImage2D)
Optimized PBO texture upload
File–>PBO driver buffer
(As mentioned above, there will be an asynchronous DMA from the driver’s temporary buffer to GPU memory sometime after the application’s buffer update completes)
[blockquote]it’s possible I made a mistake in my test application[/blockquote]
Turns out I had made a mistake in my test, which I’ve now fixed. You can find my modified version of the SDK’s Texturing demo (PBO upload + TexStorage) on GitHub: https://github.com/JoeDavisIMG/Native_SDK/commits/pbo-texture-upload
Hope this helps