Use glProgramBinary in PowerVRGE8320 Always execute GLSLCompileToUniflex is Slow

We use Unreal Engine to develop mobile games,We found that calling the glProgramBinary interface on PowerVR devices can be time-consuming。It executed GLSLCompileToUniflex。
But another device executed PVRNiflexReadHWBinay。So It fast Than GLSLCompileToUniflex。

We want to know why this is happening?How should we solve it。 Thrank,This is very important for our game!

Hi zouqingwei,

Thanks for your message, and welcome to the PowerVR Developer Forum!

There are great differences between GLSLCompileToUniflex and PVRUniflexReadHWBinary:

  • GLSLCompileToUniflex: Will follow a full compilation of a shader, including parsing it, generating LLVM, optimizations, and many other stages.
  • PVRUniflexReadHWBinary: Will read an Uniflex binary, skipping many of the previous stages.

Compiling a shader is an unavoidable step for builds which are intended to run on platforms where there are many different possible GPUs where the game could run. Usually this is done on first launch, where the engine will take some time to do this compilation for the specific GPU where it is running.

Unreal Engine has the pre-cached Pipeline State Objects PSO functionality for modern APIs (Vulkan, Metal, D3D12), where it will pre-cache all the PSOs immediately needed: https://dev.epicgames.com/documentation/en-us/unreal-engine/optimizing-rendering-with-pso-caches-in-unreal-engine Take into account the steps mentioned for the PSO system to work properly (Play the game. / Log what is actually drawn. / Include this information in the build.). For OpenGL ES, UE uses an equivalent mechanism called program objects https://dev.epicgames.com/documentation/en-us/unreal-engine/manually-creating-bundled-pso-caches-in-unreal-engine

Please make sure you have the correct settings to enable PSO precaching in the .ini file https://dev.epicgames.com/documentation/en-us/unreal-engine/pso-precaching-for-unreal-engine You could also consider profiling and reviewing detailed tracking information to understand whether something is affecting PSO performance (perhaps you have a low pool memory limit, or many different scene materials?).

I hope this is enough help. If you have Unreal Engine specific questions you can ask in their Developer Forum.

Best regards,
Alejandro

Hi Alejandro !
Thank you for your reply。I have confirmed our process。 Our project has already use UE PSO。We are using UE 4.26。
First UE Use glGetProgramBinary Compile the shader into a binary file。
Second startup Game UE Use glProgramBinary from the binary file。
We have many Shader variants,Some variants Use GLSLCompileToUniflex very Fast And Some variants very Very slow。


Is this related to the complexity of our shader or some specific features? Do you have any good suggestions if we want to speed up the compilation process?

Hi zouqingwei,

Thanks for your message.

To better understand the issue you are experiencing, I would like please to know your PowerVR device information specified below:

Also, please provide the source code of one shader which is taking a long time so we can do some testing.

Best regards,
Alejandro

Hi Alejandro,
Thanks for your message.

1.Device brand Infinix HOT 30i。GPU PowerVR GE8320
2.Android version 12。Android SDK 31 。
3.GPU driver version OpenGL ES 3.2 build 1.13@5776728

M_BoomSH_Medium.zip (5.4 KB)
This Attachment is VertexShader and FragmentShader。
Please help test it。

Best regards,
Zouqingwei

Hi zouqingwei,

I used your shaders in a modified version of one example from our PowerVR SDK using binary shaders (OpenGLESBinaryShaders). This SDK sample uses glProgramBinary as well to load a shader binary previously compiled and saved to disk.

I used our GPU profiler PVRTune to record an execution of this example on a Motorola G8 Power Lite (same GPU as Infinix HOT 30i), with GPU driver version 1.11@5425693. I could verify there are small OpenGL ES Compile Shader tasks executed by the GPU when glProgramBinary is used which take between 0.005ms to 0.84ms.

The general advice would be lowering the complexity of both shaders as much as possible, as they were above 300 lines. The PowerVR GE8320 GPU used is quite low end and a simpler version of that shader will not only work better with glProgramBinary but also allow more performance.

Best regards,
Alejandro