Hi – I have a PC/console engine ported to mobile (currently targeting iOS devices, so 5-series SGX), and have run into some ES driver behavior we are unable to explain or determine the root cause. I know that you may not be able to provide any specific advice about iOS, but any help would be deeply appreciated!
The problem is: when we enable a certain post-processing pass (bloom), our CPU usage spikes and GPU utilization drops due to what looks like an operation stalling the CPU on the GPU. The symptom is visible both by viewing the handy CPU/GPU time + utilization monitors in Xcode, or by using the “OpenGL ES Driver Monitor”. When we enable this pass, in the driver monitor we see “hardware wait time” jump from 0 to a large value (tens of millions, the docs don’t give the units). In Xcode, the CPU time jumps about 10-20ms (approximately our frame time), while the GPU time jumps 3-4ms, and GPU utilization drops quite a bit (from 99% to quite a bit lower).
The passes in question are simple, each renders into FBOs with texture attachments, and the next pass uses the resulting texture. The first one starts by using the texture the scene was rendered into. Each pass is reduced in size by half, and no FBOs are re-used throughout postprocessing. Some example shaders are below.
We’re not doing anything that would historically cause the CPU to wait on the GPU, like glReadPixels, any queries, mapping a buffer without UNSYNCHRONIZED set, or generally modifying any resources in use by the gpu). This behavior doesn’t change when we modify the complexity of the shaders, or change the amount of geometry drawn. We’ve also experimented with fewer passes, and while the time does go down, the stall is still apparent (the cpu still seems to wait for the gpu to finish the main rendering passes before starting the postprocessing).
We’d really love for someone to see if anything in here would be causing the driver to wait on the GPU, or could provide any advice about what might be going wrong. I can fill in any missing details about how our frame is rendered if that might help.
Thanks much!
Example data:
Here’s a GL trace for two typical blits in the post process chain:
#1115 glPushGroupMarkerEXT(0, "ImageBloomBlurMobile")<br />
#1116 glBindFramebuffer(GL_FRAMEBUFFER, 7)<br />
#1117 glViewport(0, 0, 800, 608)<br />
#1118 glDisable(GL_SCISSOR_TEST)<br />
#1119 glBindBuffer(GL_ARRAY_BUFFER, 22)<br />
#1120 glVertexAttribPointer(0, 2, GL_FLOAT, 0, 8, nullptr)<br />
#1121 glClearColor(1.0000000, 1.0000000, 0.0000000, 1.0000000)<br />
#1122 glClear(GL_COLOR_BUFFER_BIT)<br />
#1123 glUseProgram(125)<br />
#1124 glUniform4fv(vs_uniforms_vec4[0], 5, <data>)<br />
#1125 glActiveTexture(GL_TEXTURE0)<br />
#1126 glBindTexture(GL_TEXTURE_2D, 17)<br />
#1127 glDrawArrays(GL_TRIANGLES, 0, 3)<br />
#1128 glPopGroupMarkerEXT()<br />
#1129 glPushGroupMarkerEXT(0, "ImageBloomBlurWideMobile")<br />
#1130 glBindFramebuffer(GL_FRAMEBUFFER, 9)<br />
#1131 glViewport(0, 0, 416, 320)<br />
#1132 glDisable(GL_SCISSOR_TEST)<br />
#1133 glBindBuffer(GL_ARRAY_BUFFER, 22)<br />
#1134 glVertexAttribPointer(0, 2, GL_FLOAT, 0, 8, nullptr)<br />
#1135 glClearColor(1.0000000, 1.0000000, 0.0000000, 1.0000000)<br />
#1136 glClear(GL_COLOR_BUFFER_BIT)<br />
#1137 glUseProgram(142)<br />
#1138 glUniform4fv(vs_uniforms_vec4[0], 5, <data>)<br />
#1139 glActiveTexture(GL_TEXTURE0)<br />
#1140 glBindTexture(GL_TEXTURE_2D, 18)<br />
#1141 glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE)<br />
#1142 glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE)<br />
#1143 glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR)<br />
#1144 glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR)<br />
#1145 glDrawArrays(GL_TRIANGLES, 0, 3)<br />
#1146 glPopGroupMarkerEXT()
```<br />
<br />
And are some example fragment shaders, the vertex shaders are like you'd expect (sorry they are weird looking, they are all automatically generated by reading in our compiled hlsl shaders).<br />
<br />
#if GL_ES
precision lowp float;
#endif
const vec4 ps_c0 = vec4(0.166667, 0.0, 0.0, 0.0);
uniform sampler2D g_samSourceA;
varying vec4 v_texcoord0;
varying vec4 v_texcoord1;
varying vec4 v_texcoord2;
varying vec4 v_texcoord3;
varying vec4 v_texcoord4;
varying vec4 v_texcoord5;
void main()
{
vec4 t0_ps;
vec4 t1_ps;
t0_ps = texture2D(g_samSourceA, v_texcoord0.xy);
t1_ps = texture2D(g_samSourceA, v_texcoord1.xy);
t0_ps = t0_ps + t1_ps;
t1_ps = texture2D(g_samSourceA, v_texcoord2.xy);
t0_ps = t0_ps + t1_ps;
t1_ps = texture2D(g_samSourceA, v_texcoord3.xy);
t0_ps = t0_ps + t1_ps;
t1_ps = texture2D(g_samSourceA, v_texcoord4.xy);
t0_ps = t0_ps + t1_ps;
t1_ps = texture2D(g_samSourceA, v_texcoord5.xy);
t0_ps = t0_ps + t1_ps;
gl_FragData[0] = t0_ps * ps_c0.xxxx;
}
<br />
#if GL_ES
precision lowp float;
#endif
uniform vec4 ps_uniforms_vec4[3];
const vec4 ps_c0 = vec4(0.0, 0.0, 0.0, 0.0);
#define g_fBloomCutoff ps_uniforms_vec4[0]
#define g_fBloomStrength ps_uniforms_vec4[1]
#define g_vBloomColor ps_uniforms_vec4[2]
uniform sampler2D g_samSourceA;
varying vec4 v_texcoord0;
void main()
{
vec4 t0_ps;
vec4 t1_ps;
t0_ps = texture2D(g_samSourceA, v_texcoord0.xy);
t0_ps.xyz = (t0_ps.xyz * g_vBloomColor.xyz) + g_fBloomCutoff.xxx;
gl_FragData[0].xyz = t0_ps.xyz * g_fBloomStrength.xxx;
gl_FragData[0].w = ps_c0.x;
}