Pixel Local Storage writes garbage into framebuffer

I try implementing soft particles using GL_EXT_shader_pixel_local_storage extension.
So far it works on Samsung A21s (with Mali GPU) but results in rather strange output on Moto E6s. This device has PowerVR Rogue GE8320, running Android 9. Reported OpenGL ES version is OpenGL ES 3.2 build 1.10@5130912.

Here is my rendering pipeline:

  1. Render a Buddha statue and floor plane objects into PLS with depth and color write enabled(you can see its silhouette on both screenshots). This shader stores fragment depth value into PLS.
  2. Render soft particles reading depth from PLS.

Clear color is dark green (0, 0.2, 0, 1).
The only expected rendered objects are particles - I’ve commented out all code for rendering into framebuffer except particles.

On Mali GPU I get no color written into framebuffer (which is expected, PLS shaders which use GL_EXT_shader_pixel_local_storage extension cannot write color information).
On PowerVR GPU I have 2 issues:

  1. Some garbage red color output from PLS-writing shaders.
  2. Values in PLS read shader are not present. Even test dummy constant value is not saved into PLS structure.

Here are screenshots. As a new user I can upload only 1 image so I’ve combined them - left is Mali, right is PowerVR.
You can see an outline of statue on 1st image (this is correct because I enabled depth write for PLS pass), and some soft particles.
On the 2nd image there are only some red+black garbage color outputs and no particles (they are 100% transparent because of missing depth information from PLS pass).

Shaders code:

// depth render pass - vert

#version 300 es
uniform mat4 view_proj_matrix;
in vec4 rm_Vertex;

void main() {
    gl_Position = view_proj_matrix * rm_Vertex;
}

// depth render pass - frag

#version 300 es
#extension GL_EXT_shader_pixel_local_storage : require 
precision highp float;
__pixel_local_outEXT PixelLocalStorage 
{
    layout(r32f) highp float depth; 
    layout(r32f) highp float depth2; 
} storage;

void main() {
    storage.depth = gl_FragCoord.z;
    storage.depth2 = 1.0; // even this dummy value is not stored on PowerVR
}

////////////////////////////////////////////////////////////////////////

// particles read from PLS - vert

#version 300 es
uniform mat4 view_proj_matrix;
in vec4 rm_Vertex;
in vec2 rm_TexCoord0;
out vec2 vTextureCoord;

void main() {
  gl_Position = view_proj_matrix * rm_Vertex;
  vTextureCoord = rm_TexCoord0;
}

// particles read from PLS - frag

#version 300 es
#extension GL_EXT_shader_pixel_local_storage : require 
precision highp float;
__pixel_local_inEXT PixelLocalStorage 
{
    layout(r32f) highp float depth; 
    layout(r32f) highp float depth2; 
} storage;
uniform vec3 uCameraRange; // x = 2 * near; y = far + near; z = far - near
uniform float uTransitionSize;
float calc_depth(in float z)
{
  return uCameraRange.x / (uCameraRange.y - z * uCameraRange.z);
}
in vec2 vTextureCoord;
uniform sampler2D sTexture;
uniform vec4 color;
out vec4 fragColor;

void main() {
   vec4 diffuse = texture(sTexture, vTextureCoord) * color;
   float geometryZ = calc_depth(storage.depth);
   float sceneZ = calc_depth(gl_FragCoord.z);
   float a = clamp(geometryZ - sceneZ, 0.0, 1.0);
   float b = smoothstep(0.0, uTransitionSize, a);
   fragColor = diffuse * b;
//   fragColor *= 0.01; fragColor.g += storage.depth2; // this is a test line to visualize that `depth2` has some value
}

Please advise.

1 Like

Hi keaukraine,

Thanks for your question and welcome to the PowerVR Developer Forum!

We have an OpenGL ES PLS deferred shading sample in our SDK (Native_SDK/examples/OpenGLES/DeferredShading at master · powervr-graphics/Native_SDK · GitHub), perhaps you could take a look and use it to fix the issues on PVR devices.

Let us know if you have questions regarding that sample or you still have issues after reviewing it.

Best regards,
Alejandro

Hello Alejandro,

Thank you for pointing out to this example, will take a look at it!

You’re welcome keaukraine,

Let us know if you need any help with it.

Best regards,
Alejandro

Compiled DeferredShading example (there were some troubles with broken latest version of Android SDK), it runs without issues. So this doesn’t seem to be a driver bug, must be something missing in my code. Will analyze it step-by-step in more detail.
Once again, thank you Alejandro for quick and helpful response!

Hi keaukraine,

You’re welcome, thanks for looking into our DeferredShading example and testing it.

If you need anything else please write us, we will be glad to help you.

Best regards,
Alejandro

Hello Alejandro.

I looked into DeferredShading example and am somewhat confused now.

Apparently, on PLS2-capable GE8320 GPU this example switches to PLS2 rendering pipeline. And because Mali-G52 doesn’t support PLS2 extension, a fallback PLS1 pipeline is used on this hardware. Please note that my code also uses PLS1 since for my needs it is enough to use PLS1.

I modified example to enforce PLS1-only rendering pipeline by disabling PLS2 in shaders and suppressing detection of this extension in C++ code. This way it renders correctly on Mali GPU but results in visual glitches (with no GL errors detected) on PowerVR:

However, after I set PLS size to any non-zero value using extension FramebufferPixelLocalStorageSizeEXT() call, rendering is fixed. Please note that this is required for PLS2 only and is not valid for PLS1-only hardware. What seems strange and wrong is that even specifying obviously incorrect size 4 or 8 (which is way too small, it must be 16) bytes still fixes PLS1 rendering.

I believe that this breaks rendering pipelines implemented for PLS1 (for wider range of GPUs, since PLS2 is not as widely adopted as PLS1).

For you to easily reproduce my issue, I’ve forked repo with updated examples here - GitHub - keaukraine/Native_SDK: C++ cross-platform 3D graphics SDK. Includes demos & helper code (resource loading etc.) to speed up development of Vulkan, OpenGL ES 2.0 & 3.x applications

As you can see, with my first commit 8cc2c0df1c96c1966bb0631a0a0126fcdb1c3e17 I effectively disabled all PLS2 rendering. This results in on-screen garbage, as seen on screenshot.

My second commit 19df7247d6e426be935679bcf5ce5fda8047daf0 fixed PLS1 rendering on GE8320 hardware. I have no access to other PLS2-capable devices to test if this behavior is the same for other GPUs. Please note that this sets PLS size to incorrect value 4 which still somehow fixes issue.

Could you please take a look at my findings and explain why PLS1 usage requires calls to PLS2 extension command FramebufferPixelLocalStorageSizeEXT()?
Am I missing something or is it a compatibility issue between PLS1 and PLS2?

Hi keaukraine,

Thank you very much for your thoughtful research. We will analyse your code and come back to you with an explanation about the PLS1 / PLS2 usage implemented in this demo.

Best regards,
Alejandro

Hi keaukraine,

We are still working on finding out exactly what is causing those issues. We have a couple of possible causes but still haven’t verified them. I’ll keep you updated.

Best regards,
Alejandro

Hello Alejandro,

Thank you for keeping me up to date!
If possible, can you already answer one preliminary question? Is this issue hardware-specific or does it reproduce on other PowerVR GPUs (newer/older ones)? I’m curious because I have only 1 test device w/ PowerVR, and it is important to know that the rendering code in my apps runs identically on all devices with certain features, e. g. I should check only for certain GL extensions, not for a GPU vendor.

Best regards,
Oleksandr

Hi keaukraine,

Of course :slight_smile:

We tested a version of the broken app from your pull request on a Chromebook (PowerVR GX6250) and on an Oppo Reno Z (PowerVR GM9446) and in both cases it worked fine. I also tested it on a Moto E6 Plus (PowerVR GE8320) but it was working wrong. I noticed that PLS only works properly with the first 4 bytes of information for the whole PLS structure, where the specification supports a minimum of 16 bytes. There could be a different number of reasons for this issue, and that is what we’re trying to find right now, but as you can see, it is related to some devices.

We will let you know as soon as we have more info.

Best regards,
Alejandro

Hi Alejandro,

Thank you for investigating this!
Waiting for more info.

Best regards,
Oleksandr.

Hello @AlejandroC.

Any update on this? Have you found some more clues?

Best regards,
Oleksandr

Hi keaukraine,

Sorry for the huge delay.

We think the most probable cause of the artifacts is a DDK issue, but to verify it we have to reproduce it, and for that we need to prepare a custom Android / DDK build, since internally we released the DDK version in your phone for Android 10 and not Android 9 as the manufacturer did. This will take time I’m afraid.

You should not worry about it being an application related issue, but this is just my guess, we still need to do a proper diagnose :slight_smile:

Best regards,
Alejandro

Hello Alejandro.

I’ve been distracted by another project but now I’m back to this PLS implementation.
I have some follow-up questions:

  1. You said that first 4 bytes of PLS data are stored correctly but in my app it doesn’t seem to be true - I use a single float value and still it doesn’t work correctly.
  2. If this issue is not application-related, then it should be hardware-specific (or to be more specific, a problem is in drivers). So unfortunately, this still seems to be a problem for me personally since I have only this one Android device with PowerVR GPU.
  3. You said that this diagnose is still not final but is rather a guess. Do you have more information now?

Also, when you mentioned that PLS works properly only with 4 bytes of data, did you mean PLS1? This device-specific bug sees to be related to setting of PLS size. But my implementation uses PLS1 (it is enough for my requirements and it is more widely adopted by various GPU vendors), and specifying PLS size is a PLS2-only feature - as I mentioned it, mixing it within PLS2 pipeline to “fix” issue feels very wrong.

Once again, thank you for your feedback and investigation of issue!

Best regards,
Oleksandr

Hi Oleksandr,

Thanks for the feedback. Yes, the tests I did with the first 4 bytes working correctly were with PLS1 (I used a Moto E6 Plus with exact same GPU and DDK version as the ones on the device you’re using).

I’m afraid we haven’t more information thant we had a couple weeks ago. I will write you in case we are advance with this.

Best regards,
Alejandro

Hello Alejandro,

OK, thank you for keeping me updated.

Best regards,
Oleksandr.