Can UBOs be stored in registers?

desperado · November 13, 2019, 10:00am

Greetings,

I have a GLSL shaders that uses arrays. According to the shader editor, the array entries are currently stored in sh (shared?) registers.
My question is: If I switch to UBOs that are not larger than the current arrays, will the data still be handed over via registers?

Regards

MartonTamas · November 14, 2019, 2:32pm

Hi,

I tried out what you suggested using PVRShaderEditor.

the ubo version:
#version 310 es
precision mediump float;
out vec4 fragmentColor;

layout (std140) uniform ubo
{
vec4 color1;
} ubo1;

void main()
{
fragmentColor = ubo1.color1;
}

results in:
0 : mov i0.e0.e1.e2.e3, sh2
1 : f16mad(WAS:sopmad0) r1.f16.f16.e0, i0, 0.oneminus, 0.neg
f16mad(WAS:sopmad1) r1.f16.f16.e1, sh3, 0.oneminus, 0.neg
f16mad(WAS:sopmad2) r0.f16.f16.e0, sh0, 0.oneminus, 0.neg
f16mad(WAS:sopmad3) r0.f16.f16.e1, sh1, 0.oneminus, 0.neg

the uniform version:
#version 310 es
precision mediump float;
out vec4 fragmentColor;

uniform vec4 color;

void main()
{
fragmentColor = color;
}

results in:
0 : mov i0.e0.e1.e2.e3, sh2
1 : f16mad(WAS:sopmad0) r1.f16.f16.e0, i0, 0.oneminus, 0.neg
f16mad(WAS:sopmad1) r1.f16.f16.e1, sh3, 0.oneminus, 0.neg
f16mad(WAS:sopmad2) r0.f16.f16.e0, sh0, 0.oneminus, 0.neg
f16mad(WAS:sopmad3) r0.f16.f16.e1, sh1, 0.oneminus, 0.neg

as you can see the generated code is identical. I suppose the compiler will try to fit UBOs into shared registers until it has enough space.

bests,
Marton

MartonTamas · November 14, 2019, 2:36pm

It seems like with the compiler built into PVRShader editor the limit is
255 * vec4
eg.
uniform vec4 color[255];
or
layout (std140) uniform ubo
{
vec4 color1[255];
} ubo1;

desperado · January 20, 2020, 1:50pm

Hello,

I tried the example again for a vertex shader that looks like this:

#version 310 es

precision mediump float;

in vec4 inVertex;
out vec2 textureCoordinate;

layout (std140) uniform ubo
{
    vec4 position[255];
} ubo1;

void main()
{
    textureCoordinate = vec2(1.0, 1.0);
    gl_Position = ubo1.position[254];
}

Disassembly:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : smadd32 ft0, c31, c24, c24, 
           mov idx0, ft0;
    
    1    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 5;
    
    2    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    3    : mov ft0, sh250[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 0;
    
    4    : mov ft0, sh251[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 1;
    
    5    : mov ft0, sh252[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 2;
    
    6    : mov ft0, sh253[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 3;

Still register access. Now let’s go over 255:

    #version 310 es
    
    precision mediump float;
    
    in vec4 inVertex;
    out vec2 textureCoordinate;
    
    layout (std140) uniform ubo
    {
    vec4 position[301];
    } ubo1;
    
    void main()
    {
            textureCoordinate = vec2(1.0, 1.0);
            gl_Position = ubo1.position[300];
    }

Yields:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : mov ft0, ft1, c0, 1024
           mov ft2, c0
           cbs ft3, c0
           mov ft4, ft1
           lsl ft5, ft4, c0
           mov idx0, ft5;
    
    1    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 5;
    
    2    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    3    : mov ft0, sh178[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 0;
    
    4    : mov ft0, sh179[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 1;
    
    5    : mov ft0, sh180[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 2;
    
    6    : mov ft0, sh181[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 3;

Still register access, right? Let’s go up o 512:

    #version 310 es
    
    precision mediump float;
    
    in vec4 inVertex;
    out vec2 textureCoordinate;
    
    layout (std140) uniform ubo
    {
    vec4 position[512];
    } ubo1;
    
    void main()
    {
            textureCoordinate = vec2(1.0, 1.0);
            gl_Position = ubo1.position[511];
    }

    --------------------- Disassembled HW Code -------------------- 
    
    0    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           mov i0, sh3;
           uvsw.write ft0, 5;
    
    1    : smadd64 ft0, sh4, c1, sh2, i0, 
           mov r5.e0.e1.e2.e3, fte
           mov r4, ft0;
           ld r0, drc0, 4, r4;
    
    2    : wdf drc0
    
    3    : mov ft0, r0
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 0;
    
    4    : mov ft0, r1
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 1;
    
    5    : mov ft0, r2
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 2;
    
    6    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    7    : mov ft0, r3
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 3;

PVRShaderEditor 2.12, PowerVR Series6

Is this a bug in the shader editor or do we have more register space in vertex shader?

MartonTamas · January 20, 2020, 2:03pm

I think that’s because you used mediump, so 2x16bit floats will be packed into a single slot

desperado · January 20, 2020, 2:06pm

Changed to highp and 511:

    #version 310 es
    
    precision highp float;
    
    in vec4 inVertex;
    out vec2 textureCoordinate;
    
    layout (std140) uniform ubo
    {
    vec4 position[511];
    } ubo1;
    
    void main()
    {
        textureCoordinate = vec2(1.0, 1.0);
        gl_Position = ubo1.position[510];
    }

Shader still uses registers:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : mov ft0, ft1, c0, 1792
           mov ft2, c0
           cbs ft3, c0
           mov ft4, ft1
           lsl ft5, ft4, c0
           mov idx0, ft5;
    
    1    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 5;
    
    2    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    3    : mov ft0, sh250[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 0;
    
    4    : mov ft0, sh251[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 1;
    
    5    : mov ft0, sh252[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 2;
    
    6    : mov ft0, sh253[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 3;

desperado · January 20, 2020, 2:20pm

Just for fun, I entered the same shader as fragment and as vertex:

    #version 310 es
    
    precision highp float;
    
    in vec2 textureCoordinate;
    out vec4 fragmentColor;
    
    uniform MatrixBlock
    {
    vec4 uniformTest[256];
    } myBlock;
    
    highp int index;
    
    void main()
    {
        fragmentColor = myBlock.uniformTest[ index ];
    }

Fragment disassembly:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : mov i0.e0.e1.e2.e3, sh3
    
    1    : smadd64 ft0, i1, c16, sh2, i0, 
           mov r5.e0.e1.e2.e3, fte
           mov r4, ft0;
           ld r0, drc0, 4, r4;
    
    2    : wdf drc0

Vertex disassembly:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : imad16 ft0, i0.e0, c4.e0, sh2.e0
           mov idx0, ft0;
    
    1    : mov ft0, sh0[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    2    : mov ft0, sh1[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 5;
    
    3    : mov ft0, sh2[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 6;
    
    4    : mov ft0, sh3[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 7;

Vertex uses registers, fragment doesn’t.

MartonTamas · January 20, 2020, 2:34pm

sorry, I realised that it only changes to non-sh IF you are using ALL 255 mediump floats
the compiler is smart enough not to allocate space for stuff you don’t need.

desperado · January 20, 2020, 2:38pm

I access the block via another uniform (index). How does it know which values index can assume?

MartonTamas · January 20, 2020, 2:41pm

Not sure why it’s different… probably the compiler can allocate more for vertex

for vertex the limit seems to be 510, 511 generates a load

#version 310 es

precision mediump float;

in vec2 textureCoordinate;

uniform MatrixBlock
{
vec4 uniformTest[510];
} myBlock;

uniform highp int index;

void main()
{
    gl_Position = myBlock.uniformTest[ index ];
}

desperado · January 20, 2020, 2:49pm

Would that be GL_MAX_VERTEX_UNIFORM_VECTORS?

MartonTamas · January 20, 2020, 2:53pm

yup, seems like it

desperado · January 20, 2020, 2:55pm

Are there official specs for PowerVR6 for that?

MartonTamas · January 20, 2020, 3:17pm

That could vary per GPU, so I’d say your best bet is to query said GL variable.

desperado · January 20, 2020, 3:23pm

GX 6250

According to here it should be 256:

I don’t get it.

MartonTamas · January 20, 2020, 3:39pm

have a look here:
http://opengles.gpuinfo.org/displayreport.php?id=3025

desperado · January 20, 2020, 3:44pm

Its still 256 just like for fragment shaders.

I don’t understand why the compiler behaves differently in both cases.

Does the compiler just assume some value? Are there Powervr6 chips chat provide 512 uniform vec4 shared registers?

MartonTamas · January 20, 2020, 4:26pm

the compiler probably compiles for one of the series 6 gpus. Other gpus could have different configurations like in this case.

GE8320 should have 512 for vertex

desperado · January 20, 2020, 6:38pm

I’m using a Series6 compiler profile in the PVRShaderEditor. I would expect that such a compiler assumes capabilities that all Series6 cards have. That means I would expect it to know that there is at least one series 6 card ( like the here present Rogue GX6250 ) where GL_MAX_VERTEX_UNIFORM_VECTORS = 256 and thus not produce assembly that seemingly assumes 512. At least I would expect it to be consistent between vertex and fragment shader if the values are the same.

Or can you see some nifty optimization in the compiler that magically grants twice the available amount of indices to the vertex shader for certain cases?

desperado · January 20, 2020, 6:45pm

Btw according to the spec the sh registers are allocated from the common store:

How fast is the common store compared to whatever caches texelFetch uses? Does the common store use other caches close to the chip?

Regards

Topic		Replies	Views
UBO performance drop PowerVR Insider	7	710	June 6, 2019
Number of Available Uniforms? PowerVR Insider	12	386	April 27, 2009
Validate ProgramUniformBlockBindings PowerVR Insider	2	471	November 1, 2022
How many uniform vectors can iPad PowerVR SGX supp PowerVR Insider pvrgeopod	5	377	September 1, 2010
simple but slow Shader? PowerVR Insider	5	341	April 27, 2010

Can UBOs be stored in registers?

Related topics