Can UBOs be stored in registers?

Greetings,

I have a GLSL shaders that uses arrays. According to the shader editor, the array entries are currently stored in sh (shared?) registers.
My question is: If I switch to UBOs that are not larger than the current arrays, will the data still be handed over via registers?

Regards

1 Like

Hi,

I tried out what you suggested using PVRShaderEditor.

the ubo version:
#version 310 es
precision mediump float;
out vec4 fragmentColor;

layout (std140) uniform ubo
{
vec4 color1;
} ubo1;

void main()
{
fragmentColor = ubo1.color1;
}

results in:
0 : mov i0.e0.e1.e2.e3, sh2
1 : f16mad(WAS:sopmad0) r1.f16.f16.e0, i0, 0.oneminus, 0.neg
f16mad(WAS:sopmad1) r1.f16.f16.e1, sh3, 0.oneminus, 0.neg
f16mad(WAS:sopmad2) r0.f16.f16.e0, sh0, 0.oneminus, 0.neg
f16mad(WAS:sopmad3) r0.f16.f16.e1, sh1, 0.oneminus, 0.neg

the uniform version:
#version 310 es
precision mediump float;
out vec4 fragmentColor;

uniform vec4 color;

void main()
{
fragmentColor = color;
}

results in:
0 : mov i0.e0.e1.e2.e3, sh2
1 : f16mad(WAS:sopmad0) r1.f16.f16.e0, i0, 0.oneminus, 0.neg
f16mad(WAS:sopmad1) r1.f16.f16.e1, sh3, 0.oneminus, 0.neg
f16mad(WAS:sopmad2) r0.f16.f16.e0, sh0, 0.oneminus, 0.neg
f16mad(WAS:sopmad3) r0.f16.f16.e1, sh1, 0.oneminus, 0.neg

as you can see the generated code is identical. I suppose the compiler will try to fit UBOs into shared registers until it has enough space.

bests,
Marton

2 Likes

It seems like with the compiler built into PVRShader editor the limit is
255 * vec4
eg.
uniform vec4 color[255];
or
layout (std140) uniform ubo
{
vec4 color1[255];
} ubo1;

1 Like

Hello,

I tried the example again for a vertex shader that looks like this:

#version 310 es

precision mediump float;

in vec4 inVertex;
out vec2 textureCoordinate;

layout (std140) uniform ubo
{
    vec4 position[255];
} ubo1;

void main()
{
    textureCoordinate = vec2(1.0, 1.0);
    gl_Position = ubo1.position[254];
}

Disassembly:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : smadd32 ft0, c31, c24, c24, 
           mov idx0, ft0;
    
    1    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 5;
    
    2    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    3    : mov ft0, sh250[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 0;
    
    4    : mov ft0, sh251[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 1;
    
    5    : mov ft0, sh252[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 2;
    
    6    : mov ft0, sh253[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 3;

Still register access. Now let’s go over 255:

    #version 310 es
    
    precision mediump float;
    
    in vec4 inVertex;
    out vec2 textureCoordinate;
    
    layout (std140) uniform ubo
    {
    vec4 position[301];
    } ubo1;
    
    void main()
    {
            textureCoordinate = vec2(1.0, 1.0);
            gl_Position = ubo1.position[300];
    }

Yields:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : mov ft0, ft1, c0, 1024
           mov ft2, c0
           cbs ft3, c0
           mov ft4, ft1
           lsl ft5, ft4, c0
           mov idx0, ft5;
    
    1    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 5;
    
    2    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    3    : mov ft0, sh178[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 0;
    
    4    : mov ft0, sh179[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 1;
    
    5    : mov ft0, sh180[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 2;
    
    6    : mov ft0, sh181[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 3;

Still register access, right? Let’s go up o 512:

    #version 310 es
    
    precision mediump float;
    
    in vec4 inVertex;
    out vec2 textureCoordinate;
    
    layout (std140) uniform ubo
    {
    vec4 position[512];
    } ubo1;
    
    void main()
    {
            textureCoordinate = vec2(1.0, 1.0);
            gl_Position = ubo1.position[511];
    }

    --------------------- Disassembled HW Code -------------------- 
    
    0    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           mov i0, sh3;
           uvsw.write ft0, 5;
    
    1    : smadd64 ft0, sh4, c1, sh2, i0, 
           mov r5.e0.e1.e2.e3, fte
           mov r4, ft0;
           ld r0, drc0, 4, r4;
    
    2    : wdf drc0
    
    3    : mov ft0, r0
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 0;
    
    4    : mov ft0, r1
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 1;
    
    5    : mov ft0, r2
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 2;
    
    6    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    7    : mov ft0, r3
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 3;

PVRShaderEditor 2.12, PowerVR Series6

Is this a bug in the shader editor or do we have more register space in vertex shader?

I think that’s because you used mediump, so 2x16bit floats will be packed into a single slot

Changed to highp and 511:

    #version 310 es
    
    precision highp float;
    
    in vec4 inVertex;
    out vec2 textureCoordinate;
    
    layout (std140) uniform ubo
    {
    vec4 position[511];
    } ubo1;
    
    void main()
    {
        textureCoordinate = vec2(1.0, 1.0);
        gl_Position = ubo1.position[510];
    }

Shader still uses registers:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : mov ft0, ft1, c0, 1792
           mov ft2, c0
           cbs ft3, c0
           mov ft4, ft1
           lsl ft5, ft4, c0
           mov idx0, ft5;
    
    1    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 5;
    
    2    : mov ft0, c64
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    3    : mov ft0, sh250[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 0;
    
    4    : mov ft0, sh251[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 1;
    
    5    : mov ft0, sh252[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 2;
    
    6    : mov ft0, sh253[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 3;

Just for fun, I entered the same shader as fragment and as vertex:

    #version 310 es
    
    precision highp float;
    
    in vec2 textureCoordinate;
    out vec4 fragmentColor;
    
    uniform MatrixBlock
    {
    vec4 uniformTest[256];
    } myBlock;
    
    highp int index;
    
    void main()
    {
        fragmentColor = myBlock.uniformTest[ index ];
    }

Fragment disassembly:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : mov i0.e0.e1.e2.e3, sh3
    
    1    : smadd64 ft0, i1, c16, sh2, i0, 
           mov r5.e0.e1.e2.e3, fte
           mov r4, ft0;
           ld r0, drc0, 4, r4;
    
    2    : wdf drc0

Vertex disassembly:

    --------------------- Disassembled HW Code -------------------- 
    
    0    : imad16 ft0, i0.e0, c4.e0, sh2.e0
           mov idx0, ft0;
    
    1    : mov ft0, sh0[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 4;
    
    2    : mov ft0, sh1[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 5;
    
    3    : mov ft0, sh2[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 6;
    
    4    : mov ft0, sh3[idx0]
           mov ft0.e0.e1.e2.e3, ft0
           uvsw.write ft0, 7;

Vertex uses registers, fragment doesn’t.

sorry, I realised that it only changes to non-sh IF you are using ALL 255 mediump floats :slight_smile:
the compiler is smart enough not to allocate space for stuff you don’t need.

I access the block via another uniform (index). How does it know which values index can assume?

Not sure why it’s different… probably the compiler can allocate more for vertex

for vertex the limit seems to be 510, 511 generates a load

#version 310 es

precision mediump float;

in vec2 textureCoordinate;

uniform MatrixBlock
{
vec4 uniformTest[510];
} myBlock;

uniform highp int index;

void main()
{
    gl_Position = myBlock.uniformTest[ index ];
}

Would that be GL_MAX_VERTEX_UNIFORM_VECTORS?

yup, seems like it :slight_smile:

Are there official specs for PowerVR6 for that?

That could vary per GPU, so I’d say your best bet is to query said GL variable.

GX 6250

According to here it should be 256:

I don’t get it.

have a look here:
http://opengles.gpuinfo.org/displayreport.php?id=3025

Its still 256 just like for fragment shaders.

I don’t understand why the compiler behaves differently in both cases.

Does the compiler just assume some value? Are there Powervr6 chips chat provide 512 uniform vec4 shared registers?

the compiler probably compiles for one of the series 6 gpus. Other gpus could have different configurations like in this case.

GE8320 should have 512 for vertex

I’m using a Series6 compiler profile in the PVRShaderEditor. I would expect that such a compiler assumes capabilities that all Series6 cards have. That means I would expect it to know that there is at least one series 6 card ( like the here present Rogue GX6250 ) where GL_MAX_VERTEX_UNIFORM_VECTORS = 256 and thus not produce assembly that seemingly assumes 512. At least I would expect it to be consistent between vertex and fragment shader if the values are the same.

Or can you see some nifty optimization in the compiler that magically grants twice the available amount of indices to the vertex shader for certain cases?

Btw according to the spec the sh registers are allocated from the common store:

How fast is the common store compared to whatever caches texelFetch uses? Does the common store use other caches close to the chip?

Regards