GLSL Precision Qualifiers vs Vertex Attributes




Hello,
 
what are rules of the thumb when it comes to setting vertex attributes state with glVertexAttribPointer, given the shader precision qualifier being mediump or lowp ?
 
If I preprocess my data by simply converting floats -> half-floats, what is the proper glVertexAttribPointer-esque function to define the attributes state? glVertexAttribPointer with GL_SHORT as type?
Or am I completely wrong, and since glVertexAttribPointer sets client state (confirm?), the conversion to mediump happens somewhere down the pipes when copying the memory from client -> server?
 
thanks, L.
 
 

lykathea2012-08-23 23:54:36






Hi lykathea,





Sorry for the delay in replying to this, we've been fairly
busy getting ready for our next release.





There's no real need to worry about how the types relate
between your upload and how they're read in the shader - the drivers will
handle that for you. The only thing you really need to worry about is the
*amount* of data you're sending, so if you can use half floats, you should. To
use half floats though, you'll need to query the extension
"GL_OES_vertex_half_float". Then you would pass "GL_HALF_FLOAT_OES"
to glVertexAttribPointer. Whichever data you use in your client array must
match whatever is passed here. The final read in the shader can be whatever you
need it to be.





Hope that helps?





Thanks,





Tobias





Normal
0




false
false
false

EN-GB
X-NONE
X-NONE






































































































































































/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin:0cm;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;
mso-fareast-language:EN-US;}







Hello Tobias,

Thank you for your reply, I'm targeting iOS, so 'GL_OES_vertex_half_float' is sadly a no-go to me.

GLfloat  FloatBuffer[BufferSize];
GLushort HalfFloatBuffer[BufferSize];

void ConvertBuffers(const 
GLfloat   *src, 
GLushort  *dst, GLsize size)
{
    for(GLsize i = 0; i < size; ++i)
    {
        dsti ] = FloatToHalfFloat( src i ] );
    }
}

ConvertBuffers(FloatBuffer, HalfFloatBuffer, BufferSize);

Say that I preprocess my vertex data like this, I guess that my FloatToHalfFloat function could even be implemented in terms of GCC's '__fp16' ARM-only type. 

Now, uploading the vertex data:

glVertexAttribPointer(AttributeIndex, AttributeComponents, GL_UNSIGNED_SHORT, AttributeNormalized, AttributeStride, HalfFloatBuffer);

Is the above correct? Will the driver handle this case by itself, taking the type parameter merely as a size hint? 

Given this shader definition:
....
attribute mediump vec3 Attribute;
....

thanks, L.




lykathea2012-09-16 14:47:35

Hi Lykathea,

Sorry for the late reply, I have been away for a week.

The type is not merely a size hint I’m afraid, it is in fact a hard type definition, so doing this will produce in correct values. The only thing I can suggest is converting your values to GL_SHORT, then set the “normalised” flag to GL_TRUE. These values will be interpreted as fixed point values between -1 and 1.

If your values are already normalised between -1 and 1, then the conversion is a simple case of multiplying your values by 32767. If your values are unsigned you can use GL_UNSIGNED_SHORT instead and multiply by 65536. Either way, you’re essentially looking to make the range match the range of values associated with the short type.

By setting the “normalised” flag to GL_TRUE, these values will be read back as floating point when accessed in the shader, between either -1 and 1 or 0 and 1 for unsigned.

Hope that makes sense?

Thanks,
Tobias



It’s been quite a time but nonetheless … I’d like to solve this :slight_smile:



Well, yes, it makes sense, but feels like a detour around what I’m really looking for, besides it puts another stress on the pipeline in terms of [0, 1] -> medium float conversion - the work I wanted to actually get done offline, as I’m trying to reduce the memory bandwidth while not hogging shaders with a single line of new code.



What I’m probably really proposing here is OpenGL extension which would solve the discrepancies between the type mappings of glVertex*** APIs and Shader code in such a way:



GL_MEDIUMP_FLOAT / GL_HALF_FLOAT -> mediump float



While also specifying the bit representation of these types like GL_MEDIUMP_FLOAT -> 1s.5e.10m encoding IEEE 754-2008. The GL_LOWP_FLOAT format is totally obscure here … These things are by spec vendor dependent.



In 3.4. Promote Calculations up ‘The Chain’ chapter of the PowerVRSGX performance recommendations is mentioned pre-baking, while in chapter 4.3. Data Types we get:


Vertex shaders always expect attributes to be of the type ‘float’, this means that all types except ‘float’ will require a conversion. This conversion is performed in the USSE pipeline and costs a few additional cycles; thus the choice of attribute data type is a trade-off between shader cycles, bandwidth/storage requirements and precision. It is important that type conversion is considered as bandwidth is always at a premium.
Precision requirements should be checked carefully, the ‘byte’ and ‘short’ types are often sufficient, even for position information. For example, scaled to a range of 10m the ‘short’ types give you a precision of 150 μm. Scaling and biasing those attribute values to fit a certain range can often be folded into other vertex shader calculations, e.g. multiplied into a transformation matrix.
(what kind of float are we talking here? lowp, medium, high? or floats in general?)

Or is that part made slightly invalid by the chapter 4.6. Padding:

When vertex data is interleaved, each vertex should be aligned to a four byte boundary; when vertex data is not interleaved each element in each array of vertex data should be aligned to a four byte boundary.

Meaning that it doesn't really make sense to try to upload less than 4bytes per element/vertex? If that is right, then it really cannot be helped on this level but wait:

Let's say that I load my vertex data in the aforementioned 1s.5e.10m format and want to upload them to the GPU, so what can I do:

1 - Just convert them to full blown float32 at runtime, and let the driver do the rest.
2 - Do like you proposed in your post

(Option 1) is just plain wrong, since the only bandwidth I probably save here is Storage -> Memory, I guess that on-fly decompression of the vertex data using zlib/snappy would save more space while not wasting that much more cycles. The only justification for this is probably combining both compressed + half-float vertex data, that format would probably need to be taken with further care, since stuffing that many information bits into so tightly packed space makes the problem of high entropy apparent. (Option 2) Puts a burden on the runtime shader performance, also code needs to be updated in non-obvious ways, as well as the art pipeline.

We need (Option 3).



TL;DR version:
I guess that all we need is having the names for the mediump and lowp types available in the glVertex*** family of functions and making the OpenGL client state aware of them, since the server obviously already knows how to handle them. That solution would also work nicely in the the context that "The type is not merely a size hint I'm afraid, it is in fact a hard type definition ..."

Where can we call for an extension? :)

thanks, L.

EDIT: According to this Japanese website :
PowerVR SGX unified fp32 fp16 fix10 fp32 fp16 fix10
PVR SGX543MP unified fp32 fp16 fix10 fp32 fp16 fix16?

The lowp is actually fix10, it seems to be in the Q10 format at least on my iPhone 4S:

``` const char * shader_names[2] = {"GL_FRAGMENT_SHADER", "GL_VERTEX_SHADER"};
const char * precision_names[6] = {"GL_LOW_FLOAT", "GL_MEDIUM_FLOAT", "GL_HIGH_FLOAT", "GL_LOW_INT", "GL_MEDIUM_INT", "GL_HIGH_INT",};
int range[2], precision;
printf("[X3D.PrecisionQualifiers.Dump] START --n");
for(int i = GL_FRAGMENT_SHADER; i <= GL_VERTEX_SHADER; ++i)
{
for(int k = GL_LOW_FLOAT; k <= GL_HIGH_INT; ++k)
{
glGetShaderPrecisionFormat(i, k, range, &precision);
printf("(%s %s) Precision: %i Range: [%i:%i]n", shader_names[i-GL_FRAGMENT_SHADER], precision_names[k-GL_LOW_FLOAT], precision, range[0], range[1]);
}
}
printf("[X3D.PrecisionQualifiers.Dump] END --n"); ```

Which outputs:

``` [X3D.PrecisionQualifiers.Dump] START --
(GL_FRAGMENT_SHADER GL_LOW_FLOAT) Precision: 8 Range: [0:0]
(GL_FRAGMENT_SHADER GL_MEDIUM_FLOAT) Precision: 10 Range: [15:15]
(GL_FRAGMENT_SHADER GL_HIGH_FLOAT) Precision: 23 Range: [127:127]
(GL_FRAGMENT_SHADER GL_LOW_INT) Precision: 0 Range: [23:23]
(GL_FRAGMENT_SHADER GL_MEDIUM_INT) Precision: 0 Range: [23:23]
(GL_FRAGMENT_SHADER GL_HIGH_INT) Precision: 0 Range: [23:23]
(GL_VERTEX_SHADER GL_LOW_FLOAT) Precision: 8 Range: [0:0]
(GL_VERTEX_SHADER GL_MEDIUM_FLOAT) Precision: 10 Range: [15:15]
(GL_VERTEX_SHADER GL_HIGH_FLOAT) Precision: 23 Range: [127:127]
(GL_VERTEX_SHADER GL_LOW_INT) Precision: 0 Range: [23:23]
(GL_VERTEX_SHADER GL_MEDIUM_INT) Precision: 0 Range: [23:23]
(GL_VERTEX_SHADER GL_HIGH_INT) Precision: 0 Range: [23:23]
[X3D.PrecisionQualifiers.Dump] END -- ```

Hi Lykathea,



Okay, not sure where to begin replying on this one - that’s a big post you’ve got there!



I’ll try to answer everything I can think of here, but I certainly don’t think an extension is going to happen anytime soon I’m afraid :(!



So firstly - most people think the ideal situation is that there are half and float data types instead of this arbitrary precision stuff. That would certainly make many developer’s lives easier anyway. The reason these precisions exist as they do (vendor specific) is that the hardware designed for this stuff was originally created 6 years ago - we barely had smart phones back then :O! At the time, the techniques we see today hadn’t really taken off on mobile, so it wasn’t a concern at the time - for future hardware I suspect you’ll start to see things converge more toward the way desktop has it, but it’s not something that’ll be solved in the short term; because these are hardware limitations, not something that can be solved in software.



As for the translations between API and GLSL, you’ve just got to assume that the precision you get will be the lower precision of the two - so GL_HALF_FLOAT/highp, the precision will be limited to 16 bits, whereas for GL_FLOAT/highp, the precision will be limited to the bits of the highp qualifier. The values you’ve got look about right for the various precision qualifiers. As for the lowp qualifier, even though it’s a fixed function, it still will just be a translation of whatever is passed into the API - if you pass in a fixed value, it’ll be translated from fixed -> lowp, and if you pass in a float, it’ll be translated from float -> lowp. The main thing here is that the host and shader code treat the formats of these as completely separate, and changing that model would require a larger change to the API than you’d think - so an extension isn’t really viable I’m afraid!



As for option 2 - there’s no run time shader cost here; two half float values fit into 4-byte boundaries properly, so there’s no issue. The hardware can easily deal with this - the conversion is handled extremely efficiently. The performance guideline is there to warn against doing something like having a 3-byte unit of data (e.g. an interleaved half and byte, which would cause misaligned data).



The best thing I can suggest to you is just to submit a bug report with Apple requesting the half float feature, otherwise I’m afraid you will just have to use either full float or convert to fixed - sorry I can’t be of more help here! :frowning:



And a quick note about the floating point calculations: all the float precisions are quick and well optimised, integer types are the slow ones.



Regards,

Tobias