Number of Available Uniforms?

I know the OGLESv2 spec says a minimum of 128 uniforms for the vertex shader and this is the value the emulator spits out. Is this the value used in the TI OMAP3530 SGX driver?Â

Basically I'm reimplementing some commonly used functionality from OGL 2.0 back into ES 2.0 and my fixed function uber shader is likely to exceed 128 uniforms. So i'll either have to pack multiple values in one uniform location or split the shader. Both of which will require me to store uniforms values client side....

Another Question, does the OGLESv1.1 driver use the Neon extensions of the ARM Cortex A8 for things like matrix transformations?

Adventus wrote:

I know the OGLESv2 spec says a minimum of 128 uniforms for the vertex shader and this is the value the emulator spits out. Is this the value used in the TI OMAP3530 SGX driver?Â

Basically I'm reimplementing some commonly used functionality from OGL 2.0 back into ES 2.0 and my fixed function uber shader is likely to exceed 128 uniforms. So i'll either have to pack multiple values in one uniform location or split the shader. Both of which will require me to store uniforms values client side....


Yes, the maximum is 128 uniform vectors. Whether splitting the shader or using uniforms for multiple purposes is better depends on the shader code. What functionality are you trying to replicate?



Quote:
Another Question, does the OGLESv1.1 driver use the Neon extensions of the ARM Cortex A8 for things like matrix transformations?

This is really a question for TI but I suspect the answer is no.

Quote:
Yes, the maximum is 128 uniform vectors. Whether splitting the shader or using uniforms for multiple purposes is better depends on the shader code. What functionality are you trying to replicate?
Yeah, I figured at some point i would need to store everything client side anyway.... so i've already done that and it didn't appear to produce too much overhead. Basically I'm trying to implement as much of OGL2 as i can, so in the shader i currently have "to ogl spec" Alpha Test, Materials / Lighting, Texture Coord Generators, Fog, Multitexturing and Clip Planes. Storing the Attenuation factors as a vec3 instead of a float[3] saved me 16 uniforms, likewise i gained another 8 via combining the SpotExponent and SpotCutoff variables. I think I'll be able to get away with one shader, but would their be any performance advantages in splitting it?

Quote:
This is really a question for TI but I suspect the answer is no.
Oh OK. In that case my matrix transformations will probably out perform the OGLESv1.1 ones, the difference between NEON and VFPlite are pretty dramatic.
Adventus wrote:
Yeah, I figured at some point i would need to store everything client side anyway.... so i've already done that and it didn't appear to produce too much overhead. Basically I'm trying to implement as much of OGL2 as i can, so in the shader i currently have "to ogl spec" Alpha Test, Materials / Lighting, Texture Coord Generators, Fog, Multitexturing and Clip Planes. Storing the Attenuation factors as a vec3 instead of a float[3] saved me 16 uniforms, likewise i gained another 8 via combining the SpotExponent and SpotCutoff variables. I think I'll be able to get away with one shader, but would their be any performance advantages in splitting it?

Does that mean you have a single fragment shader as well?

Quote:
Does that mean you have a single fragment shader as well?
Yea thats correct. Is there a problem with this?Â

I'm hoping the shader compiler does some good static optimisation, otherwise i'll have to split the shaders.


Adventus wrote:
Yea thats correct. Is there a problem with this?

I'm hoping the shader compiler does some good static optimisation, otherwise i'll have to split the shaders.

It's massively less efficient than an optimized fragment shader for every state combination. Splitting is essential here. However you may be able to generate lots of shaders from a single source using the preprocessor.

Quote:
It's massively less efficient than an optimized fragment shader for every state combination. Splitting is essential here.
Yea I'm beginning to realise that. Simply having the Alpha Test code in there but disabled significantly reduces my fps. I also noticed that the size of my fragment code can also destroy performance....

I presume the 64 fragment uniform vectors the emulator spits out is the same as on actual hardware?Â

Quote:
However you may be able to generate lots of shaders from a single source using the preprocessor.
Ahhh thats a good idea, didn't think of that. Is it possible to pass defines to the shader compiler from your application? Or will i have to modify the shader binary/source from the app....Â
Adventus wrote:
I presume the 64 fragment uniform vectors the emulator spits out is the same as on actual hardware?

Yes, for this specific hardware and drivers.



Quote:
However you may be able to generate lots of shaders from a single source using the preprocessor.Ahhh thats a good idea, didn't think of that. Is it possible to pass defines to the shader compiler from your application? Or will i have to modify the shader binary/source from the app....

You can't modify shader binaries, but for source it's a simple case of putting '#define's in front of your source. Note that you can pass multiple strings to glShaderSource which will be concatenated.

>>  Another Question, does the OGLESv1.1 driver use the Neon extensions of the ARM Cortex A8 for things like matrix transformations?



 

We do not write intrinsics today, but we do use the FPU option in the compiler. Ensure that you are on the latest OMAP3 SDK.

 

We would be interested in the results of various operations on the shader.

>> We do not write intrinsics today, but we do use the FPU option in the compiler. Ensure that you are on the latest OMAP3 SDK.
OK thats promising. It would be a huge waste otherwise. I presume the ARM compiler has good NEON support and can vectorise pretty well. Since i've implemented my own OGL matrix tranformation code (and CodeSourcery isn't so good with NEON), i'll probably have to write intrinsics or ASM to get decent performance.Â

PS: I'm not yet developing on actual hardware (I'm waiting for a Pandora). I'm just using the OGLES2.0 PowerVR emulator.Â

Me again. Sorry for the numerous questions, but it really helps.

I've shifted to a multiple program object design (using defines) but there appears to be a maximum number of object you can have (~256). This is no way near enough for the 2 clipplane states, 2 fog states, 7 types of Alpha test and the 4 texture units each with 5 different environment types (and this isn't including the complicated GL_COMBINE mode). 2*2*7*(6^4) = 36288 fragment shaders. I'm going to have to dynamically compile shaders as i encounter them.

As i see it i have two options:

1. Have a single program object and relink the fragment shader each time a state change occurs.

2. Have a program object for each fragment object.

Which one will produce the best performance? Will my uniforms be maintained in (1) after a relinkage? They'll obviously need to be reuploaded in (2).

Thanks.

Just FYI - Xmas is away at the moment, but I’m sure he’ll get back to you when he can about this soon.

Adventus wrote:

As i see it i have two options:

1. Have a single program object and relink the fragment shader each time a state change occurs.

2. Have a program object for each fragment object.

Which one will produce the best performance? Will my uniforms be maintained in (1) after a relinkage? They'll obviously need to be reuploaded in (2).

Thanks.


Definitely (2). Few applications use more than a few dozen state combinations for rendering at any time, so you won't need lots of program objects. Uniform values are not maintained over a glLinkProgram call.