Number of Available Uniforms?

adventus · April 15, 2009, 1:59am

I know the OGLESv2 spec says a minimum of 128 uniforms for the vertex shader and this is the value the emulator spits out. Is this the value used in the TI OMAP3530 SGX driver?Â

Basically I'm reimplementing some commonly used functionality from OGL 2.0 back into ES 2.0 and my fixed function uber shader is likely to exceed 128 uniforms. So i'll either have to pack multiple values in one uniform location or split the shader. Both of which will require me to store uniforms values client side....

Another Question, does the OGLESv1.1 driver use the Neon extensions of the ARM Cortex A8 for things like matrix transformations?

xmas · April 16, 2009, 2:26pm

Adventus wrote:

I know the OGLESv2 spec says a minimum of 128 uniforms for the vertex shader and this is the value the emulator spits out. Is this the value used in the TI OMAP3530 SGX driver?Â

Basically I'm reimplementing some commonly used functionality from OGL 2.0 back into ES 2.0 and my fixed function uber shader is likely to exceed 128 uniforms. So i'll either have to pack multiple values in one uniform location or split the shader. Both of which will require me to store uniforms values client side....

Yes, the maximum is 128 uniform vectors. Whether splitting the shader or using uniforms for multiple purposes is better depends on the shader code. What functionality are you trying to replicate?

Quote:

Another Question, does the OGLESv1.1 driver use the Neon extensions of the ARM Cortex A8 for things like matrix transformations?

This is really a question for TI but I suspect the answer is no.

adventus · April 17, 2009, 5:58am

Quote:

Yes, the maximum is 128 uniform vectors. Whether splitting the shader or using uniforms for multiple purposes is better depends on the shader code. What functionality are you trying to replicate?

Yeah, I figured at some point i would need to store everything client side anyway.... so i've already done that and it didn't appear to produce too much overhead. Basically I'm trying to implement as much of OGL2 as i can, so in the shader i currently have "to ogl spec" Alpha Test, Materials / Lighting, Texture Coord Generators, Fog, Multitexturing and Clip Planes. Storing the Attenuation factors as a vec3 instead of a float[3] saved me 16 uniforms, likewise i gained another 8 via combining the SpotExponent and SpotCutoff variables. I think I'll be able to get away with one shader, but would their be any performance advantages in splitting it?

Quote:

This is really a question for TI but I suspect the answer is no.

Oh OK. In that case my matrix transformations will probably out perform the OGLESv1.1 ones, the difference between NEON and VFPlite are pretty dramatic.

xmas · April 17, 2009, 12:22pm

Adventus wrote:

Yeah, I figured at some point i would need to store everything client side anyway.... so i've already done that and it didn't appear to produce too much overhead. Basically I'm trying to implement as much of OGL2 as i can, so in the shader i currently have "to ogl spec" Alpha Test, Materials / Lighting, Texture Coord Generators, Fog, Multitexturing and Clip Planes. Storing the Attenuation factors as a vec3 instead of a float[3] saved me 16 uniforms, likewise i gained another 8 via combining the SpotExponent and SpotCutoff variables. I think I'll be able to get away with one shader, but would their be any performance advantages in splitting it?

Does that mean you have a single fragment shader as well?

adventus · April 17, 2009, 11:53pm

Quote:

Does that mean you have a single fragment shader as well?

Yea thats correct. Is there a problem with this?Â

I'm hoping the shader compiler does some good static optimisation, otherwise i'll have to split the shaders.

xmas · April 20, 2009, 12:34pm

Adventus wrote:

Yea thats correct. Is there a problem with this?

I'm hoping the shader compiler does some good static optimisation, otherwise i'll have to split the shaders.

It's massively less efficient than an optimized fragment shader for every state combination. Splitting is essential here. However you may be able to generate lots of shaders from a single source using the preprocessor.

adventus · April 21, 2009, 1:12am

Quote:

It's massively less efficient than an optimized fragment shader for every state combination. Splitting is essential here.

Yea I'm beginning to realise that. Simply having the Alpha Test code in there but disabled significantly reduces my fps. I also noticed that the size of my fragment code can also destroy performance....

I presume the 64 fragment uniform vectors the emulator spits out is the same as on actual hardware?Â

Quote:

However you may be able to generate lots of shaders from a single source using the preprocessor.

Ahhh thats a good idea, didn't think of that. Is it possible to pass defines to the shader compiler from your application? Or will i have to modify the shader binary/source from the app....Â

xmas · April 21, 2009, 9:01am

Adventus wrote:

I presume the 64 fragment uniform vectors the emulator spits out is the same as on actual hardware?

Yes, for this specific hardware and drivers.

Quote:

However you may be able to generate lots of shaders from a single source using the preprocessor.Ahhh thats a good idea, didn't think of that. Is it possible to pass defines to the shader compiler from your application? Or will i have to modify the shader binary/source from the app....

You can't modify shader binaries, but for source it's a simple case of putting '#define's in front of your source. Note that you can pass multiple strings to glShaderSource which will be concatenated.

prabindh · April 23, 2009, 9:15am

>> Another Question, does the OGLESv1.1 driver use the Neon extensions of the ARM Cortex A8 for things like matrix transformations?

We do not write intrinsics today, but we do use the FPU option in the compiler. Ensure that you are on the latest OMAP3 SDK.

We would be interested in the results of various operations on the shader.

adventus · April 23, 2009, 10:55am

>> We do not write intrinsics today, but we do use the FPU option in the compiler. Ensure that you are on the latest OMAP3 SDK.
OK thats promising. It would be a huge waste otherwise. I presume the ARM compiler has good NEON support and can vectorise pretty well. Since i've implemented my own OGL matrix tranformation code (and CodeSourcery isn't so good with NEON), i'll probably have to write intrinsics or ASM to get decent performance.Â

PS: I'm not yet developing on actual hardware (I'm waiting for a Pandora). I'm just using the OGLES2.0 PowerVR emulator.Â

adventus · April 26, 2009, 6:53am

Me again. Sorry for the numerous questions, but it really helps.

I've shifted to a multiple program object design (using defines) but there appears to be a maximum number of object you can have (~256). This is no way near enough for the 2 clipplane states, 2 fog states, 7 types of Alpha test and the 4 texture units each with 5 different environment types (and this isn't including the complicated GL_COMBINE mode). 2*2*7*(6^4) = 36288 fragment shaders. I'm going to have to dynamically compile shaders as i encounter them.

As i see it i have two options:

1. Have a single program object and relink the fragment shader each time a state change occurs.

2. Have a program object for each fragment object.

Which one will produce the best performance? Will my uniforms be maintained in (1) after a relinkage? They'll obviously need to be reuploaded in (2).

Thanks.

GordonMaclachlan · April 27, 2009, 10:10am

Just FYI - Xmas is away at the moment, but I’m sure he’ll get back to you when he can about this soon.

xmas · April 27, 2009, 11:19am

Adventus wrote:

As i see it i have two options:

1. Have a single program object and relink the fragment shader each time a state change occurs.

2. Have a program object for each fragment object.

Which one will produce the best performance? Will my uniforms be maintained in (1) after a relinkage? They'll obviously need to be reuploaded in (2).

Thanks.

Definitely (2). Few applications use more than a few dozen state combinations for rendering at any time, so you won't need lots of program objects. Uniform values are not maintained over a glLinkProgram call.

Topic		Replies	Views
extremely low performance with too many local variables used in glsl code PowerVR Insider	2	554	October 16, 2013
How many uniform vectors can iPad PowerVR SGX supp PowerVR Insider pvrgeopod	5	350	September 1, 2010
Using multiple shaders PowerVR Insider	6	307	March 9, 2009
Maximum number of varyings ? PowerVR Insider	4	281	August 13, 2012
Varying Vectors issue with X1550 PowerVR Insider pvrvframe	4	335	May 2, 2011

Number of Available Uniforms?

Related topics