Can UBOs be stored in registers?

MartonTamas · January 21, 2020, 9:53am

yes that compiler compiles for one specific GPU. I wouldn’t expect vertex and fragment to be consistent as the two workloads might require different setups. I can ask around what causes the difference and get back to you

I’d say the difference is that you can expect the common store to be set up and good to go when the program starts executing. Texel fetches are dynamically served and have to go through the cache hierarchy.

desperado · January 21, 2020, 10:26am

That would be nice.

Basically I stand before the decision of putting a lookup table either in an UBO (that resides in the sh registers / common store ) and a texture (whose accesses might also be supported by a cache) and I need to determine what would be faster.

MartonTamas · January 22, 2020, 1:43pm

So there’s not much explanation for the difference, it’s just simply how it works.

How big is that lookup table?
I’d test performance in either case!

desperado · January 22, 2020, 8:40pm

Does that mean that the common store is large enough to store more than 256 vec4 for UBOs for PowerVR6 chips? Should this be possible?

desperado · January 22, 2020, 11:24pm

Ok, this spec here for the G6230
http://www.actions-semi.com/upload/file/20170728/6363680778367191661795441.pdf
Lists 128 bit x 1024 common store. Is there a rule in the compiler to switch if this is exhausted?

@MartonTamas: Btw Can you find out what latency the common store has compared to the L0 texture cache?

Tx

MartonTamas · January 23, 2020, 10:00am

when it says it has space for 512 or something than you should be able to utilise that.

MartonTamas · January 23, 2020, 10:09am

Lists 128 bit x 1024 common store. Is there a rule in the compiler to switch if this is exhausted?

I think if you use more than the available common store, your SH reads will turn into DMA (LD) operations, which is quite a bit slower.

Btw Can you find out what latency the common store has compared to the L0 texture cache?

I don’t think I can find that out, but registers being registers you can read them in the same cycle ie. no cost or 1 cycle.
Data needs to go through the cache hierarchy so you’d potentially need to wait for data to arrive from system memory. If it’s already in the cache, L0 should be around 3-5 cycles, L1 should be 10-20 cycles, L2 should be around 20-40 cycles.
DRAM should be around 100 cycles or more, but I’d definitely write a test and run it on the target GPU that you’d use.

desperado · January 23, 2020, 1:09pm

What irritates me is that the number of available common store entries (1024) seems to be much larger than what is returned by GL_MAX_VERTEX_UNIFORM_VECTORS (256).

Also my assumption was that sh registers are not actual registers but pseudo-registers allocated from the common store. That’s why I wanted to know the latency of the common store because I assume that is the latency of the sh “registers”.

MartonTamas · January 23, 2020, 1:27pm

that’s 1024 * 128 bits, precisely what the hardware has. We probably need to suballocate so that we can have multiple tiles in flight etc. I don’t think it’s wasted

no, they are registers

desperado · January 23, 2020, 1:48pm

OK, could you please again explain this sentence to me:

“Shared Registers are allocated from the Common Store and may be indexed.”

This is what gave me the thought that sh registers are not actual registers. Are you still rolling with the 1 cycle latency?

Also, is there a definitive way to calculate the available sh register memory using glGet commands?

MartonTamas · January 24, 2020, 3:41pm

Temporary and Attribute registers for example are allocated from the Unified Store but they are still registers
so yes 1 cycle latency
I’d say just grab the value that GL gives you, as you can see you might not be able to use all 512 for some reason (eg. driver/compiler needs some space), but you should be able to use most of them.

Topic		Replies	Views
GLSL ES compiler, assembler output? PowerVR Insider pvrvframe	2	327	May 18, 2010
Does PVRShaderEditor support gl_LastFragData?	10	1096	December 22, 2020
GLSL shader compiler crash PowerVR Insider	1	471	October 16, 2013
PVRShaderEditor USSE Asm inactive PowerVR Insider pvrshadereditor	5	382	October 30, 2012
Using multiple shaders PowerVR Insider	6	283	March 9, 2009

Can UBOs be stored in registers?

Related Topics