Hi!
I currently have a single input texture that needs to be filtered by three different shaders and emitted into three different output textures. A school book example of MRTs, problem is, OpenGLES 2 do not expose MRT. I’m developing for the iPad2+ and iPhone4+.
In my current implementation I have 3 different FBOs, each assigned to a unique output texture. I currently perform the rendering like this:
- Assign the input texture to texture unit 0
- Make FBO 1 the render target
- Activate shader 1
- Draw full screen quad
- Make FBO 2 the render target
- Activate shader 2
- Draw full screen quad
- Make FBO 3 the render target
- Activate shader 3
I have hard time measuring where the performance go but my guts feeling says all this will become highly sequential, mainly because I’m reusing texture unit 0 all the time.
How can I make this as fast as possible is my question? I have multiple ideas in my head but I thought you guys might give me the correct one directly. Here are my ideas:
First idea:
- Assign the input texture to texture unit 1,2 & 3
- Make FBO 1 the current render target
- Activate shader 1 (using texture unit 1)
- Draw full screen quad
- Make FBO 2 the current render target
- Activate shader 2 (using texture unit 2)
- Draw full screen quad
- Make FBO 3 the current render target
- Activate shader 3 (using texture unit 3)
- Draw full scree quad
Idea 2:
- Assign the input to texture unit 1
- Use a 2x bigger output texture attached to a single FBO
- Use conditional IFs in the shader to select what’s rendered into each quadrant of the output texture
I have loose ideas using a cube texture as output also but I dunno about that.
So, how do I “emulate” multiple render targets in the most efficient way? I would prefer to have parallelism here!
Thanks in advance!
Kind regards, Andreas Larsson
Hi,
The Series5 architecture processes a single render pass at a time. It cannot process multiple render passes in parallel.
Because of this, the approach you are already using should be suitable. Binding the texture to unit 0 for all render passes will not result in worse performance than binding the texture to a different unit for each render pass.
You are most likely bandwidth limited. If you have not done so already, you can reduce your bandwidth overhead by compressing your input texture (e.g. PVRTC 2bpp or 4bpp).
Additionally, if the input texture is static (i.e. you’re not updating the contexts of the texture or its filtering modes etc, at run-time) and the effect you are applying in a given render pass is static (i.e. the effect produces identical results each frame), then you could generate the results of the render passes off-line.
Regards,
Joe
Hi and thanks for a swift reply! 
My input texture is fully dynamic, re-rendered each frame. Three questions:
- Basically to improve performance, I need to reduce the resolution, simple as that?
- How can I measure how much time each buffer takes, because when I measure the time between one FBO switch and another I always get 0ms, where as the whole frame gets quite some ms. Is everything batched towards the end of the frame by the driver or what? Can I “flush” somehow?
- How would I use discard frame buffer in this case? F.e. the rendered input texture is of no longer use when I’ve rendered the three outputs. If I do not discard this will the GPU do some copy? Can I get some performance back by discarding here?
Kind regards, Andreas
Not sure if my follow up questions got hidden from you since I accidentally checked “answered” after posting the follow ups…