Render to Buffer - Slow performance & Error

Hi all,



Regarding to my recent post “glDrawArrays() issue with varying Texture Source” I’ve tried to Implement Image Rotation by using multiple FBO’s. I’m getting better performance in Opengl in PC(50% better performance). But there is reduced No performance improvement in OpenGLES in my OMAP 3 device. Pardon me if I did any silly mistakes.

Here are the problems I’m facing.

  1. Error in attaching the Texture objects. (Temporarily fixed by generating the Texture during Draw Process)
  2. NO Performance improvement noticed.





    Here is my code snippet. Kindly give me your suggestions.
bool OGLES2Texturing::InitView(){<br />
LoadShaders();<br />
// Sets the clear color<br />
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);<br />
LoadVbos();//Load VBO's to Draw a QUAD with Textures<br />
LoadTextures(); // To Generate 3 Texture objects<br />
LoadImageBuffers();//To Load Source image Buffer(RGBA)<br />
LoadFBO();<br />
DrawOffScreen(0.6);<br />
return true;<br />
}<br />
void OGLES2Texturing::LoadTextures(){ 	//To Generate 3 Texture objects<br />
glGenTextures(3, m_TextureId);<br />
for (int i = 0; i < 3; i++)	{<br />
glBindTexture(GL_TEXTURE_2D, m_TextureId<i>);<br />
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, g_ImageWidth, g_ImageHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE,NULL);<br />
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);<br />
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);<br />
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);<br />
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);<br />
}<br />
glBindTexture(GL_TEXTURE_2D,0);<br />
}<br />
void OGLES2Texturing::LoadFBO(){	//To Generate 3 FrameBuffer objects<br />
glGenFramebuffers(3, m_auiFboId);<br />
for(int Index = 0; Index < 3; Index++)	{<br />
glBindFramebuffer(GL_FRAMEBUFFER, m_auiFboId[Index]);<br />
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, m_TextureId[Index], 0);<br />
glClear(GL_COLOR_BUFFER_BIT);<br />
}<br />
glBindFramebuffer(GL_FRAMEBUFFER,0);<br />
}<br />
bool OGLES2Texturing::DrawOffScreen(GLfloat fAngle)<br />
{<br />
PVRTMat4 mRot,mTrans, mMVP;<br />
mTrans = PVRTMat4::Translation(0,0,0);<br />
mRot = PVRTMat4::RotationZ(fAngle);<br />
mMVP = mTrans * mRot ;<br />
for(int Index = 0; Index < 3; Index++)	{<br />
<br />
glBindFramebuffer(GL_FRAMEBUFFER, m_auiFboId[Index]);<br />
glViewport(0,0,g_ImageWidth,g_ImageHeight);<br />
glClearColor(1.0f, 1.0f, 1.0f, 1.0f);<br />
glClear(GL_COLOR_BUFFER_BIT);<br />
<br />
glUseProgram(m_uiProgramObject);<br />
<br />
glBindBuffer(GL_ARRAY_BUFFER, m_ui32VboId);<br />
glEnableVertexAttribArray(VERTEX_ARRAY);<br />
glVertexAttribPointer(VERTEX_ARRAY, 3, GL_FLOAT, GL_FALSE, m_ui32VertexStride, 0);<br />
<br />
glEnableVertexAttribArray(TEXCOORD_ARRAY);<br />
glVertexAttribPointer(TEXCOORD_ARRAY, 2, GL_FLOAT, GL_FALSE, m_ui32VertexStride, (void*) (3 * sizeof(GLfloat)));<br />
<br />
(1)glGenTextures(3, m_TextureId);//->Here is a bug if i didn't generate the Textures again here. the output sreen is Left empty. So I've added this line again. Correct me if i did anything wrong.<br />
<br />
glBindTexture(GL_TEXTURE_2D, m_TextureId[Index]);<br />
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, g_ImageWidth, g_ImageHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, SourceImageBuffer+(g_ImageSize*Index));<br />
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);<br />
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);<br />
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);<br />
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);<br />
<br />
int i32Location = glGetUniformLocation(m_uiProgramObject, "myPMVMatrix");<br />
glUniformMatrix4fv(i32Location, 1, GL_FALSE,  mMVP.ptr());<br />
<br />
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);<br />
}<br />
GLubyte *pOutTexData = new GLubyte[g_ImageSize*g_No_of_Frames];<br />
memset(pOutTexData,0,g_ImageSize * g_No_of_Frames);<br />
for (int Index = 0; Index < 3; Index++)	{<br />
glBindFramebuffer(GL_FRAMEBUFFER, m_auiFboId[Index]);<br />
glReadPixels(0,0,g_ImageWidth ,g_ImageHeight,GL_RGBA,GL_UNSIGNED_BYTE,pOutTexData+(g_ImageSize*Index));<br />
}<br />
Write_pOutTexData_to File();<br />
glBindBuffer(GL_ARRAY_BUFFER, 0);<br />
glBindFramebuffer(GL_FRAMEBUFFER, 0);<br />
return true;<br />
}
```<br />
<br />
<br />
<br />
</i>

Hi Balaji,



So from looking at your code I can spot a fairly serious error which is probably what’s causing the problems. When you call “glGenTexture” in the middle of your render loop, what you’re doing is generating three new, undefined textures. You were getting problems before because you’re trying to write to AND read from the same textures at the same time, which causes some undefined behaviour. That you get any output at all is slightly surprising :)!



What you need to do is upload your three textures to read from, but generate three DIFFERENT textures to be used as the framebuffer attachments. After that, remove the gentextures in the render loop. If you fix that it’ll hopefully work and be more performant?



There are some other changes I can suggest after you’ve fixed that issue, but it’s worth doing this first to see what difference that makes.



Let me know if you have any questions!

Tobias

Hi @Tobias!

Thank you very much for your valuable reply!

I’ve tried your method of Render To Texture & I’ve achieved improved performance(VGA Image:Before :~100 ms, Now:~75 ms) for a single frame.



I’m looking forward for any other suggestions from you.



Thank you again for spending your valuable time.

Regards,

Balaji.R



Here is my corrected code snippet:

InitView()<br />
{<br />
LoadShaders();<br />
GenerateTextures();<br />
GenerateFBOTextures();<br />
LoadImageBuffers();<br />
LoadVbos();<br />
LoadFBO();<br />
DrawOffScreen(0.163);<br />
return true;<br />
}<br />
GenerateFBOTextures(){<br />
glGenTextures(3,m_FBOTextureId);<br />
for (int i = 0; i < 3; i++)	{<br />
glBindTexture();<br />
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, g_ImageWidth, g_ImageHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);<br />
glTexParameterf();...<br />
}<br />
}<br />
LoadFBO(){<br />
...<br />
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, m_FBOTextureId[Index], 0);<br />
...<br />
}<br />
<br />
DrawOffScreen(GLfloat fAngle)<br />
{	...<br />
for(int Index = 0; Index < 3; Index++)	{...<br />
glBindFramebuffer(GL_FRAMEBUFFER, m_auiFboId[Index]);<br />
glBindTexture(GL_TEXTURE_2D, m_TextureId[Index]);<br />
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, g_ImageWidth, g_ImageHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE,  g_SourceImageBuffer+(g_ImageSize*Index));...<br />
DrawArrays();<br />
}<br />
for (int Index = 0; Index < 3; Index++)	{<br />
glBindFramebuffer(GL_FRAMEBUFFER, m_auiFboId[Index]);<br />
glReadPixels(0,0,g_ImageWidth ,g_ImageHeight,GL_RGBA,GL_UNSIGNED_BYTE,pOutTexData+(g_ImageSize*Index));<br />
}<br />
Write_pOutTexData_to File();...<br />
}<br />

```<br />
<br />

Hi Balaji,



Actually to be honest, your fixed and updated code snippet is about as fast as you’re going to get for this type of operation, the only way you could really get better performance now is if you add even more textures. Depending on how many textures you add, there might be some other things that could be done to make it a bit faster as well. If you stick with the three textures you have now, you’ve basically got the best performance you could have. Let me know if you would consider adding more textures!



Thanks,

Tobias

Hi @Tobias!

Actually I’m trying to process the live video from a camera with Opengl. I think in camera driver side we can configure to process only 3-5 frames(max). If we want to process a stored video file then we can add more textures. Lets assume I’m doing a playback using this method, could you please suggest me the maximum frames i can process in a single iteration?



Thanks,

Regards,

Balaji.R

Oh right, it’s live video! I didn’t realise that. In which case, what OS are you operating on? There might be a much more efficient way of processing this stuff.

@Tobias

I’m now targeting Windows CE 7 os. I’m glad to hear from you.



Regards,

Balaji.R

Hi Balaji,



Yes I’m afraid I was at a conference all last week, sorry it’s taken so long to get back to you!



Anyway unfortunately you said one of the OS’s where there’s no native mechanism to do what I was hoping. On Android for instance, there are ways to avoid various data copy/transfer operations - the camera can write directly to GPU accessible memory for instance. WinCE doesn’t have any such niceties I’m afraid :frowning:



Regardless, if you’re processing live video and can pump more textures in at once, you can still get a speed boost. So instead of your current “upload all the textures, do all the renders, then do all the reads” you should instead interleave them. You essentially need a delay of maybe 10 textures between the reads and the renders, and then again between the renders and the reads. 10 is an entire guess at the appropriate figure, you might get better performance with more delay, so I’d make it configurable and tweak accordingly. Whatever the delay is, you’ll need that many framebuffers, and twice that many textures - so for delays of 10, you’ll need 10 fbos, and 10 texture objects for uploading to, and 10 texture objects to attach to the framebuffers.



In case that isn’t clear, here’s the order you’re going for:



Upload 1

Upload 2



Upload 10

Draw 1

Upload 11 // You can start re-using texture handles at this point.

Draw 2

Upload 12



Upload 20

Draw 10

Upload 21

Draw 11 // You can start re-using framebuffers now.

Read 1

Upload 22

Draw 12

Read 2

… //Many frames later, at the end of the video…

Upload n-1 //Final Upload

Draw n-11

Read n-21

Draw n-10

Read n-20



Draw n-1 //Final draw

Read n-11

Read n-10



Read n-1 //Final read



Hope that makes sense… you’re essentially setting up a circular buffer of objects here. As well as this, I’d suggest putting in some glFlush commands after each draw - I’m not 100% sure if that will do anything on the WinCE drivers, but if it does it should help regulate the framerate. If you do implement this and you see a lot of stuttering, it means it’s not working and we’ll try something else. Regardless, this should theoretically make things go a lot faster than your current version.



Regards,

Tobias



P.S. Sorry for the long-winded explanation! Kind of needed it I’m afraid.