glDrawArrays() issue with varying Texture Source

Hi all,

I’m working on Opengl ES 2.0 using OMAP3530 development board on Windows CE 7.

My Task is to Load a 24-Bit Image File & rotate it about an angle in z-Axis & export the image file(Buffer).

For this task I’ve created a FBO for off-screen rendering & loaded this image file as a Texture by using glTexImage2D() & I’ve applied this Texture to a Quad & rotate that QUAD by using PVRTMat4::RotationZ() API & Read-Back by using ReadPixels() API. Since it is a single frame process i just made only 1 loop.

Here are the problems I’m facing now.

  1. All API’s are taking distinct processing time on every Sometimes when i run my application i get different processing time for all API’s.

  2. glDrawArrays() is taking too much time (~50 ms - 80 ms)

  3. glReadPixels() is also taking too much time ~95 ms for Image(800x600)

  4. Loading 32-Bit image is much faster than 24-Bit image so conversion is needed.

    I’d like to ask you all if anybody facing/Solved similar problem kindly suggest me any

    Here is the Code snippet of my Application.

    void BindTexture(){

    glGenTextures(1, &m_uiTexture);

    glBindTexture(GL_TEXTURE_2D, m_uiTexture);

    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, ImageWidth, ImageHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, pTexData);




    int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, TCHAR lpCmdLine, int nCmdShow)


    // Fragment and vertex shaders code

    pszFragShader = "Same as in RenderToTexture sample;

    char* pszVertShader = "Same as in RenderToTexture sample;

    CreateWindow(Imagewidth, ImageHeight);//For this i’ve referred OGLES2HelloTriangle_Windows.cpp example



    Generate& BindFrame,Render Buffer();

    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, m_auiFbo, 0);

    glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT16, ImageWidth, ImageHeight);

    glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, m_auiDepthBuffer);


    GLfloat Angle = 0.02f;

    GLfloat afVertices[] = {Vertices to Draw a QUAD};

    glGenBuffers(1, &ui32Vbo);

    LoadVBO’s();//Aps’s to load VBO’s refer

    // Draws a triangle for 1 frames



    glBindFramebuffer(GL_FRAMEBUFFER, m_auiFbo);


    PVRTMat4 mRot,mTrans, mMVP;

    mTrans = PVRTMat4::Translation(0,0,0);

    mRot = PVRTMat4::RotationZ(Angle);

    glBindBuffer(GL_ARRAY_BUFFER, ui32Vbo);


    int i32Location = glGetUniformLocation(uiProgramObject, “myPMVMatrix”);

    mMVP = mTrans * mRot ;

    glUniformMatrix4fv(i32Location, 1, GL_FALSE, mMVP.ptr());

    // Pass the vertex data


    glVertexAttribPointer(VERTEX_ARRAY, 3, GL_FLOAT, GL_FALSE, m_ui32VertexStride, 0);

    // Pass the texture coordinates data


    glVertexAttribPointer(TEXCOORD_ARRAY, 2, GL_FLOAT, GL_FALSE, m_ui32VertexStride, (void*) (3 * sizeof(GLfloat)));

    glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);//

    glReadPixels(0,0,ImageWidth ,ImageHeight,GL_RGBA,GL_UNSIGNED_BYTE,pOutTexData);

    glBindBuffer(GL_ARRAY_BUFFER, 0);

    glBindFramebuffer(GL_FRAMEBUFFER, 0);

    eglSwapBuffers(eglDisplay, eglSurface);



Hi Balaji,

Generally a lot of work has to be done to set up the render on our hardware, which is why you’re seeing quite a large overhead. There are two main things that you need to know about using a GPU for this.

  1. The GPU is asynchronous to the CPU. When you ask it to do work, it doesn’t do it immediately, it takes time to “accelerate” if you will, but then will run really fast. Doing this sort of stop-start of a single render is going to be slow for this reason alone. When you call glReadPixels, you’re stalling the CPU whilst it waits for the GPU to finish.
  2. Transferring data between CPU and GPU accessible memory is slow. Mobile platforms have very limited bandwidth, which seriously slows these operations down, and unfortunately it’s not something that can be sped up.

    So the hardware is designed to be running a lot of steps simultaenously, rather than a single render as you’ve described. One way to speed this up is to simply have multiple operations running in parallel if that works for your App. If you’re doing it as part of a file conversion app, batching multiple files would be the best way to do this.

    The second question to ask is - can you leave the texture in GPU memory rather than pulling it out? If you’re wanting this to be realtime - presumably you’re drawing to the screen anyway?

    If you can explain your use case, and what your targets are, I should be able to give you some more specific advice. Let me know if this makes sense/helps!



Thank you @Tobias for your Kind reply.

Here is my situation.

1)Our device is Displaying a video file in screen by using Direct Show(without using GPU/OpenGL)

2) I’ve to load this video frame in GPU and rotate this Image frame in a specific angle then send this frame back to display device.since it is not handling the Display process, i cannot directly display the image using GPU.

3)So in my case it is also possible to post process two continuous frames.

4) I’ve to integrate my code as a direct show filter.(next step)



Hi Balaji,

Ah ok so you’re using video, that makes more sense. Yes if you can process consecutive frames then you should be able to see a large performance increase. The trick here is to use multiple texture objects (3 or 4 would be ideal), and the same number of framebuffers. You then cycle through each one every frame, so that the GPU can parallelise effectively. Then don’t call glReadPixels in the frame that you draw to it - instead wait until the last minute to read from it before you draw to it - if you read it right before you bind it to draw to it again it’ll give the GPU time to complete the render and avoid most of the stalling otherwise associated with glReadPixels.

In other words, some pseudo code:

Create textures (1,2,3)<br />
Create fbos (1,2,3)<br />
<br />
frame1:<br />
Upload to texture1<br />
Draw to fbo1<br />
<br />
frame2:<br />
Upload to texture2<br />
Draw to fbo2<br />
<br />
frame3:<br />
Upload to texture3<br />
Draw to fbo3<br />
<br />
frame4:<br />
Read from fbo1(result of frame1)<br />
Upload to texture1<br />
Draw to fbo1<br />
<br />
//And so on...
```<br />
<br />
Each texture upload in this case would be a new frame of your video. This means you have a couple of frames of delay still, but you can keep processing frames at the same time.<br />
<br />
Hope that all makes sense?<br />
<br />
Thanks,<br />

Thank you for your Kind Answer @Tobias. I’m working on it…