First time drawing into texture is slow

I’m using a framebuffer with several large 1280x720 2D textures. I notice that the first time I draw into one of the textures, it takes 40-60ms, but after that, drawing into that same texture is fast. How can I avoid these slow first accesses? I haven’t noticed this issue on another competing mobile GPU.



This is on a Galaxy Nexus (PowerVR SGX 540) with Android 4.2.2. My application deals with video frames, but below is a simple program that demonstrates the problem simply using glClear(). Run the program and view the Android LogCat output and you’ll see that the first glClear() calls take 40-60ms and later ones take 0ms. Tap the screen to draw again, or kill the program and start it fresh to see the 40-60ms again. Thanks.



package com.example.gltest;

import javax.microedition.khronos.egl.EGLConfig;
import javax.microedition.khronos.opengles.GL10;

import android.app.Activity;
import android.opengl.GLES20;
import android.opengl.GLSurfaceView;
import android.os.Bundle;
import android.os.SystemClock;
import android.util.Log;
import android.view.View;

public class MainActivity extends Activity implements GLSurfaceView.Renderer {

private static final String TAG = "MainActivity";

private static final int NUM_TEXTURES = 10;

// texture dimensions, fairly big
private static final int WIDTH = 1280;
private static final int HEIGHT = 720;

private int mFramebuffer;
private int[] mTextures = new int[NUM_TEXTURES];

@Override
public void onSurfaceCreated(GL10 gl, EGLConfig config) {
int[] values = new int[1];

GLES20.glGenFramebuffers(1, values, 0);
checkGlError("glGenFramebuffers");

mFramebuffer = values[0];

GLES20.glGenTextures(mTextures.length, mTextures, 0);
checkGlError("glGenTextures");

for (int i = 0; i < mTextures.length; i++) {
GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, mTextures);
checkGlError("glBindTexture");

GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D, 0, GLES20.GL_RGBA,
WIDTH, HEIGHT, 0, GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE,
null);
checkGlError("glTexImage2D");

GLES20.glTexParameterf(GLES20.GL_TEXTURE_2D,
GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST);
GLES20.glTexParameterf(GLES20.GL_TEXTURE_2D,
GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_LINEAR);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D,
GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D,
GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE);
checkGlError("glTexParameter");

GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, 0); // unbind
checkGlError("glBindTexture");
}

Log.i(TAG, "onSurfaceCreated complete");
}

@Override
public void onDrawFrame(GL10 gl) {
GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, mFramebuffer);
checkGlError("glBindFramebuffer");

// do this 5 times
for (int iterations = 0; iterations < 5; ++iterations) {
// glClear each texture
for (int i = 0; i < mTextures.length; ++i) {
GLES20.glFramebufferTexture2D(GLES20.GL_FRAMEBUFFER,
GLES20.GL_COLOR_ATTACHMENT0, GLES20.GL_TEXTURE_2D,
mTextures, 0);
checkGlError("glFramebufferTexture2D");

long start = SystemClock.uptimeMillis();
GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT);
long end = SystemClock.uptimeMillis();
checkGlError("glClear");

Log.i(TAG, "clear[" + i + "]: " + (end - start) + " ms");

GLES20.glFramebufferTexture2D(GLES20.GL_FRAMEBUFFER,
GLES20.GL_COLOR_ATTACHMENT0, GLES20.GL_TEXTURE_2D,
0, 0);
checkGlError("glFramebufferTexture2D");
}
}

GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0); // unbind
checkGlError("glBindFramebuffer");
}

@Override
public void onSurfaceChanged(GL10 gl, int width, int height) {}

private void checkGlError(String op) {
int error;
while ((error = GLES20.glGetError()) != GLES20.GL_NO_ERROR) {
Log.e(TAG, op + ": glError " + error);
throw new RuntimeException(op + ": glError " + error);
}
}

@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);

final GLSurfaceView view = new GLSurfaceView(getApplication());
view.setEGLContextClientVersion(2);
view.setRenderer(this);
view.setRenderMode(GLSurfaceView.RENDERMODE_WHEN_DIRTY);
view.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
view.requestRender();
}
});

setContentView(view);
}
}


Observed output:


onSurfaceCreated complete
clear[0]: 52 ms <-- first glClear() of texture 0 is slow
clear[1]: 97 ms <-- first glClear() of texture 1 is slow
clear[2]: 70 ms
clear[3]: 69 ms
clear[4]: 52 ms
clear[5]: 48 ms
clear[6]: 45 ms
clear[7]: 73 ms
clear[8]: 60 ms
clear[9]: 64 ms
clear[0]: 0 ms <-- second glClear() of texture 0 is fast
clear[1]: 0 ms <-- second glClear() of texture 1 is fast
clear[2]: 0 ms
clear[3]: 0 ms
clear[4]: 0 ms
clear[5]: 0 ms
clear[6]: 0 ms
clear[7]: 0 ms
clear[8]: 0 ms
clear[9]: 0 ms

Hi,



For optimal performance, we recommend binding each off-screen texture to its own FBO. Changing attachments will incur a cost.



Additionally, I would suggest deferring the glClear() operations until there are additional draw operations. By calling glClear() by itself, the base-level latency of submitting work to the GPU is incurred, making the clear seem like an expensive operation (in fact, the clear is actually an extremely efficient operation for the TBDR). If you instead call the clear when there are additional draws to execute, the base latency of the GPU render will be hidden.



Regards,

Joe

Thanks for your reply, but I don’t understand how it answers the questions because:


  1. Changing attachments does not seem to incur a large cost because:

    (a) In the output above, the later calls to glClear() only take “0 ms” which is relatively fast, suggesting that changing attachments is relatively fast. It is only the initial calls to glClear() that are very slow.

    (b) I modified the code to use an FBO for each texture (code is below) and there was no change to the timing/output. If changing attachments was a large cost, this should have improved, but it didn’t.


  2. My actual program does not use glClear(), but actually uses glDrawArrays() and a shader (to draw a video frame into the texture). The reason the source code I posted uses glClear() is because it is a simple way to reproduce the problem where the first use of a texture is very slow compared to subsequent usage. I don’t think glClear() in the sample code is a problem because: the base-level latency of submitting work to the GPU does not seem to be large because the later calls to glClear() only take “0 ms” which is relatively fast.



    Thus, while I very much appreciate your time and reply, I don’t understand how it explains what is going on or how to deal with it.



    If your concern is that glClear() is not used properly in the sample, I’m open to suggestions on modifications to better demonstrate the problem. The problem that I’m really trying to deal with is: the first use of textures is very slow compared to subsequent usage.



    Thanks for your time. Below is the updated sample source code that uses an FBO for each texture instead of changing attachments.



package com.example.gltest;

import javax.microedition.khronos.egl.EGLConfig;
import javax.microedition.khronos.opengles.GL10;

import android.app.Activity;
import android.opengl.GLES20;
import android.opengl.GLSurfaceView;
import android.os.Bundle;
import android.os.SystemClock;
import android.util.Log;
import android.view.View;

public class MainActivity extends Activity implements GLSurfaceView.Renderer {

private static final String TAG = "MainActivity";

private static final int NUM_TEXTURES = 10;

// texture dimensions, fairly big
private static final int WIDTH = 1280;
private static final int HEIGHT = 720;

private int[] mFramebuffers = new int[NUM_TEXTURES];
private int[] mTextures = new int[NUM_TEXTURES];

@Override
public void onSurfaceCreated(GL10 gl, EGLConfig config) {
GLES20.glGenFramebuffers(mFramebuffers.length, mFramebuffers, 0);
checkGlError("glGenFramebuffers");

GLES20.glGenTextures(mTextures.length, mTextures, 0);
checkGlError("glGenTextures");

for (int i = 0; i < mTextures.length; i++) {
GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, mTextures);
checkGlError("glBindTexture");

GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D, 0, GLES20.GL_RGBA,
WIDTH, HEIGHT, 0, GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE,
null);
checkGlError("glTexImage2D");

GLES20.glTexParameterf(GLES20.GL_TEXTURE_2D,
GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST);
GLES20.glTexParameterf(GLES20.GL_TEXTURE_2D,
GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_LINEAR);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D,
GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE);
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D,
GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE);
checkGlError("glTexParameter");

GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, mFramebuffers);
checkGlError("glBindFramebuffer");

// Attach texture to FBO
GLES20.glFramebufferTexture2D(GLES20.GL_FRAMEBUFFER,
GLES20.GL_COLOR_ATTACHMENT0, GLES20.GL_TEXTURE_2D,
mTextures, 0);
checkGlError("glFramebufferTexture2D");

GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0); // unbind
checkGlError("glBindFramebuffer");

GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, 0);
checkGlError("glBindTexture");
}

Log.i(TAG, "onSurfaceCreated complete");
}

@Override
public void onDrawFrame(GL10 gl) {
// do this 5 times
for (int iterations = 0; iterations < 5; ++iterations) {
// glClear each texture
for (int i = 0; i < mTextures.length; ++i) {
GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, mFramebuffers);
checkGlError("glBindFramebuffer");

long start = SystemClock.uptimeMillis();
GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT);
long end = SystemClock.uptimeMillis();
checkGlError("glClear");

Log.i(TAG, "clear[" + i + "]: " + (end - start) + " ms");

GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0); // unbind
checkGlError("glBindFramebuffer");
}
}
}

@Override
public void onSurfaceChanged(GL10 gl, int width, int height) {}

private void checkGlError(String op) {
int error;
while ((error = GLES20.glGetError()) != GLES20.GL_NO_ERROR) {
Log.e(TAG, op + ": glError " + error);
throw new RuntimeException(op + ": glError " + error);
}
}

@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);

final GLSurfaceView view = new GLSurfaceView(getApplication());
view.setEGLContextClientVersion(2);
view.setRenderer(this);
view.setRenderMode(GLSurfaceView.RENDERMODE_WHEN_DIRTY);
view.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
view.requestRender();
}
});

setContentView(view);
}
}


Observed output:


onSurfaceCreated complete
83 ms <-- first glClear() of texture 0 is slow
64 ms <-- first glClear() of texture 1 is slow
83 ms
59 ms
65 ms
59 ms
55 ms
54 ms
41 ms
68 ms
0 ms < -- second glClear() of texture 0 is fast
0 ms < -- second glClear() of texture 1 is fast
0 ms
0 ms
0 ms
0 ms
0 ms
0 ms
0 ms
0 ms
0 ms

Hi,



You’re right that my previous suggestions shouldn’t impact the test. An oversight on my part. Both are relevant for optimal performance in complex renders though, as they will reduce driver overhead.



I’ve spoken to the driver team, and they’ve informed me that a backing surface is allocated when an attachment is used for the first time, which would explain the cost you’re seeing. This behaviour is specific to Series5 & Series5XT drivers. The process has been optimised in our Series6 drivers.



Regards,

Joe

Thanks for reaching the driver team and finding out the definitive cause, I appreciate it very much. Is there any work-around? Maybe there is a way to asynchronously do the slow accesses for multiple textures simultaneously?



Presently I just make the user wait up front while the attachments are used for the first time (this could take 2-3 seconds at the beginning of video playback). The user-experience isn’t great, but it is the best I have come up with so far (and it is better than having the slow access during video playback).



Thanks again.

You should be able to use a background GL thread (T#2) to offload the work from your main thread (T#1). We’ve never tried to do this though, so can’t say for certain how much of a difference this would make. As FBOs can’t be shared between threads, you would have to:


  1. T#1: Create a background GL thread (1x1 pbuffer or use EGL_KHR_surfaceless_context if it's available)

  2. T#2: Create an FBO

  3. T#2: Bind a texture to it

  4. T#2: Call glClear() (the allocation will be done at this point, as the driver postpones allocation until the first operation is issued)

  5. T#2: Once the glClear() returns, send a signal to your main thread that the texture is available for use

  6. T#1: Bind the texture (whose backing store has now been allocated) to an FBO

  7. T#1: Reuse the texture attachment without any overhead :)



There may be a more efficient/cleaner solution to your problem though. If you can explain your use case (e.g. why you're allocating the surfaces, what you're rendering to them), I can look into possible alternatives.

Thanks,
Joe

Thanks again for the continued replies. My use case is the following:



My Android app displays the frames of a video file out of order. To do this, there are two phases:



  1. I allocate a bunch of textures up front which are used repeatedly later in phase 2:
  2. Use android.media.MediaCodec to decode some frames and draw them into those textures, then draw the textures to the actual window in a different order than they were decoded. This phase 2 repeats for many seconds or minutes.


This actually works pretty well/fast except for the slow first time the textures are drawn into. Earlier prototypes of my app accessed the textures for the first time in phase 2, but that dropped the frame rate in phase 2 too much (until all the textures were accessed once). So I changed my app to have phase 1 access each texture for the first time, so that phase 2 would always operate at full speed.

That works pretty well, except for the fact that phase 1 takes a while and I just show the user a spinning progress indicator during that time. That is ok, but I am wondering whether there is a reasonable way to improve upon that user experience, or whether I should just live with it.

Thanks.