Hi guys,
= Short version =
The following code leaks memory on PowerVR GPU and leads to crash after 17 iterations on Nexus S and 22 iterations on Droid Razr (Android OS, mostly Java API):
while (true) {
int textureId;
glGenTextures(1, &textureId);
log("textureId: ", textureId);
glBindTexture(GL_TEXTURE_2D, textureId);
glTexImage2D(GL_TEXTURE_2D, 0,GL_RGB, 2048, 2048, 0, GL_RGB, GL_UNSIGNED_SHORT_5_6_5, 0);
drawSomethingUsingTheTexture(textureId);
glDeleteTextures(1, &textureId);
}
Non-PowerVR GPU output:
textureId: 1
textureId: 1
textureId: 1
textureId: 1
…
textureId: 1
(keeps going like this)
PowerVR GPU output:
textureId: 70001
textureId: 140002
textureId: 210003
…
textureId: 1190017
glDrawArrays(GL_TRIANGLES, 0, 3) failed: 0x0505 (GL_OUT_OF_MEMORY)
= Long version =
I have 2 threads, let’s call them UI thread and Upload thread. UI thread does the rendering, should be fast and should never block, and we don’t care about Upload thread performance.
The way I do it is pretty straightforward - eglCreateContext() in UI thread, pass created context to Upload thread and create compatible shared context in it. In Upload threa, instead of eglCreateWindowSurface I use eglCreatePbufferSurface (1x1 pixel in size). Everything is great, eglMakeCurrent is happy (no bad match or anything).
Upload thread then enters a loop in which it sits and wait for texture creation commands, and then passes created texture ids (with all data uploaded to them) to UI thread, and UI thread is happy because it’s running fast and doesn’t block. On all devices but those that run PowerVR GPU. Those devices aren’t happy, they leak memory and crash
In an attempt to reduce the test case, I got rid of the UI thread, there is only one main Activity thread which sits in a loop and creates textures with null data until it crashes (the loop is shown in the Short Version section).
Interestingly, doing so in a GLSurfaceView seems to work just fine, but the only difference between it and my code is that I’m using pbuffer instead of window surface, and it’s probably not the reason. Or is it?
Not calling drawSomething(…) results in texture not being uploaded until first draw call which results in a block in UI thread. Also PowerVR docs explicitly say that I should do that. The method consists of glDrawArrays(GL_TRIANGLES) with a simple vertex/fragment shader pair attached to it.
I’m probably doing something wrong, like expecting hardware to behave in a way it does guaranty to behave. Any tips are appreciated!
I would also appreciate any tips regarding general GPU hardware behavior, regardless if it’s related to this issue - I’m very much interested in various pitfalls and tricks that would help me understand hardware better (and PowerVR GPUs are proving to be the most difficult to develop for so far, with lots and lots of confusing or annoying behavior).
Thanks a lot in advance!
One thing I found is that replacing pbuffer with a windowsurface and adding eglSwapBuffers() helps, and there is no leak after that. It doesn’t work without eglSwapBuffers however, leading me to believe that it’s rather a driver bug, at least I see no reason why resource usage should be tied to eglSwapBuffers() (as opposed to e.g. glFlush(), calling which doesn’t help btw).
As this is designed to be executed in (bakcground) Upload thread, I don’t have a windowsurface to draw to anyway, so this is not a solution at all. Ideally I would like not to render anything at all in this thread, but then I don’t see how else i can make sure that the created texture is fully constructed and won’t block UI thread upon first draw. I tried inserting fences, but they all return completion and the UI thread still blocks (for up to 20 frames, depending on texture size).
Any tips are appreciated, thanks!
Hi,
It sounds like you’re hitting two problems:
Single thread case
For other GPUs, the render may be kicked as soon as drawSomething() is called. Our driver will postpone work until
the frame needs to be rendered, for example when eglSwapBuffers() is called. In your single threaded case, it seems that the driver on your device is allocating more and more memory, but never kicking the render.
Multi-threaded case
Do you mean that you are calling drawSomething() in your Upload thread? Similarly to the case above, the driver will not render anything until it needs to. The best way to solve this is to call drawSomething() (with the texture bound) in your UI thread to force the texture upload. This should be better for performance as the texture data upload to the driver will be done in a separate thread.
For more information about the architecture, you should refer to our “PowerVR Performance Recommendations” and “SGX Architecture Guide for Developers” documents in the SDK and online here
Regards,
Joe
The best way to solve this is to call drawSomething() (with the texture bound) in your UI thread to force the texture upload. This should be better for performance as the texture data upload to the driver will be done in a separate thread.
I fail to connect the two points. Isn't the way I do it uploading texture data in a separate thread, too?
I call glDraw* in my Upload thread because I noticed that this is the only way to force texture upload on some GPUs (I'm not 100% sure right now but I believe this includes some PowerVR GPUs too), and calling "drawSomething() (with the texture bound) in your UI thread to force the texture upload" as you suggest results in significant stalls (up to 20 frames for large textures) which is simply unacceptable. I can try special-casing some GPUs and not making a draw call for them, but honestly I don't see a big reason for doing so.
In summary:
- every texture upload results in a significant stall without glDraw* in Upload thread
- everything is butter smooth with glDraw* in Upload thread, but leaks memory like crazy
it seems that the driver on your device is allocating more and more memory, but never kicking the render.What's interesting is that it forces texture upload though! UI thread is showing very smooth animation with the uploaded textures (but crashes after 15-18 uploads, exhausting all the memory).
What I don't understand is why it's not kicking in even after I call glFlush()/glFinish(). Am I understanding OpenGL docs correctly in that glFlush() should flush all the operations, and all of them MUST be completed by the time glFinish() returns? Perhaps they do complete (because my textures are fully uploaded, and there are no stalls or anything in UI thread), but somehow still reference my textures?
It does however work if I create a eglSyncKHR (with or without EGL_SYNC_FLUSH_COMMANDS_BIT_KHR) effectively fixing the leak and solving the problem *happyface*
For more information about the architecture, you should refer to our "PowerVR Performance Recommendations" and "SGX Architecture Guide for Developers" documents in the SDK and online hereRead them both, that's exactly where drawing in Upload thread idea is coming from (5.6.1 Texture Warm-Up).
Best regards,
Denis
The problem is that draw calls in your Upload thread will only be executed by the driver if the target surface is being updated. As you’re not issuing a eglSwapBuffers() in your Upload thread, the driver is never executing this work.
Texture uploads are done in two stages:
- Copy from the address provided by the application to a buffer in the driver’s memory. This is a blocking operation
- The driver then copies its buffer into the GPU’s memory. This copy is done asynchronously by the GPU i.e. does not block your UI thread
Stage 1. is done in your Upload thread when glTexImage2D() is called. As this is a blocking operation, moving it into a second thread makes sense.
Stage 2. must be done by issuing a draw call on a thread that renders, otherwise the driver will optimize out the work. If you issue the draw on our UI thread, it will have a minimal cost as it will occur asynchronously.
The memory leak does sound like a driver bug. I’m curious if the approach above would allow you to avoid this problem though.
Some driver configurations don’t immediately kick work when glFlush or glFinish are called. This allows the driver to kick the work when is best for performance instead of when the application has chosen to do so. Our What’s the best way to micro-benchmark my render on PowerVR hardware? FAQ item touches on this.
That’s interesting that eglSyncKHR helps. Glad you found a workaround!
"If you issue the draw on our UI thread, it will have a minimal cost as it will occur asynchronously."
I obviously tries that first, but I’m getting significant stalls (up to 20 frames on older hardware) which is simply unacceptable (e.g. it can happen during a scrolling in Photo Gallery).
I also managed to speed things up a bit by using Gralloc (I wish it was better documented as accessible without dlopen/dlsym hacks though), so I’m mostly satisfied now.