Freezing inside GLES 2.0 driver

Hi there,





We’ve got a very successful game which is currently top in the top 3 paid apps games on the Android store. Unfortunately the device hard resets from inside the GLES2 driver sporadically only on a Nexus S (SGX 540). There’s nothing we can do to debug this other than logging before and after the glDrawElements/glDrawArrays call that the app calls and never returns from.





Is there any way of working out what has caused this? We get no debug output on crash, the device just resets. We’ve tried logging all pertinent states etc, but it all looks fine.





I’m hoping to get some help without getting pinged between you guys and the phone manufacturer.





Many thanks,





Steve.

Hi Steve,

Have you tried using PVRTrace (https://www.imgtec.com/powervr/insider/pvrtrace.asp)? This can help you identify anything suspicious in your rendering that might be upsetting the graphics driver.

If PVRTrace doesn’t help, you can send us a minimal reproduction binary for this issue (devtech@imgtec.com). Ideally this would be a frame based render and would have a few frames that render correctly followed by a frame where the error occurs. Failing that, we could use a binary of the actual game as long as you can make it clear what steps we would have to take to reproduce the problem. Working with a minimal reproduction will make it much easier for us to isolate the issue though.

We are happy to sign an NDA with your company before receiving any binaries. If you email devtech@imgtec.com requesting an NDA, we should be able to turn this around within a few days.

Thanks,
Joe



Hi Joe,





Thanks, I’ll take a look at PVRTrace. The real killer is that it’s not very reproducable. We have to set the AI playing against itself and it normally freezes on Nexus S within 30 minutes. I can send a special build with this ‘hack’ in of course.





If I can’t see anything amiss, I’ll get in touch soon. We hit #1 today in Android Games and things have gone a bit crazy. I’ll request and NDA anyway, to get things moving.





Thanks for your help,





Steve.

Hi Steve,

Is there anything that changes in the rendering when the error occurs?

If not, it sounds like it could be an accumulative error in the driver, and if

this is the case it should also be reproducible with fewer draw calls.

Would it be possible to try and narrow down the error by making fewer GL draw calls? Even with the tools we have, it can take quite a while to track down errors like this. Anything you can do to help us have a clearer understanding of where the problem is will make it quicker and easier for us to help.

Congrats on making it to the #1 spot :slight_smile:

Joe



Hi Joe,





It’s very erratic. It can happen in 10 minutes or 2 hours. Unfortunately there’s no ‘smoking gun’ scenario that can lead to an obvious repro case. Best way is to just play through levels 1 to 3 (about 15 mins).





Things we’ve tried:


Draw no 3D : Never crashes (ran overnight)


Draw no particles : Crashes


Draw no skinned chars : Crashes


Disable all vertex arrays before/after draw calls (to put them in known state before they get set) : Crashes





Current thinking is that it’s maybe something to do with the interaction between 2D/3D rendering (glDrawArrays with an index/vertex buffer of 0 for 2D, vs VBO’s in 3D). We can’t disable 2D rendering as the display is swapped with the 2D calls too.





This only happens on Nexus S too, unfortunately it’s a very common device.





Re:PVRTrace, I’ve managed to copy all the PVR libs to the device, but no pvrtrace.cfg is produced anywhere on the device after running the app, so I’m stuck on stage 3 of 8.





Regards,





Steve.

Hi Steve,





When you say you disabled 3D, was the 2D rendering still enabled?


Would it be possible to hack the swap into the 3D rendering path so that you can disable the 2D rendering path?





Have you tried forcing all rendering to use the same shader (possibly something simple like a block colour or applying a single texture)?





Can you post the driver version of your Nexus S and any other PowerVR Android devices you might be targeting (you can query this with “adb shell cat /proc/pvr/version”)? If you don’t have any other PowerVR devices, then you can send (email or FTP) a binary over to us so we can run it on a few other devices to see if the problem has been fixed with newer drivers.

The pvrtrace.cfg file will only be created if your application is allowed to write to the / or /sdcard directories on your target device. If the creation is failing, you can create this file manually and push it to the device.

Thanks,
Joe









Hi Joe,





Yeah, the 2D was still rendering the HUD/2D items.


I’ve not tried using the same shader for all yet, I’ll set them to a “justcolour” shader and see what happens.





The Nexus S that messes up is:


Version 1.6.16.4131.1.1 (release) smdkc110_android


System Version String SGX540 S5PC110





Samsung Galaxy (works fine)


Version 1.5.15.3152 (release,Sep 1 2010 21:34:13) smdkc110_android


System Version String: SGX540 S5PC110





Archos Tablet (Works fine)


Version 1.6.16.4061


System Version String: SGX revision = 1.2.5





I’ll have a play with the pvrtrace.cfg file.





Steve.








stevehaggerty2011-12-14 15:22:54

Hi Joe,





I’m back on this now.





After rooting my phone, remounting partitions as writeable, chmod’ing and copying config files/PVRtrace libs around, I finally got the program to produce a trace file - Only for the PVR Trace tool to display a dialog box saying “Error reading file” when I try to load it :frowning:





For about 2 seconds worth of run it’s producing a 24meg file.





Any ideas / strategies we can take a look at? I don’t want to have to mark this device as not compatible unless I exhaust all avenues of attack.





Many thanks,





Steve.





PS:I think the previous post MAY be spam. :wink:

Hi Steve,

Sorry to hear the Trace tool crashed on you. Did you download the version from the PVRTrace webpage I mentioned above, or did you use the version in the SDK package? The SDK version is out of date and has a number of known bugs that have been fixed in the separate PVRTrace package.

You can choose in the pvrtrace.cfg file whether it will record rendering data or not (i.e. if it will capture textures, vertices etc). Setting this value to 0 will cause the recording to only contain API calls. You should try setting it to 0 to see if it helps you get around the GUI tool crash.

Did you try using a simple colour shader for the whole scene? Can you also try rendering the scene to an offscreen buffer at a lower resolution (e.g. half or quarter of the device’s resolution) and see if the error is reproducible?

Are there any textures or VBOs that are updated regularly? Due to the way that the render is deferred in a TBDR, uploading new textures via TexImage2D etc or changing the contents of VBOs while rendering means that the drivers have to, for example, store the previous VBO in memory so that the current frame referencing this VBO can be processed and also allocate memory for the updated VBO so this data can be used by the next frame.



Hi Joe,





Thanks, I’ll try the slimline GL capture. I was using the link you sent for PVRTrace.





We’re certainly modifying VBO contents, in a fairly controlled manner. The rain/snow is done by modifying a VBO. If I change the number of particles from 1000 to 8000 the freeze occurs in a matter of seconds. I tried double buffering the VBO’s (modifying one, rendering the other), I even tried calling glDrawArrays on memory and avoiding particle VBO’s altogether, to no avail.





Steve.

Let me know if the GL trace works. Feel free to send over any trace you record so I can have a closer look if you are still having problems.

Ah, ok. Sounds like that may be the problem. If possible, can you avoid updating the VBOs at all? For example, store

all rain particles as one large object and do a single matrix

transformation on it, or even split it into a few objects that are

translated differently? This is likely to be most efficient approach for

rendering on the GPU, and should allow you to workaround the problem.

If that’s not an option, can you ensure your VBOs that need updating contain the smallest amount of data possible, i.e. only vertex positions? Any other data (e.g. texcoords) should be stored in a static VBOs, as the driver has to duplicate memory for all of a VBO when it’s being updated. If this doesn’t help, then triple buffering the VBOs or avoiding VBOs completely for the dynamic data might work.

Joe



Hi Joe,





I’m still getting no joy with PVTTrace. I’ve turned the texture/buffer collection off and I’m still getting the tool barfing on my capture file. I’ve even tried the Linux version, with the same results.





I’ll forward you a capture file and see if you guys can make head/tail of it.





BTW:I removed all our dynamic VB code and the driver still hangs.





Thanks for your help,





Steve.


Hello Steve,





Did you happen to find the problem to the random reboots on Nexus S ?





Thanks


This sounds very like that old “favourite” -  “SharedBufferStack waitForCondition”. It causes phone to hang randomly and then eventually reboot.

http://code.google.com/p/android/issues/detail?id=20833

Hi all,



Did you manage to fix the problem?



On my galaxy nexus I do have the same issue. The app makes a call to glDrawArray and never comes out. It seems to me the problem arrives when using FbO.



Thanks for your help



Br



Sebastien

Heh, 3 year delay, but no, there was no solution. Blocking the device on Google Play was the only answer.