SGX Hardware Recovery triggered

Hi,



I am using OMAP5 evm based platform with linux running on it. While running one of the application I get following SGX lock up detection messages. Is it always necessary to fix such error even the platform recovered successfully? OR Is it fine in case it recovers always?





[13262.587249] PVR_K: HWRecoveryResetSGX: SGX Hardware Recovery triggered

[13262.594451] PVR_K: SGX debug (SGX_DDK_Linux_CustomerTI sgxddk 19 1.9@2166536)

[13262.603027] PVR_K: (P0) EUR_CR_CORE_ID: 01191201

[13262.608856] PVR_K: (P0) EUR_CR_CORE_REVISION: 00010106

[13262.614562] PVR_K: (P0) EUR_CR_EVENT_STATUS: 243C2780

[13262.620727] PVR_K: (P0) EUR_CR_EVENT_STATUS2: 000000A0

[13262.626525] PVR_K: (P0) EUR_CR_BIF_CTRL: 00000000

[13262.632415] PVR_K: (P0) EUR_CR_BIF_BANK0: 00000007

[13262.638183] PVR_K: (P0) EUR_CR_BIF_INT_STAT: 00080000

[13262.644104] PVR_K: (P0) EUR_CR_BIF_FAULT: 00000000

[13262.649841] PVR_K: (P0) EUR_CR_BIF_MEM_REQ_STAT: 00000000

[13262.655761] PVR_K: (P0) EUR_CR_CLKGATECTL: 002AA6AA

[13262.661529] PVR_K: (P1) EUR_CR_EVENT_STATUS: 043C2780

[13262.667419] PVR_K: (P1) EUR_CR_EVENT_STATUS2: 000000A8

[13262.673248] PVR_K: (P1) EUR_CR_BIF_CTRL: 00000000

[13262.679199] PVR_K: (P1) EUR_CR_BIF_BANK0: 00000007

[13262.684936] PVR_K: (P1) EUR_CR_BIF_INT_STAT: 00080000

[13262.690856] PVR_K: (P1) EUR_CR_BIF_FAULT: 00000000

[13262.696624] PVR_K: (P1) EUR_CR_BIF_MEM_REQ_STAT: 00000000

[13262.702514] PVR_K: (P1) EUR_CR_CLKGATECTL: 002AA6AA

[13262.708282] PVR_K: Checking EDM memory context (index = 7, PD = 0xaf786000)

[13262.715820] PVR_K: Found MMU context for page fault 0x00000000

[13262.722137] PVR_K: GPU memory context is for PID=646 (insmod)

[13262.728363] PVR_K: No PDE found

[13262.731719] PVR_K: Checking TA memory context (index = 0, PD = 0x9c880000)

[13262.739196] PVR_K: Found MMU context for page fault 0x00000000

[13262.745483] PVR_K: GPU memory context is for PID=1191 (viewmanager_Map)

[13262.752685] PVR_K: No PDE found

[13262.756072] PVR_K: Checking 3D memory context (index = 0, PD = 0x9c880000)

[13262.769134] PVR_K: Found MMU context for page fault 0x00000000

[13262.776672] PVR_K: GPU memory context is for PID=1191 (viewmanager_Map)

[13262.783752] PVR_K: No PDE found

[13262.787231] PVR_K: Checking PTLA memory context (index = 0, PD = 0x9c880000)

[13262.794738] PVR_K: Found MMU context for page fault 0x00000000

[13262.801055] PVR_K: GPU memory context is for PID=1191 (viewmanager_Map)

[13262.808044] PVR_K: No PDE found

[13262.811553] PVR_K: SGX Host control:

[13262.815307] PVR_K: (HC-0) 0x00000001 0x00000000 0x00000000 0x00000004

[13262.822784] PVR_K: (HC-10) 0x00000000 0x0000000A 0x00068010 0x00000003

[13262.829864] PVR_K: (HC-20) 0x00000001 0x00000001 0x00000000 0x000036F7

[13262.836975] PVR_K: (HC-30) 0x000D4766 0x0493FD94 0x00000000 0x00000000

[13262.844024] PVR_K: (HC-40) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.851104] PVR_K: (HC-50) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.858306] PVR_K: (HC-60) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.865417] PVR_K: (HC-70) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.872406] PVR_K: (HC-80) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.879486] PVR_K: SGX TA/3D control:

[13262.883422] PVR_K: (T3C-0) 0xF4003000 0xF4003120 0xF4002000 0xF4129800

[13262.890533] PVR_K: (T3C-10) 0x00000000 0x00000000 0x00000000 0xF4002980

[13262.897705] PVR_K: (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.904876] PVR_K: (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.912078] PVR_K: (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000002

[13262.919250] PVR_K: (T3C-50) 0x00000000 0x00000000 0x00000001 0x00003291

[13262.926361] PVR_K: (T3C-60) 0x000034A0 0xF41DF810 0xF41DF7B8 0xF4000000

[13262.933563] PVR_K: (T3C-70) 0xAF786000 0xF4004000 0xF41A8C00 0xF41DF810

[13262.940673] PVR_K: (T3C-80) 0xF4121F60 0xF41AC6A0 0xF41DF7B8 0xF4122440

[13262.948028] PVR_K: (T3C-90) 0x91FF34FF 0x90FF34FF 0x00000000 0x00000000

[13262.955108] PVR_K: (T3C-A0) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.965087] PVR_K: (T3C-B0) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.972320] PVR_K: (T3C-C0) 0x00000000 0x00000000 0x00000000 0x00000000

[13262.979583] PVR_K: (T3C-D0) 0x000088CE 0x000088CD 0xF4005000 0xF4010820

[13262.986694] PVR_K: (T3C-E0) 0xF4002020 0xF411FCC0 0xF411FCC0 0x00000000

[13262.993927] PVR_K: (T3C-F0) 0x00000000 0x000004A7 0x000004A7 0x00000000

[13263.001037] PVR_K: (T3C-100) 0x00000001 0x00000001 0xDE676F24 0x00000000

[13263.008392] PVR_K: (T3C-110) 0x55DE2482 0x00000000 0x00000000 0x00000000

[13263.017089] PVR_K: SGX Kernel CCB WO:0xF7 RO:0xF7



Thanks & Regards,

Viks

Hi Viks,



The platform should recover, but it’s best to avoid these issues if you can as they will reduce your application’s performance and may introduce instability.



It’s possible that relying on undefined API behaviour is causing the issue. I’d recommend recording your application with PVRTrace and reviewing the output of PVRTraceGUI’s Static Analysis to see if any errors or warnings have been identified. If the application is OK, then you should report the issue to TI to see if you’re hitting a known driver bug.



Regards,

Joe

Thanks Joe for your quick answer.



Thanks & Regards,

Vikash

Hi Joe,



I have noticed following error while running the application and at the time of lock up. Does this signifies any known issue or gives any hint to resolve lock up?



PVR:(Error): Render Timeout! LastFrame: 565 [1657, /sgxkick_client.c]

PVR:(Warning): PB Watermark Info - Alloc: 0x22a , Free: 0x5f8 [486, /sgxrender_targets.c]

PVR:(Warning): PB Watermark Info - Alloc: 0x22a , Free: 0x5f8 [486, /sgxrender_targets.c]



Regards,

Vikash

Hi Viks,



It sounds like you’re filling the Parameter Buffer and are subsequently hitting a driver bug. You should ask TI if there’s a newer graphics driver available for the platform in case this resolves the issue. However, for optimal performance you should always try to avoid filling the Parameter Buffer.



In the latest version of PVRTune, there is an SPM counter (part of counter group #2) that identifies when Parameter Buffer overflow events have occurred. To use this, I would recommend dropping the counter onto the Graph View and using the Counter Properties dialog to change the Y axis value to a small value (e.g. 4) to make it easier to see when the value of the graphed counter changes.



Regards,

Joe

Hi Joe,



Previously I was using SGX Linux DDK based on 1.9@2166536 from TI. Asking about the latest DDK I got 1.9@2253347 from TI and with this also I am seeing SGX lock up after running the Navigation application.

Does this latest version looks sufficient for this issue?



Currently I am in process to get the PVRTrace for this lock up and analyse it using PVRTrace GUI as you suggested.in your previous reply.



Regards,

Vikash

Hi Viks,



Looking at the DDK commit messages, there was a fix in 2257028 that modified the Parameter Buffer resize heuristics. I suspect that the driver’s PB resizing is responsible for the issues you’re seeing. TI should be able to help you to configure the driver to disable PB resize to see if it resolves the issue.



Regards,

Joe

Hi Joe,



I tried to observe the PB overflow with the latest PVRTune (SDK 3.3), however I can’t find SPM counter there. Is this requires latest driver?



Also last micro kernel trace I am seeing are as follows:



[ 241.097167] PVR_K: (MKT-1FB) 000A1C09 01600800 0000F91E AD000184 MKTC_3DLB_END

[ 241.104919] PVR_K: (MKT-1FC) 000A1C5F 01600800 0100FA1D AD000281 MKTC_TALB_FINDTA

[ 241.113220] PVR_K: (MKT-1FD) 000A1D01 01600800 0100FA1D AD000282 MKTC_TALB_END

[ 241.121185] PVR_K: (MKT-1FE) 000EBBB9 01600800 0101041F AD000A01 MKTC_TIMER_POTENTIAL_3D_LOCKUP

[ 241.130493] PVR_K: (MKT-1FF) 000EBC0C 01600800 0101041F AD000A0A MKTC_TIMER_LOCKUP

[ 241.138671] PVR_K: SGX Kernel CCB WO:0xE3 RO:0xE3



Thanks for this valuable information. I will contact TI to see if I get any help in this case.



Best Regards,

Vikash

Hi Vikash,



Sorry for the massive delay here. I’m doing a clean up of the forum and spotted this discussion was unresolved.



I’d forgotten to mention before that the latest version of PVRTune includes a Search widget. From here, you can search for a string in all counter and timing data that has been captured. If an SPM event has occurred, a search for the term will list all SPM events in the recording.



Regards,

Joe