For now, regarding the early termination of compute shaders, the idea is that I am trying to do as much (useless) computation in each thread, with few groups, so as to get an empirical estimation of raw peak Gflops.
Put it simply, I am multiplying mat4 variables in a for loop with a big number of iterations. It works well, up to some number of iterations, but the overall computation time is still less than a second, and because I already have done some much longer matrix multiplication on the CPU side, I’m not convinced that the watchdog is relevant (because my process have done much more compute intensive task and still completed it on the CPU side).
What I am guessing, is that there is some absolute number of cycles that can be done in a single thread, which is defined by the driver, and I would like to be aware of this value, or where to obtain this value.
With the help of the
debug_dump binary, I do notice that the programs giving me wrong results (higher estimation, due to early termination and thus wrong gflops estimation) appear as:
Recovery 1: PID = 1358, frame = 0, HWRTData = 0x00000000, EventStatus = 0x00000400, Guilty Overrun
CRTimer = 0x00003873E1BA, OSTimer = 332.622666312, CyclesElapsed = 1003083264
PreResetTimeInCycles = 51456, HWResetTimeInCycles = 28160, TotalRecoveryTimeInCycles = 79616
Here is the full debug_dump log:
debug_dump_log.txt (11.9 KB)
So there, I launched my program 2 times, one time with a reasonable amount of computation with PID 1357 which does not appear in this debug log, and an other one with a much higher number of iterations with PID 1358, and this
Guilty Overrun thing makes me think the driver is the one responsible for the early termination of the compute shaders, this is what i’d like to get more information about.
To be more clear, I think that it makes sense that the gpu program cannot stall the GPU and thus have a limitation on its execution time/total cycles, it’s just that I want to be fully aware of this limitation.