GPU Load and Utilization

Hi PowerVR Team,

We are working on an embedded platform with Imagination’s GE7800 GPU and dual ARM Cortex A57 CPU. The ARM core runs linux OS. We are using PVRTuneDeveloper in order to profile the GPU. We are trying to obtain the GPU load or utilization for a specified use case. In order to do that, we are looking at the following:

  1. Renderer Active and Tiler Active percentages
  2. Render Time and Tiler Time

Are there any other counters that we can consider? Also, in our profiling we can see that a number of the tiler and renderer tasks are overlapped. However the above mentioned counters do not take overlap into account. Given this scenario, what is the best way to obtain a GPU load or utilization value?

Thanks and regards,
Subramanya.

Hi Subbu,

For your first question:

  • Renderer Active and Tiler Active percentages show whether there are any Renderer or Tiler tasks in the GPU, being it a binary GPU counter. This means, if a small workload is sent to the GPU, those counters will show 100% which does not show realistically how busy the GPU is.
  • I would recommend to follow the counters below:
    Processing load: pixel: % of time the Shader Processor is busy shading fragments.
    Processing load: vertex: % of time the Shader Processor is busy shading vertices.
    Processing load: compute: % of time the Shader Processor is busy processing compute kernels (in case your application has compute).
    I would also recommend watching the counter GPU memory interface load, to know how busy the GPU is in terms of bandwidth reads and writes from the system.

For your second question:

  • GPU counters are linked to specific parts of the GPU. In the case of Tiler and Renderer tasks being run at the same time, those counters will overlap, as they refer to different parts of the GPU.
  • There is not a specific GPU counter that can give a global utilization value. You will need to interpret those counters. For example if you have an application with no compute workloads, you could consider adding the counters Processing load: pixel + Processing load: vertex and divide by 2 to have rough idea that can express how busy the GPU is. In an ideal application with no bottlenecks which manages to utilize each part of the GPU to the maximum and is balanced, this number would be close to 100%. Another case could be to take maximum of both, max(Processing load: pixel, Processing load: vertex) which gives a more realistic value on how much one of the parts of the GPU is being utilized.

Best regards,
Alejandro