I run a blog on netbook gaming. Since I own an Intel Cedar Trail netbook, I have the SGX545-based GMA 3600. I’ve been trying to figure out how my GPU compares to desktop PC GPU’s like Nvidia’s 7000 series. Here’s what I’ve come up with from looking at Wikipedia, ImgTec’s press release, and Intel’s datasheet:
The GMA 3600 has a core config of 4:2:4 (unified shaders, texture mapping units, and render output units). The shaders and TMU’s are clocked at 400 MHz, while the ROP’s are clocked at 200 MHz. This gives a pixel/texture fillrate of 800 million/s and 6.4 GFLOPS of shader performance. That pegs it a little above a GeForce 6200 LE or a Radeon 9200 SE. However, ImgTec cites a fillrate of 1 gigapixel per second, which I can only guess comes from the efficient TBDR. In that case, I get 2 GP/s, the same as a Radeon 9700. I’m not sure which is the more accurate figure, so I hope someone out there has an answer. Intel’s datasheet claims one vertex can be processed every 8 cycles, with a peak vertex/triangle ratio of 0.5. Since the SGX545 can process 80 million triangles at 400 MHz, there should be a maximum of 160 million vertices/s when no pixel shaders are being used, about as fast as a GeForce 7150 or a Radeon 9600. I haven’t been able to find any equivalence to dedicated pixel shaders (i.e. not unified), so any help there would be greatly appreciated. For now, I’ll assume that a GMA 3600 shader is equivalent to one pixel shader if no vertex shaders are running, which would make the GMA 3600 equal to a GeForce 6500 or Radeon 9600 Pro in pixel shading.
I hope at least a small part of that made sense and that you weren’t bored to death by reading it. So now for the real question: Is any of this correct? I know PowerVR GPU’s shouldn’t be directly compared to traditional GPU’s, but all I can really find is mobile-mobile GPU comparisons.
Thanks in advance!
SGX545 at 400 MHz has the following peak rates:
800 Mpixels/sec textured fillrate (2 textured pixels per clock)
6.4 Gflops (4 flops per clock per USSE, 4 USSEs total)
80 M transformed triangles/sec (5 clocks per triangle)
So the ROPs (SGX doesn’t really have the same implementation of the back end of the traditional GPU pipeline as the GPUs you’re trying to compare it to, but let’s call it a ROP for the sake of comparison) run at core clock, not half as you’ve speculated.
We’ve long marketed SGX using a 2.5x multiplier for fillrate, which takes our TBDR architecture into account and assumes an average per-pixel opaque overdraw of 2.5x, which gives you 2 Gpixels/sec, but that’s under the aforementioned assumptions which need a certain workload to become accurate.
As you’ve found out, it’s very difficult to compare those on-paper peaks to those of a traditional PC-based IMR like a GeForce or a Radeon, because the architectures and available resources for the GPUs are so completely different.
I wouldn’t like to guess at which PC GPU is closest in the real world to a 400 MHz GMA3600; only benchmarking can tell you that, doing it from on-paper stats is fraught with assumption and architectural understanding.
I hope that helps!
Thank you Rys, that helps a lot! I still have a few more questions, though. In Intel’s datasheet, a “vertex rate” of 1 transformed triangle/8 clocks is given. Where did this number come from? Also, a 400 MHz SGX545 (in this case, one unchanged from ImgTec’s design) takes 5 clocks per triangle and cranks out 80 M of them, as you said. That logically means that 1 unit performs those triangle transformations. Is that 1 unit one of the USSE’s or a dedicated transformation engine akin to a pre-DirectX 10 vertex shader?
On the subject of ROP’s and clock speeds, Intel’s datasheet cites a fillrate of 4 pixels/clock, which is where my presumption of 4 ROP’s came in. The datasheet also specifies a “render clock” of 200 MHz regardless of the 400 MHz core clock of the GMA 3600 or the 640 MHz clock of the GMA 3650. I could only guess that that render clock applied to those 4 pixels/clock. Is there some other core component clocked at 200 MHz?
I never fully understand what Intel meant by “vertex/triangle ratio” in the datasheet. It claims an average rate of 1 vtx/tri and a peak rate of 0.5 vtx/tri. What does that mean? I would think the peak value would be higher than the average.
Thanks again for your response, Rys!