A couple of days ago I started porting some OpenGL code over to the iPhone and encountered some rather interesting (but not yet frustrating performance issue.
I’m rendering 3 parallel cylinders (64 triangles each) along the Z axis (into the screen). When they’re far away, it’s all cool - steady 60 fps, no worries. But when they come nearer and start covering a large chunk of the screen the framerate drops to 30 and below.
I thought I must’ve hit the fillrate limit, but when I added some quads covering half the screen there was no considerable impact on the framerate. When I made the cylinders shorter on the far side the framerate went up a lot compared to the freed up screen space, which was maybe a couple thousands pixels at most. So it couldn’t have been the rasterizer causing the bottleneck.
Then I decided to reduce the vertex count to 8 polys per cylinder and… here we go, steady 60 fps regardless of distance. The objects covered approximately the same screen space, so definately no fillrate bottleneck.
This made no sense at all until I read somewhere about the tile based PowerVR MBX renderer. What I’m thinking is, when the GPU draws a tile it needs to transform all polygons covering that tile, which wouldn’t be a problem if the triangles were small. But when a polygon covers a large number of tiles, it needs to be individually transformed for each tile. In my case, the tiles covering the parts of the cylinders far away would have caused the renderer to process all 64 polygons over and over again, leading to the vertex stage bottleneck.
Are there any other plausible explanations? Any comments appreciated!
Vertices are only transformed once, however there is a small cost for each tile that is covered by a triangle since the renderer has to keep track of the triangles for each tile. It may sound counter-intuitive, but could you try subdividing the cylinders along their length so you get smaller triangles?