Hi all,
I’m doing a sprite based 2d and whilst in the simulator performance is fine on the actual device the frame rate has halved.
Currently I turn of GL_DEPTH_TEST and simply draw my layered sprites back to front.
Can anyone with greater knowledge than me confirm that by allowing GL_DEPTH_TEST I will increase my frame-rate because only the pixels nearest actually get rendered rather than my current method where everything is drawn?
If this is correct must I add a ‘z’ component to my sprites and alter my vertex draw to accommodate this?; the ‘z’ simply determines the depth for the z buffer - has no spatial purpose as this is 2d.
Another area of confusion is where using compressed PVR textures actually can boost render speed. Obviously a file size saving can be made; but I read conflicting support on whether render speed is increased. One argument stated that an ARGB8888 texture converted to ARGB4444 was ‘just as fast’ as the compressed version and without the quality degradation.
I’d like some concrete proof of this! ;-))))
Finally is there a proper GUI version of the pvr conversion tool for the mac?; I can’t find one.
Any comments/suggestions/even help appreciated
Cheers
When you say you draw your objects from back to front, do you mean that every object has the same Z coordinate (e.g. 0.0), and you’re just rendering your scene so objects nearer the camera is drawn last?
POWERVR Tile Based Deferred Rendering (TBDR) hardware is particularly good when drawing opaque objects because the hardware performs efficient Hidden Surface Removal (HSR), which can massively reduce overdraw, regardless of object submission order. Because of this, you should find that the use of depth buffers for your 2D game will make little or no difference to performance in this case.
This is an suboptimal approach for other architectures, because using a depth buffer will reduce overdraw (the number of fragments that are overwritten in the frame buffer), as you will only be processing fragments that are visible.
A Z value does have a use in 2D games. Although you can consider the game 2D, having objects behind one another essentially means you can render in 3D, but with screen aligned polygons. Rendering in this way allows you to take advantage of the performance boost a depth buffer can provide. To do this, you should give you objects a Z coordinate value based on how far they should be from the camera.
Storage space of compressed textures for deployment (such as PVR textures) is a nice bonus, but isn’t the reason the textures should be as small as possible. Smaller textures mean less data has to be sent to the graphics core when textures are accessed, which means memory bandwidth is lower. As memory bandwidth is a regular bottleneck in any 3D graphics hardware, the ability to compress texture data into the smallest space possible can reduce this bottleneck significantly.
Yes, the current SDK release (2.7) has a version of PVRTexToolGUI for Mac. You can find it in the Utilities directory of the SDKJoe2011-01-13 17:08:16
Hi Joe,
You confirm what I thought then.
Basically allow my game is ‘flat’ (2D) sprites appear infront/behind each other and therefore currently everything is drawn. Whilst with a z buffer enabled only the visible bits are drawn (ignoring alpha transparency for the moment).
Hmmmmm, I downloaded the mac pvrtextoolgui and it wasn’t a dmg or anything; not sure what (how!) to run it.
Any guidance appreciated.
Cheers
On rereading your first post, I noticed you were comparing the performance of the iPhone Simulator and the iPhone 3G device. As the Simulator is not using the same underlying graphics hardware, you will never get true performance data when compared to the actual device. This will be the reason you have seen such a big difference in FPS in your application.
I’ve edited my first comment to correct it. For your 2D sprite game, performance should be equally good on the 3G with and without a depth buffer because of the way HSR works on POWERVR hardware. On other architectures, using a depth buffer can make a huge difference.
Alpha transparency on the other hand can affect performance. As our Application Development Recommendations document (found in the SDK) highlights, you should render all opaque geometry first, then alpha tested objects (although, these should be avoided as much as possible. The reasons for this are discussed in the same document) and then finish by rendering alpha blended objects.
All of our Mac utilities use X11. There are instructions on setting up the SDK for your Mac in our SDK documentation
Sorry, I am confused!
Are you saying the use of a z buffer on the iphone in this instance will have little/no effect!!
Erm!, that’s worrying!!.
My vertex batching/ texture atlasing etc. is optimal so I assumed I was hitting fill rate.
:-(((
Mark
Actually I am really really confused!
If the graphics chip is being optimal anyway and only rendering what isn’t occluded then I don’t understand why my frame rate drops so drastically - I am certainly not CPU bound of that I am certain.
I am filling the whole screen, so simply changing from 1 texture to another to draw an image that only covers say 95% should result in little alteration. i.e. only 5% of it is visible and therefore the draw is minimal so it’s just the texture swap overhead which would also be very small.
I guess I am missing something here!!!
Cheers
Yes, the use of a depth buffer in your case should have no impact, as hidden surface removal is performed regardless to ensure the only fragments processed are those that will affect the final rendered frame.
Can you provide a little more information about what you are trying to render, and where you are seeing a problem?
You shouldn’t compare the performance of the Simulator against the device - they are different platforms, so will not give the same result.
In the test you’ve mentioned. are you only drawing opaque geometry? Are you filling the screen with a single polygon, or something more complex? When you say you’ve swapped between textures, are their bits per pixel different, are you using mip-mapping, are their texture filtering modes different?
It sounds like your hitting a texture memory bandwidth bottleneck instead of the fill rate bottleneck you thought you were hitting. Our performance recommendations document provides an explanation of how you should compress, filter and create mip-map levels for textures to get the most out of the available memory bandwidth.
Can you provide one or more screen shots of your test/s so I have a better idea of what you’re trying to achieve?
Joe2011-01-14 09:59:23
Thanks Joe,
What you say makes sense; and yes I now think it’s a texture bandwidth issue.
Currently I draw a scrolling landscape (2d) made up of quads (2 tri’s on iphone) of 32x32 pixels each.
The texture for the landscape is 1 texture so each ‘tile’ maps to uv’s within that 1 texture.
I have some area’s where there are no tiles i.e. the background shows through
Using the same technique I wish to place another scrolling tile layer underneath i.e. parallax scrolling effect.
Perhaps maybe even a few semi-transparent layers.
As the ‘holes’ in the foremost landscape are quite limited it is occulding most of the parallax layers and as you say with our without a z buffer the parallax layers require very little visible draw.
Currently all textures are the same bit depth, no-mipmaps but yes filering (blending and alpha) changes for the parallax. No point having an opaque layer ontop another opaque layer :-))))
You’ve been real helpful so any further pointers are certainly appreciated.
Cheers
Ahhh, ok. I’m starting to get a better idea of where you’re seeing problems now
You can only benefit from hidden surface removal when using opaque layers - you cannot determine the exact visibility of alpha blended surfaces during HSR, because a given fragment could be fully opaque, blended, or completely transparent. This means for every blended layer, you are introducing overdraw (every blended fragments colour has to be calculated, then mixed with the current contents of the colour buffer). Although the POWERVR can handle blending very efficiently, as all Read-Modify-Write operations on the colour buffer are performed in on-chip memory, you should break objects down into opaque areas and small regions that need to be blended to get the biggest benefit from the hardware’s HSR process.
I’ve already given a number of performance recommendations related to this in another topic recently, which is available through this link
I see, so having a full screen tiled parallax alpha blended layer is going to cause me big big problems!! :-))
Presumably if I make the texture small would help; but eventually I will hit fill rate due to over draw?
Will check out your link.
Cheers
Hi again,
I’ve now got the mac pvrtextool and although I have not implemented pvr textures I thought I would simply reduce an ARGB8888 to ARGB4444 as this should have some benefit.
I encode and the output file seems ok, but I can’t seem to save it.
i.e. whatever I save (even with a different file name) when I load back in it’s 32bpp again.
What am I doing wrong??
Cheers
You will see some benefit with ARGB4444 (16bpp) vs ARGB8888(32bpp) although PVRTC (2bpp or 4bpp) is significantly better again.
Are you using the command line or the GUI PVRTexTool? If you encode a .pvr file and open it again in PVRTexTool GUI you should be able to see what format it’s in by clicking the green Get Info button in the toolbar. Is this how you’re checking it?
To encode RBGA4444 with the comand line use a line like:
./PVRTexTool -fOGL4444 -itexture.png
Hi,
Yeah I am using the GUI version.
I currently have no PVRTC support; and have realised that a PNG can’t be 16bpp!!
As a test I will need to frig for now a 16bpp load i.e. a PVR; but it might be easier to stick PVRTC in straight away; but I am worried the compression artifacts will show as it’s a 2D app so the textures are obvious as opposed to a 3d app; where slight visual errors through compression are less noticeable.
Cheers
PNG stores image data in its own compression system - it’s not a container for data like PVR is.
In our SDK Tools folder there is some loading code for PVRs - PVRTTexture.h has the function names. This will allow you to load any PVR file available for an API.
For a 2D app I would suggest trying PVRTC 2bpp for textures, then PVRTC4 and go to 16bit (565 for opaque or 4444/1555 for transparent) on a case by case basis. Once loaded into GL textures are all treated the same and the loading code I mention handles the different formats for you. In our own projects we prepare a script file that calls PVRTexTool specifying the format for each texture - as we add further graphics we just add to the script. I would recommed a similar process for your project. You can review what textures look like with PVRTexTool GUI or in-game. Often compression artefacts aren’t so obvious on the mobile screen as on your desktop - I found backgrounds were fine as PVRTC2bpp for instance.
OK!
Just tried PVRTC; Hmmmmmm!, some quality issues!
alpha seems to go a bit funny on certain sprites (I guess anything alpha could be in a separate texture that isn’t compressed to avoid this issue).
Get lot’s of visual errors around my tiles.
The texture atlas places each 32x32 ‘tile’ 2 pixels away from it’s neighbour to avoid any UV errors.
I noticed when I did the encode to 4bpp these alpha spaces with nothing on were shown up in differences.
Why would that be??
thanks
Just tried it on the real device with instruments and my frame-rate is even lower!!!
Now I really am lost!
You may find PVRTC does introduce problems with transparent areas - 16bpp might be better for these textures. PVRTC is lossy compression and losing information in the alpha channel can lead to completely transparent areas no longer being completely transparent, unfortunately.
I don’t understand how your framerate can be reduced… Instruments does have a certain amount of performance overhead - does the framerate improve if you don’t run with it (I tend to use my own fps counter for this reason)?
Did you start using a depth buffer? How are you loading your textures? Is this definitely being called only once?
Possibly unrelated, you should glClear every frame and use glDiscardFramebufferEXT so that unnecessary data doesn’t need to be stored.
Hi there,
No depth buffer use (no need to!); not sure what you mean by how am I loading my textures; just the usual route :-))
But yes everything is called once.
The alpha issues I can deal with. But the speed issue I can’t :-)))
Best get back to looking at 1555 or 4444 format; but I need to fiddle my loader for those.
I do glclear every frame. But found if I didn’t every frame (i.e. no colour or z clear) I got a frame-rate boost (I guess that makes sense).
Though; even though texture compression causes artifacts I didn’t expect my fps to drop; that shouldn’t happen!!!
Cheers
Ok, have hacked in 1555 format. i.e. I convert the 8888 png on the fly and upload the new texture once.
No frame-rate change on device!!!
I don’t get where my problem is!! :-((
More info.
If I take out my parallax draw texture (it’s only 128*128 and now of course 16bpp; no alpha) my frame-rate hits near 60 fps (hurray).
So I am unsure where the problem is… my parallax texture is small and it is visible on a tiny portion of the screen; so it draw is nominal; the difference between drawing it and not is around 15fps; so very significant.
I should point out perhaps my test device is iphone 3g (not S!) I am trying to get 60fps on min (sub, according to Apple) spec hardware :-)))