I am at “near noob” level when it comes to GPU usage, so please in advance forgive imprecise questions and anything I might have overlooked.
currently I am writing an application that needs to process image data on the GPU. The runtime and framework I have to use, is CPU centric. Image data is provided in terms of RGBA bitmaps. I have to put this image data as well as the processing algorithm and data under GPU control in terms of texture and vertex data, process it with OpenGL ES 2 using special shaders and then read back the resulting RBGA data to be used by the runtime / framework. My target platform is an iPad (2 or better) as well as Android-based mobile hardware (mostly Tegra2/3).
My question is: What would be the fastest way to share memory or exchange data between CPU and GPU?
As of now, I simply use glTexImage2D to send bitmap data from CPU to the GPU. After processing, I use glReadPixels to read back the processing result to CPU land. This seems quite slow to me, since copying is involved and it seems odd, because isn’t it the case that most mobile platforms (incl. iPad) use shared memory between CPU and GPU?
I looked into the GL_OES_mapbuffer extension. Would that be a faster alternatives on systems that support this extension? Any other ideas, hints or pointers?
I appreciate any help on this, since it is quite hard to get information or experienced advises on this topic.
Thank you in advance
This is a common problem that developers sometimes face. Unfortunately there's no simple solution for what you've asked, but I'll try to help as much as I can.
Firstly, the reason that this is so slow:
The command that really will cause your app to slow down is glReadPixels. Generally, the GPU and CPU are executing in parallel. Different operations take different amounts of time for each processor to complete, so the gl calls are more like "requests" which the driver will queue up and execute when the GPU is ready.
However, when you call glReadPixels, you expect all the calls up until this point to have been processed - a partially rendered texture is essentially useless. Since the GPU will be a little behind in terms of the command stream, the CPU has to wait for the GPU to go through all commands up to the glReadPixels. When the GPU catches up, it then has to wait for the CPU to give it some new commands, which can't happen until the data has been copied. This serialization essentially stalls both the CPU and GPU for an amount of time when they'd be otherwise busy, meaning a lot of time is wasted doing nothing unfortunately.
I hope that explains why although the memory is shared - there is a high cost associated with doing this.
Our usual recommendation in this instance is to suggest that, depending on what you're actually trying to achieve, you either avoid reading the pixels back, or you don't involve the GPU. There are a couple of options available to make this work better, but they're not especially simple and require various extensions to be present which may or may not exist on your target devices. Equally, as regards iOs devices, they use their own context handling library (eagl instead of egl) so I am not sure what paths would be available to you to do this on Apple hardware.
Can I ask what effect/output it is that you are trying to get out? I may be able to help you better if I know what you're aiming for. If you don't want to discuss project details on the public forums please feel free to email us at email@example.com.
thanks you for this comprehensive answer. I will follow your suggestion and follow up directly via email.