Too slow texture-transfer to GPU memory on OMAP3

Hi,





I am trying to build a test application that loads 24-bit bmp images, rotates each bmp image, and displays the rotated images on TI’s OMAP 3530 beagleboard.


I started from one of the openVG tranining sample code (OVGImage.cpp) that came with OMAP35x_Graphics_SDK_3_00_00_06. What I did was:





In the Initialization,


- created a vgImage one time using vgCreateImage() in sRGBA_8888 format


- loaded the first RGB 24-bit bmp image into CPU memory


- converted 24-bit rgb data into 32-bit rgba data


- transferred the 32-bit image data to GPU by using vgImageSubData()





In the render module,


i. set the matrix mode to VG_MATRIX_IMAGE_USER_TO_SURFACE


ii. set vgLoadIdentity() and did vgRotate()


iii. did vgDrawImage() for the first image


iv. loaded new bmp into CPU, convert to 32-bit rgba data, transfer to GPU, and repeated i~iii again








When I built and ran the application on beagleboard, I got the following cpu time:





- load a bmp image to CPU: 40msec


- convert to rgba 32 bit format: 13 msec


- vgImageSubData(): 500~512 msec





I understand trasnfering texture to GPU side costs delays, but 500 msec for each frame looks too much.


When I upload texture to GPU just one time and reuse the same texture for following rotations or translations, no delays are observed. But I want to update image data for every frame.





Any help would be appreciated !





Thank you.





johnh

What is the size of the BMP you are using ?



The bmp image size I used is 512x512.

Is your goal - to only rotate a BMP image ? Or will you be using the SGX for further graphics applications after rotation ? If only rotation, there may be other ways to do it. Please explain your use case.



OK. Here's what I want to do. For a given image sequence (i.e., video), my application in CPU side read a bmp image from storage, send it to SGX for rotation with user-specified angle, and read back the rotated image from GPU side for further processing. This procedure should be performed for each image in real-time (e.g. 30 frame/sec). SGX is only assigned for image rotation.

The length of video is typically more than 10 minutes, it would not be possible to upload all the images to SGX at initial state.



Thanks.



johnh

The short answer is that what you’re trying to do is goingt o be difficult to achieve through standard API calls as they require the images to be processed/copied before render after they have been uploaded. Without access to specialised extensions which TI are responsible for there isn’t much else that can be done in this case, I’m afraid. Even then you are transferring a large texture to the GPU and back each frame which is going to be costly in bandwidth whatever approach taken.





I’m sorry I can’t be of more help.