Frame drop

Hi,





About 56 fps on average has achieved on realtime video frame processing on iPhone4.


The quality, the size(480x320) and its frame rate, is almost the same as the Camera app.





Thanks for the great GPU on iPhone!





Now I’m trying Kuwahara filter on it.


I know it has too much texture lookups, so which is ok.





Since it is quite slow, like 1 fps, I can see every frame.


And I noticed that only one frame out of two is displayed.





Memory bandwidth looks full.


But it is not the problem since simple rendering can achieve close to 60 fps.


The only difference is pixel shader.


If the prev pixel shader task is still in progress, the next is discarded, then the frame is dropped?





Thanks,


Takenori      

I’m unclear what the data source of your application is. Were you comparing against the camera app because you are using a camera source as well, or are you using a video source?





I think it may be the data source that is dropping frames rather than the graphics core.


If you are using a camera source, then how are you certain you are dropping frames? It may be possible that the camera’s capture rate will be reduced if you are putting a heavy load on the device.


If you are using a video source, are you sure that the video decoder is not dropping frames in an attempt to keep the video running at the correct play back speed?





As this is a very platform specific question, I think you are best contacting Apple for an answer for this.


Alternatively, you could send us your shader so we can help you optimise it to give you better performance, which should (hopefully) cause the issues you have been experiencing to disappear :slight_smile:

Thanks, Joe.





It is realtime video frame capturing through Camera with AVFoundation. So the source is not video(movie from file), but camera. I have two more simple filters, gray and sepia, as well. They can achieve up to 60 fps.





> If you are using a camera source, then how are you certain you are dropping frames?





I have two atomic flags, needsVideoFrame and hasVideoFrame.





The video capturing thread callback immediately returns if needsVideoFrame is false. If true, it copies its data with the right format, let OpenGL renderer grab it, and set needsVideoFrame false, and hasVideoFrame true.





The rendering thread load the data into texture with glTexImage2D if hasVideoFrame is true. Then, set needsVideoFrame true, and hasVideoFrame false.





So new video frame comes only after the prev image got processed. Thus, by looking at the produce/consume text output and the actual result on the display, I found one frame is dropped. In other words, only 1 frame update happens on a display while two cycles of produce/consume are shown.





I can send the XCode project because it is just an experiments. Where can I send?





Thanks,


Takenori

It’s difficult to determine if the problem you are having is from a bug in your own code (e.g. threading issues), the camera server choosing to capture at a lower frame rate, or you are experiencing an actual issue with your GL drawing code from the explanation you have given.





If you send a stripped down test case of the issue you are experiencing to devtech@imgtec.com, I’ll try to find the cause of the problem :slight_smile:

No, I don’t think it’s a bug. But the device looks under too much contention by the Kawahara’s fragment shader. So this issue will have gone if it gets improved.





Any chance to improve the following? I have already moved coordination calculation to vertex shader, and removed duplicated texture2D. I’m wondering using multiple passes. Is there any recommended direction to lookup like “row” to get better cache hits? Or it’s simply depends on the number of texture lookups since it’s twiddled?





Thanks,


Takenori








//          10


//     0     1     2


// 11     3     C     5     12


//     6     7     8


//          13





uniform sampler2D sampler2d;


varying mediump vec2     myTexCoord;





const int radius = 2;


const mediump float n = float(radius * radius + 2);





varying mediump vec2 myTexCoord0;


varying mediump vec2 myTexCoord1;


varying mediump vec2 myTexCoord2;


varying mediump vec2 myTexCoord3;


//varying mediump vec2 myTexCoord4; equal to myTexCoord


varying mediump vec2 myTexCoord5;


varying mediump vec2 myTexCoord6;


varying mediump vec2 myTexCoord7;


varying mediump vec2 myTexCoord8;





varying mediump vec2 myTexCoord10;


varying mediump vec2 myTexCoord11;


varying mediump vec2 myTexCoord12;


varying mediump vec2 myTexCoord13;





void main()


{


     mediump vec3 m[4];


     mediump vec3 s[4];


     for (int k = 0; k < 4; ++k) {


          m[k] = vec3(0.0);


          s[k] = vec3(0.0);


     }


     


     mediump vec3 c;


     mediump vec3 cc;


     


     // 0


     {


          c = texture2D(sampler2d, myTexCoord0).rgb;


          m[0] = c;


          s[0] = c * c;


     }


     


     // 1


     {


          c = texture2D(sampler2d, myTexCoord1).rgb;


          cc = c * c;


          m[0] += c;


          s[0] += cc;


          m[1] += c;


          s[1] += cc;


     }


     


     // 2


     {


          c = texture2D(sampler2d, myTexCoord2).rgb;


          m[1] += c;


          s[1] += c * c;


     }


     


     // 3


     {


          c = texture2D(sampler2d, myTexCoord3).rgb;


          cc = c * c;


          m[0] += c;


          s[0] += cc;


          m[2] += c;


          s[2] += cc;


     }


     


     // 4 - C


     {


          c = texture2D(sampler2d, myTexCoord).rgb;


          cc = c * c;


          m[0] += c;


          s[0] += cc;


          m[1] += c;


          s[1] += cc;


          m[2] += c;


          s[2] += cc;


          m[3] += c;


          s[3] += cc;


     }


     


     // 5


     {


          c = texture2D(sampler2d, myTexCoord5).rgb;


          cc = c * c;


          m[1] += c;


          s[1] += cc;


          m[3] += c;


          s[3] += cc;


     }


     


     // 6


     {


          c = texture2D(sampler2d, myTexCoord6).rgb;


          m[2] += c;


          s[2] += c * c;


     }


     


     // 7


     {


          c = texture2D(sampler2d, myTexCoord7).rgb;


          cc = c * c;


          m[2] += c;


          s[2] += cc;


          m[3] += c;


          s[3] += cc;


     }


     


     // 8


     {


          c = texture2D(sampler2d, myTexCoord8).rgb;


          cc = c * c;


          m[2] += c;


          s[2] += cc;


          m[3] += c;


          s[3] += cc;


     }


     


     // 10


     {


          c = texture2D(sampler2d, myTexCoord10).rgb;


          cc = c * c;


          m[0] += c;


          s[0] += cc;


          m[1] += c;


          s[1] += cc;


     }


     


     // 11


     {


          c = texture2D(sampler2d, myTexCoord11).rgb;


          cc = c * c;


          m[0] += c;


          s[0] += cc;


          m[2] += c;


          s[2] += cc;


     }


     


     // 12


     {


          c = texture2D(sampler2d, myTexCoord12).rgb;


          cc = c * c;


          m[1] += c;


          s[1] += cc;


          m[3] += c;


          s[3] += cc;


     }


     


     // 13


     {


          c = texture2D(sampler2d, myTexCoord13).rgb;


          cc = c * c;


          m[2] += c;


          s[2] += cc;


          m[3] += c;


          s[3] += cc;


     }


     


//     {


//          c = texture2D(sampler2d, myTexCoord0).rgb;


//          m[0] = c;


//          s[0] = c * c;


//          c = texture2D(sampler2d, myTexCoord1).rgb;


//          m[0] += c;


//          s[0] += c * c;


//          c = texture2D(sampler2d, myTexCoord3).rgb;


//          m[0] += c;


//          s[0] += c * c;


//          c = texture2D(sampler2d, myTexCoord).rgb;


//          m[0] += c;


//          s[0] += c * c;


//          c = texture2D(sampler2d, myTexCoord10).rgb;


//          m[0] += c;


//          s[0] += c * c;


//          c = texture2D(sampler2d, myTexCoord11).rgb;


//          m[0] += c;


//          s[0] += c * c;


//     }


     


//     {


//          c = texture2D(sampler2d, myTexCoord1).rgb;


//          m[1] += c;


//          s[1] += c * c;


//          c = texture2D(sampler2d, myTexCoord2).rgb;


//          m[1] += c;


//          s[1] += c * c;


//          c = texture2D(sampler2d, myTexCoord).rgb;


//          m[1] += c;


//          s[1] += c * c;


//          c = texture2D(sampler2d, myTexCoord5).rgb;


//          m[1] += c;


//          s[1] += c * c;


//          c = texture2D(sampler2d, myTexCoord10).rgb;


//          m[1] += c;


//          s[1] += c * c;


//          c = texture2D(sampler2d, myTexCoord12).rgb;


//          m[1] += c;


//          s[1] += c * c;


//     }


     


//     {


//          c = texture2D(sampler2d, myTexCoord3).rgb;


//          m[2] += c;


//          s[2] += c * c;


//          c = texture2D(sampler2d, myTexCoord).rgb;


//          m[2] += c;


//          s[2] += c * c;


//          c = texture2D(sampler2d, myTexCoord6).rgb;


//          m[2] += c;


//          s[2] += c * c;


//          c = texture2D(sampler2d, myTexCoord7).rgb;


//          m[2] += c;


//          s[2] += c * c;


//          c = texture2D(sampler2d, myTexCoord11).rgb;


//          m[2] += c;


//          s[2] += c * c;


//          c = texture2D(sampler2d, myTexCoord13).rgb;


//          m[2] += c;


//          s[2] += c * c;


//     }


     


//     {


//          c = texture2D(sampler2d, myTexCoord).rgb;


//          m[3] += c;


//          s[3] += c * c;


//          c = texture2D(sampler2d, myTexCoord5).rgb;


//          m[3] += c;


//          s[3] += c * c;


//          c = texture2D(sampler2d, myTexCoord7).rgb;


//          m[3] += c;


//          s[3] += c * c;


//          c = texture2D(sampler2d, myTexCoord8).rgb;


//          m[3] += c;


//          s[3] += c * c;


//          c = texture2D(sampler2d, myTexCoord12).rgb;


//          m[3] += c;


//          s[3] += c * c;


//          c = texture2D(sampler2d, myTexCoord13).rgb;


//          m[3] += c;


//          s[3] += c * c;


//     }





     mediump float min_sigma2 = 1e+2;


     for (int k = 0; k < 4; ++k) {


          m[k] /= n;


          s[k] = abs(s[k] / n - m[k] * m[k]);





          mediump float sigma2 = s[k].r + s[k].g + s[k].b;


          if (sigma2 < min_sigma2) {


               min_sigma2 = sigma2;


               gl_FragColor = vec4(m[k], 1.0);


          }


     }


     


}

I’ve had a look around various resources on the web, and also an article in GPU Pro, but I’m sceptical that you will be able to achieve an interactive frame rate with this technique without stripping down the effect into something much simpler and less accurate.





If you put this code into our PVRUniSCoEditor utility and generate cycle counts, you will see there are 286 cycles estimated for the shader (which does not take into account the cost of a large number of varyings and other factors that cannot be taken into consideration in offline compilation). Just for reference, I would suspect a full screen fragment shader on an iPhone 4 would struggle to achieve a decent frame rate with more than 30-40 cycles for each fragment (this is only an estimate rather than a test I’ve performed).





You should see some improvement if you unroll your for loop, reduce the number of texture reads you are performing (i.e. sample less neighbouring pixels, if possible) and/or reduce the number of fragments that need to be processed (e.g. perform a render pass at a 1/4 of the resolution of the display), but may still struggle to achieve an interactive frame rate.

I have tried a quick look with PVRUniSCoEditor before, and so am aware of the terrible costs per fragment :slight_smile:





> Just for reference, I would suspect a full screen fragment shader on an iPhone 4 would struggle to achieve a decent frame rate with more than 30-40 cycles for each fragment (this is only an estimate rather than a test I’ve performed).





Thanks for the info. Yes, I will keep fragment shader as lean as you suggest.





Kuwahara filter, sometime in the future.





Feel free to post again if you manage to find any good algorithms or optimisations to achieve a decent frame rate. Would be interested to know if it’s possible :slight_smile:





Also, you should read the performance recommendations document in our SDK for additional optimisations you may be able to implement.

Not a decent rate, but better. Any idea to improve this?





The new design is to pack left mean and right mean into one integer.





1 2 O 3 4





a: mean of (1 + 2 + O)


b: mean of (O + 3 + 4)





Thus, with hardware filtering like Bloom example,


now it only needs two lookups, intermediate up and down.





Here’s how to pack.





I do this after converting ABGR to ARGB.





     for (int i=2; i < num-2; i++) {


          // red


          int a = (((p[i-2] >> 16) & 0xff) + ((p[i-1] >> 16) & 0xff) + ((p >> 16) & 0xff)) / 3;


          int b = (((p >> 16) & 0xff) + ((p[i+1] >> 16) & 0xff) + ((p[i+2] >> 16) & 0xff)) / 3;


          int red = a & 0xf0 | (b >> 4) & 0xf;


          


          // green


          a = (((p[i-2] >> 8) & 0xff) + ((p[i-1] >> 8) & 0xff) + ((p >> 8) & 0xff)) / 3;


          b = (((p >> 8) & 0xff) + ((p[i+1] >> 8) & 0xff) + ((p[i+2] >> 8) & 0xff)) / 3;


          int green = a & 0xf0 | (b >> 4) & 0xf;


          


          // blue


          a = ((p[i-2] & 0xff) + (p[i-1] & 0xff) + (p & 0xff)) / 3;


          b = ((p & 0xff) + (p[i+1] & 0xff) + (p[i+2] & 0xff)) / 3;


          int blue = a & 0xf0 | (b >> 4) & 0xf;


          


          p2 = p & 0xff000000 | (red << 16) & 0x00ff0000 | (green << 8) & 0x0000ff00 | blue & 0x000000ff;


     }





And then, they are unpacked, and computed in the fragment shader.





uniform sampler2D sampler2d;


varying mediump vec2     myTexCoord;


uniform mediump float offsetK;





void main()


{


    //gl_FragColor = texture2D(sampler2d,myTexCoord);


     


     mediump vec3 c = texture2D(sampler2d, myTexCoord).rgb;


     mediump vec3 c1 = texture2D(sampler2d, myTexCoord + vec2(0, offsetK)).rgb;


     mediump vec3 c2 = texture2D(sampler2d, myTexCoord + vec2(0, -offsetK)).rgb;


     


     c1 = c1 * 255.0 / 16.0;


     mediump vec3 upperleft = vec3(floor(c1.r), floor(c1.g), floor(c1.b)); // 0…16


     mediump vec3 upperRight = vec3(c1.r-upperleft.r, c1.g-upperleft.g, c1.b-upperleft.b); // 0…1


     upperleft /= 16.0; // 0…1


     c = c * 255.0 / 16.0;


     mediump vec3 left = vec3(floor(c.r), floor(c.g), floor(c.b)); // 0…16


     mediump vec3 Right = vec3(c.r-left.r, c.g-left.g, c.b-left.b); // 0…1


     left /= 16.0; // 0…1


     c2 = c2 * 255.0 / 16.0;


     mediump vec3 lowerleft = vec3(floor(c2.r), floor(c2.g), floor(c2.b)); // 0…16


     mediump vec3 lowerRight = vec3(c2.r-lowerleft.r, c2.g-lowerleft.g, c2.b-lowerleft.b); // 0…1


     lowerleft /= 16.0; // 0…1


     


     mediump vec3 m[4];


     mediump vec3 s[4];


     


     // 0 : upperleft + left


     m[0] = upperleft + left;


     s[0] = upperleft * upperleft + left * left;


     


     // 1: upperRight + Right


     m[1] = upperRight + Right;


     s[1] = upperRight * upperRight + Right * Right;


     


     // 2: lowerleft + left


     m[2] = lowerleft + left;


     s[2] = lowerleft * lowerleft + left * left;


     


     // 3: lowerRight + Right


     m[3] = lowerRight + Right;


     s[3] = lowerRight * lowerRight + Right * Right;


     


     // variance


     mediump float min_sigma2 = 1e+2;


     for (int k = 0; k < 4; ++k) {


          m[k] /= 2.0;


          s[k] = abs(s[k] / 2.0 - m[k] * m[k]);





          mediump float sigma2 = s[k].r + s[k].g + s[k].b;


          if (sigma2 < min_sigma2) {


               min_sigma2 = sigma2;


               gl_FragColor = vec4(m[k], 1.0);


          }


     }


     


}