Performance of glCopyBufferSubData in GPU -> GPU case

Hello,

does anyone have data on how glCopyBufferSubData performs if both source and target are in graphics device memory compared to, say, if the source is in CPU memory and it is uploaded via glBufferSubData?

Or say in comparison to a transform feedback based solution.

Regards

Hi,

The performance should be the same in a unified memory architecture.
However, feel free to profile timings of the CPU-to-GPU and GPU-to-GPU cases.

Regards,
Dihara