Hi, there are a lot of things I've been wondering. I don't actually have much practical experience with modern 3D rendering so some of it might sound strange or silly, please let me know. For things that are not supported by OGL ES 2 I'd be curious to know if the SGX hardware is capable of supporting it and if there's any chance of the OGL ES 2 ever eventually offer support for it in some form. Of course I know this information is probably confidential, but in case it's not..
1. When color masking is used such that the entire color isn't drawn does it still cause a big performance hit, or does the tile renderer special case this? Even better, would it be capable of not performing framebuffer updates at all, for instance for a pure stencil-buffer? This would be useful for rendering per-polygon data beyond what the render buffer in one pass can handle. Since it'd just be using the stencil comparators of which there are 8 (no shading/texture lookups necessary) and since it'd have 1/4th the bandwidth requirements on the output I'd expect it be really fast if it worked this way.
2. I understand SGX's OGL ES 2 implementations don't currently support depth textures but has support for packed depth-stencil. This only makes sense if there's a way to read back the depth and stencil values. How does this work, and does it require reading back the color buffer too? And is it possible to read stencil only as a stream of 8bit values?
3. Related to the last question, despite lack of depth textures, is it possible to explicitly set the entire scene's depth and stencil buffers before rendering begins? Is it possible to do it using the same packed format mentioned above?
4. Also related, what do the 24bit depth values represent/how is it converted from the internal 32bit floating point format? Is it possible to read back the full 32bit format?
5. Is it possible for a fragment shader to read the pixel's depth and/or stencil value? The latter would be useful in using stencil to modify shader functionality rather than just determine if the pixel is drawn or not - otherwise you'd need two passes with two shaders and opposite stencil functions.
6. What kind of overhead is there between changing the state of stencil function, depth funtion, current shader, and bound texture? Is the SGX capable of switching seamlessly between these things per-pixel? Because of the TBDR I expect that the primitive currently getting rendered will change a lot, as opposed to with an IMR. If there's significant overhead is there a recommended sort order/priority, or can you confidently assign these things per-primitive without fear of killing performance?
7. I understand why fragment shader discards kills performance, ie how it works against the TBDR, but from what I hear it will hurt performance significantly even if they're rarely actually taken. Why is this the case? I would expect that at worst having a conditional discard in the shader would make a whole tile slower if any are taken, but not the whole scene. How much is this penalty diminished when discard capable shaders are sorted separately like the documentation recommends? Is it faster to discard near the start of a shader's execution instead of later on, or does the entire shader get executed regardless?
8. Is there any way to
output color buffer pixels that are not vec4? Preferably some way to
output integer scalars or vec2s instead. Probably not full 32bit per
component since I understand integers to only be 16bit but that'd still
offer a lot more than 8bits per channel if you want to somehow express
> 8bit data without using full on floating point formats. In that
regard, is there a way to set colors directly from a shader, as they're
intended to appear on the color buffer, rather than going through
conversion from floating point vectors to whatever the output format
is? I'm guessing no on both of these.
I think that's all I have for now, if I think of others I'll post them here. Thanks.
Sorry it took quite a while to answer.
1. If color output is masked, or there is no color attachment to the current FBO, fragment processing will not take place unless the shader contains discard, or alpha test is enabled.
2. There are situations where depth and stencil need to be written to memory (e.g. if you use a depth/stencil renderbuffer multiple times and don’t clear it). Using a packed depth/stencil format means the representation in memory will be more compact.
3. There is no way of explicitly setting the depth and stencil buffer contents (short of drawing a point for each pixel).
4. An OpenGL ES application can’t access depth buffer contents, so there is no conversion from/to user data.
5. No, the fragment shader can’t access those values. Stencil-based branching may actually be more efficient than shader-based branching.
6. You should keep state changes at a minimum as there is a significant overhead per draw call. The right sort order depends on your requirements, but sorting by transparency, then by shader probably makes the most sense.
7. Discard affects all tiles which are covered by the rendered object (unless it can be eliminated by overlapping geometry which has been rendered first), regardless of whether fragments get discarded or not. Some SGX variants can take advantage of early discard, however the shader compiler should normally choose the best location.
8. Currently there is no such functionality in OpenGL ES.