GPU Compressor


I've recently been working on creating a CUDA based version of the PVR compressor. I'm really close to being done but my compressed images seem to have these weird bright spots that show up. The spots are in different places every time I do the compression and I'm really at a loss for how to get rid of them. From what I can tell it seems that during the optimization of the A and B representative images, the colors go out of bounds - either they become negative or greater than 255. I've tried bounding the colors but that just leaves me with white spots instead of colored spots. I've also tried replacing the color with that of the original image but that ends up blurring the entire image slightly, but enough so that it's noticeable. I've uploaded the original images and my compressed images and included the links below. You should be able to open the PVR files with the PVRTexTool.

Original Image (lorikeet):

https://docs.google.com/leaf?id=0B3wSV14HINbvMmFiYjJmMDAtYzgyMy00YjhhLWJmMDMtYWM2M2ZlNWQ3OWMy&hl=en_US

GPU Compressed Image(lorikeet):
https://docs.google.com/leaf?id=0B3wSV14HINbvYmEzZDRjZGItZGIxZS00OWYzLTgzNDktZGRmZjg5NmZkOWJj&hl=en_US

Original Image (lena):
https://docs.google.com/leaf?id=0B3wSV14HINbvNDc1MWE5NDUtMjdhNS00MzFlLWEzNDAtNDhmYjhlMzg0M2Nl&hl=en_US
GPU Compressed Image(lena):
https://docs.google.com/leaf?id=0B3wSV14HINbvNmZlN2RjMTctZmQ5Mi00NjQzLWJiYTctMmQ1NGI2NDcwYThk&hl=en_US

I've been following the PVR paper (https://www.imgtec.com/powervr/insider/docs/PVRTextureCompression.pdf) almost exactly. The only deviation that I've made is that I use a matrix of size 49x2 instead of 121x8 during optimization. Since I'm using a GPU, I've found that running the SVD on more threads with smaller matrices works faster. The other obvious solution would be to just use the old optimized color value if the new one seems funky but because of the way I've laid out the data, this would require otherwise unnecessary extra storage so that's going to be my last option if nothing else works.

Anyway if anyone has any ideas on how to fix the issue, I would really appreciate it. My version hasn't really been optimized but it's already about 10x faster than the one in the PVRTexLib and I plan on making my source code available once I'm finished with it. So the sooner I can sort this bug out, the sooner I can make it available.


EDIT: updated image links



jynnantonix2011-07-25 23:16:53

Hi jynnantonix

(very HHGTTG BTW),

Yes, the compressor described in the paper originally used SVD to update the representative colours, but I ran into the problem that, if the modulation values used to create the matrix were far from ideal (which would often happen if the original guesses weren’t great) the calculated colours could go a bit crazy.  Also, the fact that you could still get an underconstrained matrix was a nuisance.

The current PVRTexTool uses a different approach altogether and that, in turn, is soon to be replaced by a “3rd generation” technique.

Having said all that, there may be a couple of things you can try. You may already be doing this but, for the local 7x7 neighbourhood of pixels, only put the error/differences between the current approximation and the source values into the 49*RGBA vector. Using SVD you should then be able to compute two “updates” to apply to the current central representatives.  Also, don’t allow it to step too far in one go as that may be indicative of a poor initial guess.

Simon

PS: Any chance of OpenCL instead of CUDA? :slight_smile:

Oh, one other thing: the http://www.filedropper.com/lorikeet-512 etc links just take me to the main filedropper page.  Are there some extra bits missing from the URLs?


It seems filedropper has a limit on how long they will host files. I've updated my post with links to the images on my google docs page.

I tried what you suggested - calculating an update to the representative values rather than completely new values and it did get rid of the spots I was talking about. However, the image didn't look nearly as good as it did before (not including the spots). I think this has more to do with my initial wavelet filter than anything else since the updates will be applied to it. So I'll be messing around with different filters to see what I can get.

I also implemented what I mentioned in my first post, just re-using the old representative value if the new one is out of bounds. This seems to be working pretty well. It's not as crisp as the PVRTexLib version but it's really close.

One question I did have is regarding the values that go into the Mw matrix. The paper says that the values of the matrix are a combination of the distance and modulation value of each pixel in the local block. I haven't been able to figure out how I should combine the distance; nothing I've tried has really worked so my Mw matrix currently only has the modulation value for each of the pixels. Do you think this would make a difference?

And I've never actually used OpenCL before. I'm using CUDA now because the compressor is going to be incorporated into a bigger project, much of which also uses CUDA. But I think once I'm satisfied with my version it wouldn't be too hard for me to write it in OpenCL to allow either option.

Lorikeet with my fix:
http://docs.google.com/leaf?id=0B3wSV14HINbvMTc0YjIxMDYtZTU0OS00MjY1LTg0ODQtYzAwMGQ2NTg0MzYy&hl=en_US

Lorikeet with pixel update fix:
https://docs.google.com/leaf?id=0B3wSV14HINbvNmY2YTAwNDAtZGI3ZS00ZGI0LWI4MWYtZDBmZDgxYWRlNDY3&hl=en_US

jynnantonix2011-07-25 23:40:40


jynnantonix wrote:
It seems filedropper has a limit on how long they will host files. I've updated my post with links to the images on my google docs page.

Thanks. I'll take a look.
Quote:

I tried what you suggested - calculating an update to the representative values rather than completely new values and it did get rid of the spots I was talking about. However, the image didn't look nearly as good as it did before (not including the spots).

Curious.  The current (2nd gen) and upcoming 3rd gen compressors both use a "correct the error" approach and it seems to work quite well. You will need to run a few update passes though.

Quote:

One question I did have is regarding the values that go into the Mw matrix. The paper says that the values of the matrix are a combination of the distance and modulation value of each pixel in the local block. I haven't been able to figure out how I should combine the distance; nothing I've tried has really worked so my Mw matrix currently only has the modulation value for each of the pixels. Do you think this would make a difference?

Oh, it should make a big difference.

For the sake of simplicity in the following, let's just stick to the 4bpp case.  AFAIU,  you are taking pixel windows that are 7x7 in size and centred on locations of the form [i*4 + 2, j*4 + 2].   Because the underlying upscale is a 4x4 bilinear, you'll need to scale your weights accordingly.

Now, assuming you are using the "correct the error" approach I suggested, the values for top left pixel (or any corner for that matter) of your window would be weighted by 1/16, i.e. an X weight of 1/4  * a Y weight of 1/4.  The next pixel along the top row would be 2/16 (2/4 * 1/4). The very centre pixel would have a weight of 1, and it's immediate left, right, top and bottom neighbours all have weights of 3/4. I hope that makes sense.

You then combine these 'distance' weights with the blend ratios of the mod values to produce each pair of values in your 49x2, Mw matrix.


Quote:

And I've never actually used OpenCL before. I'm using CUDA now because the compressor is going to be incorporated into a bigger project, much of which also uses CUDA. But I think once I'm satisfied with my version it wouldn't be too hard for me to write it in OpenCL to allow either option.

That's fair enough. I only ask because OpenCL will (or should eventually) be on a larger set of platforms.



SimonF2011-07-26 17:53:40

So the distance weight was the thing I was missing. Using a correct the error approach and the matrix set up like you suggested, I’ve gotten my images to look almost exactly like the ones created by the PVRTexLib. With one exception - there seems to be some sort of ‘wrinkling’ effect on the borders of my compressed images. I can make the wrinkles move around to different parts of the image but I can never seem to get rid of them. I’ve figured out that it has something to do with the weight matrix, the bilinear upscale, and the index of the top left pixel in my 7x7 window. Maybe I’m not centering the pixels properly when I upscale because it seems like the three are just very slightly off in one direction or the other and it’s made the past two days really frustrating because I’m so tantalizingly close to having it done. If you’ve run into this problem or have any ideas on what might be causing it, that would be really helpful.

Optimizing pixel pair [i, j]. I run 100 optimization passes to make the effect more obvious because the wrinkles seem directly tied to the number of optimization passes.

Index of top left pixel in 7x7 window [4i, 4j]:
https://docs.google.com/leaf?id=0B3wSV14HINbvODk3Mzc0YmItNGQ3Zi00MTExLTllZjktYThjNjM0OGVmZGIy&hl=en_US

Index of top left pixel in 7x7 window [4i+1, 4j+1]:
https://docs.google.com/leaf?id=0B3wSV14HINbvYjkxMzI3ZDItZTk1Ny00OGU1LWI0MjUtOGJjYjg4MzJjYTAz&hl=en_US

Index of top left pixel in 7x7 window [4i+2, 4j+2]:
https://docs.google.com/leaf?id=0B3wSV14HINbvM2Y3MGQ5NTctZTQ5OC00NjFlLThlYWQtYzM4NDg0YzMwN2I4&hl=en_US

Index of top left pixel in 7x7 window [4i+3, 4j+3]:
https://docs.google.com/leaf?id=0B3wSV14HINbvOWIzZWJkNzMtZTk5NC00Njg5LWJkMDMtZDkwYTMyYTllY2Fi&hl=en_US

Index of top left pixel in 7x7 window [4i-1, 4j-1]. This is actually really close to the original if I only run a few optimizations but when I run more, you can see that the problem hasn’t really gone away:
https://docs.google.com/leaf?id=0B3wSV14HINbvYTBiYWMxOWQtN2RjYy00MDI3LTljMzMtNmNjOGJkOTU2NWVh&hl=en_US

Index of top left pixel in 7x7 window [4i-2, 4j-2]:
https://docs.google.com/leaf?id=0B3wSV14HINbvMjFkYWJkZDAtOTFiMS00ZjU3LWE3YjItOGYwNzI5ZDczYWIz&hl=en_US

Index of top left pixel in 7x7 window [4i-3, 4j-3]:
https://docs.google.com/leaf?id=0B3wSV14HINbvOTU2M2JiNTItM2NhYS00ODhhLWExZDYtMjJlMjY0YjYwYTBj&hl=en_US

Pastebin to my bilinear resize method:
http://pastebin.com/AFN85Sf1




Assuming the top left pixel of the image is 0,0  then the top left pixel of any of your 7x7 windows should be at [4*i - 1, 4*j - 1].  Note that the image is toroidal so the top wraps around to bottom and the left edge wraps with the right.

Now I had a quick look at 2 of your images (the 
[4*i - 1, 4*j - 1] and [4*i, 4*j]) and I agree that they are weird.  At a guess, it appears some sort of ringing/oscillation starts up and just gets worse and worse - like a positive feedback loop.

In what order do you process the windows and do you update the representatives for the windows sequentially or do you, effectively, do all of them in parallel? Do you fully optimise one region before moving on to the next, or do process the whole image in several passes?


SimonF2011-07-29 09:34:43

I figured out the issue. My bilinear upscale wasn’t centering the image properly and that was throwing everything else off. The compressed images look great now.

Thanks so much for your help :slight_smile:

I’m glad to hear it. 


Hello jynnantonix,

I have a question for you:
is it possible for you to share your GPU compressor?

I’m currently working with a large amount of textures to be compressed, and the process takes hours!

Having a GPU compressor would increase my compression time.

Even if you are not able to share your compressor, could you at least show a comparison in between compression times with one tool or the other?

Thanks in advance!