I'd be very interested in an optional Pillow-SIMD downsampling resize that produces 16 bit output internally and then uses a dither to convert from 16 bit to 8 bit. Photoshop does this by default and it produces superior downsampling. Without keeping the color resolution higher, you can end up with visible color banding in resized 8 bit images that wasn't visible in the source image.
I am curious if the reason that Pillow-SIMD is more than 4x faster than IPP is due to features IPP supports - like higher internal resolution - that Pillow-SIMD doesn't? The reported speeds here are amazing, and I'm definitely going to check this project out and probably use it, but I'd love a little clarity on what the tradeoffs are against IPP or others. I assume there are some.
Each resampling algorithm will internally produce some high-precision result before cutting it to 8 bits. For Pillow-SIMD it is 32-bit integers. Currently, I haven't considered dithering, but it is a very interesting idea. Do you have any links for further reading about downsampling banding and dithering?
About IPP's features: the comparison is pretty fair: the same input, the same algorithm and filters, pretty much the same output. If IPP uses more resources internally with the same output, so, maybe it shouldn't.
Shame on me, I still haven't added the link to IPP's test file I used. Here is it: https://gist.github.com/homm/9b35398e7e105a3c886ab1d60bf598d...
It is modified ipp_resize_mt program from IPP's examples. If you have installed IPP, you'll easily find and build it.
> Do you have any links for further reading about downsampling banding and dithering?
Sadly, no, I wish I did. I just made some expensive mistakes printing giclee images from downsampled digital files, and whipped up my own dither for converting 16bit to 8bit. It wasn't until it bit me that I noticed Photoshop does it better than most apps because dithering is on by default. That's when I went looking and found an option for it in Photoshop's settings.
The main banding problem when downsampling is with slow changing gradients. Sky and interior walls, for example. I bump into it a lot with digital art too, since the source images don't have any noise. But even when there's noise in the source image, downsampling 2x or more with a good filter can eliminate the noise and cause gradients to stabilize and show their edges in 8bit color. In my experience, the problem is more common with print than on-screen resized images, but it's still pretty easy to spot on a screen, especially in the darks, and especially when jpeg compressing the results.
Implementation wise, the 16-to-8 bit dither is nowhere near as sensitive as the dithers we normally see converting 8bits to black & white or when posterizing. Almost anything you come up with will do. You don't need any fancy error diffusion or anything like that. Here's what I do: imagine the filtered 16 bit result as an 8.8 fixed point number in the [0-256) range, so the least significant bits are in the [0-1) range. I add a random number between -0.5 and +0.5 before rounding to the nearest integer. Viola, drop the low 8bits and the result is a dithered 8 bit value.
What I just described will be way slow in your world if you call a random number function every pixel, so don't do that. :P For Pillow-SIMD you'd want a random number lookup table or something slightly smarter than a random() function. And I dither on the color channels separately, but there might be some way to make it blaze by dithering the brightness and rounding all three channels up or down at the same time. I've just never tried to optimize it the way you're doing, but if you find a way and release anything that dithers, I would LOVE to hear about it.
I suspect on a GPU it will be better to use 32-bit floating point internally. But yeah, dithering the output when converting back to integers would be great.
In Photoshop I always convert to 16-bit linear color before doing any kind of compositing or resampling.
I am curious if the reason that Pillow-SIMD is more than 4x faster than IPP is due to features IPP supports - like higher internal resolution - that Pillow-SIMD doesn't? The reported speeds here are amazing, and I'm definitely going to check this project out and probably use it, but I'd love a little clarity on what the tradeoffs are against IPP or others. I assume there are some.