More

jpap · 2025-04-09T08:48:51 1744188531

Agreed; I do this all the time as well. I don't think it as "guessing" per se, I think of it as forming a hypothesis based on the intersection of my existing knowledge of the problem and any new observations [1], then testing it to figure out if the hypothesis was correct. If proven incorrect, the process helps me reform into a new hypothesis and the cycle continues until the problem area becomes clear enough to solve outright.

[1] By reading more of the code, RTFM, observing logs, tracing with a debugger, varying inputs, etc.

jpap · on Feb 13, 2021

This post covers a topic I've spent some time with in the past, and is generally a good overview but unfortunately gets the idea of "linear RGB" wrong. That means all of the results need some attention, including the Go implementation. Maybe for a part II post?

Each color value (e.g. red) is represented by a value from zero to full intensity. It's easiest to think of it as a number between 0 and 1 in a linear space. You could use a floating point number for that, or a quantized/fixed point value. For example the 10-bit quantized value round(r_linear*1023) in the range 0 to 0x3ff.

8-bit RGB color components are "encoded" from their linear version with a transfer curve (aka gamma compression). For sRGB, the curve is a piecewise linear and exponential combination. A good overview is [1]. There are many different encodings, including sRGB, BT.601, BT.709, etc. Then there's "full range" vs. "video range"... it can get complex pretty quickly.

Because of gamma encoding, an 8-bit R_sRGB red value is not equal to round(r_linear*255). You have to first compress r_linear via the gamma curve, then quantize that 0..1 value to 8-bits. When going in reverse (expanding an 8-bit sRGB value to linear), you generally take R_sRGB/255 to produce a value in the 0..1 range and then use the inverse gamma curve to get the linear value. These computations can be done in floating point, fixed point, or using lookup tables.

The takeaway is that you can't represent 8-bit sRGB color components in linear with just 8-bits, without losing precision. You need at least 12-bits for linear sRGB and many implementations just go straight for 32-bit float values for simplicity.

These conversions are required whenever you combine (blend) pixels encoded into sRGB: so for each pixel operation X, you decode sRGB to linear, perform X, then encode back to sRGB. It's expensive! That's why GPUs offer texture formats that specify a gamma encoding like sRGB, so a pixel shader can blissfully work in linear color, with the conversions done for it in hardware as a pre- and post-shader operation. On the CPU? You have to do it all yourself...

Because of that, many software libraries don't bother with the proper gamma conversion and just compute everything in the logarithmic (gamma encoded) domain. And most of the time, it looks OK! But it really is just a "cheap" approximation -- sometimes it can look quite bad compared to the (proper) linear computation...

As far as I can tell, none of the Go standard library does linear blending; and all of the image formats are assumed to be sRGB encoded. There are some 3rd party packages like [3] that can do some of color management on a 16-bit linear image format (RGBA64 == 16bits/component RGBA).

The other thing the author might consider is revising the "Why?" footnote to the "Random Noise (grayscale)" section. What the author is actually doing there is just using a cheap approximation to a rounding function: round(x) ~= floor(x + 0.5). In general, doing a round like that introduces a bias [2]. That section can be summarized as: after every pixel operation, round and clamp back to the valid range.

[1] https://blog.johnnovak.net/2016/09/21/what-every-coder-shoul... [2] http://www.cplusplus.com/articles/1UCRko23/ [3] https://github.com/mandykoh/prism

makeworld · on Feb 13, 2021

Thanks a lot for this, I was hoping to learn with this blog post. Correctness is the highest priority for me, so I'm glad to improve my library.

I will update the library to use 16-bit color everywhere (0-65535), and update the blog post to note this.

As for the rounding, that's another great point, and thanks for the link. I will change the library and blog post to round to the even number on ties.

Edit: I've updated the blog post, I'd appreciate if you could check it out and let me know if I made any mistakes with the update.

bjornornorn · on Feb 13, 2021

Just a small correction: Logarithmic != gamma encoded. Most software use sRGB encoding which is close to a power function (often referred to as a gamma function). Logarithmic encoding is often used for encoding HDR images, but is not what most software use.

jpap · on Feb 13, 2021

When I referred to "logarithmic domain", I'm not talking about a purely log/exp transfer function, but one that is "log like". Perhaps it is more accurate to say "non-linear domain"... but I hope you get the idea. :)

The sRGB transfer function is piecewise linear + exponential but can be closely approximated by a simple exponential with \gamma ~= 2.2 [1]. Either way, the encoding between linear and non-linear is generally referred to as "gamma correction", even when using a transfer function that is not a simple exponential.

[1] https://en.wikipedia.org/wiki/SRGB#The_forward_transformatio...

mark-r · on Feb 13, 2021

sRGB was engineered to be indistinguishable from a power 2.2 function, even though it's harder to calculate.

patrakov · on Feb 13, 2021

Even if the article gets the "linear RGB" idea wrong, it doesn't matter for the results, because each channel in all explored palettes is still 1-bit: either on, or off, sometimes with the constraint that red and green cannot be on at the same time.

jpap · on Feb 13, 2021

Unfortunately not when you're performing error diffusion, where the residuals are added to neighboring pixels. If you're doing it in the non-linear domain, you're going to get a different result during that diffusion step, even when you're dithering to a target 1-bit/pixel image.

You can see this visually in Surma's excellent blog post [1]: look for the gradient strips in the "Gamma" section.

[1] https://surma.dev/things/ditherpunk/

makeworld · on Feb 13, 2021

Good point, perhaps I should've picked some different palettes. But the ideas behind the blog post generalize to all palettes.

jiggawatts · on Feb 13, 2021

> none of the Go standard library does linear blending;

I would understand a C++ library from the 1990s getting this wrong, or some toy project not bothering to implement colour management properly.

But to develop a new programming language for the 2010s to 2020s and blithely assume that images are always 8-bit sRGB is lazy beyond belief...

To put things in perspective, this would be roughly the same as making an application around the same time that simply assumes that the screen resolution is a fixed 1024x768 pixels.

jpap · on Oct 26, 2018

I'd also love to read the thesis, especially to see how they structured the lock.

I recently wrote a readers-writers (subtree) lock for a K-ary tree and it was quite a challenge with a nontrivial implementation. It was much easier to start with a single mutex-like (subtree) lock (one reader or writer), and then extend it to the general multi-reader/writer case.

I can only imagine that a lock-free version (of even a single reader/writer) must be even more of a challenge. I hope it's not a terribly long wait, but there could be much to learn from the simpler lock-based scheme in the meantime. :)

On the topic of access to the research, I found a talk by the first author [1] a helpful companion to the paper, with slides for what appears to be the same talk elsewhere [2]. I haven't seen any implementations outside of pseudocode in the paper -- it would be nice to see if there are any tricks to generating the random ranks cheaply, or the alternative that uses a pseudo-random function of the key.

Thanks to the OP for sharing this. I'm keen to try it on another project where I was hesitant to use red-black trees due to their complexity.

[1] https://www.youtube.com/watch?v=NxRXhBur6Xs

[2] http://knuth80.elfbrink.se/wp-content/uploads/2018/01/Tarjan...

jpap · on July 7, 2017

They would likely get another big speedup by doing this. iDCT gets faster as you perform a "DCT downscaling" operation because you require fewer add/mul [1].

You could probably go for another speedup, independently of DCT downscaling, by operating in YCbCr before a colorspace conversion to RGB. For example, for 4:2:0 encoded content (a majority of JPEG photographs), you end up processing 50% less pixels in the chroma planes.

When you combine both techniques, you can have your cake and eat it too: for example, to downsample 4:2:0 content by 50% you can do a DCT downscale on only the Y plane, keeping the CbCr planes as they are before colorspace conversion to RGB. No lanczos required!

If you need a downsample other than {1/n; n = 2,4,8}, you can round up to the nearest integer n then perform a lanczos to the final resolution: the resampling filter will be operating on a lot less data.

On quality I once saw a comparison roughly equating DCT downscaling to bilinear (if I can find the reference I'll update this comment). With the example above, it really depends on how you compare: if you compare to a 4:2:0 image decoded to RGB where the chroma is first pixel-doubled or bicubic-upsampled before conversion to RGB then downsampled, it might be that the above lanczos-free technique will look just as good because it didn't modify the chroma at all. Ultimately it's best to try-and-compare.

Lastly you could leverage both SIMD and multicore by processing each of the Y, Cb, and/or Cr planes in parallel.

[1] http://jpegclub.org/djpeg/

jpap · on June 6, 2017

> This format doesn't really blaze new ground (EDIT: it does in the sense that it defines a container format to express image-y constructs like still images and sequences of images in an ISO media container, but see my other comment that asks how this is similar but different to video [3]), but if this repackaging and the resulting code donation and political clout helps it gain traction, we still would gain a lot.

You need to think about HEIF as the container and disconnected from the codec used to compress the media content. In that sense, HEIF is a massive trailblazer. It gives us an ISO standard to describe image and mixed media consistent with existing methods (i.e. MP4) while allowing for new constructs (i.e. useful features of the future in addition to tiling, depth map, stereo, etc.) and new codecs as they are developed and adopted. HEIF is a universal forward-looking format.

Until now, almost all of the major imaging formats were tied to the codec and not terribly extensible to new use cases. While BPG is a great format in the short term (it gave us HEVC coded images around 2.5 years ago), it isn't an ideal choice for the long term when viewed as above.

niutech · on June 6, 2017

TIFF and Exif (which JPEG is based on) are not tied to any particular codec and are extensible, so HEIF is not a game changer.

BTW: Did you know that you can embed LPCM/μ-Law PCM/ADPCM audio data in JPEG/Exif?

jpap · on June 7, 2017

> TIFF and Exif (which JPEG is based on) are not tied to any particular codec and are extensible, so HEIF is not a game changer.

TIFF is a great format but its ability for extensibility is limited. You can't readily contain in a video track in addition to a photo, or a burst sequence that utilizes intra-frame prediction.

> BTW: Did you know that you can embed LPCM/μ-Law PCM/ADPCM audio data in JPEG/Exif?

Yes; I once owned point-and-shoot cameras that did this. The audio was pretty poor quality because they didn't employ compression and wanted to keep the file sizes small.

However you're limited to the 64k maximum JPEG marker segment size for non-image metadata; or ugly hacks like chaining segments (e.g. ICC profiles). Exif is strictly limited to being contained within 64k. How big is your audio track again? ;-)

JPEG has had its time as a wildly successful format but has also held back the imaging world from adopting a standardized way to cater for new applications; burst above, mixed media (photo + video + audio), depth maps, stereo images, alpha (that's not a hack), lossless, etc., etc. HEIF has all the key ingredients, including extensibility, to support these modern applications and grow as the universal container format for decades to come.

strngtn · on June 9, 2017

That should be very high requirement, even more than video. Why not just rename it to vedio? We just need an image, not a video that called image.

jpap · on June 6, 2017

> I'd be curious if the two "representations" are losslessly convertible, for example.

They are: you could just extract the HEVC NAL units, and re-write them into a MP4 or QuickTime container, making sure to properly place the codec configuration box, etc.

HEIF also goes beyond a sequence of frames, in that it can describe alpha planes, depth maps, tiling, etc. In that case there might not be an analog with a standard video. If you really wanted to decompose a HEIF container, you might choose to extract the raw media into elementary streams (for HEVC or AVC; or if you're using the JPEG codec, just plain JPEG files) adjacent to any metadata like Exif, etc. This is essentially what Nokia's conformance files are [1].

[1] https://github.com/nokiatech/heif_conformance/tree/master/bi...

jpap · on June 6, 2017

That's actually not a bad idea!

The HEIF examples on the Nokia HEIF site do something similar; they implement a HEVC decoder (and do the HEIF container processing) all in JavaScript using http://www.libde265.org. The images you see are decoded into a HTML5 canvas.

jpap · on June 6, 2017

While the full standard text and machinery can get quite complex, you can still construct relatively simple files housing a single image, thumbnail, and Exif that isn't that much more complex than a typical JPEG file. Compared to JPEG, where you need to define quantization tables, Huffman tables, etc. in marker segments, much of the codec-related complexity in HEIF is layered within the compressed image payload (e.g. HEVC NAL units).

Besides the coding efficiency, JPEG really hit the big time because of the readily available libjpeg cross-platform implementation. In this case, HEIF can leverage existing implementations of the ISO Base Media File Format box model; it builds heavily on that standard.

Nokia is doing the right thing by releasing their format handling library, though perhaps they could loosen up their license to include commercial use. ;-) Hopefully there are some developers amongst us inspired enough to start a new open project that, like libjpeg, brings HEIF to the masses.

jpap · on June 6, 2017

Yes! This is one of the motivations behind HEIF: it's codec agnostic and extensible. You can use HEIF with JPEG and AVC for example, or some new codec down the line.

It's also extensible, in that you can define new item types and box structures to describe any new imaging construct you like. For example, you could host images with alpha, depth maps, stereo left/right channels, a HDR image with each of the raw brackets and the final fused image, etc.

The format relies upon the ISO Base Media File Format file-type branding to distinguish the infinite possibilities into a clear set of capabilities required for playback; in the same way MPEG4 does for video codec profile and levels.

jpap · on June 6, 2017

HEIF is actually very closely related to MP4 (or really, the ISO Base Media File Format (BMFF); and historically, QuickTime which is what they are both based on).

You are right -- the main difference is that BMFF was originally designed for time-based media (video, audio) but isn't in itself well suited towards media that isn't time based, like images and their associated metadata (e.g. Exif).

HEIF extends the ISO BMFF so that you can have untimed media -- one or more photos, a collection -- or even mixed media comprising both untimed and timed media. Apple's live photo is a good example of mixed media, comprising a photo and a video.

But the HEIF container format goes so much further. You could have image sequences -- for example a higher-quality version of Animated GIF (using I-frames only, or P/B for more compression) with looping -- tiling, auxiliary images like depth maps and alpha channels, or even stereo images with a left and right channel.

Though originally HEIF came out of the HEVC standards track, hence the words "High Efficiency" in the name, it was later extended to include other codecs like JPEG, AVC. There's no reason it couldn't be extended to include VP9, PNG, or any other codec.

Think of HEIF as a versatile, extensible, standardized container format. The media coding scheme is separable. This is a big deal for the future of image/media coding because we're no-longer locked into "yet another format" that's tied to the codec. With the ISO BMFF box model, HEIF can grow to adapt new constructs and codecs for some generations to come.

vetinari · on June 6, 2017

> Think of HEIF as a versatile, extensible, standardized container format. The media coding scheme is separable. This is a big deal for the future of image/media coding because we're no-longer locked into "yet another format" that's tied to the codec.

There were similar approaches in the past and they failed. Electronic Art's IFF, or later TIFF, were also designed as extensible containers. The vast majority of the software handling these formats handles only the most popular codecs, the fringe one pushing it forward will get ignored.