Which is called text shaping. ????

bradrn · on Feb 25, 2024

No, they’re different. ‘Shaping’ is deciding how to combine adjacent characters, mostly by changing their forms as needed, and secondarily by adjusting their horizontal or vertical positions. ‘Layout’ then involves taking those shaped characters and placing them on the page to form words and sentences. There’s some overlap — e.g. you could use shaping to implement the mojikumi rules in section 3.1.2 here — but by and large they’re separate things.

sylware · on Feb 26, 2024

I reread the descriptions of those c++ diarrheas which are those abomination of harfbuzz and icu, and indeed, this is a cluster f*ck of "text-layout-shaping" terminologies.

Until I get a plain and simple C99 implementation in the worse case scenario, I would stay away from those.

dougfelt · on Feb 26, 2024

Good luck finding a 'plain and simple' implementation that is as feature complete as either of those. International text layout is complex.

sylware · on Feb 26, 2024

There was a unicode text shaper from Japan, a C library, l17n or something like that. But google is unable to find it again. Maybe it is gone.

But the right way to avoid like hell those c++ abominations (there are still people thinking coding c++ makes them smart, at best toxic, but no less worse) and start with roman script and incrementally add supported languages. I did venture in harfbuzz, there is no salvation, the coding is so much c++ brainf*cked better start clean and lean.

astrange · on Feb 26, 2024

Funny enough, ICU was originally written in C++ by Apple for Pink/Taligent, then ported to Java, then ported back to C++.

sylware · on Feb 26, 2024

Weird, I got before they started to enshitify ICU with c++, I remember having a plain and simple C version.

dhosek · on Feb 25, 2024

text shaping is not a thing. Unicode provides some basic parameterization (e.g., indicating where valid line breaks can occur in a text), but does not provide much more in the way of layout specification, nor should it. It’s a text encoding specification, not a typography specification. You will also notice that it does not include, e.g., specifications about Latin alphabet typographic ligatures either (e.g., f+f+i → ﬃ). The existence of some ligatures in the Unicode standard is a sop to backwards compatibility with older encodings only (this is also why Unicode includes superscript/subscript digits, box drawing characters and a number of inconsistencies in how different scripts are managed).