Requirements for Japanese Text Layout

omoikane · on Feb 25, 2024

I like how they have a dedicated section for mixing Japanese and western language:

https://www.w3.org/TR/jlreq/#japanese_and_western_mixed_text...

I wonder if there are considerations for other combinations, such as mixing Japanese with right-to-left text. Maybe the other combinations are not nearly as common?

knuckleheadsmif · on Feb 26, 2024

It is common in some contexts. Unicode also has within it formatting rules for a lot of this stuff although not the kind of rules you’ll find in a manual of style or a typesetting rule book.

Japanese can flow top to bottom and right to left as well with in books that are read left to right.

wahnfrieden · on Feb 25, 2024

Web has great support for Japanese. Apple native support on iOS and macOS is relatively abysmal, with piss poor support for basic needs like ruby text.

nxobject · on Feb 25, 2024

Yeah – my understanding is that, if you really want a cross-platform product, you'll have to license a commercial engine.

wahnfrieden · on Feb 25, 2024

I like using web views including rendering them to png or pdf

Apple has some janky old UIKit and less-so AppKit support for Japanese that's portable to use from SwiftUI so I get by without needing web views for every label. However it's still nice to just use a web view for primary content.

There's zero input support for advanced Japanese features on native Apple platforms basically, like ruby text editing, but they can be bolted on cleverly with work. Or just use a web view.

tupuc_speedrap · on Feb 26, 2024

This makes sense, most people from California rarely even leave their state let alone go to a different country

astrange · on Feb 26, 2024

Japan is the second largest iOS market.

re5i5tor · on Feb 25, 2024

In season 2 of Tokyo Vice, they show character Jake, a reporter for newspaper Meicho Shimbun, using a native Japanese “word” processor to type in his stories. Really fascinating and seems to be authentic.

dhosek · on Feb 25, 2024

I remember being absolutely fascinated by the ability of 1980s/early 1990s Japanese laptops to manage input with a standard keyboard. Especially given the computing resources of the time, it seemed like serious dark magic.

wodenokoto · on Feb 26, 2024

The first time I watched battle royal I thought the hacking scene was film magic, because no way could the computer just change letters into Japanese characters like that.

astrange · on Feb 26, 2024

Unlike Jake (Adelstein) himself, who whenever I've read any of his stories has obviously made it up. Especially the ones where he claims to know everyone in the yakuza.

o11c · on Feb 25, 2024

Hm, I'm having a weird and interesting issue with this website.

I'm using Firefox ESR with the (recommended) Auto Tab Discard [1] extension to save memory for idle background tabs. Unlike prior extensions AFAIU this should be using largely native browser code paths.

When I return to the jlreq page after it has idled out, all the Japanese text comes back and the "English" button is inoperative until I click one of the other language options.

Whose bug is this?

[1]: https://addons.mozilla.org/en-US/firefox/addon/auto-tab-disc...

mjevans · on Feb 26, 2024

I have no proof, but I suspect it's a firefox bug, as I've noticed other behavior that seems a LOT like: some process forces a tab to reload, but that trigger isn't fed to plugins (E.G. Dark Reader, and pages that are unexpectedly blindingly bright).

ATD might be useful as a way of triggering this behavior more reliably by exercising the discard and reload tabs.

ngcc_hk · on Feb 25, 2024

Very interesting. And provide both Japanese and English version (both separately and together) is a God sent.

Cloudef · on Feb 25, 2024

Funny that this pops on the front page just as i posted this on show hn https://github.com/Cloudef/zig-budoux

This site also nicely demonstrates how poorly web browsers break japanese text

nxobject · on Feb 25, 2024

Everything you wanted to know about Japanese typography, and more.

tkgally · on Feb 25, 2024

Yes, indeed. I’ve had a professional interest in Japanese typography for many years, having worked on many Japanese dictionaries and other publications, but this document contains much that I didn’t know.

nxobject · on Feb 25, 2024

+1 to that – in the document's note sections there are a lot of typographical rules of style/aesthetic rules of thumb that are probably out of the scope of a W3C-style purely algorithmic specification, are very useful (e.g. indentations, conventional head/subhead/running section formatting, etc.)

sylware · on Feb 25, 2024

Isn't text shapping more in the unicode scope than in w3c scope?

phonon · on Feb 25, 2024

This isn't glyph shaping, it's text layout.

sylware · on Feb 25, 2024

Which is called text shaping.

????

bradrn · on Feb 25, 2024

No, they’re different. ‘Shaping’ is deciding how to combine adjacent characters, mostly by changing their forms as needed, and secondarily by adjusting their horizontal or vertical positions. ‘Layout’ then involves taking those shaped characters and placing them on the page to form words and sentences. There’s some overlap — e.g. you could use shaping to implement the mojikumi rules in section 3.1.2 here — but by and large they’re separate things.

sylware · on Feb 26, 2024

I reread the descriptions of those c++ diarrheas which are those abomination of harfbuzz and icu, and indeed, this is a cluster f*ck of "text-layout-shaping" terminologies.

Until I get a plain and simple C99 implementation in the worse case scenario, I would stay away from those.

dougfelt · on Feb 26, 2024

Good luck finding a 'plain and simple' implementation that is as feature complete as either of those. International text layout is complex.

sylware · on Feb 26, 2024

There was a unicode text shaper from Japan, a C library, l17n or something like that. But google is unable to find it again. Maybe it is gone.

But the right way to avoid like hell those c++ abominations (there are still people thinking coding c++ makes them smart, at best toxic, but no less worse) and start with roman script and incrementally add supported languages. I did venture in harfbuzz, there is no salvation, the coding is so much c++ brainf*cked better start clean and lean.

astrange · on Feb 26, 2024

Funny enough, ICU was originally written in C++ by Apple for Pink/Taligent, then ported to Java, then ported back to C++.

sylware · on Feb 26, 2024

Weird, I got before they started to enshitify ICU with c++, I remember having a plain and simple C version.

dhosek · on Feb 25, 2024

text shaping is not a thing. Unicode provides some basic parameterization (e.g., indicating where valid line breaks can occur in a text), but does not provide much more in the way of layout specification, nor should it. It’s a text encoding specification, not a typography specification. You will also notice that it does not include, e.g., specifications about Latin alphabet typographic ligatures either (e.g., f+f+i → ﬃ). The existence of some ligatures in the Unicode standard is a sop to backwards compatibility with older encodings only (this is also why Unicode includes superscript/subscript digits, box drawing characters and a number of inconsistencies in how different scripts are managed).