Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Requirements for Japanese Text Layout (w3.org)
62 points by nxobject on Feb 25, 2024 | hide | past | favorite | 28 comments


I like how they have a dedicated section for mixing Japanese and western language:

https://www.w3.org/TR/jlreq/#japanese_and_western_mixed_text...

I wonder if there are considerations for other combinations, such as mixing Japanese with right-to-left text. Maybe the other combinations are not nearly as common?


It is common in some contexts. Unicode also has within it formatting rules for a lot of this stuff although not the kind of rules you’ll find in a manual of style or a typesetting rule book.

Japanese can flow top to bottom and right to left as well with in books that are read left to right.


Web has great support for Japanese. Apple native support on iOS and macOS is relatively abysmal, with piss poor support for basic needs like ruby text.


Yeah – my understanding is that, if you really want a cross-platform product, you'll have to license a commercial engine.


I like using web views including rendering them to png or pdf

Apple has some janky old UIKit and less-so AppKit support for Japanese that's portable to use from SwiftUI so I get by without needing web views for every label. However it's still nice to just use a web view for primary content.

There's zero input support for advanced Japanese features on native Apple platforms basically, like ruby text editing, but they can be bolted on cleverly with work. Or just use a web view.


This makes sense, most people from California rarely even leave their state let alone go to a different country


Japan is the second largest iOS market.


In season 2 of Tokyo Vice, they show character Jake, a reporter for newspaper Meicho Shimbun, using a native Japanese “word” processor to type in his stories. Really fascinating and seems to be authentic.


I remember being absolutely fascinated by the ability of 1980s/early 1990s Japanese laptops to manage input with a standard keyboard. Especially given the computing resources of the time, it seemed like serious dark magic.


The first time I watched battle royal I thought the hacking scene was film magic, because no way could the computer just change letters into Japanese characters like that.


Unlike Jake (Adelstein) himself, who whenever I've read any of his stories has obviously made it up. Especially the ones where he claims to know everyone in the yakuza.


Hm, I'm having a weird and interesting issue with this website.

I'm using Firefox ESR with the (recommended) Auto Tab Discard [1] extension to save memory for idle background tabs. Unlike prior extensions AFAIU this should be using largely native browser code paths.

When I return to the jlreq page after it has idled out, all the Japanese text comes back and the "English" button is inoperative until I click one of the other language options.

Whose bug is this?

[1]: https://addons.mozilla.org/en-US/firefox/addon/auto-tab-disc...


I have no proof, but I suspect it's a firefox bug, as I've noticed other behavior that seems a LOT like: some process forces a tab to reload, but that trigger isn't fed to plugins (E.G. Dark Reader, and pages that are unexpectedly blindingly bright).

ATD might be useful as a way of triggering this behavior more reliably by exercising the discard and reload tabs.


Very interesting. And provide both Japanese and English version (both separately and together) is a God sent.


Funny that this pops on the front page just as i posted this on show hn https://github.com/Cloudef/zig-budoux

This site also nicely demonstrates how poorly web browsers break japanese text


Everything you wanted to know about Japanese typography, and more.


Yes, indeed. I’ve had a professional interest in Japanese typography for many years, having worked on many Japanese dictionaries and other publications, but this document contains much that I didn’t know.


+1 to that – in the document's note sections there are a lot of typographical rules of style/aesthetic rules of thumb that are probably out of the scope of a W3C-style purely algorithmic specification, are very useful (e.g. indentations, conventional head/subhead/running section formatting, etc.)


Isn't text shapping more in the unicode scope than in w3c scope?


This isn't glyph shaping, it's text layout.


Which is called text shaping.

????


No, they’re different. ‘Shaping’ is deciding how to combine adjacent characters, mostly by changing their forms as needed, and secondarily by adjusting their horizontal or vertical positions. ‘Layout’ then involves taking those shaped characters and placing them on the page to form words and sentences. There’s some overlap — e.g. you could use shaping to implement the mojikumi rules in section 3.1.2 here — but by and large they’re separate things.


I reread the descriptions of those c++ diarrheas which are those abomination of harfbuzz and icu, and indeed, this is a cluster f*ck of "text-layout-shaping" terminologies.

Until I get a plain and simple C99 implementation in the worse case scenario, I would stay away from those.


Good luck finding a 'plain and simple' implementation that is as feature complete as either of those. International text layout is complex.


There was a unicode text shaper from Japan, a C library, l17n or something like that. But google is unable to find it again. Maybe it is gone.

But the right way to avoid like hell those c++ abominations (there are still people thinking coding c++ makes them smart, at best toxic, but no less worse) and start with roman script and incrementally add supported languages. I did venture in harfbuzz, there is no salvation, the coding is so much c++ brainf*cked better start clean and lean.


Funny enough, ICU was originally written in C++ by Apple for Pink/Taligent, then ported to Java, then ported back to C++.


Weird, I got before they started to enshitify ICU with c++, I remember having a plain and simple C version.


text shaping is not a thing. Unicode provides some basic parameterization (e.g., indicating where valid line breaks can occur in a text), but does not provide much more in the way of layout specification, nor should it. It’s a text encoding specification, not a typography specification. You will also notice that it does not include, e.g., specifications about Latin alphabet typographic ligatures either (e.g., f+f+i → ffi). The existence of some ligatures in the Unicode standard is a sop to backwards compatibility with older encodings only (this is also why Unicode includes superscript/subscript digits, box drawing characters and a number of inconsistencies in how different scripts are managed).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: