Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can this be used to build a grammar checker for japanese language?


Even if one could verify grammatical correctness, there are many ways to produce unnatural Japanese phrases.

To give an easy example: 9つ (9 things) is natural, but 10つ sounds extremely strange. However, 10個 sounds fine. When the number is large enough, it's also common to not use 助数詞 at all.

Sometimes, grammatical mistakes are natural Japanese. For instance, there is a concept of ら抜き言葉 (words with ら dropped), where people will say e.g. 寝れない ("I can't sleep") instead of 寝られない. This is an error in conjugation, yet it's natural language and applies to a few other words, too.

Validating both grammar and word choice is still insufficient to judge naturality of a Japanese phrase. A common "mistake" made by many Japanese is writing 「違和感を感じる」. The verb is redundant because of the 「感」 in 「違和感」. The "correct" word to use in this case is 覚える. In practice, however, either choice of word is understandable and considered correct (except to those with the trivia of 「違和感は覚えるもの!」)

Sometimes, redundancy makes phrases considered incorrect (see 二重敬語 for an example). In other cases, nobody will debate the correctness of the phrase.


No, because what people mean when they say grammar checker, it doesn't suffice to check whether a sentence is (formally speaking) ungrammatical or not. You'd expect it also check word choice, ortography etc. Those aren't part of the syntax structure. This means that it would allow many very flawed sentences.

Besides, the grammar this project uses is not likely to reflect accurately the actual grammar of modern spoken or written Japanese, and it's likely not to be even nearly complete; that would mean it would also have a quite lot of false positive "ungrammaticals".

Something _like_ this can certainly be used as a part of a grammar checker. But in that case, you shouldn't implement it in TypesScript's type system in the first place.


No. Japanese is very context sensitive, and like any natural language, has ambiguities. Japanese is loaded with dajare (puns).

Grammar checking basically needs AI - you need to train some model to understand common phrases and sentence structure. Before LLMs there was software like MeCab[1] which done this, and gave good results, but modern LLMs are much more capable.

[1]:https://taku910.github.io/mecab/


Japanese grammar is so simple because it doesn't matter that much.

Most of the "rules" are common patterns made into guidelines, and they'll change depending on the speaker, context, society of the time (the "correct" way is fully dictated by the majority). And you could break the grammar rules as long as the other accepted guidelines are OK.

As a parallel you can learn to mechanically drive a car, but driving it "correctly" will require full knowledge of traffic code, societal rules and how to reasonably handle conflicting situations, including crashing it into a tree if it means avoiding a packed school bus.


No, it's missing basic things, accepts ungramattical sentences, and is fundamentally flawed by being based off nihongokyouiku grammar.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: