Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

this happens in Turkish too. I believe the reason is that the movie subtitles were used for training without cleaning up the comments / intros subtitle authors leave in them.

leaving personal comments, jokes, reactions, intros in subtitles is very common in eastern cultures.

Turkish readers will probably remember “esekadam iyi seyirler diler” :)



Kind of mindblowing considering who it is we're talking about. Of all companies, OpenAI couldn't be bothered to throw an LLM at this problem? Finding amorphously phrased but clearly recognizable needles in large numbers of haystacks seems like a patently perfect task for them.


Don't even need an LLM, a regex would have sufficed (I've used my fair share of community sourced subtitles, and comments are almost always in a different font, colour, between brackets, etc etc).


That name translates as "Donkey Man" btw :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: