Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In Russian it often hallucinates "Субтитры сделал DimaTorzok" ("Subtitles by DimaTorzok") at the end of things. Interestingly, I wasn't able to find any YouTube videos with that name in the subtitles, so it's not like it's in a lot of training data.


I tried googling this and found questions from Telegram users why voice messages recognition sometimes produces this phrase and who is this person. Also I found this thread [1] claiming that the subtitles by DimaTorzok are coming from some Russian youtube videos on gaming like [2].

[1] https://github.com/openai/whisper/discussions/2372

[2] https://www.youtube.com/watch?v=FAqyUuahMlc&t=401s


Yeah, I know about this from Telegram, because they use Whisper for voice message recognition. There are a bunch of other artifacts it often produces.


Could it be someone distributing subs online, e.g. showing up in the opensubtitles.org dataset?


Or possibly someone subtitling pirated movies? That seems to be a common thing according to other comments




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: