Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

nice.

FYI that stock regexp is buggy and e.g. will match $42.36 which obviously isn't a symbol.

Indeed, symbols are tough enough to nail correctly e.g. people not including the $, symbols with periods, etc.

Offhand, I'd build a little neural net classifier (e.g. https://fasttext.cc/ ) and train this on a slew of example-posts that are/aren't about stonks. To get training data, use regexps and then run through them by hand (20+ per minute per hour = 1200/hour, or outsource to amazon mturk $0.25 per 10, incl verification = $30/1200). Also, there's probably easy ones you can classify 100% correctly with regexps, to increase the training set size.

I'm happy to help if you like.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: