FYI that stock regexp is buggy and e.g. will match $42.36 which obviously isn't a symbol.
Indeed, symbols are tough enough to nail correctly e.g. people not including the $, symbols with periods, etc.
Offhand, I'd build a little neural net classifier (e.g. https://fasttext.cc/ ) and train this on a slew of example-posts that are/aren't about stonks. To get training data, use regexps and then run through them by hand (20+ per minute per hour = 1200/hour, or outsource to amazon mturk $0.25 per 10, incl verification = $30/1200). Also, there's probably easy ones you can classify 100% correctly with regexps, to increase the training set size.
FYI that stock regexp is buggy and e.g. will match $42.36 which obviously isn't a symbol.
Indeed, symbols are tough enough to nail correctly e.g. people not including the $, symbols with periods, etc.
Offhand, I'd build a little neural net classifier (e.g. https://fasttext.cc/ ) and train this on a slew of example-posts that are/aren't about stonks. To get training data, use regexps and then run through them by hand (20+ per minute per hour = 1200/hour, or outsource to amazon mturk $0.25 per 10, incl verification = $30/1200). Also, there's probably easy ones you can classify 100% correctly with regexps, to increase the training set size.
I'm happy to help if you like.