Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Related Questions": tokenize => random hash/project (tokens) => TD-IDF => KD-tree lookup

"automatically infer tag": tokenize / shingle q&a, ORDER token+bigrams BY TF-IDF(token + bigrams)

In both cases a global IDF estimate can be held in memory using a Counting Bloom Filter (or a traditional solr index).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: