Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is this any more than fancy outlier detection (genuinely asking, this is not my field).

i.e. if the majority of data fed to a system is bad, will it work?



They do different things. Both are useful.

Outlier detection detects data points that look different from your existing data in some way; that "lie outside" what is usual. Sometimes the assumption is that outliers are generated by an entirely different process from the rest of the data. It's important not to conflate "outliers" with legitimate data points that happened to fall at the tail ends of the data distribution.

The techniques I listed attempt to actually compensate for some known or estimable level of badness in the data. For example, gold loss correction (GLC) essentially estimates the probability of misclassifying a data point, and uses that to adjust the output of the model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: