Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Of course false positives matter; the naive heuristic “everything is AI generated” has zero false negatives, and mostly false positives. In the OP half of the positives are false. That’s not a useful signal IMO. You couldn’t use that to police homework for example.


I said they might not matter for quite a few use cases, not that they don't matter for all use-cases.

e.g. If you were Sam Altman at OpenAI and your use-case is mostly looking for training data and wanting to tell if it is AI-Generated or not (so you can exclude this from training data), you probably care much more about false negatives than false positives (false positives just reduce your training data set size slightly, while false negatives pollute it).

Of course they matter if you are marking homework (where conversely false negatives aren't actually that important!), but it's pretty trivial to think of use-cases where the opposite is true.


Even for that the false positives could end up mattering. E.g. if the data being incorrectly excluded turns out to be the most important part of the training set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: