Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems like the author is missing a trick?

You can use a simple hash table without probing to classify each number as either (1) first instance of a number in the table, (2) known duplicate of a number in the table, or (3) didn't fit in the table.

If you use a hash table with n buckets and one item per bucket and the input is random, then at least 63% of the distinct numbers in the input will fall into cases 1 or 2 in expectation. Increasing table size or bucket size improves this part of the math.

On really big inputs, it's probably still healthy to do a pass or two of radix sort so that the hash table accesses are cheap.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: