Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

All of these bizzaro explanations leave me wondering if I've missed something. Can't this be elegantly solved via the pigeonhole principle?

There are one million numbers to be sorted, out of a space of one hundred million. That means that no number may be more than 100 away from any other number once fully sorted (7 bits). Therefore, you can simply use deltas to encode your numbers, as 7 bit deltas * one million numbers < 1 MB RAM.

EDIT: should've been clearer: no number may be more than 100 away from any other number on average once fully sorted. Therefore, it's an average of 7 bits per number, maximum. Duplicates are even easier, since it's only one bit to encode the duplicate (a delta of zero).

EDIT 2: As for the encoding to be used, I think a universal code or Golomb code would probably be sufficient. They can get quite close to natural entropy.



Duplicates are allowed, so you could for example have 500,000 zeros and 500,000 99,999,999's, which are more than 100 apart.


That is one of the more ideal distributions, as each number gets one (zero) bit for delta, and the number at the jump point gets a large but globally inconsequential number of bits. Bits per number is still less than 7.


> That means that no number may be more than 100 away from any other number once fully sorted (7 bits) It's possible that 90% of numbers are <1 million, and the remaining 10% are > 91 million. That's a 90 million delta possibility.


You would of course use a delta scheme that can encode any number using O(log n) bits. At that point having a lopsided distribution of numbers actually helps you. Cramming the first 90% of the numbers into 1% of the space can save you about six bits per number. Even if you spread out the remaining 10% the price is going to be far less than sixty bits each.


Hmm, that doesn't follow, I can have 0, and from 99,000,002 - 100M.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: