I was thinking of ways to solve the problem by eliminating the overhead of checking whether a random number was already drawn, when Vitter's solution was to make the random number generator produce random numbers in a sorted manner, thus eliminating that overhead entirely.
EDIT: Then again... if the algorithm is to be used to shuffle a list, wouldn't returning the selection in a sorted manner defeat the entire purpose? If k = n, output would be input...
>The main one is that eventually you’re ignoring too many cards and the algorithm doesn’t finish on time (this is the coupon collector’s problem).
This will take longer time at each iteration and when both MAX_TWEET_ID and and TWEET_COUNT is large and close to (or equal to) each other it may take very long to complete.
> (Stated more formally: given non-negative integers k and n with k <= n, generate a list of k distinct random numbers, each less than n. We reasonably ask that this be done in O(k) time and O(1) additional space. Finally, the distribution must be uniform.)
As k approaches n the expected number of draws needed to get k unique values increases faster than O(k).
It will get slow towards the end once each loop iteration has a very low chance of being productive (only happens when you're choosing all or almost all elements). You can fix that, though there's potential tradeoffs.