The last time I wanted an unpredictable, non-repeating sampling of 64-bit numbers, I used a symmetric cipher in CTR mode. Essentially, start with a random key and a random value, return the encrypted version of it, then the encrypted version of its increment, and so on. Returns every number in the range, with no predictability.
The thing the Vitter algorithm does which this doesn't is that it produces the samples in order. This is why it's useful with tape drives, or sampling market data through the day: you can start at the beginning, and take the samples without ever having to seek backwards.
It is also the only reason to use this algorithm :) It's a pretty important requirement that I don't think the author emphasized enough.
If you want them shuffled (or don't care if they're shuffled), better use a permutation function.
BTW there are much faster permutation functions than cryptographic ones (you don't need the security aspect for this application). Look to modern PRNGs.
This is a good approach. Thanks to bijective mapping between original and encrypted spaces the result is guaranteed to be unique. And since the mapping function is a secure encryption algorithm, the output has uniform distribution. Well done.
This works, but only if you're sampling the exact same space as the block size. For example, if your cipher outputs 64-bit blocks, but you want to sample in 2^32, you're back to the drawing board, because there's no way to truncate the cipher output without risking duplicates.
You can use the "hasty pudding trick" to truncate the cipher output without risking duplicates. It is inefficient if the block space of the cipher is much larger than the sampling space, though.
However, you can also use a Feistel network to create a block cipher for an arbitrary bit length. So, you can always bound the block space to be no more than twice the sampling space, which is okay.
A variable-size block construction would help here. It doesn't even have to be cryptographically secure, only bijective and statistically good enough similar to PRNGs sans CS. It doesn't even have to be keyed.
Another good tool for this is the linear feedback shift register, and most programmers outside of crypto seem to be unaware of it too. A "complete" N-bit LFSR gives you 2^N non-repeating pseudorandom values. You start with some value (the seed), and feed it into your LFSR operation along with your chosen bitmask (the "taps"), and you get a difficult-to-predict value back out.
"Complete" LFSRs are created by carefully selecting your taps, and there's plenty of literature available for a broad range of taps for many different numbers of output bits.
Your cipher may have an LFSR buried somewhere in it, I guess they're a common component in crypto. (I know dick-all about crypto, mostly.)
Usual caveats for PRNGs apply: LFSRs are cool and magical and super fast, but the next number in the sequence can be computed by anyone else that can figure out your LFSR and seed, and since a complete LFSR will hit each value in 2^N exactly once per cycle, it doesn't even count as pseudorandom.
But, if you need a random-looking non-sequential non-repeating selection widget, an LFSR just might do the trick.
I’ve used this trick with a 64-bit block cipher (base-32 encoding the result) to generate short, unique public identifiers (to put in URLs) for entries in a database based on an integer primary key.
This is a good way to pick a few 64 bit numbers without duplicates, if 64 bit numbers is what you need.
I didn't see any replies to you mention this part, but Vitter's algorithm works on an unknown number of samples. You can pick a 64-bit number uniformly when you know in advance you have exactly 2^64 choices. But how would you pick samples from a stream with equal probability, without knowing how many items are in the stream? Maybe it's 3, maybe it's 2.493e85, you won't know until you run out, and when you do, you want to have picked some items from the stream with uniform probability.
Incorrect, you do know this number beforehand. The parameter `N` in the code is the upper bound on the number of values, and the parameter `n` is the number of samples.
Wait, are symmetric ciphers guaranteed to cycle through the full space of block-sized numbers when applied to their output like that? I thought that was just "close to" true?
This is my homespun solution whenever I need a deterministic, non-repaeting mapping of a sequence. Except I rely - shame on me - on a variable blocksize encryption scheme I've hacked together for just this purpose.
Output - of the mapping as well as of anything I run through the encryption algo - passes all pseudo-random validity testing I throw at it, but obviously I would never, ever consider using the thing where real security was a concern.
Choosing an initial random number, then keep modulo-incrementing it by a generator (a number larger than, and relatively prime to, the stack size) wouldn't work as well?