The last time I wanted an unpredictable, non-repeating sampling of 64-bit number...

twic · on Sept 13, 2019

The thing the Vitter algorithm does which this doesn't is that it produces the samples in order. This is why it's useful with tape drives, or sampling market data through the day: you can start at the beginning, and take the samples without ever having to seek backwards.

tripzilch · on Sept 16, 2019

It is also the only reason to use this algorithm :) It's a pretty important requirement that I don't think the author emphasized enough.

If you want them shuffled (or don't care if they're shuffled), better use a permutation function.

BTW there are much faster permutation functions than cryptographic ones (you don't need the security aspect for this application). Look to modern PRNGs.

garganzol · on Sept 13, 2019

This is a good approach. Thanks to bijective mapping between original and encrypted spaces the result is guaranteed to be unique. And since the mapping function is a secure encryption algorithm, the output has uniform distribution. Well done.

LolWolf · on Sept 13, 2019

Indeed, but this assumes that modern crypto is secure! :)

whymauri · on Sept 13, 2019

When our crypto prof told us we're not sure if one way functions exist, everyone's jaw dropped.

convivialdingo · on Sept 13, 2019

I know this, as I work in crypto.

But I’d have never considered using it for this - it doesn’t register in the brain as anything other then crypto.

Awesome solution, thanks for hacking my brain.

nemo1618 · on Sept 13, 2019

This works, but only if you're sampling the exact same space as the block size. For example, if your cipher outputs 64-bit blocks, but you want to sample in 2^32, you're back to the drawing board, because there's no way to truncate the cipher output without risking duplicates.

twic · on Sept 13, 2019

You can use the "hasty pudding trick" to truncate the cipher output without risking duplicates. It is inefficient if the block space of the cipher is much larger than the sampling space, though.

However, you can also use a Feistel network to create a block cipher for an arbitrary bit length. So, you can always bound the block space to be no more than twice the sampling space, which is okay.

CobrastanJorji · on Sept 13, 2019

What's the hasty pudding trick? Is it related to the Hasty Pudding cipher?

twic · on Sept 13, 2019

It is. Perhaps it's not officially called that, but that's how it was described to me:

https://groups.google.com/d/msg/sci.crypt/gjYclOVJDrc/tiv4ic...

the8472 · on Sept 13, 2019

A variable-size block construction would help here. It doesn't even have to be cryptographically secure, only bijective and statistically good enough similar to PRNGs sans CS. It doesn't even have to be keyed.

thaumaturgy · on Sept 13, 2019

Another good tool for this is the linear feedback shift register, and most programmers outside of crypto seem to be unaware of it too. A "complete" N-bit LFSR gives you 2^N non-repeating pseudorandom values. You start with some value (the seed), and feed it into your LFSR operation along with your chosen bitmask (the "taps"), and you get a difficult-to-predict value back out.

"Complete" LFSRs are created by carefully selecting your taps, and there's plenty of literature available for a broad range of taps for many different numbers of output bits.

Your cipher may have an LFSR buried somewhere in it, I guess they're a common component in crypto. (I know dick-all about crypto, mostly.)

Usual caveats for PRNGs apply: LFSRs are cool and magical and super fast, but the next number in the sequence can be computed by anyone else that can figure out your LFSR and seed, and since a complete LFSR will hit each value in 2^N exactly once per cycle, it doesn't even count as pseudorandom.

But, if you need a random-looking non-sequential non-repeating selection widget, an LFSR just might do the trick.

justicz · on Sept 13, 2019

I’ve used this trick with a 64-bit block cipher (base-32 encoding the result) to generate short, unique public identifiers (to put in URLs) for entries in a database based on an integer primary key.

garganzol · on Sept 13, 2019

There is also Skip32 cipher for 32-bit values. Interestingly enough, it was initially designed to cover the scenarios like yours.

dahart · on Sept 15, 2019

This is a good way to pick a few 64 bit numbers without duplicates, if 64 bit numbers is what you need.

I didn't see any replies to you mention this part, but Vitter's algorithm works on an unknown number of samples. You can pick a 64-bit number uniformly when you know in advance you have exactly 2^64 choices. But how would you pick samples from a stream with equal probability, without knowing how many items are in the stream? Maybe it's 3, maybe it's 2.493e85, you won't know until you run out, and when you do, you want to have picked some items from the stream with uniform probability.

tripzilch · on Sept 16, 2019

Incorrect, you do know this number beforehand. The parameter `N` in the code is the upper bound on the number of values, and the parameter `n` is the number of samples.

dahart · on Sept 17, 2019

I don't know which code you mean, but I was referring to Vitter's basic "Algorithm R", which does not depend on knowing the number of samples or an upper bound in advance. https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_R

SilasX · on Sept 13, 2019

Wait, are symmetric ciphers guaranteed to cycle through the full space of block-sized numbers when applied to their output like that? I thought that was just "close to" true?

ghusbands · on Sept 13, 2019

Yes, it's bijective. For every input value, it provides a unique output that will decrypt back to that input.

You might be misunderstanding the algorithm. We're only encrypting an incrementing number, not ever encrypting something that is already encrypted.

SilasX · on Sept 13, 2019

Ah, okay. Thanks! Maybe I was thinking of hash functions, for which it's not guaranteed that every output has an (n-sized) pre-image?

Edit: To clarify, I do now understand that you were proposing:

random number X -> output encrypt(X), encrypt(X+1), encrypt(X+2), etc

Which I had misinterpreted as:

random number X -> output encrypt(X), encrypt(encrypt(X)), encrypt(encrypt(encrypt(X))) etc.

HelloNurse · on Sept 13, 2019

Counter mode means encrypting successive (all different) numbers, not the previous output. There is no issue with cycles.

SilasX · on Sept 13, 2019

Yes, the original commenter replied to say that and I agreed to it.

https://news.ycombinator.com/item?id=20962758

interfixus · on Sept 14, 2019

This is my homespun solution whenever I need a deterministic, non-repaeting mapping of a sequence. Except I rely - shame on me - on a variable blocksize encryption scheme I've hacked together for just this purpose.

Output - of the mapping as well as of anything I run through the encryption algo - passes all pseudo-random validity testing I throw at it, but obviously I would never, ever consider using the thing where real security was a concern.

epx · on Sept 13, 2019

Choosing an initial random number, then keep modulo-incrementing it by a generator (a number larger than, and relatively prime to, the stack size) wouldn't work as well?

eternalban · on Sept 13, 2019

In a similar vein, I use a PRNG (based on Blake2b cryptographic hash) and mask off bits for the required domain.

someguydave · on Sept 13, 2019

Why not simply increment the CTR IV with a monotonically increasing counter?

toolslive · on Sept 13, 2019

you can also use a Lehmer random number generator of full period.

ghusbands · on Sept 13, 2019

Like all LCGs, the output of the Lehmer RNG is very predictable, but it would indeed suffice for the purpose given in the article.