Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Higher quality random floats (corsix.org)
116 points by ingve on Oct 18, 2023 | hide | past | favorite | 44 comments


The holy grail of random float generation is a function that takes two bounds and can generate ALL representable floating point numbers within the given bounds with the proper distribution, here uniform (and does so efficiently).

I've written such an implementation with explanations [0], but it doesn't handle subnormals correctly. I've yet to find the time to revisit the problem.

[0] https://github.com/camel-cdr/cauldron/blob/main/cauldron/ran...


I don't think that would be a good idea -- sticking to positive normals and truncating below is enough for float32 and float64. I mean, consider that 1.1754944f-38 is a normal 32 bit float.

The probability that your "perfect" code would ever get triggered is such that humankind has not built enough compute to expect to get that unlucky yet.

(you may ask: why then care about that possibility if it's dead code anyway? Because of security vulns in context of flawed RNGs. Bad underlying RNG leading to bad distributions is expected; bad underlying RNG leading to RCE is not OK. So don't ever output zero or subnormals!)

For bfloat16 the same holds; for ieee float16, well, I have no clue what we want.


I don't think such a function would be a good default, but I think there may be applications, probably some complex very large simulations, where this might matter.


You think there might be applications like complex very large simulations where an event with probability 1:10^38 matters?

You are aware that the current age of the universe is < 10^27 nanoseconds, just for comparison?


But what if you want a random number between two very small floats?


Fair enough, if a user asks for a random float between [0, 1e-38f] then subnormals are expected.

I was just thinking about the (0,1) case, under the mistaken assumption that one could map it to (a,b) via multiplication/addition, but you're right -- if you want (a,b) perfectly () then it's not obvious to me.

() up to inaccuracies of cosmologically negligible scale


> The holy grail of random float generation is a function that takes two bounds and can generate ALL representable floating point numbers within the given bounds with the proper distribution, here uniform (and does so efficiently).

Uniform as in all representable numbers have equal probability? Or uniform as in approximating a uniform distribution over real numbers?


Uniform over the real numbers.


It’s a little unclear what you mean by that without further explanation. Do you mean that conceptually one selects a real number at random and then rounds that real number to the closest representable float? (And if so, which rounding mode?)


Floats aren't uniformly distributed over the reals (e.g. larger floats are sparser) so the above approach simply tries to account for that when generating the number by using a precisely counteracting non-uniform selection method which will result in even distribution in relation to where in the reals the picks are. It does so directly.


How much entropy do you need for that?


Average entropy is low because the events are rare. Worst-case though, you need a number of bits linear in the number of states the exponent can have, plus the number of bits of precision you need. Subnormality doesn't make that much worse because they all have the same exponent (the thing that consumes a lot of worst-case bits), and because their relative frequencies are precisely what you would obtain by filling the significand with uniformly random bits.

It's been awhile since I've thought about this problem, so I'll reserve the right to be off by a small constant additive or multiplicative factor. You need around 24+256 bits for 32-bit IEEE-754 floats (less by roughly half if you're sticking in a range like 0 to 1), and 53+2048 for 64-bit. The average number of bits required is something closer to 25 and 54, respectively.


what does subnormal mean?


Usually floating point numbers are defined as `1.fraction * 2^(exponent-bias)`, but if the exponent is zero `0.fraction * 2^(1-bias)` is used, to allow representing very small numbers.

IIRC I also explained it here: https://m.youtube.com/watch?v=VHJUlRiRDCY&t=2774s


And the reason subnormals exist at all is that normally there's an implicit 1-bit (without the implicit 1-bit, there would be multiple representations of the same value, e.g. all values with mantissa 0 would represent 0 regardless of exponent), but you could never represent the value 0 itself if the implicit 1-bit is always there. So for the lowest exponent value, the implicit bit is set to 0, to allow representing 0, but also those other very tiny values since there's still an entire mantissa to give meaning to in addition to value 0.


Subnormals or denormalized floats can lead to some unexpected results. IIRC dividing a subnormal by a subnormal returns infinity.

I experienced this where a colleague had tried to reduce the number of divisions by rewriting (a/x)*(b/y)*(c/z) to (a*b*c)/(x*y*z).

The problem was that in certain circumstances the terms were all small, but of comparable magnitude. Thus the latter lead to subnormal divided by subnormal. The fix was to undo the optimization and do the divisions first, which lead to the multiplication of three numbers of order 1.


> IIRC dividing a subnormal by a subnormal returns infinity.

This should not be true in a conformant implementation of IEEE 754-2008. You can get infinity by dividing by a subnormal (due to overflow), but that should only be possible with sufficiently large numerators.

> The problem was that in certain circumstances the terms were all small, but of comparable magnitude. Thus the latter lead to subnormal divided by subnormal. The fix was to undo the optimization and do the divisions first, which lead to the multiplication of three numbers of order 1.

My guess is that this problem was actually the numerator being subnormal and the denominator underflowing all the way to zero and therefore the result going to infinity.


It's been over a decade, so I struggle to recall the details.

> My guess is that this problem was actually the numerator being subnormal and the denominator underflowing all the way to zero and therefore the result going to infinity.

Yes thinking about you're probably correct.


Yeah, I think a good general guideline is that if you see a bunch of divisions like that, the author did it intentionally (to avoid overflow or subnormals).

Anyway, I wonder if the optimization even helps at all. Either way the critical path is a division and two multiplications. That seems like the sort of thing that, between SIMD an OoO execution, a modern CPU ought to be able to work out, haha.

No promises but I’d want to benchmark it before caring.


Maybe depends on the platform? I tried 5e-323 / 6e-323 in javascript console and it returns 0.8333333333333334 (and e-323 should be subnormal for double precision)


Depends on control registers like e.g. MXCSR. It's an utter mess. Consider e.g. https://news.ycombinator.com/item?id=32738206


thanks!


This problem is much more acute in Float32.

Another sample implementation (maybe easier to read?) is in https://discourse.julialang.org/t/output-distribution-of-ran...

As far as I remember, the main reasons for not rocking the boat on that in julia were: Everybody generates rand floats wrong, so doing it right breaks expectations and compat; and there is a perf price of maybe 0.5-1.5 cycles/number to pay; and this is nontrivial to SIMD (basically because we don't have simd TZCNT / LZCNT / access to exponential distribution)


> the main reasons for not rocking the boat on that in julia were: Everybody generates rand floats wrong, so doing it right breaks expectations and compat

Ugh, I’m sorry they had to deal with this and am sorry with the (understandable) choice they made.


What kind of things are relying on random floats being generated wrong?


Comparability between implementations: Say GPU vs CPU, or between languages, or to pseudocode in old papers.

Typical example where the badness of the floats bites you is if you do something like log(rand()), or 1/x, or more generally you map your uniform (0,1] interval via a cdf to generate a different distribution, and your mapping has a singularity at zero (this is like super standard -- e.g. how you generate exponentially distributed numbers. Or if you generate random vectors in an N-d ball, you're using a singular cdf to compute the magnitude your vector, multiplied with a normalized normal distributed N-d variable to generate the angular component).

Then the bad random floats (i.e. the non-smooth distribution close to zero) introduce real and visible bias after transformation. Afaiu the problem is well-known and serious people correct for it somewhere. If you fix the underlying problem without revisiting the now-obsolete fixes, then your results are biased again.

I'm not arguing for keeping the bad random float generation everywhere. I think it should be fixed, not just in julia but everywhere. I'm just saying that it's not a no-brainer and there is discussion and compat and communication and code audit involved in doing such a momentuous change.

Also, I'm not really up to putting in the work to champion that at the moment (arguing on the internet doesn't count ;)).


Nice. I am not surprised that the approach presented here was discussed in Julia circles already some 5 years ago. So it was about half as fast, if I interpret the thread correctly.


That thread was in 0.6 days on my long-dead broadwell using DSFMT as a C library with afaiu hand-written intrinsic code for bulk generation of floats. We switched RNG to xoshiro in the meantime which is faster and generates 64 bit numbers natively (as opposed to dsfmt which generated 53 bit natively...). So don't trust that these old timings represent current julia; I updated the thread with current timings.

I'd be very happy if this can of worms could be reopened, but am currently not active enough in julia dev to champion it.

Also somebody would need figure out something very clever for AVX2 / NEON. (AVX512 has the required instructions)

Also I can't imagine the mess with GPU -- if rand statistics differ widely between CPU and GPU that's a no-go, and I don't know what works well on which GPUs.


Very, very nice.

Julia currently uses the same trick as Lua (generate uniform Float64 bit pattern between 1 and 2 by masking a random UInt64 and reinterpreting, then subtract 1).

I wonder what the speed is.

Having the “other” half open unit interval in addition to the traditional one might come in handy for several applications.


For rngs that natively generate 64 bits (some only give you 52 bits at a time), the speed is really good.


Readers may also be interested in this old comment of mine: https://news.ycombinator.com/item?id=30895669. It uses the method by Downey briefly mentioned at the end of the article.


Hm, this method seems odd to me. Unbiased yes but then rounded to nearest float. Consider each floating point value a bin. Then this will fill bins where the coverage is sparse faster. Eg say the possible floats are 1, 10, 100 and so on - you generate from 0 to 10000. There will be far more in bin 1000 than 100.


This is exactly how random floats are computed in the Zig standard library:

https://github.com/ziglang/zig/pull/10428/


This is awesome and the first precedent I've ever seen for a standard library doing the right thing on rand floats. Big kudos to the zig people and thanks for brightening my day!


Oh, cute. It looks like they ever-so-slightly overweight the probability of values whose mantissa is entirely zeroes though. For example, the probability of hitting exactly 0.5 should be 2^-54 + 2^-55, whereas zig looks to give it 2^-54 + 2^-54.


After thinking about the problem a bit, I've come up with a completely different approach. I'm sure this is known, but I can't find anything so I'd appreciate if someone can point me to a citation.

Generate two independent doubles a and b the "usual way", meaning from [0, 1) with equal spacings of 2^-53. Then calculate 2^-53*b + a. Assuming for simplicity that the rounding mode is towards zero, this gives a new random double in [0, 1). I've only done the math in my head so I'm not 100% sure on all the numbers, but I believe it scores pretty well at the criteria in the blog post; tentative results are

Probability of zero: 2^-106

Smallest nonzero: 2^-106

Largest non-seeable: approx. 2^-54

Distinct values: 2^58.7

The usual rounding mode (round to nearest, ties to even) will generate numbers in [0, 1] with very slight biases that need closer analysis. One could also go further and do e.g. 2^-106*c + 2^-53*b + a, but something like that should only be needed for for single-precision floats.

The obvious downside is that you always need two random integers and not just in the slow path. However, SIMD becomes easier because you don't need AVX512 for lzcnt.


FWIW, there is a recipe for this in the Python docs[0]. It was developed in collaboration with Alan Downey and Tim Peters:

    from random import Random
    from math import ldexp

    class FullRandom(Random):

        def random(self):
            mantissa = 0x10_0000_0000_0000 | self.getrandbits(52)
            exponent = -53
            x = 0
            while not x:
                x = self.getrandbits(32)
                exponent += x.bit_length() - 32
            return ldexp(mantissa, exponent)
[0] https://docs.python.org/3/library/random.html#recipes


Often the naive method is all right as you aren’t really concerned by the fact that numbers close to zero don’t have the maximum precision possible – particularly when your use case has a certain "native scale" and what amounts to a fixed point approach works fine. And especially if the results are going to be discretized at some point anyway (eg. computer graphics).


One could argue that the most important criteria to optimize for is insuring that the mean expected value is exactly 0.5. That is a more practical criteria than trying to allow super low values like 2^-70 which are indeed expressible as floating point, but just less important than maintaining the simple behavior of a proper average value.


one could argue that the most important criteria is that if you take any interval between 0 and 1 you get the same density... at least that would be a uniform distribution, since even a normal, exponential or whatever could have mean 0.5 and simulate that instead.


Optimizing for that criteria you obtain a "random number generator" which deterministically returns 0.5 every time.


The proposed generator has, assuming the true uniform integer generator, the mean expected value of 0.5 + 2^-152 [1]. If you can ever distinguish this difference you are more likely to see a biased RNG output much earlier.

[1] Start with the first `rand_between_zero_and_one` snippet. `x = ((x + 1) >> 1) + (e << 52)` can be rewritten as `d = (1.0 + ((x + 1) >> 1) * 2^-52) * 2^(e - 1023)` (since it always generates a normal number). `E[(x + 1) >> 1]` exactly equals to 2^-51, and `E[2^e] = 2^1022 * 2^-1 + 2^(1022-1) * 2^-2 + ... + 2^(1022-74) * 2^-75 + 2^(1022-75) * 2^-75 = 2^1023 (1/4^1 + 1/4^2 + ... + 1/4^75 + 0.5/4^75) = 2^1023 (1/3 + 1/(6*4^75))`. So `E[d] = (1 + 2^-51 * 2^52) * (1/3 + 1/(6*4^75)) = 1/2 + 1/4^76`.


The final methods seem equivalent to reading a non-float binary fractional number .xyz... from a random bitstream and stopping at the point where further digits will be truncated in the float representation.

eg if x = 1 read 53 more mantissa bits and stop, if x = 0 decrement the exponent and repeat. There's no bias in such a process.


Lovely article and a very nice C implementation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: