Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

wit truly random data, patterns should randomly appear.


There's a pretty easy proof to show why compression of truly random data is impossible. Let C be a compression algorithm. Assume that for all bit strings S, C(S) is not longer than S, and assume that there exists strings S' such that C(S') is shorter than S'. Also, assume that for any two distinct bit strings S1 and S2, C(S1) != C(S2). Let S'' be the shortest such string: the shortest string that C shortens. Let the length of S'' be M, and the length of C(S'') be N < M. Note N >= 1, and by assumption every string of length N does not have its length changed by C. There are 2^N strings of length N, and by assumption it is easy to see that since the images of any pair of N bit strings under C are distinct, every string of N bits is the image of some other N bit string. But now, S'' has length M but has an image of length N. This necessarily collides with the image of some string of length N, a contradiction. Thus, for any compression algorithm which is injective (in effect, reversible) and successfully shortens some strings, the algorithm must lengthen some others.

What this short proof shows is that even though random data may have patterns, no compression algorithm can successfully leverage this for every random string.


Encrypted data is "indistinguishable from random data" but it is not actually random data. No patterns will appear.


If patterns appear in random data, and no patterns appear in encrypted data, then it's trivially distinguishable from random data: look for patterns.


To be clear, I'm not claiming that encrypted data is compressible. I'm just pointing out that any detectable "patterns" that appear in random data but not encrypted data are inconsistent with the claim that random and encrypted data are indistinguishable. Also if you find such a "pattern" that's probably a good starting point for breaking whatever cipher you're using.



I'm sorry, that article is far too long to tell me what you are trying to say. All I can gather is, I was wrong


tl;dr Random data may appear to have patterns, but those patterns do not repeat, so they cannot be compressed.


Ah, but they can repeat! There is a finite chance that any sequence of any length will occur in a random sequence.

So while a compression alg applied to a very large amount of random data is unlikely to reduce the size, it is quite possible to achieve some compression on smaller blocks.


Any given _example_ of a random sequence may have less than maximum entropy, and so may be compressible, but a truly random sequence _in general_ will not be compressible. So you can make a compression algorithm that will compress random data in some cases but it will never be able to compress any arbitrary random data. Does this help?


You should take a look at the almost 15 year old (and unsolved) "Random Compression Challenge"

http://marknelson.us/2012/10/09/the-random-compression-chall...


That is still your problem and you need to take responsibility for it. It is wiser to learn and know what you're talking about, don't just add ambient, useless, incorrect noise.


Saying something true but burying it is not wise either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: