Your random text isn't uniformly random - any byte that isn't a letter or number...

ds · on June 28, 2016

Here it is encrypted with AES256 - Pass is wasteoftime

http://pasted.co/wasteoftime.txt 4,732 bytes

http://pasted.co/wasteoftime.rar 3,753 bytes

Also, if there is issues with using the 'wrong' encryption, I feel thats kind of a straw man argument. Please feel free to upload a file over 5MB which cant be reduced in size through any of the various compression tools.

Also, keep in mind that I never said it would be a huge benefit at all- I only said that SOME compression was possible SOME of the time.

mikeash · on June 28, 2016

That's not encrypted with AES256, that's encrypted with AES256 and then encoded with base64.

Do you understand the difference between base64 and binary data? Base64 only uses six bits per byte. It's designed to allow data to transit cleanly through places which only allow text, such as JSON. Binary data is eight bits per byte. Binary data is what encryption and compression algorithms output. Naturally, if you take data which only uses six bits per byte, and run it through a compression algorithm which is able to use eight bits per byte, you can achieve good compression. But this is illusory, because you expanded the original by 33% when you applied base64 in the first place! All your attempts at compression can be beaten by simply decoding the base64 into the original binary data.

It doesn't matter what magnitude of compression you specified. Random data cannot be compressed at all on average. This is a simple mathematical certainty with a straightforward proof. Encrypted data looks and acts like random data. If you find a way to reliably compress the output of a modern, secure encryption algorithm (not the base64 encoding, but the original binary) then you'll have found a massive security hole in it which will make you famous.

Dylan16807 · on June 28, 2016

And as a note: It's fine to use base64 if you want. But then you also need to output the compressed file as base64. And you'll see that 3753 octects equate to 5004 base64 characters, and the file is actually significantly larger than it was before compression.

klodolph · on June 28, 2016

> Also, keep in mind that I never said it would be a huge benefit at all- I only said that SOME compression was possible SOME of the time.

This is a common misconception. The pigeonhole principle here tells us that any scheme that ever achieves some compression also achieves some expansion. The only reason compression algorithms work at all is because they are more likely to compress than they are to expand, because we know something about the probability distribution of the plaintexts.

The pigeonhole argument treats compression as a black box with input and output, and makes no assumption about how the compression works.

Your idea—to only compress some inputs and not others—doesn't change the fact that your algorithm can be treated as if it is a black box with inputs and outputs. So the pigeonhole principle still applies. Many people before you have made this argument before, so we are very familiar with it, and very familiar with the reason why it is wrong.

FURTHERMORE, compression, by its very nature, works very poorly on data which is apparently uniformly distributed, as encrypted data is.

This is why you are wrong.