As eru said elsewhere, you've created a stream cipher. To be secure, the
generated keystream (the thing you xor against the plaintext) must not have any
biases: the attacker shouldn't be able to guess the contents of any keystream
byte (with P > 1/2^8). Otherwise, an attacker can probabilistically recover
parts of the plaintext.
I played around with your C implementation a little[1]. I put it into 8-bit
mode, and generated 16 random keys and output their keystreams[2]. Here's the
first 32 bytes of those keystreams, in hex:
As you can see, there are some significant biases here. For example, as an
attacker, once I have a ciphertext, I can guess that bytes 18 and 19 of the
keystream were (hex) "1112", and have a very good chance of being right.
While the 32-bit version isn't as bad, I think there's still significant biases.
I generated 24495 keys. Here's the distribution for keystream byte 1 (script at
[3]):
I should really get to sleep so I didn't do the statistics on this, but I'm
fairly sure that's not a uniform distrubtion (i.e., bytes aren't being drawn
uniformly from [0:255]). Also, keystream byte 0 only ever has 128 values,
instead of 255.
I did not look at 64-bit.
All in all, I wouldn't use your cryptosystem :) but there's no shame in that!
Crypto is hard! and youre only going to get better.
I'm not really sure how you can get similar results with completely different keys. I'm looking in to this. I tried generating a meg encrypted zeroes in 8bit mode and the most used 8 bit value was only 1% more common then the least used value.
This looks fishy to me. Follow my thought: If you are looking at the very first value coming out of the stream, that value should have zero bias from the algorithm. Why? because it is an xor of two different values in the key. If both values are properly random then so should their XOR.
Note that at this point the algorithm has not yet started modifying itself, so the original key is still intact. The self modifying code may still be broken, but i think we can be fairly sure the first value has the exact same bias as the random number generator.
it's not standard with symmetric encryption in general; it's a "known plaintext attack". http://en.wikipedia.org/wiki/Known-plaintext_attack - with aes, for example, even if you know the plaintext there is no known way to obtain the key (faster than brute force search).
I played around with your C implementation a little[1]. I put it into 8-bit mode, and generated 16 random keys and output their keystreams[2]. Here's the first 32 bytes of those keystreams, in hex:
0002010203fd050607b0090a0b250d0e0f8b111213a4151617f5191a1be6003e 3b4a00790369050607004b170b820d0e0fb7111213d0005f1796191a1b321d1e 00130102036b05060734090a0b260d0e0fca111264141516178700671b534ae8 df7c806903d10506077fcc670b583600dec617580a2b8b161712191a1b1c00a7 00260102032f1d0082a0090a0bd30d0e0f4ca8c616aa151617a400a34f1c9e1e af7cb175036705060786090a0bc90d0e0f2a111213831516173700501bfa1d1e 000236ca58110506079c00090b860d0e0f8f111251009e1617b4191a1b311d1e 00250102039700ec0729090a0b2a0d0e0f961112133a151617e3a200956764b8 002a0102031e050607ed003c0b260d0e0fcf111213c000151a18191a1bb21d1e 413c0102031505060745090a0b060d0e0fc6111213b815161784191a1b341d1e 00890102039500b807a200570ba70d0e0fa31112134c1516172d191a1ba03900 00cd01020332050607e100b10bdd0d0e0f0d111213261516170e191a1b041d1e 0076010203c30506074e03c7880069000eb8078d54134604a61a142d1a0600c3 00f101020344050607a8090a0b0e0d0e0f11111213e6151617f3191a1b291d1e 00f1007a0369aab10068090a0b53b1ac0fd1111213531516171800ee1bde1d1e 0036010203570506076c00d60b050d0e0f2f27583f03591617bc09471b141d1e
As you can see, there are some significant biases here. For example, as an attacker, once I have a ciphertext, I can guess that bytes 18 and 19 of the keystream were (hex) "1112", and have a very good chance of being right.
While the 32-bit version isn't as bad, I think there's still significant biases. I generated 24495 keys. Here's the distribution for keystream byte 1 (script at [3]):
I should really get to sleep so I didn't do the statistics on this, but I'm fairly sure that's not a uniform distrubtion (i.e., bytes aren't being drawn uniformly from [0:255]). Also, keystream byte 0 only ever has 128 values, instead of 255.I did not look at 64-bit.
All in all, I wouldn't use your cryptosystem :) but there's no shame in that! Crypto is hard! and youre only going to get better.