>It is left as an exercise to the reader to think about why it’s impossible to have cycles in this graph.
This was funny. Suppose you wanted to build a node that linked to itself. You'd have to find a fixed point in the combination of functions that adds other data to the link and hashes it. Finding a fixed point of a hashing function is hard.
In fact, for an ideal cryptographic 256 bit hash function, modeled as a random oracle, it takes an average of 2^128 iterations before reaching a periodic point. The average cycle size of the reached periodic point is also 2^128. There exists a fixed point for your set of files only if the cycle size of the periodic point is 1.
Using the big-step, little-step cycle detection algorithm to avoid using gigantic amounts of memory, you're then looking at an average of 1.5 * 2^129 iterations of updating your graph of 256-bit cryptographic hashes in order to discover you've hit a periodic point.
Offhand, I don't know the probability that there's a fixed point for a given starting point for a random mapping of 256-bit values to 256-bit values, but my intuition is that it's vanishingly small. If anyone has an elegant derivation of the probability, I'd love to see it.
The expected number of fixed points in a random permutation is 1. This is an application of linearity of expectation: for a random permutation f of size N and a given input X, the probability that f(X) = X is 1/N, i.e. the probability that X is a fix point is 1/N. There are N possible choices of X, and so by linearity of expectation the expected number of fix points is N * 1/N = 1.
This doesn't tell you anything about concentration bounds or whatever, but it's a neat fact nonetheless.
Unfortunately, it's a random mapping, not necessarily a random permutation. An ideal block cipher would be modeled as a random permutation. Though, in this particular case the domain and range are the same, so unless I'm missing something, the expected number of fixed points comes out to 1 by the same reasoning.
Indeed, the expected number of fix points is the same for both permutations and mappings. Quite curious when you think about it, because the distribution of fix points is obviously different! (Consider the case n = 2 for the simplest example: there are mappings with exactly 1 fix point, but every permutation has either 0 or 2 fix points).
Note though that a correct block cipher is necessarily a permutation, because it's invertible (by definition, a permutation is just an invertible mapping with domain and range equal). A hash function on the other hand needn't be a permutation even when you restrict the domain to inputs of the same bit length as the hash output.
Actually, I should point out that you can start your iteration initializing the hash links to random values, which means there's not just one cycle you could potentially arrive at for any graph of files. I didn't mean to imply that discovering a cycle of size > 1 for a given file graph proves that there don't exist fixed points for that file graph.
I'm pretty sure it's "just" extraordinarily difficult. With some things, the difference between the two is vanishingly small. Like, in principle, flipping a bit takes a certain amount of energy. So you can take how many bit flips the best algorithm to do something takes, convert it to energy, and compare it to "total energy output of the sun for the next one hundred years", then call it pretty much impossible if the algorithm's number is bigger.
This was funny. Suppose you wanted to build a node that linked to itself. You'd have to find a fixed point in the combination of functions that adds other data to the link and hashes it. Finding a fixed point of a hashing function is hard.