Since it's about compressing one specific sequence of bytes, and compression time/cost doesn't matter - only decompression - wouldn't finding a set of [neural network inputs + weights + processing code + error correction] (I think in the direction of GANs or auto-encoders) be a way to have a high chance of finding improvements? Not 100% sure it would be in spirit of the contest, and if the cost of training would offset the reward...
A fully NN solution (as opposed to using a light sprinkling like all competitive solutions do) requires big binaries for the most part, while an arbitrary program can easily memorize exact sequences to find repetition while using a tiny NN for some finetuning of predictions. A pure NN solution like a Transformer-XL does turn in the record-setting performance on natural language datasets including WP... unfortunately, the required NN model sizes alone tend to be larger than the entire uncompressed WP corpus here, and so have zero chance of ever being the minimum of model+compression. (An observation which I think supports the idea that the Hutter Prize has long outlived its intended use for measuring progress towards AGI; it's now just sort of a 'demo scene' version of AI benchmarking, testing intelligence within extreme constraints of sample efficiency/compute, rather than a true useful benchmark.)
The fact that small compressors like gzip or zpaq do so well at small data/compute but then can't compete as you scale up to tens or hundreds of gigabytes of text (amortizing the cost of those fancy NN models) can be considered a version of the Bitter Lesson: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
The bitter lesson seems to be striving for the wrong goal. Replicating human performance will at best give you human performance. Humans are incredibly inefficient. They require multiple orders of more computational resources just to match a machine. Since we do not have an abundance of computational resources, trying to replicate humans will always result in inferior results than simply running conventional algorithms.
I interpreted the rules as only decompression needs to be in 8 hrs, training could be arbitrarily long? That would also lead to the theoretical strategy of finding the sequence somewhere in pi, and just storing the index and a pi generator....