Since it's about compressing one specific sequence of bytes, and compression tim...

gwern · on Aug 11, 2019

A fully NN solution (as opposed to using a light sprinkling like all competitive solutions do) requires big binaries for the most part, while an arbitrary program can easily memorize exact sequences to find repetition while using a tiny NN for some finetuning of predictions. A pure NN solution like a Transformer-XL does turn in the record-setting performance on natural language datasets including WP... unfortunately, the required NN model sizes alone tend to be larger than the entire uncompressed WP corpus here, and so have zero chance of ever being the minimum of model+compression. (An observation which I think supports the idea that the Hutter Prize has long outlived its intended use for measuring progress towards AGI; it's now just sort of a 'demo scene' version of AI benchmarking, testing intelligence within extreme constraints of sample efficiency/compute, rather than a true useful benchmark.)

The fact that small compressors like gzip or zpaq do so well at small data/compute but then can't compete as you scale up to tens or hundreds of gigabytes of text (amortizing the cost of those fancy NN models) can be considered a version of the Bitter Lesson: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

imtringued · on Aug 12, 2019

The bitter lesson seems to be striving for the wrong goal. Replicating human performance will at best give you human performance. Humans are incredibly inefficient. They require multiple orders of more computational resources just to match a machine. Since we do not have an abundance of computational resources, trying to replicate humans will always result in inferior results than simply running conventional algorithms.

gwern · on Aug 12, 2019

Why then do systems following the bitter lesson like AlphaZero strongly outperform humans? (Hint: it involves scaling.)

yongjik · on Aug 11, 2019

But your entire model, plus your code, plus the compressed data must fit in 16MB: it gives very little room for maneuver.

Scoundreller · on Aug 11, 2019

Yes, but limitations are:

1) Needs to run in around 8hrs or less on a single CPU core

2) (I assume) it needs to be self-extracting (and size includes the compression logic).

jerrre · on Aug 12, 2019

I interpreted the rules as only decompression needs to be in 8 hrs, training could be arbitrarily long? That would also lead to the theoretical strategy of finding the sequence somewhere in pi, and just storing the index and a pi generator....

codetrotter · on Aug 11, 2019

> 2) (I assume) it needs to be self-extracting (and size includes the compression logic).

Size includes decompression logic, which may be what you meant but it is important to distinguish between compression and decompression.

http://www.hutter1.net/prize/hrules.htm#rules

dooglius · on Aug 11, 2019

Since the stated motivation is work toward AGI, I think that such an approach would be highly in the spirit of the contest.

Smoosh · on Aug 12, 2019

What if we write a powerful AGI, but it just isn't interested in this problem?

foota · on Aug 11, 2019

The size of your model is included.

jl6 · on Aug 11, 2019

> Restrictions: Must run in ≲8 hours using a single CPU core and <1GB RAM and <10GB HDD on our test machine.

TBH I feel limiting RAM usage to 1GB is an unnecessary restriction in 2019.

dmitrygr · on Aug 11, 2019

People thinking that 1 GB of RAM is not much is one of the main problems in computing today. Please stop.

jbverschoor · on Aug 11, 2019

This +1000000000

Dylan16807 · on Aug 11, 2019

It's outdated, but it's still 10x the size of the uncompressed data, so it should be generally sufficient.