Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Lossless compression requires identifying the pattern that produced an input as perfectly as possible": no, it requires identifying it absolutely perfectly. This includes all vacuous information as well. E.g., in the Wikipedia example, if there were a 2d scatter plot of a sample from a bivariate uniform distribution, lossless compression would require "memorizing" all of the plotted points.

Predicting perfectly is much, much different from predicting well. Machine learning is about the latter, while lossless compression is about the former.



This is not true. If you have a good predictor, you only need a few bits to store a piece of information. One way is just to record the places where your prediction is wrong. The ideal way would be to split all the possibilities so exactly half of possible sequences are on one side, and exactly half the probability is on the other. Every bit tells you what path to go down.

So instead of using 64 bits to specify the x y coordinates of every point on the plot, you could just use a much smaller number to represent how far it diverges from it's predicted location. You could narrow down the possible locations the point could be in by half, and then you just bits to specify only those possibilities, not all of them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: