If the machine is "anonymous", it does matter. Scenario #1: A human gets insider information that a Solar City will be bought. Makes his anonymous "AI machine" predict that Solar City is a great stock to buy!
That's not possible. The data is encrypted. None of the participants can see which stocks (or anything) about the data they train with. Numerai turns stock prediction into a pure ML problem.
Doesn't the encrypted chart still need to display price history or volume? If so it seems like it'd be a trivial task to match it up with its real-world counterpart.
It would be easy to match an obfuscated stock market dataset with some third party dataset, and this has happened on many Kaggle competitions (data leaks). That's why the encryption here is important.
But how do you encrypt a stock's historical performance without removing the information (performs better in summer, went up after 9/11…) hidden inside it?
You can add noise, but I doubt that will be enough.
The model has no idea what stock it is predicting. It is just a random ID that represents a security. Further, there are no encrypted charts and no ability to backtest your model outside of the small amount of data provided by Numerai. Download the csv's. They are much smaller than I expected.