A bit late to reply, but none of this actually answers my question. I don't doub...

A bit late to reply, but none of this actually answers my question. I don't doubt distillation from behavior is possible, I doubt that it's possible when 90% of o1's behavior is never returned from the API. If the chain of thought process is what improves the results, then distillation without the chain of thought to train on should not produce comparable results.