The author said the original liquid paper specifies random starting weights. I think what would happen is you get a bit of a random personality each time you redo the randomization, and then it will self-referentially update over time. I mean you have to start somewhere. You could start with all 1s, I guess, if you’re going to norm.
Update: Even if this is a good idea, and I’m not sure it is, it probably makes sense to have a pretty fast early move away from the random weights, and then slow down.
Update: Even if this is a good idea, and I’m not sure it is, it probably makes sense to have a pretty fast early move away from the random weights, and then slow down.