Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The observation itself is also partially incorrect. This is a video I watched a few months ago that went further into the whole how do you deal with subnetworks thing.

https://youtu.be/WW1ksk-O5c0?list=PLCq6a7gpFdPgldPSBWqd2THZh... (timestamped)

At the timestamp they discuss how actually the original ICLR results only worked on these extremely tiny models and larger ones didn't work. The adaptation you need to sort of fix it is to train densely first for a few epochs, only then you can start increasing sparsity.



Watched the video - thanks

Ioannu is saying the paper's idea for training a dense network doesn't work in non-toy networks (the paper's method for selecting promising weights early doesn't improve the network)

BUT the term "lottery ticket" refers to the true observation that a small subset of weights drive functionality (see all pruning papers). It's great terminology because they truly are coincidences based on random numbers.

All that's been disproven is that paper's specific method to create a dense network based on this observation




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: