It means that 97% of the outputs from the layer are zero; only 3% are active _at a time_. You can’t get rid of the other 97% entirely because the active 3% isn’t static. I think the paper is saying that they can reasonably accurately predict the active 3% at least well enough to make it run faster without losing too much accuracy.