The vectors don't need to be orthogonal due to the use of non-linearities in neu...

		WithinReason 87 days ago \| parent \| context \| favorite \| on: Language models pack billions of concepts into 12k... The vectors don't need to be orthogonal due to the use of non-linearities in neural networks. The softmax in attention let's you effectively pack as many vectors in 1D as you want and unambiguously pick them out.