These exotic (non linear algebra based) embedding representations are often slow...

nico · on April 25, 2024

Super interesting

Do you think it makes sense to have a group of models, each with more ad-hoc embeddings, and coordinate them to respond according to the domain of the input?

Do multi-modal models use the same embedding type/structure for an image, sound, text?

VHRanger · on April 27, 2024

I think your first question is open for people to explore.

The answer to the second is yes - it's all vector embeddings, and they're aligned to each other by finding a dataset that matches pairs (eg. images with captions)

The real use for exotic embeddings will have to be in analyzing the embeddings themselves I think, otherwise it's easier to shove normal vectors downstream into other models.