Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author here - ya "binary vectors" means quantizing to one bit per dimension. Normally it would be 4 * dimensions bytes of space per vector (where 4=sizeof(float)). Some embedding models, like nomic v1.5[0] and mixedbread's new model[1] are specifically trained to retain quality after binary quantization. Not all models do tho, so results may vary. I think in general for really large vectors, like OpenAI's large embeddings model with 3072 dimensions, it kindof works, even if they didn't specifically train for it.

[0] https://twitter.com/nomic_ai/status/1769837800793243687

[1] https://www.mixedbread.ai/blog/binary-mrl



Thank you! As you keep posting your progress, and I hope you do, adding these references would probably help warding off crusty fuddy-duddys like me (or at least give them more to research either way) ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: