Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is a really recent development. Even if this architecture is technically superior, it could take time before a model using it becomes competitive.

Or maybe it does not pan out at all. We are still at the stage where people are throwing everything at the wall to see what sticks. Some promising ideas which work at small scale do not work at bigger.



This. Hyperparameter tuning and training include a lot of model specific black magic. Transformers have had time to mature, it'll take a while for other stuff to catch up even if they have a higher potential ceiling.


Definitely agree that a lot of work going into hyperparameter tuning and maturing the ecosystem will be key here!

I'm seeing the Mamba paper as the `Attention Is All You Need` of Mamba - it might take a little while before we get everything optimised to the point of a GPT-4 (it took 6 years for transformers but should be faster than that now with all the attention on ML)


Another interesting one is that the hardware isn't really optimised for Mamba yet either - ideally we'd want more of the fast SRAM so that we can store more larger hidden states efficiently




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: