Given the API is similar to existing libraries, I’m curious as to whether the pe...

Given the API is similar to existing libraries, I’m curious as to whether the performance is better with this one. And if so, what’s stopping existing libraries from being as fast. IIRC, PyTorch at least has a Metal backend.

The README mentions unified memory, but what stops other frameworks from modeling copies as no-ops? I wonder if MLX makes larger architectural decisions based on GPU CPU communication being cheap.