I think they are mainly -dev and -schnell. Both models are 12B. -pro is the most powerful and raw, -dev is guidance distilled version of it and -schnell is step distilled version (where you can get pretty good results with 2-8 steps).
something about pro must be better than dev or it wouldn't be made API-only, but what exactly, how does guidance distilling affect pro it and what quality remains in dev?