The notation in the code will be very familiar to anyone comfortable with the underlying research and math. The "conceptual" documentation is in the literature.
What you're asking for is the rough equivalent of asking a C programmer to name their loop variables "index" instead of "i." Everyone familiar with the concepts of c programming knows what "i" means in the context of a for loop. Similarly, everyone familiar with transformers knows what "gelu" and "attn" mean.
This isn’t a good comparison. “i” is used domain independently across an entire language, not in some other domain. In fact, it’s used across the entirety of computer science (and originated in maths), so it’s across an entire discipline and even inter-disciplinary.
They should use proper variable names if they want to have the code understood by anyone non-specialist, and by people who use different terminology, and people looking back at the code in the future when terminology may have changed.
I don’t know about this domain, but the single-letter-variable name etc AKA “match the equation” is a curse when non-CS engineers/scientists write code. It often breaks code conventions, leaving IDEs to light up like a Christmas tree when opening the source. There’s a good reason CS moved from register letters to something closer to natural language.
So then why don't you believe me when I tell you that all of these variable names are extremely standard, and will be familiar to anyone who has written deep learning code before?
I feel like you both have good points. Yes, a lot of the variables are very ML specific and often called that way. However, I feel like that encourages the same researchers (who are obviously not software engineers) to give the rest of their variables sub-par names as well. Why would you give any variable a name longer than a word even, if so many you regularly encounter are just `w`, `u`, `x`, `hparam` ... and so on.
I'm a software engineer with a background in ML, so even though I somewhat know the domain language I still get mad at the blatant disrespect for PEP-8. That being said, this one is definitely one of the better codebases I have come across. This feels like it could be fairly easily worked with and understood. I have seen far, far worse code to go along research papers.
What you're asking for is the rough equivalent of asking a C programmer to name their loop variables "index" instead of "i." Everyone familiar with the concepts of c programming knows what "i" means in the context of a for loop. Similarly, everyone familiar with transformers knows what "gelu" and "attn" mean.