To people saying that writing a compiler is almost always an overkill - that's because you and author mean different things by the word "compiler". Or, in other words, that's because your intuition is based on the inflexible languages you're working with.
It's not an accident that the talk/article is about Clojure[0]. A Lisp. In Lisp, "writing a compiler" is a fancier way of saying "writing a possibly code-walking macro", which is still just a fancy way of saying "writing a function that transforms a tree into another tree, and having it run compile-time on parsed program source as the input". 99% of the usual compiler work - tokenizing, building an AST, generating machine/bytecode, optimizing - is handled for you by the host Lisp compiler. All the mechanics are there, and all you have to do is to add the brains you need.
--
[0] - Clojure itself actually makes this slightly more difficult than it looks, because of the decisions it made that makes the syntax slightly less obviously homoiconic.
There is a joke in Lisp / Lisp dialects that they don't write their code in Lisp, they write the language they want to write their code in in Lisp.
Then solve the problem in that language. It's often not true, but writing DSLs in a Lisp is technically much simpler than other languages I reckon, if your problem domain actually requires such a thing.
Lisp is not alone in that regard, to be honest I suspect it is even more true in FORTH. In FORTH you define words, which you then write your program in. These words very quickly become problem/domain-specific.
Speaking as someone who has written maybe 100 lines of code in both languages, thus someone who should not be taken too seriously, there seem to be a lot of similarities between Lisp and Forth.
Speaking as someone who has written maybe 10000 lines of code in both languages, and may be worth taking a little more seriously, the similarities are there but the macros in Forth are just somehow off.
There is extremely little written online about Forth macros. The best book I've found on the subject is Let Over Lambda, whose latter chapters talk about Forth macros and how it's possibly the best macro system second only to Lisp. And yet I don't know of anyone, outside possibly Chuck Moore in his "Glow" system for CAD for circuits, that can exploit layer upon layer of Forth macros.
The beauty of Lisp macros, really, when you really dive to the bottom of it, is not what it appears to be. It's not about being able to turn one thing into another, no matter how much flexibility it allows in doing so. Any language can do that. Javascript for one, just use strings and eval and you can create interesting macros. It's not about a domain-specific language: every fucking language lets you define words in it with which you'll do the work, and often a syntax too derived from it (operator overloading). No. And it's not about brevity: because of S-expressions, Lisp is really not that brief compared to other languages (like C) (when you are doing everything right).
The beauty of Lisp macros is simply and only that you can do all these things over and over again, layer upon layer, "ten or twenty layers of macros deep" (to paraphrase Paul Graham), with each layer making the next layer no more difficult to work with. Nothing else does that. Not even Forth.
And a note on homoiconicity: it appears standard Forth is homoiconic...maybe. Mostly. Not transparently so like Lisp (see above), where no matter how much you exploit homoiconicity your code doesn't degrade in structure, preventing you from doing so n+1 times [1]. No, in Forth, you get mostly homoiconicity, but there's caveats. However, F18 code, featured in the GA144 chip, is basically homoiconic both because it's forth with pointers and because it's assembly, and it does allow a lot of macro-like behavior, including an 18-bit self-replicating virus. It's nice. But it's still not Lisp.
+1 to this. I did this literally last week for a toy project and it was hilarious to be reminded of how easy it was.
I know I'm just rehashing the above comment, but: if your target language is complicated, and your host language is complicated, sure, it can be a pain. If they're both lisps, you can almost do it in your sleep.
Writing a compiler is still almost always overkill, because implementing the compiler is one of the smallest parts of creating an effective programming language.
Let's say you're starting a project, and you're choosing a language to write it in. You have two options:
Language A:
1. Has multiple mature compiler/interpreter implementations.
2. Has lots of tooling built around it.
3. Has extensive documentation.
4. Has a diverse community to hire programmers from.
5. Fits the problem domain okay.
Language B:
1. Has one buggy implementation that returns mangled stack traces on errors. See Kernighan's Lever[1].
2. Has no tooling.
3. The documentation is the compiler. So... no documentation.
4. The only programmer that knows it is the one who wrote the compiler.
5. Fits the problem domain as well as the programmer understood the problem domain when he started writing the compiler.
I know which language Dan McKinley[2] would choose. :)
You might say, "But my language uses the same homoiconic syntax of Lisp, so the tooling of Lisp carries over, and it doesn't require more documentation than just normal functions, and Lisp programmers can pick it up easily." To which I would respond, "Sounds like you took the long way to implementing a library." I'd level this criticism against a lot of Racket languages, which basically are just a shorthand for a bunch of require statements. I'd rather copy/paste those same require statements.
The fact that Lisps make creating a compiler so easy is actually a downside, because it leads people to write compilers without thinking through the full secondary effects of doing so. This is one of many heads of the Hydra that is The Lisp Curse[3].
You're still thinking of languages as these huge things that can be offered as products. It's not the way a Lisper thinks about it.
> To which I would respond, "Sounds like you took the long way to implementing a library."
And I'd respond, that library is the language. There's a spectrum of complexity of what you can call "a language". General-purpose programming languages like you seem to be considering are one end of that spectrum. The other end of that spectrum are the abstraction layers and APIs you're coding to. Take e.g. OpenGL API - you can view it as a set of functions, but once you consider the rules about calling them, you may as well call that a language.
So when you're designing a "DSL" in a Lisp, it's literally no different than designing any other module. You have to name things, you have to figure out how they should be used. Lisps just doesn't force you to shoehorn your API into built-in language syntax. It doesn't force you to write "open-file" and "close-file" because your language only has functions and no built-in "with" construct; it lets you add the "with" constructs with four lines of code, making the API cleaner.
Most DSLs sit somewhere between "API for a stack" and "API for OpenGL". They're just code - code that in some cases happens to run at compile time and operate on other code. But on the user level, on the API level, it's no different at all from using any other library. A documented DSL is no harder to use than a documented function library; a function library lacking documentation is no easier to use than similarly undocumented macro-based DSL.
Some people seem to have an irrational fear of "less mainstream" things, even though they're not fundamentally more difficult than the mainstream things. I've been working with Lisp professionally for a while now, and I've seen plenty of really complex and hairy DSLs and had to debug some when they broke. Fixing a broken macro or its use is not fundamentally different from debugging Java code of similar complexity, but when the DSL works, it can vastly improve readability of the code that uses it.
1. Fixing broken generated code is fundamentally harder than debugging non-generated code, because the code generator doesn't appear in the call stack, and because a line of generated code doesn't necessarily even have any specific line number in the input program.
2. "Tooling" includes type systems that can detect and prevent bugs at compile time, and IDEs that can highlight the bug and often automatically fix the bug with a shortcut action.
You can write your own type system in Lisp, but testing a type system is uniquely harder than testing ordinary functions. Type systems prove facts about a program, and type-system bugs have to do with cases where the language can't prove the safety of something that's safe, or where the system wrongly proves that code is incorrect when it's actually correct, or vice versa.
Finding important truths that a proof system can't prove (or finding falsehoods that a proof system can generate) is much harder than ensuring that a well-scoped function generates valid outputs for its inputs.
3. Documenting languages (especially type systems, especially type system errors) is also much harder than documenting function parameters.
In a non-Lisp language, developer-users don't have to understand how the language (and especially the type system) is implemented, because the language devs keep the language small and well tested. (Yes, the Java compiler does have bugs sometimes, but odds are, any given bug is in your code, not the compiler.)
Keeping the language small allows a community to form around documenting. If you write your own DSL, there's no language community to support you.
And, sure, having a big community of people supporting each other and answering questions always a huge help, even just for libraries, so there is some pressure to stick with mainstream languages. But languages have come and go from the mainstream while Lisp has remained sidelined.
Lisp makes it easy to generate code, and that's the "Lisp curse," because writing your own mini language is usually a bad idea.
CLOS started as a portable library for Common Lisp. It was implemented as an embedded language extension and provides on the user level a set of macros: DEFCLASS, DEFGENERIC, DEFMETHOD, etc...
There is no reason such an embedded DSL can't have documentation, find widespread use etc. In the Lisp world, there are a bunch of examples like these - where DSLs are written for larger groups of uses and not just the original implementors.
One has to understand though, that not all attempts have the same quality and adoption. But that's fine in the Lisp world: one can do language experiments - if tools are designed for larger groups, one can just put more effort into them to improve the quality for non-implementors.
> CLOS started as a portable library for Common Lisp. It was implemented as an embedded language extension and provides on the user level a set of macros: DEFCLASS, DEFGENERIC, DEFMETHOD, etc...
Right, that shows that if a highly skilled group of people with community support put in a lot of effort, they can do something that is extremely difficult to do.
The Scheme community has not had such success. All three of the Scheme object systems I've experimented with suffer from all four of the problems I mentioned. And at least one of them was recent enough that they had the advantage of having CLOS to guide the way on how to do it right.
I'm not saying that DSLs are impossible to create successfully. I'm saying that many Lispers drastically underestimate the difficulty of doing so in proportion to the value.
Keep in mind that you have people in this thread claiming that "DSLs are trivial to implement in Lisp." And I'm saying, DSLs are trivial to implement counterproductively in Lisp, but implementing a high quality DSL is still very hard.
> Right, that shows that if a highly skilled group of people with community support put in a lot of effort, they can do something that is extremely difficult to do.
I don't think it's 'extremely' difficult to do. Object Systems in Lisp were for some time experimented in dozens in various qualities.
CLOS OTOH is on the 'extremely difficult' side, since it has a lot of features and even may provide its own meta-object protocol. But even then it is possible to leverage a lot of Lisp features (like a well-documented code generation/transformation system) and thus reduce some implementation complexity.
When CLOS was designed, several well-documented OOP extensions already existed (and were used): Flavors, New Flavors, LOOPS, Common LOOPS, Object Lisp, CommonObjects, a bunch of frame languages like FRL, KEE, ...
> DSLs are trivial to implement in Lisp
Some are, some are not. There is a wide range of approaches. This starts relatively simple for some 'embedded DSLs' and gets more difficult for non-embedded DSLs.
There is a decades long practice developing languages in Lisp (since the 1960s), especially embedded DSLs and thus there should be a wide range of them, including a lot of well documented.
Many Lispers know that this CAN be a lot of work, given that many complex DSLs have been developed, many which have seen more than one person year (or even dozens) to develop and some which have been maintained over a decade or more. In many projects one has also seen the limits of Lisp (speed, debugging, delivery, etc.) for this.
> There is no reason such an embedded DSL can't have documentation, find widespread use etc.
Yes, actually, there is. It turns out that type systems are actually hard. They're hard to understand even when the documentation is as good as it can possibly be. (Look at all the confusion around Rust's borrow checker. Rust's borrow-checker documentation is world class; this is just a challenging area to understand.)
Have you ever had that feeling where someone explained a mathematical proof to you and you just couldn't understand it, even though it was presented as clearly as it possibly could? You had to stop and think about it, play with it, and then you think you get it, but then you realize you didn't get it, and then, finally, you get it, and you can't even really understand how you misunderstood it in the first place.
Type systems are literally proof tools. Designing your own type system and documenting it is harder than solving your actual problem.
This yak hair is made of steel fiber. Don't shave it.
> Have you ever had that feeling where someone explained a mathematical proof to you and you just couldn't understand it, even though it was presented as clearly as it possibly could? You had to stop and think about it, play with it, and then you think you get it, but then you realize you didn't get it, and then, finally, you get it, and you can't even really understand how you misunderstood it in the first place.
I thought this is what programming of anything other than trivial, repetitive web CRUD looks like?
I mean, seriously, you have to stop and think sometimes. I'd say, fairly often. And you and 'kerkeslager keep bringing up type systems for some reason, as if this is something one would reasonably want to write in a project that's not strictly a type research project. CL already has a type system that's adequate for most tasks, it's easy to add and document new types. It's not Haskell, but then again Lisps aren't what you want to pick up if you need a proper type system.
Designing your own DSL is not about designing a type system, except for the most trivial (or convoluted) meanings of "designing".
> You're still thinking of languages as these huge things that can be offered as products.
No, I'm really really not.
You're assuming my criticisms are because I don't understand how DSLs work in Lisp, but I would request that you not make that assumption.
> And I'd respond, that library is the language. There's a spectrum of complexity of what you can call "a language". General-purpose programming languages like you seem to be considering are one end of that spectrum. The other end of that spectrum are the abstraction layers and APIs you're coding to. Take e.g. OpenGL API - you can view it as a set of functions, but once you consider the rules about calling them, you may as well call that a language.
You might as well not call it a language, though. I understand what you're saying about the spectrum, and I do agree that a library of functions is a DSL, but a library of functions is a special case of a DSL that doesn't have any of the usual downsides of a DSL. You can't really use a library of functions to justify the complexity of other DSLs.
A library of macros isn't a library of functions, and a library of macros has all the downsides I talked about. And if you're talking about DSLs as being part of what makes Lisp special, then you're not talking about libraries of functions, because almost any language can do that.
Let's not get lost in a semantic argument here: when we're talking about DSLs, libraries of functions aren't the central example of a DSL that we're talking about.
> So when you're designing a "DSL" in a Lisp, it's literally no different than designing any other module. You have to name things, you have to figure out how they should be used. Lisps just doesn't force you to shoehorn your API into built-in language syntax. It doesn't force you to write "open-file" and "close-file" because your language only has functions and no built-in "with" construct; it lets you add the "with" constructs with four lines of code, making the API cleaner.
Not being forced to shoehorn your API into the built-in language syntax is exactly the problem I'm talking about.
Either you shoehorn your API into the built-in language syntax, or your users have to learn a new syntax, which mangles your stack traces through the macro expansions, breaks your tooling, isn't documented, and is only understood by you.
Incidentally, Lisp (Common Lisp) definitely implements a with-like syntax. It's been a while since I used it, but I remember this clearly.
> A documented DSL is no harder to use than a documented function library; a function library lacking documentation is no easier to use than similarly undocumented macro-based DSL.
That is very much your opinion, and not one that is borne out by my experience.
> Some people seem to have an irrational fear of "less mainstream" things, even though they're not fundamentally more difficult than the mainstream things.
Again, please don't make assumptions about me. I'm not avoiding macros because they're less mainstream, I'm avoiding them because I've experienced a lot of pain from them.
> Fixing a broken macro or its use is not fundamentally different from debugging Java code of similar complexity, but when the DSL works, it can vastly improve readability of the code that uses it.
It's true that fixing a broken macro is not fundamentally different from debugging Java code of similar complexity. You can write bad code in any language. But if we're trying to write good code, limiting the complexity is a huge priority, and you can't deny that introducing macros introduces complexity. Comparing to Java misses the point: we can look at Lisp with macro-based DSLs versus Lisp without macro-based DSLs, and see the benefits there.
Just to be clear: my criticisms here aren't of Lisp as a whole. I've been working a lot in Racket lately and there are a lot of things I like about it--it's batteries-included in the way Python used to be. I really hope Racket takes off more.
This applies equally to libraries or DSL. The developer can document them clearly and extensively, or not at all.
"Incidentally, Lisp (Common Lisp) definitely implements a with-like syntax."
As a macro, most likely?
"and you can't deny that introducing macros introduces complexity"
It can also eliminate a massive amount of incidental complexity, by eliminating lots of repetitive, boiler plate, copy and pasted code, where bugs will inevitably pop up as an error is fixed in one place but not another, or a subtle typo, a programmer forgets to clean up a resource...etc. etc.
"I really hope Racket takes off more."
Does Racket use macros less aggressively or egregiously than other Lisps? (Serious question, I honestly don't know.)
I like to think of it as concentrating complexity. This allows a programmer to focus intensely on a small area of the code with the benefit of easing the development process of a much greater portion of the code base.
> This applies equally to libraries or DSL. The developer can document them clearly and extensively, or not at all.
This is true, but functions usually don't require as much documentation. With a function, I usually know when the arguments will be evaluated and how, for example: this isn't an assumption I can make with a macro. If you don't change syntax you don't need to document changes in syntax.
> "Incidentally, Lisp (Common Lisp) definitely implements a with-like syntax."
> As a macro, most likely?
Possibly, but a mature Common Lisp implementation has likely taken care to preserve stack traces and make sane decisions about how to expand, is probably thoroughly documented, etc. A macro implemented by your average programmers has little in common with this, and if you're doing all this work, the idea that DSLs are trivial to implement in Lisp begins to erode.
> It can also eliminate a massive amount of incidental complexity, by eliminating lots of repetitive, boiler plate, copy and pasted code, where bugs will inevitably pop up as an error is fixed in one place but not another, or a subtle typo, a programmer forgets to clean up a resource...etc. etc.
That's all true, and I'm not completely against their use. DSLs can be an effective tool. But I think that tool is a double-edged sword which needs to be used much more judiciously than the glib, "Do you have a problem? Write a compiler!".
> Does Racket use macros less aggressively or egregiously than other Lisps?
I don't think so. It's really the programmers, not the language, that uses macros, but I would say that the culture around Racket has a big focus on creating DSLs, so if anything Racket programmers use macros more aggressively and more egregiously. :)
But there are other upsides. Racket is the only language nowadays with a decent, truly cross-platform desktop UI library that ships with the language that I know of. Python and Java were once contenders, but support for Java has dropped quite a bit, and Tk isn't even used by many Python programmers any more (Qt is just so much better--but it doesn't ship with the language).
> 3. The documentation is the compiler. So... no documentation.
I don't understand this one. Shouldn't small compilers for small languages be excellent documentation? I can't really ground the point or my response, but it seems that clever tooling should be able to analyze the compiler to create good documentation. Information could include type annotation, a timeline of the creation of features of the DSL or any other number of things that could be kept during the creation of the compiler.
> Sounds like you took the long way to implementing a library
DSLs aren't Libraries. When would you want one over the other?
The Racket languages I've seen have impressed me with their simplicity and focused scope, I would never think the same of C with includes or js with requires. The point is to be writing a file which is perfectly scoped to the issue, right? I imagine it would suck to "write" html if it were a library in C, rather than a file format with all the benefits that comes from being its own thing.
> Sounds like you took the long way to implementing a library.
You are right that DSLs should be about as natural as libraries. Linq in C#, regex, and SQL are good examples of DSLs you probably rely on regularly.
I suggest looking at the symbolic differentiator and circuit simulator in SICP. It is composed of just a few functions and is far cleaner than a library of functions on a custom data structure.
> Linq in C#, regex, and SQL are good examples of DSLs you probably rely on regularly.
Right, but with the exception of Linq, those all have addressed all the problems I brought up:
1. Has multiple mature compiler/interpreter implementations.
2. Has lots of tooling built around it.
3. Has extensive documentation.
4. Has a diverse community to hire programmers from.
They were able to address these issues because the DSLs solved very common problems that a lot of people had, and even with that incentive, it took a lot of effort from a lot of people to get there. A new DSL you implement to solve your specific issue is not going to trivially address these issues.
And Linq is an "exception that proves the rule". Linq has not taken off quite as well as the initial hype would have indicated. I'm not sure where it is now, because I haven't worked on .NET in over 4 years, but I remember times when it was unclear whether Linq would be supported in future Microsoft products. It's very difficult even for an organization with Microsoft's clout to create a DSL which is successful.
> I suggest looking at the symbolic differentiator and circuit simulator in SICP. It is composed of just a few functions and is far cleaner than a library of functions on a custom data structure.
I suggest looking at the 4 problems mentioned in my posts, and noting that all of them still apply to the SICP example(s) of DSLs.
But we are comparing a small library of functions with a DSL of similar complexity.
> You might say: But my language uses the same homoiconic syntax of Lisp, so the tooling of Lisp carries over, and it doesn't require more documentation than just normal functions, and Lisp programmers can pick it up easily.
Doesn't that address them?
Why are those 4 issues more applicable to a Lisp DSL than a library?
To learn a Lisp DSL, I read the docs for the relevant ideas, and look at a few examples. What about that process is different than a library, except that it tends to be natural and flexible for certain problems? The SICP examples would be a major pain to implement as a typical library.
Well, I don't typically use languages like that or trees per se, but usually when I'm trying to do something I don't know how to do, I define a data structure that expresses what I want to do as naturally and succinctly as possible, and then try to write code to transform it into the actions I want.
And then I get bogged down in edge cases, and one day I realized, hey, I really should, as a matter of course, use tricky coding as feedback that tells me I should tweak my data structure(s) to make them simpler to parse, and eliminate bugs in advance by reducing my cognitive load.
Oleg has worked in other non-Lisp languages, notably members of the ML family. I know that you want to be all pro-Lisp and such, but it's just another family of languages, really; there's nothing magical or special about them. All compilers use an anamorphic parser, a hylomorphic reducer, a paramorphic optimizer, and a katamorphic instruction selector.
It's not about what the language provides, but about what the language doesn't require. Lisps can be low-ceremony, but they're not the only ones. Trees might be universal data structures, but they're not the only ones.
I'm not saying Lisps are unique in this regard, but I am saying that complaints that I see in today's HN discussion most likely stem from lack of exposure to that kind of programming. In the mainstream world, "language" and "compiler" are products. In Lisp world, they're no different than functions and classes.
Out of language families providing this level of capability, Lisps are probably closest to mainstream, and that's because of Clojure.
In the mainstream world, we study a single category of types and logic at a time. There is an ∞-category Tomb whose objects are languages and arrows are compilers; we don't talk about it much. Lisp is certainly not somehow an arena where we transcend computing concepts; it's a lightweight syntax for untyped lambda calculus, at best. A single topos, not the entire multiverse of topoi.
I could restate your entire argument for ANS FORTH and I'm not sure whether it would lose anything. That should clue you into the idea that your argument is really about the age of the language family, and the corresponding maturity of philosophy and reasoning around members of the family.
The only thing that Lisp makes easy is the parser, by dint of reusing the host Lisp's s-expression reader. The most trivial and surface-level part of a compiler, really.
> It's not an accident that the talk/article is about Clojure
On the other hand, I'm surprised that the optimizations in question don't happen automatically for Clojure programs. Surely the underlying JVM is capable of constant propagation and folding? Maybe it's the let-from-let optimization that actually gets the 10% speedup.
Oh, you do say it in the article: "However, for the maximum reach, we want our game to run in the web browsers." I missed that, thanks.
But even so, JavaScript compilers aren't stupid. No need to do this for me of course, but if you got around to measuring the performance impact of the two optimizations separately, I'd be interested in that. (And the ClojureScript developers might be interested too.)
It's not an accident that the talk/article is about Clojure[0]. A Lisp. In Lisp, "writing a compiler" is a fancier way of saying "writing a possibly code-walking macro", which is still just a fancy way of saying "writing a function that transforms a tree into another tree, and having it run compile-time on parsed program source as the input". 99% of the usual compiler work - tokenizing, building an AST, generating machine/bytecode, optimizing - is handled for you by the host Lisp compiler. All the mechanics are there, and all you have to do is to add the brains you need.
--
[0] - Clojure itself actually makes this slightly more difficult than it looks, because of the decisions it made that makes the syntax slightly less obviously homoiconic.