Learn C and build your own Lisp

tomp · on April 4, 2014

In the last chapter, Bonus Projects, the author mentions Static Typing & type systems. If anyone's interested in learning more:

I've implemented a few simple implementations of basic (and not so basic) type systems[1]. Currently, only 3 type systems are finished (Hindley Milner's Algorithm W, and Daan Leijen's extensible rows and first-class polymorphism), while gradual typing is almost done (based on [2], in branch 'dev').

[1] https://github.com/tomprimozic/type-systems

[2] Jeremy G. Siek, Manish Vachharajani - Gradual Typing with Unication-based Inference - http://ecee.colorado.edu/~siek/dls08igtlc.pdf

jnbiche · on April 4, 2014

Oh wow! This is extremely useful, thank you. I've been wanting to develop my own statically-typed language, but almost all tutorials on the web are for dynamic languages with little mention given to type systems. I've already started a toy dynamically-typed language, but the step up from that to a static language seems significant, particularly given the dearth of information I can find on the web (aside from dense academic papers).

I know the dragon book probably covers this, but I can't justify spending 130 USD on a book at the moment.

And I'm working in OCaml, so this is perfect.

Please consider putting out a donation link so people can support your work (Bitcoin would be most convenient for me, but I'll definitely try to donate regardless).

Also, some kind of open source license would be great (MIT or BSD, maybe?).

Nav_Panel · on April 4, 2014

Here's a different book you might be interested in ("Practical Foundations for Programming Languages":

http://www.cs.cmu.edu/~rwh/plbook/book.pdf

It gets pretty math-heavy at times, at least in the beginning (you can probably skip the first chapter if it's too rough), but ultimately the book is about programming language design and different ways of designing and evaluating typed (or un(i)typed) languages. It also looks at a large number of important languages including PCF, Typed/Untyped Lambda Calculus, System F, Gödel's T and ALGOL (the last of which is interesting because it uses what we now call the IO Monad).

I'm taking the class (15-312) right now, and as homework we typically implement an interpreter for some language using SML (OCaml would likely work fine for the general use case), and then prove a few things about the language. Our languages rely on our Abstract Binding Tree (ABT) infrastructure that we developed early on in the course. You can find the course page at http://www.cs.cmu.edu/~rjsimmon/15312-s14/index.html if you want to check out what we're working on.

chriswarbo · on April 4, 2014

PFfPL is a great book. It manages to present many seemingly different language features in a surprisingly unified way.

It's very opinionated at times (usually against anything "dynamic", eg. typing and binding ;) ) but I think it's worth it for the clarity that it allows. Of course, Bob Harper has the clout to back up those opinions!

ExceptionRaised · on April 4, 2014

I try to take a break from proving progress and preservation by reading hackernews and of course I find someone talking about 15-312 :/ Either way, I would agree with you in recommending the text.

oggy · on April 4, 2014

I've been wanting to read that for a while. Another candidate is Pierce's "Types and Programming Languages". Has anybody by any chance read both an can compare them?

georgewfraser · on April 5, 2014

I've read both and Pierce is much better. I have nothing but respect for Robert Harper, a great thinker in this space, but Pierce is a better writer.

tel · on April 4, 2014

I'd be interested in this as well. I'm about halfway through TaPL and have often considered reading Harper's book instead.

notthemessiah · on April 5, 2014

Pierce will get you working in Coq ASAP, but Harper will provide you with a much richer context.

me2i81 · on April 5, 2014

TAPL isn't Coq-focused. Pierce and his collaborators have another book available online, Software Foundations, which is written as a Coq literate program: http://www.cis.upenn.edu/~bcpierce/sf/index.html

tomp · on April 4, 2014

Well, the purpose of this project is purely to experiment with type systems, so this is far from a programming language (e.g. in a real compiler, you would want to save file/line information for AST nodes, so that the type-checker can report an error, not simply abort. Also, a compiler would not just type-infer/check the AST, but would annotate the AST with inferred types, so that they can be used in later stages (optimization, ...)). Shoot me an email if you need help with something. Also, check out [1].

> Please consider putting out a donation link so people can support your work (Bitcoin would be most convenient for me, but I'll definitely try to donate regardless).

I'd prefer comments/bugs/interesting discussion.

> Also, some kind of open source license would be great (MIT or BSD, maybe?).

I'll do that ASAP, in the meantime, it's public domain/CC-whatever, with the exception of file first_class_polymorphism/propagate.ml and propagation of types tests in first_class_polymorphism/test_infer.ml and first_class_polymorphism/test_infer.ml, which are heavily inspired by Daan Leijen's reference implementation.

[1] https://github.com/typeclassy/plzoo

jnbiche · on April 4, 2014

Oh, I know this project wasn't to create an entire compiler. But it's a great explanation of a missing puzzle piece in my own project.

Hope I didn't offend by suggesting donations -- wasn't my intent.

Anyway, thanks again for putting this out there. It will be a big help.

Edit: Wow, plzoo is another great resource, thank you. I found a reference to the plzoo project and website at one point but the site itself no longer seems operational ("no one here but us chickens", it says) and I didn't think the source code would be available anywhere. Very happy to see it's up on github.

jbclements · on April 5, 2014

Looks like no one's mentioned Benjamin Pierce's authoritative book on this, "Types and Programming Languages" (TAPL). I would certainly put it way, way above the Dragon Book. Bob Harper can certainly defend his opinions as well as anyone, but I think Pierce is currently the standard text here.

Also, if you're looking to add a static type system to a Lisp-like language, you should take a look at Shriram Krishnamurthi's Programming Languages and Interpretation, second edition, at http://cs.brown.edu/courses/cs173/2012/book/ . It's a free online book, and I'm currently using it as the text in my programming languages class.

m_mueller · on April 4, 2014

Something that might be useful for you: I've heard a talk where the speaker described how F# allows defining one's own type system as an input to the compiler. The language is apparently very extendible this way. It's also open source but I can't find information about its license just now. If I were to build a custom type system that's probably where I'd start to look at, even though I'm not a .NET developer.

jnbiche · on April 4, 2014

Hmm, maybe he was talking about type providers? These let you "customize" F# (kind of like macros) but not really implement a new language.

But I'd definitely be interested in that talk if you can remember the name of the speaker or the venue/conference so I can try to Google it.

m_mueller · on April 4, 2014

It was "S4174 - How to Design a Language Integrated CUDA Compiler with LLVM" at the GTC 2014. Videos should be up in a month or so.

gmfawcett · on April 4, 2014

You might also enjoy Matías Giovannini's write-up of a typed concatenative language in Ocaml:

http://alaska-kamtchatka.blogspot.ca/2009/01/essence-of-conc...

mcmillhj · on April 4, 2014

You can get a used copy of the Dragon book for relatively cheap. The PDF is probably floating around on the internet somewhere as well.

jnbiche · on April 4, 2014

Yeah, I'm always cautious about pirated pdf's floating around the net, since they frequently carry malware. Probably not an issue for Linux systems, but I prefer not to take a chance.

I'd definitely pick up a used copy of the Dragon book if I could find one for a reasonable price. Unfortunately, this book is used frequently in college CS courses, so the demand for used copies stays high, and so does the price.

rayiner · on April 4, 2014

Holy shit. I've been looking for something like this! I had a difficult time understanding Siek's gradual typing papers without any code.

tomp · on April 4, 2014

Yeah, me too... it's not an easy paper, the algorithm is quite convoluted, and it took me a while to understand... I'll add some explanation of the basic idea and my version of the algorithm.

thinkpad20 · on April 4, 2014

Thanks for that paper, I'm currently designing a language and I'm wondering if I should use static or gradual typing. I'm currently doing static, but I want to investigate gradual. Anyway thanks! And I'll check out your code too...

tomp · on April 4, 2014

I hope you find it useful. Also, I don't know what kind of language you're designing, but if it's an object oriented language, you might be more interested in some other papers on gradual typing, for example:

[1] J Siek, W Taha - Gradual Typing for Objects - http://www.cs.colorado.edu/~siek/gradual-obj.pdf [2] T Wrigstad, FZ Nardelli, S Lebresne, J Östlund, J Vitek - Integrating Typed and Untyped Code in a Scripting Language - https://www.cs.purdue.edu/homes/jv/pubs/popl10.pdf [3] A Rastogi, A Chaudhuri, B Hosmer - The Ins and Outs of Gradual Type Inference - http://www.cs.umd.edu/~avik/papers/iogti.pdf

sanxiyn · on April 4, 2014

This is a good stuff. 4 years ago, I wrote a Lisp in 500 lines of C in one day after finishing SICP. Typing it out did made what I learned more concrete.

https://gist.github.com/sanxiyn/523967

visarga · on April 5, 2014

It might be easier to understand your program than to learn the language directly.

pera · on April 4, 2014

wow one day? did you reach satori? :)

unwind · on April 4, 2014

I always find it so hard to help out proof-reading code for causes like this.

Often, the code is more complicated than I find reasonable, while omitting things that make a lot of sense in "real" code, and it's very hard to know as an outside reader what the exact motivation for each decision was, by the author.

A few such things that caused me to WTF:

The initial few examples use a pointlessly static and global line buffer, instead of declaring the line buffer where it's being used.

There is hardly any const in the code, even for cases where it obviously should (to me) be used, i.e. for variables that are never written once given their initial value.

A magic number (the buffer size 2048) is repeated in the code, and even encoded into a comment, instead of just using "sizeof buffer".

I do think I found an actual bug (on http://www.buildyourownlisp.com/chapter4_interactive_prompt)... the readline() implementation ignores its prompt argument and uses a hardcoded string, instead. The same function also does strcpy() followed by using strlen() to truncate the string it just copied; that really doesn't sit well with me.

DSMan195276 · on April 4, 2014

I would agree, like all code this author definitely has a few quirks about how they write their C code, and I think all of your criticisms are valid. Their readline implementation is definitely sloppy, but they also just throw it in there without really talking about it. Adding to the readline issue, they definitely should have const'd the strings for readline and add_history. It's not just good practice, in this case it'd be required to conform to how readline's declaration is. I'm surprised they don't get warnings about passing a const char * into a char . Also, talking about the strcpy() and strlen(), the catch there is that it's actually removing the '\n' from fgets, it's not actually trying to terminate the string. That said, they should have just saved the strlen(buffer) result in a size_t and then used it later. The real function should look something like this:
/* Fake readline function */ char* readline(const char* prompt) { fputs(prompt, stdout); fgets(buffer, sizeof buffer, stdin); size_t bufsiz = strlen(buffer); char* cpy = malloc(bufsiz + 1); strcpy(cpy, buffer); cpy[bufsiz - 1] = '\0'; return cpy; }

unwind · on April 4, 2014

Well, yes, I know what the strlen() is for, that's why I wrote truncate (not terminate); it is shortening the string by 1 character.

I see two problems with your code: it's over-allocating, since it's going to do the truncation there's no need to add 1 for the termination, it balances out with the newline; and there's absolutely no point in using strcpy() when you know the length.

DSMan195276 · on April 4, 2014

Fair enough. It wasn't obvious to me what the strlen was doing at first glance, and I doubt it's obvious to any 'beginners' reading the book so it's worth noting either way.

It's true, you're right, I was keeping it a bit in-line with the book, if you change the malloc to be one less, then you can't use strcpy since it'll write the null term to memory you don't have (It might actually work on some platforms since it's just one byte but it's undefined behavior). If you just memcpy the string with bufsiz then there's no issue though.

cliveowen · on April 4, 2014

If there's one thing that I really can't stand are magic numbers. The K&R says to always use constants, which means this has been a good practice for more than 40 years and I really can't grasp how someone can hope to teach something when his own knowledge lacks the basics.

john_b · on April 4, 2014

I haven't read through all of the author's content yet, but there is a certain benefit to using magic numbers in introductory material. Numbers impose very little conceptual overhead on people, especially "familiar" numbers like 2048 or other common buffer sizes. This allows the reader to save their precious meatspace memory for storing the novel concepts the author introduces.

But it depends how the code is presented. "Here's an example of..." is different than "Here's how you should do..."

aidenn0 · on April 4, 2014

Also lots of people still use #define for magic numbers; I learned C in the late 80s with a book by Thomas Plum, and it recommended using enum whenever possible in preference of #define.

The reason given was superior debug support. Debuggers have gotten better at handling #defines, but #defines are still second class citizens. I've personally worked with a debugger that could precisely find all references of an enum, but only grep for a #define. In code with lots of conditional compilation, that makes a huge difference.

michaelochurch · on April 4, 2014

In examples, magic numbers are OK. It's in large production systems (that will require tuning, which becomes impossible with "magic numbers") that they're a code smell. I'd much rather, in a small example, see:

    new int[2048];

than

    #define TWENTYFORTYEIGHT 2048
    new int[TWENTYFORTYEIGHT];

Also, it's generally OK to use a "magic number" if you know that the constant will only be used in that one place, or that there is no way it will ever change. Where they're bad is when they represent undocumented constraints across a program (i.e. "this 64 and that 64 are the same and can't be changed independently".)

ant512 · on April 4, 2014

Tell me what the number represents rather than what the value of the number is. How about this?

    #define BUFFER_LENGTH 2048
    new int[BUFFER_LENGTH];

spc476 · on April 6, 2014

And Standard C already defines a constant you can use: BUFSIZ.

john_b · on April 4, 2014

In my experience, aliasing numbers to words has never improved any code in any way whatsoever, and sometimes causes more headaches since numbers are easy and natural to reason about numerically (big surprise...), while values such as TWENTYFORTYEIGHT impose more mental overhead and open you up to typos.

It strikes me as a symptom of the "magic numbers are evil witchcraft" religion gone to the extreme. The whole point of banishing magic numbers is to improve code's clarity and robustness to change. I've seen

  #define ZERO 0

and

  #define ONE 1

more times than I care to admit. I have yet to see the value of either change.

theseoafs · on April 4, 2014

Programmers that replace 0 with ZERO, 1 with ONE, and 2048 with TWENTYFORTYEIGHT are severely misunderstanding why magic numbers are a bad thing in the first place. TWENTYFORTYEIGHT means exactly the same thing as 2048 (that is to say, TWENTYFORTYEIGHT means roughly nothing), but replacing 2048 with a symbolic constant that actually means something (say, BUFFER_SIZE) is a good idea.

andyroid · on April 4, 2014

I would say ONE_TB is an improvement over 1099511627776 when it comes to "natural to reason about" numbers, but yes, defining ZERO and ONE is moronic.

nostrademons · on April 4, 2014

A lot of times TiB is spelled 1024 * 1024 * 1024 * 1024 or 1 << 40, relying on compile-time constant folding to compute the final value. I usually find that a lot more readable than remembering the actual number.

dalke · on April 4, 2014

#define M_PI 3.14159265358979323846 is usually nicer than seeing (and possibly mistyping) those digits of pi each time.

zenlikethat · on April 5, 2014

Yeah, we were always taught that zero, one, empty string and other "edge case" values were O.K. to have in code directly. I'm inclined to agree.

blt · on April 4, 2014

I dislike the use of a parser generator libary. If it were any other language, sure. But Lisp is so easy to parse. Writing parsers is fun, and it would be a good exercise. I applaud you for wanting to teach beginners how to install libraries, but this seems like the wrong choice.

Props for mentioning conditional compilation early. It's underrepresented in books but essential for real life.

gmfawcett · on April 4, 2014

If you're interested in this, you might also enjoy "Write yourself a Scheme in 48 hours." It uses Haskell rather than C as the implementation language.

https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_...

rasur · on April 4, 2014

I am working through this slowly - it is excellent (although I find myself referring to the Haskell report tutorial too, when I find something that needs further clarification).

A word of warning - the pdf version is not as accurate as the html version. Prepare for some head scratching if you use the pdf.

agumonkey · on April 4, 2014

WYASI48 is mentioned as an inspiration somewhere IIRC

rui314 · on April 4, 2014

If you are interested in this, you may find one of my Lisp implementation interesting too. It's implemented less than 1k lines of heavily-commented C, but it even supports a macro system and copying GC in addition to the common features you would usually see in small Lisps.

https://github.com/rui314/minilisp

anateus · on April 4, 2014

Sounds like this may subvert Greenspun's Tenth Rule:

"Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

http://en.wikipedia.org/wiki/Greenspun's_tenth_rule

acqq · on April 5, 2014

Actually the project depends on the library which claims to implement the named rule:

https://github.com/orangeduck/mpc

JasonFruit · on April 4, 2014

If I decided to work through this, and thought it was really good, I'd be unable to make a donation because I don't dabble in cryptocurrencies. Why not make donating "real" money possible, as well?

orangeduck · on April 4, 2014

Thanks for the suggestion. I'll open this option up if possible.

jafaku · on April 4, 2014

It's called "fiat", not "real".

pseut · on April 5, 2014

FWIW, cryptocurrencies are far more fiat than dollars. You can pay taxes in dollars, for example, so they're not even a pure fiat currency.

JasonFruit · on April 4, 2014

Not all non-cryptocurrencies are fiat currency.

jafaku · on April 4, 2014

I doubt he was talking about precious metals...

EdwardDiego · on April 4, 2014

If the author is reading this, thank you very much - I've always wanted to experiment with C, but never had a project to use it on. Writing a Lisp interpreter (compiler?) is not one that occurred to me, but it definitely interests me.

Cheers for your hard work. :)

mkhattab · on April 5, 2014

Thank you for writing this and sharing it for the common good. I've been wanting to relearn C since I've switched to Python for work. Hopefully this book will be good practice because I've been wanting to implement a Lisp interpreter on a micro controller for quite sometime. I have plenty of PIC18F4550 micro controllers lying around.

Also, I liked the cat pictures and hope you'll add more of those in the next edition, perhaps.

userbinator · on April 5, 2014

Someone needs to make a complementary "Learn Lisp and build your own C."

CGudapati · on April 4, 2014

How good is this book for a complete beginner? I am starting to learn C in my free time and I have no objective way to judge this book.

simias · on April 4, 2014

Skimming through the first chapters of the book I can say it doesn't seem to assume any prior knowledge of C or even a background in programming, it explains all the basic concepts (conditionals, types etc...)

That being said I'm not entirely convinced that implementing a programming language is the best toy project for learning C since one of the first things you have to write is a parser and we all know string handling in C is a pity.

That being said it looks very interesting, I know C quite well but I might read it for the "implementing lisp" parts.

orangeduck · on April 4, 2014

Author here. Actually a parser combinator library is used for the parsing. As you mention, I think there should be some interesting things inside for people with any experience in C.

Xophmeister · on April 4, 2014

I've only just had a quick scan through, but it looks thorough and well-paced; there are examples, exercises and useful looking pull-outs. Anecdotally, It definitely looks worthwhile for a self-teaching beginner and going in the "implementing your own LISP" direction is an interesting approach.

csl · on April 4, 2014

I also did a scan of the book; it seems pretty good!

However, if you're serious about learning C then I strongly recommend getting the K&R book [0]. It's short and quickly gets down to business --- the first chapter alone gives you a condensed but working overview of the language as a whole.

Even experienced C programmers seem to keep the book around as it's good as a reference as well.

I strongly recommend people to learn C. It's a small, beautiful and very powerful language and the lingua franca for language ABIs. In fact, many popular dynamic languages are implemented in C (Python, Ruby, Lua and countless others).

Your time will be well spent!

[0]: http://en.wikipedia.org/wiki/The_C_Programming_Language

HillRat · on April 4, 2014

Or King's C Programming: A Modern Approach, which lacks the conciseness (and affordability) of K&R but is probably the best single text out there, and unlike K&R is updated through C99.

krick · on April 4, 2014

In fact, this one (implementing Lispy thing, I mean) seems the best to me so far. Why? Paradoxically, because it's so obviously incomplete, so it makes you search something by yourself while you read it and helps to actually understand a little bit more.

Both K&R and "Modern Approach" lack that word "Modern" very much, yet are written with undertone "it's everything you have to know about C", when it isn't. Person who doesn't know C very likely doesn't know how OS works, what architecture layers are behind the software he uses every day. He just knows somehow he wants to learn C, but doesn't really understand what C is, yet he knows that virtually everything is written in C, and "everything" usually doesn't run in terminal, but processes and produces sound, images, video, can have GUI, use some external devices, run in parallel, run on GPU. Also, it's pretty obvious (especially if you've used some language like Python already) that nobody writes everything from scratch these days, but there're many libraries that proved to be useful.

And stuff like branching and cycles seems to be pretty obvious even for somebody without coding experience, as far as I can judge from what I've seen so far. Yet we have plenty books that spends 20 pages to explain "if" keyword and mentions every function in standard library (why?! it's 2014, people, we have cplusplus.com now!)and covers nothing in sense of what useful libraries are out there, what are these domains where you still should use C today (because feet to meter converter shouldn't be written in C today and you probably should choose Python if there's no specific reason to use C), what tools to use for testing and debugging and such.

And while it can be justified for K&R (jeez, how old is that book!) it's just ridiculous that book that has "Modern" (well, it was 2008, but still…) in it's title covers almost nothing of what person who wants to program in C should know today. Worse, you'll see that only after reading these 800+ pages.

"C the Hard Way" is a little bit better, but still not as good as this one. This one also isn't perfect, but is the best of what I've seen so far.

kjs3 · on April 7, 2014

Kim King wrote my Modula-2 textbook. If this is as good as that, it's probably something worth picking up.

yk · on April 4, 2014

I was one of the beta readers. The book makes quite a different trade off from most other beginner C books. Most other books follow the K&R model of short examples and a lot of detail on the topic in question, while Build your own Lisp is just one project. So you actually see how the parts fit together.

I would think that either way you can learn C. Which way is better for you depends if you are more motivated by going through one long project, or if you prefer to have many small ones.

visarga · on April 5, 2014

I think it's better at teaching Lisp.

DSMan195276 · on April 4, 2014

Admittedly, while I'm interested to look at this book, it really looks to me like you should come in with at least a basic knowledge of C. The book appears to hold your hand through most of what goes on to create a basic Lisp-like language, but it really jumps fairly quickly over the basic C stuff at the beginning so you can get to the coding. That may actually be fine for you if you already have a basic C knowledge though.

krick · on April 4, 2014

> it really jumps fairly quickly over the basic C stuff at the beginning

In fact, for me it seems to be the best thing about this book. It mentions everything you need to hear once to be able to use search engine and clarify anything you don't understand. So help yourself! You don't know syntax of "switch" statement? Google it (or duckduckgo, or whatever)! You don't know how to use printf? You've been shown the way to cplusplus.com already, take your time and find out everything you want to know about that function. Then come back for a new piece of information to think about.

The only two things I guess are missing is "make" (I'd mention it from the start to help everybody save some time, with gdb and valgrind) and a little bit more information about pointers from the very beginning, because searching for help about "free" function will be no good if you don't know about stack/heap allocations.

fsloth · on April 4, 2014

Like others, I would suggest K&R. Perhaps after that, Lourdon's "Mastering algorithms with C". I've found that structuring any program around well understood datastructures and keeping it KISS after that takes you a very long way. Also applies to C. Like others have commented, stylistically the code in the linked book is perhaps not the best place to learn the best conventions for C. But! Writing an interpreter gives a huge kick and reading other peoples code is always nice.

For real world projects, it's not a bad idea to implement your own memory manager (even if it's only a wrapper for malloc and free). For real world interpreters Lua (www.lua.org) is written in C, is quite widely used and is not too large - if interpreters are your interest. If you are into lisps Structure and Interpretation of Computer Programs http://mitpress.mit.edu/sicp/full-text/book/book.html contains a the instructions for writing a lisp interpreter in any language.

sanxiyn · on April 4, 2014

I disagree with recommendation of Lua. Lua is a production-grade interpreter, and a production-grade interpreter is usually not a good educational material for those "interested in interpreters".

In particular, Lua uses no intermediate representation (there is no Lua AST!), which makes its compiler portion a quite confusing read. Its interpreter portion is no less confusing, being a register VM with a register window scheme and a non-recursive eval using goto and a separate call stack. Do not be fooled by "not too large" aspect. Lua is worth studying, but it is definitely not a beginner material.

z3phyr · on April 4, 2014

There is one excellent book I would recommend to everybody -> http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_H...

adamcanady · on April 4, 2014

At my undergrad institution, we have a class that does a project very similar to this. I recently completed the class and project and have a bit of advice for anyone looking to take up this challenge on their own.

First off, a small note: our project used C to implement Scheme (a lisp dialect), so it's similar but not exactly the same as this.

I'd recommend starting off with reading about the principles and coming up with a solution to each of the problems on your own instead of following a specific pattern as outlined in the book. For example, in our project, we decided to learn about the C basics, then just figure out how to make it 'act' like Scheme if given an input. Eventually, we thought of doing everything in a linked-list style to make organization and recursion easier and more natural, but coming to this conclusion on our own was very helpful.

Another thing is valgrind. As far as I could find, the text only mentions valgrind in one paragraph, but it's an excellent tool to check for memory leaks and errors.

Also, as mentioned in the book, a bonus is adding in GC. This turns out to be a pretty easy and a fun exploration of the different techniques available if you try a couple out for performance.

Our code in case you're interested: https://bitbucket.org/adamcanady/cs251_interpreter/src

_19qg · on April 4, 2014

Fine to implement something. But I would wish that the language implemented actually looks like Lisp or Scheme.

Currently this claims to be Lisp, but it is some strange version of it.

Using a Scheme or Lisp has a lot of advantages:

* one can compare it with other actually working implementations

* there is already a wealth of material to learn from or where code could be reused

* many books exist

* the language actually has already got some thought

A good example is the book on Scheme 9:

http://www.t3x.org/s9book/index.html

SixSigma · on April 4, 2014

On the compiler course taught by Brucee (Bruce Ellis - one half of Mark V. Shaney - the other half being Rob Pike) he got you to do two projects : write a Lisp implementation in C, and a C compiler in Lisp. Most instructive.

If you have an HP printer with a Postscript renderer and you can get an image of it, you can find a digitised photograph of him too :)

swah · on April 4, 2014

Would tcc be good enough a compiler for this? Its very easy to install on Windows compared to Mingw/cygwin etc.

sanxiyn · on April 4, 2014

akkartik · on April 4, 2014

In similar vein to tomp's comment: if someone's interested in the bonus project of implementing macros, I've implemented a simple lisp with first class macros (fexprs): http://github.com/akkartik/wart. Moreover, you can write anonymous macros out of more primitive constructs.

Over time it's accumulated other features: http://akkartik.name/post/wart. But hopefully it's still easy to understand, because it has thorough unit tests. If you have a question about some line of code you can try changing it. Seeing what tests break is a great way to get feedback as you learn, in my experience.

mcescalante · on April 4, 2014

I've already programmed quite a bit of C and C++, and this seems like a great read + resource. I'd be curious to know how a non-developer or somebody coming from something like Ruby or Python that had never developed C felt about this.

wedesoft · on April 4, 2014

I was trying to implement an interpreter for a minimal (lazy) language in C [1]. It uses Church encoding and other encodings. E.g. here are the definitions of false, true, identity, and pair:

    f_ = proc_self(lambda(v0));
    t_ = proc(lambda(v1), f());
    id_ = proc(v0, f());
    pair_ = lambda3(op_if(v0, v1, v2));

For the moment I gave up on it though but maybe it might serve as inspiration ;)

[1] https://github.com/wedesoft/blc/blob/master/src/x.c#651

arcatek · on April 4, 2014

PL101[1], from Nathan's University, is also a pretty good way to learn the basics of programming language compilers.

[1] http://nathansuniversity.com/

quakkels · on April 6, 2014

Thanks for writing this. I find your writing style to be very easy to follow. I've wanted to play with a toy language for some time, so I'm very excited to dive into your book with both feet. I'm working on Windows and am up to chapter 6. Very enjoyable. The only suggestions I would offer is perhaps to mention that AST stands for Abstract Syntax Tree. It seems this book is targeted to less experienced programmers, so it may be helpful to mention what AST means in this context. Thank you for your hard work and willingness to share your knowledge.

quakkels · on April 6, 2014

Well, just made it to the next chapter where you talk about AST. :-)

ilbe · on April 4, 2014

Wow, very timely, yesterday I needed to brush up on C and was wondering how I would go about it. Thanks. Very minor note: the link to "Learn C the Hard Way" on the faq page links back to the faq page.

orangeduck · on April 4, 2014

Thanks. I've fixed the link.

zafka · on April 4, 2014

Very Cool, I don't want to spend much time here now as I am at work......, but I love the concept. I have been using C for almost twenty years, and I think this would be a fun check.

pflanze · on April 4, 2014

I've just begun starting a group to create a Scheme implementation from scratch in London (UK). If you're already somewhat proficient in Lisp/Scheme and interested in participating, see [1] for further info and send me a message through meetup.com or through the link on my contact page.

[1] http://www.meetup.com/London-SICP-Study-Group/messages/board...

kyllo · on April 4, 2014

Thanks for sharing, this is really helping me to understand in a little more detail how compilers work. And the doge example gave me a good chuckle.

codemonkeymike · on April 5, 2014

I had made a lisp interpreter using this book I found in my schools library. http://www.amazon.com/Data-Structures-Advanced-Approach-Usin...

sgy · on April 4, 2014

Of course C is not the only option, and perhaps not an option at all for newbies. JavaCC can be a good medium to build your own language too. They recently added the ability to generate C/C++ code (https://java.net/projects/javacc)

Is there any tree building capabilities using C?

quantabytes · on April 4, 2014

Your Bitcoin and Dogecoin addresses are mixed up.

EDIT: Fixed now.

stantona · on April 4, 2014

This is extremely cool, for someone with some C experience but wants to dig into writing a lisp.

matt_heimer · on April 4, 2014

There is also a pretty good Compilers MOOC going on right now https://class.coursera.org/compilers/lecture/preview

gregburd · on April 5, 2014

My favorite small and understandable implementation of Lisp in C is by Ian Piumarta. [1]

[1] http://piumarta.com/software/lysp/

jameshk · on April 4, 2014

This is awesome, I will certainly be using this to learn more Lisp.

crnixon · on April 4, 2014

Is the source of the text on GitHub or elsewhere? I'd love to send pull requests for some of the things mentioned in these comments, like magic numbers not being in constants.

orangeduck · on April 4, 2014

Hi crnxion. The source can be found here: https://github.com/orangeduck/BuildYourOwnLisp

I welcome corrections/edits.

donniezazen · on April 4, 2014

I am new to programming. What is it so special about Lisp that I keep hearing about it in Hacker's News (often in Machine Learning and from old programmers)?

agentultra · on April 4, 2014

You will get different answers from different people. There doesn't appear to be any commonly accepted universal truths in Lisp (otherwise we'd all be programming in some variant of it by now). However there are plenty of good ideas which many more popular languages have adopted.

What made it special for me was discovering the link between symbolic computing and the lambda calculus. The kernel at the center of every Lisp is a substrate upon which languages are built. eval knows nothing about your program but can read some Lisp forms and turn itself into a machine that can execute it. This is immensely powerful. From a shockingly small number or primitives you could build, theoretically, any language you need.

The more superficial qualities, for me, are a distinct lack of frivolous, obfuscating syntax; its uniformity; and its direct representation. I don't have to maintain an exhaustive mental mapping of syntactic digraphs and operators, I don't have to remember precedence rules, I don't have to remember the grammatical distinctions between expressions and statements, and I certainly don't have to maintain a running interpreter/compiler in my head. Armed with a simplified mental model of the lambda calculus such as the substitution method is enough to work out what any program is doing (and indeed due to the nature of Lisps is also easy to interactively verify).

Every other language I've worked with requires memorizing a laundry list of special cases and weird rules (ie: how array and pointer parameters are changed by the compiler in C, precedence rules as mentioned, etc) imposed for efficiency's sake and often maligned with good engineering practices. Some strange monkey-brained part of me fools myself into believing I am becoming a better programmer when I pollute my mind with these things... but I've learned over the years that it's just a trick. The ideas are important and the implementation is just the painful cost we pay to make them reality.

Lisp just happens to have the least cognitive load in my case.

platz · on April 4, 2014

I agree with everything, though the idea of least cognitive load I think only runs skin deep (syntax).

This helpful for small scripts which need to be scanned quickly, but anything non-trivial will already have a significant layer of abstraction which takes time to parse.

Lisp people love their macros. Clojure has fancy data structures and control flow. It will take time to understand these things whether its in lisp or sandskrit.

agentultra · on April 4, 2014

> the idea of least cognitive load I think only runs skin deep (syntax)

Perhaps... I'm not aware of any empirical study into the matter so my claim is mere speculation.

There are, for example, plenty of highly productive Perl and Haskell programmers. Those languages are notorious for the gobs of arcane syntax. And yet their proponents claim it's an advantage.

"Cognitive load," in my case refers to the amount of information about the language and its compiler/runtime I have to recall in order to estimate how a given piece of code will execute.

kazagistar · on April 5, 2014

Enough use of macros means cognitive load for even simple code can be massive though. You can't really know what the result will be, especially if the entire program can be rewritten in some cases by a macro. If you constrain their use enough, sure, you will not run into these problems. But in any language it comes down to cognitive load vs expressiveness, in how you chose to wield it. Generally, more dense code requires more thinking to understand, and the resulting complexity has to do with the underlying structure, rather then the particulars used to represent it.

Dewie · on April 4, 2014

> There are, for example, plenty of highly productive Perl and Haskell programmers. Those languages are notorious for the gobs of arcane syntax.

I don't see anything arcane about Haskell's syntax. There are no surprises in vanilla\* Haskell syntax - no cruft in the syntax of expressions, no complicated syntactic sugar, you can see what is and isn't an infix function at a glance, same with type constructors (capital letters), etc.

What might be arcane is some peoples use of user-defined infix operators. But I don't know if I would lump that in with ''Haskell's syntax'', since that isn't part of the grammar of the language. But YMV.

\* I can't speak for GHC extensions or template Haskell

kyllo · on April 4, 2014

Yeah, and it's important to remember that programming languages are for humans, after all. So if this paradigm is easier to reason about and makes you more productive (than coding in something imperative like C or Java or Python), then that makes it a great language for you.

yk · on April 4, 2014

Basically Lisp is based on a very different paradigm than most mainstream languages. So if you know Lisp and program in something else, then half of the time you are reminded that the current problem would be really easy in lisp.

JimmyM · on April 4, 2014

Homoiconicity is a neat thing when you just get started, although there's some interesting discussion around that subject here - http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-...

muhuk · on April 7, 2014

Thanks for sharing. Interesting discussion.

I think the author confuses the concept (homoiconicity) and it's practical consequences. I think this quote from the comments below summarizes it:

    > But the fact of the matter is that programmers interact with text, not with data structures.
    
    Not Lisp programmers. It's trivial to implement an environment where code will be represented graphically and won't be ever represented as text.
    
    That's, actually, the point: language semantics isn't tied to textual representation.

sum_guy_juno · on April 4, 2014

Simplicity and Homoiconicity. Its an all around elegant language.

osho · on April 4, 2014

Can i get an pdf/epub version of this book?

skittles · on April 4, 2014

Anyone know why clang on mac os x doesn't know about edit/history.h? The line editing portion of the book requires it.

exef · on April 5, 2014

You can try to install with MacPorts Readline with terminal command: sudo portu install readline

And after that you include with #include <readline/read line.h> #include <readline/history.h>

jdp · on April 4, 2014

That would be for libedit: http://thrysoee.dk/editline/

jdstraughan · on April 4, 2014

I've installed this, but I still get this error:

fatal error: 'editline/history.h' file not found

Any advice would be greatly appreciated.

melonman · on April 5, 2014

Are you using Mavericks? You need to import the library using #include <histedit.h>.

Other problem is that the code won't work with this library. You need to rework it using the following example: http://www.cs.utah.edu/~bigler/code/libedit.html

There's a lot more to set up and tear down when using the Mavericks' editline; it's not so user-friendly for a beginner as the code in the example.

aaronsnoswell · on April 5, 2014

Got halfway through this today - most fun I've had programming in ages. Thanks for the link and thanks to the author!

NAFV_P · on April 4, 2014

> Cat • cannot speak or program

A caption for a picture from chapter 5, preceding a look at the grammar of Doge - DogeLisp anyone?

leondutoit · on April 5, 2014

Awesome. Hope I can make time to go through it - looks like good fun. Thanks for putting it out there.

SneakerXZ · on April 4, 2014

Does it use a generator to generate lexer and parser or author is suggesting to write them manually?

mcmillhj · on April 4, 2014

I imagine if the goal is to design from the ground up, you wouldn't be using a generator.

orangeduck · on April 4, 2014

Author here. A parser combinator library is used to do the parsing.

hamburglar · on April 4, 2014

My one criticism is that it's a fairly big leap from learning C from ground zero to being a client of a complex, high-level library like mpc. I feel like the moment you hit the call to mpca_lang, anyone who really was learning C at the beginning shifted from "learning C" mode to "copy and paste C code I don't understand" mode.

SneakerXZ · on April 4, 2014

Too bad, always everyone who teaches how to write a programming language uses generators but actual production programming languages never use them. Except of a few langugages.

I wonder why...

kevinnk · on April 4, 2014

It's hard to generate high quality error messages with current generators unfortunately. More common nowadays is building first with a generator and switching to recursive decent when it becomes too unwieldy.

yk · on April 4, 2014

You should perhaps submit it to r/C_Programming/ since you looked there for beta readers.

elwell · on April 4, 2014

This looks really cool. I wish I had the time to read it.

aidenn0 · on April 4, 2014

Out of curiosity, what are you doing with your time that is currently more important than reading it?

elwell · on April 4, 2014

Working on my startup, a social photo/video aggregator that organizes by event, and has B2C and B2B aspects: http://wesawit.com

Any feedback is appreciated.

visarga · on April 5, 2014

It' slow.

DOM loaded in 59 seconds.

Content loaded in 180 seconds and weighs 12MB

It's also downloading mp4 files in the background.

....

4 minutes passed and it is still working on the banner-video.mp4

Also, it has problems with scrolling (freezing). I use a MacBook Air 11'' from 2012.

zenlikethat · on April 5, 2014

I get DOMContentLoaded in ~1.41 seconds on from here, but the background video downloading is worrisome.

Grandparent, currently your website has slurped 31.3 MB (and counting!) over the wire for a simple landing page. What gives man? Seems that everything which should be setup on your page already, is.

halayli · on April 4, 2014

lval* v = malloc(sizeof(lval)); v->type = LVAL_QEXPR;

For goodness sake, when teaching C, make sure to check alloc's return values.

notastartup · on April 4, 2014

Using this guide, is it possible to create a programming language in a non-English language?

Jemaclus · on April 4, 2014

The basic principles apply whether you're using English or not. You should theoretically be able to write a programming language in any real-world language you want.

JoelMcCracken · on April 4, 2014

Can anyone briefly explain what makes the lisp that the author develops unique?

pjmlp · on April 4, 2014

Please don't. Rather pick something like "Compiler Construction" from Niklaus Wirth and learn how to write compilers using memory safe system programming languages.

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

orangeduck · on April 4, 2014

I wrote this book over several months. It took a few days to get the courage to share it online. It probably took you about 20 seconds to not read my book, dismiss it, and post a comment telling other to do the same. If you think an alternative is better that's fine, but how about you take an extra 20 seconds to think about the situation next time you are going post a comment.

Iftheshoefits · on April 4, 2014

The person you're responding to has a history of hating on C around here. I wouldn't take what he wrote personally (easier said than done, I know). That said, I offer the following:

Almost any language can be (ab)used in an "unsafe" manner. Conversely, it is entirely possible to create elegant, efficient, and safe code using C, and there is no particular reason to not use C for this project. In fact, C is marvelous in its simplicity and is an excellent candidate for it.

On a scale of 1 to 10 I'd put my skill with C++ around a 6 or 7. For C I'd go lower, around 3 or 4. I appreciate the work you put into this, and I intend to read your book to help me improve my skills with C.

So thanks for your effort.

groovy2shoes · on April 4, 2014

> In fact, C is marvelous in its simplicity and is an excellent candidate for it.

While I don't disagree with your comment, I'd like to point out that the Oberon language (used in pjmlp's suggested book) is even simpler than C. It's worth checking out if you value simplicity in your programming languages.

http://www.inf.ethz.ch/personal/wirth/Oberon/Oberon07.Report...

iiiears · on April 6, 2014

@OrangeDuck - Just wanted to thank you. Your style is clear and ideas progress easily from the first chapter onward.

The links offered in this thread are wonderful, The University links are impressive. HN is a tuition free education.

karuth · on April 5, 2014

I had the time to read a good portion of this book. Really well written. I hope some of the negative comments don't discourage you from sharing your work next time around. I learnt something useful this time.

zenlikethat · on April 5, 2014

OP, I hear you on this one and commend you for sticking it to the trolls. For what it's worth, I LOVE the book and think you rock for making it.

antocv · on April 4, 2014

I took a look at your book and was immersed unil my eyes started itching from strain, I got to the variables chapter.

Very well written, fun to read and immensly useful. Wish I had your book 10 years ago. Thank you.

pjmlp · on April 4, 2014

Sorry but I think the world is better server by younger generations learning that C is not the only way to do systems programming and there are better, safer, alternatives.

zafka · on April 4, 2014

Perhaps you could suggest your favorite method as an alternative? If I get to know you, i might change my opinion, but from the little I know I would suggest you avoid a career in teaching.

pjmlp · on April 4, 2014

I did suggest an alternative book.

As for your remark, I did teach back in 1998 - 1999 for first year CS students, Pascal and C++.