Show HN: Lisp in C Tutorial Series

kazinator · on Jan 14, 2020

  >         } else if (strcmp(op_symp->valuep, "if") == 0) {
  >             return sf_if(argsp, resultpp, envp);

Why have a "part 1: symbols", if you're going to end up dispatching operators based on strcmp.

A proper Lisp interpreter tests whether an atom is the if symbol using eq comparison, not string. That's a very fundamental thing in Lisp, and is part of what makes Lisp dialects languages for "symbolic processing".

eq comparison just tests whether the left object and the right object are the same object. So in implementation terms, it might compare two machine words (which might be addresses).

In other words, if your objects are represented by pointers in C, then it's just:

   } else if (op_symp == if_s) {
     return ...
   }

where if_s is a variable or macro or whatever that refers to the if symbol. Somewhere in the code, at startup, that gets interned:

   if_s = intern("if" /*, ... additional arguments, like what package */);

When we read the input expression, the "if" text will also get passed to the intern function, which will return the same pointer. So then op_symp will be the same object as if_s, making the comparison true.

The point is not just that this is faster, which it is: not scanning strings character by character when we wish to dispatch on symbols, and just comparing machine words is faster. The point is to be working with structured objects and not raw text, even when it comes to identifiers.

One nice thing about interned symbols is that if we create a brand new symbol object, then it is guaranteed to be unique (different from any other symbol and indeed any other object). That is true regardless of its name; even if its name clashes with that of an existing symbol, they are not the same object. In C terms, memory allocation always produces a new object that is different from any other object in the program.

Also note that the string "if" is not the value of the symbol if, but its name. Like in ANSI Lisp (symbol-name 'if) -> "IF". What we understand to be the value of a symbol is its binding to an object in some environment, which has no effect on its name.

ksaj · on Jan 14, 2020

This kind of error shows up often when people who are freshly learning something decide they should make a tutorial of it before they even understand the subject at anything nearing an authoritative level. It literally means everyone who follows the tutorial is doomed to make the same mistakes - it becomes part of their foundation since "that's how I learned it!" And if they're not still on board when/if the mistakes are discovered and corrected, the misinformation spreads virally.

Nobody should ever follow a tutorial from someone who is still in the learning phase. Those mistakes are part of the learning, and they certainly don't belong in a tutorial. It's better to make your own mistakes than to make someone else's mistakes part of your repertoire.

It's also why there are a gazillion unfinished tutorials out there. People who are not at all ready or capable of making sensible tutorials eventually peter out, either because it is more complicated than they expected, or more work than they expected, which leaves incomplete and highly erroneous material behind to spread virally to unwitting people who unfortunately found the wrong material to learn from.

cellularmitosis · on Jan 14, 2020

mostly, I'm writing the tutorial I wish existed.

cellularmitosis · on Jan 14, 2020

Thanks for the feedback! Here is an earlier version of the reader which featured symbol interning: https://gist.github.com/cellularmitosis/10656a3fbc1690bdaa50...

I decided to leave it out until I introduce a hash table implementation.

EDIT: also, thanks again for your work on TXR!

kazinator · on Jan 14, 2020

You should make this a URL submission so that clicking on it goes to that gist URL; then add your above blurb as a comment.

cellularmitosis · on Jan 14, 2020

ah, drat