Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The First Lisp Compiler (texdraft.github.io)
208 points by texdraft on June 1, 2022 | hide | past | favorite | 52 comments


> In LISP 1.5, function definitions were stored on the property list of the function's name...

The fact that this was once the norm but has since been done away with saddens me. I've always been fascinated with "live environments" but felt they only went half way if they didn't include the source code itself. If I'm going to be updating something in a running system, I want to know what source code was used to get the system to the state its currently in, and preferably be able to query that source code as data. Of course, that code could be kept within source control, but then it's a shadow of the running system -- a map of the territory and not the territory.

As far as I know, the only languages/environments where this functionality is still available are Tcl, Smalltalk, and to some extent stored procedures within an RDBMS.


Forth as well. ANS Forth defines the word SEE <https://forth-standard.org/standard/tools/SEE> that shows the current source of a given word.


Minor caveat on SEE. If the Forth implementation uses "threaded" code the resulting source code can be quite close to the original text.

If the Forth compiler generates native code the one-to-one relationship with the source is lost and SEE will typically show the compiled code as Assembly Language.


Javascript does this: functions have to toString() method which returns their source code. Python also does something like this. Though it's just a debugging tool; you can't do much practical with it in real applications (like edit it and instantiate a new function from it) because it doesn't include the closure the function was defined in.


Also same for Rebol & Red.

And its easy to view the source code from the console (REPL). Some examples from Rebol2...

    >> source func

    func: func [
        "Defines a user function with given spec and body."
        [catch]
        spec [block!] {Help string (opt) followed by arg words (and opt type and string)}
        body [block!] "The body block of the function"
    ][
        throw-on-error [make function! spec body]
    ]


    >> source source

    source: func [
        "Prints the source code for a word."
        'word [word!]
    ][
        prin join word ": "
        if not value? word [print "undefined" exit]
        either any [native? get word op? get word action? get word] [
            print ["native" mold third get word]
        ] [print mold get word]
    ]


Even when you can access and modify the source in-place, it may be tricky when closure is involved


This is not the norm in Forth due to the very spartan environments it originated in, but most mature implementations include a pretty comprehensive decompiler as SEE. (Ironically, this is where the Forth claim of having a modular compiler comes apart, because SEE is always monolithic and nonextensible, or at least I haven’t seen it done any other way.)


Doesn't SBCL have this functionality?


The DOCUMENTATION function shows documentation from a function's definition.


R has this (being secretly a lisp that only looks C-like on the surface).


many Lisp systems record the source location and/or the source itself (especially a source level interpreter needs that)


Do you mind enumerating what those systems are? I've only played around with Allegro CL and SBCL myself.


LispWorks for example has a documented and extensible mechanism to record the locations of definitions:

http://www.lispworks.com/documentation/lw80/lw/lw-dspecs-ug....

Common Lisp has standard function FUNCTION-LAMBDA-EXPRESSION:

  * (defun foo (a)
      (my-if (> a 10) 'big 'small))
  FOO

  * (function-lambda-expression #'foo)
  (LAMBDA (A) (BLOCK FOO (MY-IF (> A 10) 'BIG 'SMALL)))
  T
  FOO


Maybe collectively we can. In Emacs/SLIME/CCL,SBCL on ARM32,x86-64 ESC + '.' (command 'find-tag') on a symbol in the REPL opens a buffer with the definition of that symbol, provided you have the sources (it's neither decompiling nor reading the definition from the binary, but follows a hint to the source file). So this is more functionality of the environment / tools, rather then the compiler or language. Works beautifully though and in most cases you don't really want the sources embedded within the binary.


> So this is more functionality of the environment / tools, rather then the compiler or language.

Doesn't FUNCTION-LAMBDA-EXPRESSION apply here?


Didn't even know about this one. It is one of those 'optional' features -- the standard puts it like "Any implementation may legitimately return nil as the lambda-expression of any function." And indeed, CCL choses to do just that. SBCL seems to put in some effort to decompile the function.


Worst case scenario, you can probably patch your compiler's COMPILE function to store the data somewhere before performing the actual compilation.

As for CCL, it seems to work for me just fine as long as you set CCL:*SAVE-DEFINITIONS* to T:

    Clozure Common Lisp Version 1.12.1 (v1.12.1) LinuxX8664

    For more information about CCL, please see http://ccl.clozure.com.

    CCL is free software.  It is distributed under the terms of the Apache
    Licence, Version 2.0.
    ? (defun foobar (a b c) (* a (+ b c)))
    FOOBAR
    ? (function-lambda-expression #'foobar)
    NIL
    NIL
    FOOBAR
    ? (setf ccl:*save-definitions* t)
    T
    ? (defun foobar (a b c) (* a (+ b c)))
    FOOBAR
    ? (function-lambda-expression #'foobar)
    (LAMBDA (A B C) (DECLARE (CCL::GLOBAL-FUNCTION-NAME FOOBAR)) (BLOCK FOOBAR (* A (+ B C))))
    NIL
    FOOBAR


CCL has a feature called source-note to record locations and source.


Not a Common Lisp implementation but Clojure can find the source for a symbol (provided the source code file is in the classpath) with the clojure.repl/source function (doc: https://clojure.github.io/clojure/clojure.repl-api.html#cloj... .) All development environments support jump to definition.


And Emacs


Julia has this


Tail call elimination!!!

I had no idea that this was in LISP 1.5. If you had asked me, I would have sworn it was Steele, 1977. Wikipedia supports that[1], albeit one might not consider it the most reliable source.

So apparently (partial) TCE has been around since at least 1961. In that light, it's baffling that it's not supported more universally.

[1] https://en.wikipedia.org/wiki/Tail_call


The OP describes the support in HLC for TCO as limited (only for self-recursion, if I get it right, possibly more limited than that). Steele may still have been first to describe general TCO.


Tail call optimization in scheme are automatic. This LISP 1.5 compiler simply recognizes certain patterns and rewrites them. Not quite the same thing but I was surprised as well when I first looked into the code.


You could edit Wikipedia, citing Hart and Levin's memo introducing the compiler.


Note the date (1961), and the target hardware (IBM 7090 series), when considering whatever came after and existing hardware capabilities.


Funny enough I was talking with someone who wanted to make a lisp run on a very small ARM SoC and we discovered that the 7090 Lisp 1.5 was developed on likely had more core+drum than the SoC had ram.


Not even uLisp? That looks like a PIC like CPU to me.


This might have been before uLisp (definitely prior to my knowledge of uLisp), and they wanted to write the interpreter for it themselves. IIRC it had 16k of RAM, so it definitely could have run uLisp.

[edit]

I should also point out that just storing the names of all of the symbols in the common lisp specification exceeds the RAM requirements of uLisp. Obviously builtins can go in ROM, but it gives you an idea of the sizes involved.


I see, thanks for the overview.


uLisp was too large for a small flight computer I made for a hobbyist rocket - I wanted to use it but had such little RAM I had to go down to C. I think I could have ran lisp-to-c to generate from uLisp which I’ll try on my next project.


In such contexts it is more like an Assembly with a C like macro processor than proper ANSI C anyway, and if you go the code generation route, then you would be better off using the DSL capabilites from Lisp to generate Assembly directly.


People really underestimate how much RAM a Lisp/Scheme needs.

Building lists out of pairs and then using them as your intermediate format creates a lot of garbage.


> Building lists out of pairs and then using them as your intermediate format creates a lot of garbage.

I wonder if a Lisp-like language which used vectors rather than pairs as its fundamental data structure might be a better fit for severely memory-constrained systems? On average, vectors take up half the memory consumption of lists.

Such a language would end up looking rather different to Lisp though. Cons cells encourage CAR/CDR and recursion. A vector-centric language would naturally lead to a more iterative programming style.


Since the mid-70's that Lisps support vectors....


I know that, but that wasn't my point. Yes, all major Lisps support vectors; but they all use lists a lot more than they use vectors. One of Lisp's major features is "code as data", but that code is almost all lists of lists (as opposed to vectors of vectors). I was talking about a Lisp-like language in which vectors, not lists, are the primary data structure. Possibly even one which doesn't have cons cells at all, only vectors. I am suggesting such a language might possibly work better in severely memory poor environments than a more traditional Lisp would.


Then you're also aware that one of the beautiful things of Lisp is how it allows to abstract the hardware beyond the core primitives, there is nothing that prevents an optimizing Lisp compiler to map what looks like lists to vectors at the machine code level.

One reason why such optimization isn't common is most likely due to the current usage of Lisp across the industry, and the commercial implementations being a niche product.


How would you save memory using vectors for example relative to using CDR-coding?


In principle, you can save the same amount of memory with CDR-coding.

In practice though, you run into some issues (1) RPLACD/set-car! makes CDR-coding much more complicated (I suppose you can just ban it – in Racket, pairs are immutable by default, although there is also a separate mutable pair type); (2) CDR-coding generally only works if you construct the list up-front, the existence of CONS can encourage a coding style in which you don't do that; (3) to fix (2), you can force the list to be CDR-coded by duplicating it, but you have to remember to do that at right points – if you don't do it, you'll miss out on the benefits of CDR-coding, but if you do it when you don't have to you are unnecessarily harming performance; (4) since CDR-coded and non-CDR-coded lists are the same type, and indistinguishable without peeling off the covers of the implementation, it makes it harder for the programmer to address (2) and (3) correctly.


Well 1MB is more than enough; that was the maximum virtual address space size of a PDP-10 where many of the precursors to common lisp were developed. You can do a fair amount with 100kB as well. uLisp runs with 34k minimum (32k ROM + 2k RAM).


People really should learn Lisp/Scheme data types behind lists and value types.

Hint, they support them since Interlisp-C, ZetaLisp, Common Lisp,... so there is enough documentation and books where to educate yourself.


The RAM usage is tied to the size of the reachable set, plus some slack filled with garbage that depends on how you tune the garbage collection thresholds.

By today's standards, the RAM usage isn't necessarily huge.

Here is the TXR Lisp compiler recompiling stdlib/compiler.tl -> stdlib/compiler.tlo, as seen in top:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
  11488 kaz       20   0   17800  14804   2964 R  98.1  0.7   0:07.07 txr   
                           ^^^^^  ^^^^^
On the order of a bash session. It's a lot of RAM by 1982 standards at the institution level, and even 1992 standards at the consumer level, but today it means nothing.

You can easily see a Bash process a footprint on that order.

It could be reduced by tuning the garbage collector. One way to do that is to build for less memory use (useful for embedded). Here it is with txr rebuilt using #define CONFIG_SMALL_MEM 1 in config.h:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
  12838 kaz       20   0   11964   9768   3140 R  99.0  0.5   0:10.39 txr         
Bash footprints for comparison:

  $ ps aux | head -1 ; ps aux | grep bash
  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  kaz       1093  0.0  0.1   9288  2132 pts/2    Ss+  May15   0:01 -bash
  kaz       2833  0.0  0.0   8904  1992 pts/0    Ss   May15   0:00 -bash
  kaz       3509  0.0  0.2  10532  4988 pts/1    Ss+  May15   0:28 -bash
  kaz       7898  0.0  0.1   8968  2212 pts/3    Ss+  May20   0:00 -bash
Lists are used for everything: the compiler produces a list-based assembly code which is used from then through assembly. There is an optimizer which divides it into basic blocks, which are objects put into a graph, but the instructions still being lists. The peephole pattern matching is done on lists. The compiler does not bother using destructive append (nconc) for stitching together fragments of code; just straight garbage-generating appends. Same with most of the other rewriting that happens later.

In a computer in 1960, your compiler would be capped to the physical memory available. That would be the RAM use. The garbage collector would have to be called whenever the memory is exhausted, or else the show would stop. A successful compilation would demonstrate that the compiler needed no more memory than what the machine has. The closer its actual usage would be to the available memory, the longer it would take, due to the frequent garbage collections required to stay afloat.

I'd say that given people's expectations today, shaped by experiences with everyday software, they likely greatly overestimate how much RAM you need for Lisp compiling.


Appendix: the VM footprint number doesn't really give us a breakdown since it includes executable and shared libs mappings. I ran the second compile again with the memory-optimized build, and this time captured a pmap.

Here you can also see the full command, confirming the compile job:

  $ pmap 24087
  24087:   ./txr --in-package=sys --compile=stdlib/compiler.tl:stdlib/compiler.tlo.tmp
  08048000   1660K r-x-- txr
  081e7000      4K r---- txr
  081e8000     12K rw--- txr
  081eb000    124K rw---   [ anon ]
  08c03000   6188K rw---   [ anon ]
  b7c5e000      8K rw---   [ anon ]
  b7c60000   1876K r-x-- libc-2.27.so
  b7e35000      4K ----- libc-2.27.so
  b7e36000      8K r---- libc-2.27.so
  b7e38000      4K rw--- libc-2.27.so
  b7e39000     12K rw---   [ anon ]
  b7e3c000    116K r-x-- libz.so.1.2.11
  b7e59000      4K r---- libz.so.1.2.11
  b7e5a000      4K rw--- libz.so.1.2.11
  b7e5b000     28K r-x-- libffi.so.6.0.4
  b7e62000      4K r---- libffi.so.6.0.4
  b7e63000      4K rw--- libffi.so.6.0.4
  b7e64000     12K r-x-- libdl-2.27.so
  b7e67000      4K r---- libdl-2.27.so
  b7e68000      4K rw--- libdl-2.27.so
  b7e69000     36K r-x-- libcrypt-2.27.so
  b7e72000      4K r---- libcrypt-2.27.so
  b7e73000      4K rw--- libcrypt-2.27.so
  b7e74000    156K rw---   [ anon ]
  b7e9b000   1024K r-x-- libm-2.27.so
  b7f9b000      4K r---- libm-2.27.so
  b7f9c000      4K rw--- libm-2.27.so
  b7fba000      8K rw---   [ anon ]
  b7fbc000     12K r----   [ anon ]
  b7fbf000      8K r-x--   [ anon ]
  b7fc1000    152K r-x-- ld-2.27.so
  b7fe7000      4K r---- ld-2.27.so
  b7fe8000      4K rw--- ld-2.27.so
  bf8d9000    200K rw---   [ stack ]
   total    11700K
You can see the 11700K fairly closely matches the earlier VIRT figure of 11964.

Anyway, look at the [ anon ] heap area: it's like 6-something megs. That's it. That's where all the dynamic Lisp stuff is. All the predefined symbols and function bindings and whatnot, and all the objects allocated during the compile job.

libz is new; I integrated libz into TXR in just the most recent release. It happens to be number 277, so I code named it (L)Z77.


For the love of.... the compiler doesn't need a stupid initialism. Just say "the compiler". There is nothing memorable or disambiguating or intuitive or efficient about initialisms.


Hmm bookmarked. I'm making an structure editor, yeah it never get old! This is quite details and helpful. Lisp and Standard ML are two languages worth learning the basic. Currently I use Webassembly spec as reference for design decision.


"Bernard S. Greenburg"

It's Greenberg.


Fixed, thanks.


What does the prog form with parameters do? I only know about progn in CL.


The "classic" prog still exists in Common Lisp. You probably haven't heard of it because it's almost never used. Here is the description in the HyperSpec: http://clhs.lisp.se/Body/m_prog_.htm#prog In Common Lisp terms, it's a combination of LET, BLOCK, and TAGBODY.


There's also progv which, off the top of my head, is the only thing in the CL standard that lets you establish bindings (dynamic only for obvious reasons) on a non-constant list of symbols.


It's amazing that the translation to Common Lisp is as minimal as it gets, with the code being pretty much identical.

For the folks that insist Clojure is a Lisp, try doing that as an exercise.


It is a Lisp, as much as, Raket is a Scheme.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: