> In LISP 1.5, function definitions were stored on the property list of the function's name...
The fact that this was once the norm but has since been done away with saddens me. I've always been fascinated with "live environments" but felt they only went half way if they didn't include the source code itself. If I'm going to be updating something in a running system, I want to know what source code was used to get the system to the state its currently in, and preferably be able to query that source code as data. Of course, that code could be kept within source control, but then it's a shadow of the running system -- a map of the territory and not the territory.
As far as I know, the only languages/environments where this functionality is still available are Tcl, Smalltalk, and to some extent stored procedures within an RDBMS.
Minor caveat on SEE.
If the Forth implementation uses "threaded" code the resulting source code can be quite close to the original text.
If the Forth compiler generates native code the one-to-one relationship with the source is lost and SEE will typically show the compiled code as Assembly Language.
Javascript does this: functions have to toString() method which returns their source code. Python also does something like this. Though it's just a debugging tool; you can't do much practical with it in real applications (like edit it and instantiate a new function from it) because it doesn't include the closure the function was defined in.
And its easy to view the source code from the console (REPL). Some examples from Rebol2...
>> source func
func: func [
"Defines a user function with given spec and body."
[catch]
spec [block!] {Help string (opt) followed by arg words (and opt type and string)}
body [block!] "The body block of the function"
][
throw-on-error [make function! spec body]
]
>> source source
source: func [
"Prints the source code for a word."
'word [word!]
][
prin join word ": "
if not value? word [print "undefined" exit]
either any [native? get word op? get word action? get word] [
print ["native" mold third get word]
] [print mold get word]
]
This is not the norm in Forth due to the very spartan environments it originated in, but most mature implementations include a pretty comprehensive decompiler as SEE. (Ironically, this is where the Forth claim of having a modular compiler comes apart, because SEE is always monolithic and nonextensible, or at least I haven’t seen it done any other way.)
Maybe collectively we can. In Emacs/SLIME/CCL,SBCL on ARM32,x86-64 ESC + '.' (command 'find-tag') on a symbol in the REPL opens a buffer with the definition of that symbol, provided you have the sources (it's neither decompiling nor reading the definition from the binary, but follows a hint to the source file). So this is more functionality of the environment / tools, rather then the compiler or language. Works beautifully though and in most cases you don't really want the sources embedded within the binary.
Didn't even know about this one. It is one of those 'optional' features -- the standard puts it like "Any implementation may legitimately return nil as the lambda-expression of any function." And indeed, CCL choses to do just that. SBCL seems to put in some effort to decompile the function.
Worst case scenario, you can probably patch your compiler's COMPILE function to store the data somewhere before performing the actual compilation.
As for CCL, it seems to work for me just fine as long as you set CCL:*SAVE-DEFINITIONS* to T:
Clozure Common Lisp Version 1.12.1 (v1.12.1) LinuxX8664
For more information about CCL, please see http://ccl.clozure.com.
CCL is free software. It is distributed under the terms of the Apache
Licence, Version 2.0.
? (defun foobar (a b c) (* a (+ b c)))
FOOBAR
? (function-lambda-expression #'foobar)
NIL
NIL
FOOBAR
? (setf ccl:*save-definitions* t)
T
? (defun foobar (a b c) (* a (+ b c)))
FOOBAR
? (function-lambda-expression #'foobar)
(LAMBDA (A B C) (DECLARE (CCL::GLOBAL-FUNCTION-NAME FOOBAR)) (BLOCK FOOBAR (* A (+ B C))))
NIL
FOOBAR
Not a Common Lisp implementation but Clojure can find the source for a symbol (provided the source code file is in the classpath) with the clojure.repl/source function (doc: https://clojure.github.io/clojure/clojure.repl-api.html#cloj... .) All development environments support jump to definition.
I had no idea that this was in LISP 1.5. If you had asked me, I would have sworn it was Steele, 1977. Wikipedia supports that[1], albeit one might not consider it the most reliable source.
So apparently (partial) TCE has been around since at least 1961. In that light, it's baffling that it's not supported more universally.
The OP describes the support in HLC for TCO as limited (only for self-recursion, if I get it right, possibly more limited than that). Steele may still have been first to describe general TCO.
Tail call optimization in scheme are automatic. This LISP 1.5 compiler simply recognizes certain patterns and rewrites them. Not quite the same thing but I was surprised as well when I first looked into the code.
Funny enough I was talking with someone who wanted to make a lisp run on a very small ARM SoC and we discovered that the 7090 Lisp 1.5 was developed on likely had more core+drum than the SoC had ram.
This might have been before uLisp (definitely prior to my knowledge of uLisp), and they wanted to write the interpreter for it themselves. IIRC it had 16k of RAM, so it definitely could have run uLisp.
[edit]
I should also point out that just storing the names of all of the symbols in the common lisp specification exceeds the RAM requirements of uLisp. Obviously builtins can go in ROM, but it gives you an idea of the sizes involved.
uLisp was too large for a small flight computer I made for a hobbyist rocket - I wanted to use it but had such little RAM I had to go down to C. I think I could have ran lisp-to-c to generate from uLisp which I’ll try on my next project.
In such contexts it is more like an Assembly with a C like macro processor than proper ANSI C anyway, and if you go the code generation route, then you would be better off using the DSL capabilites from Lisp to generate Assembly directly.
> Building lists out of pairs and then using them as your intermediate format creates a lot of garbage.
I wonder if a Lisp-like language which used vectors rather than pairs as its fundamental data structure might be a better fit for severely memory-constrained systems? On average, vectors take up half the memory consumption of lists.
Such a language would end up looking rather different to Lisp though. Cons cells encourage CAR/CDR and recursion. A vector-centric language would naturally lead to a more iterative programming style.
I know that, but that wasn't my point. Yes, all major Lisps support vectors; but they all use lists a lot more than they use vectors. One of Lisp's major features is "code as data", but that code is almost all lists of lists (as opposed to vectors of vectors). I was talking about a Lisp-like language in which vectors, not lists, are the primary data structure. Possibly even one which doesn't have cons cells at all, only vectors. I am suggesting such a language might possibly work better in severely memory poor environments than a more traditional Lisp would.
Then you're also aware that one of the beautiful things of Lisp is how it allows to abstract the hardware beyond the core primitives, there is nothing that prevents an optimizing Lisp compiler to map what looks like lists to vectors at the machine code level.
One reason why such optimization isn't common is most likely due to the current usage of Lisp across the industry, and the commercial implementations being a niche product.
In principle, you can save the same amount of memory with CDR-coding.
In practice though, you run into some issues (1) RPLACD/set-car! makes CDR-coding much more complicated (I suppose you can just ban it – in Racket, pairs are immutable by default, although there is also a separate mutable pair type); (2) CDR-coding generally only works if you construct the list up-front, the existence of CONS can encourage a coding style in which you don't do that; (3) to fix (2), you can force the list to be CDR-coded by duplicating it, but you have to remember to do that at right points – if you don't do it, you'll miss out on the benefits of CDR-coding, but if you do it when you don't have to you are unnecessarily harming performance; (4) since CDR-coded and non-CDR-coded lists are the same type, and indistinguishable without peeling off the covers of the implementation, it makes it harder for the programmer to address (2) and (3) correctly.
Well 1MB is more than enough; that was the maximum virtual address space size of a PDP-10 where many of the precursors to common lisp were developed. You can do a fair amount with 100kB as well. uLisp runs with 34k minimum (32k ROM + 2k RAM).
The RAM usage is tied to the size of the reachable set, plus some slack filled with garbage that depends on how you tune the garbage collection thresholds.
By today's standards, the RAM usage isn't necessarily huge.
Here is the TXR Lisp compiler recompiling stdlib/compiler.tl -> stdlib/compiler.tlo, as seen in top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11488 kaz 20 0 17800 14804 2964 R 98.1 0.7 0:07.07 txr
^^^^^ ^^^^^
On the order of a bash session. It's a lot of RAM by 1982 standards at the institution level, and even 1992 standards at the consumer level, but today it means nothing.
You can easily see a Bash process a footprint on that order.
It could be reduced by tuning the garbage collector. One way to do that is to build for less memory use (useful for embedded). Here it is with txr rebuilt using #define CONFIG_SMALL_MEM 1 in config.h:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12838 kaz 20 0 11964 9768 3140 R 99.0 0.5 0:10.39 txr
Bash footprints for comparison:
$ ps aux | head -1 ; ps aux | grep bash
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kaz 1093 0.0 0.1 9288 2132 pts/2 Ss+ May15 0:01 -bash
kaz 2833 0.0 0.0 8904 1992 pts/0 Ss May15 0:00 -bash
kaz 3509 0.0 0.2 10532 4988 pts/1 Ss+ May15 0:28 -bash
kaz 7898 0.0 0.1 8968 2212 pts/3 Ss+ May20 0:00 -bash
Lists are used for everything: the compiler produces a list-based assembly code which is used from then through assembly. There is an optimizer which divides it into basic blocks, which are objects put into a graph, but the instructions still being lists. The peephole pattern matching is done on lists. The compiler does not bother using destructive append (nconc) for stitching together fragments of code; just straight garbage-generating appends. Same with most of the other rewriting that happens later.
In a computer in 1960, your compiler would be capped to the physical memory available. That would be the RAM use. The garbage collector would have to be called whenever the memory is exhausted, or else the show would stop. A successful compilation would demonstrate that the compiler needed no more memory than what the machine has. The closer its actual usage would be to the available memory, the longer it would take, due to the frequent garbage collections required to stay afloat.
I'd say that given people's expectations today, shaped by experiences with everyday software, they likely greatly overestimate how much RAM you need for Lisp compiling.
Appendix: the VM footprint number doesn't really give us a breakdown since it includes executable and shared libs mappings. I ran the second compile again with the memory-optimized build, and this time captured a pmap.
Here you can also see the full command, confirming the compile job:
You can see the 11700K fairly closely matches the earlier VIRT figure of 11964.
Anyway, look at the [ anon ] heap area: it's like 6-something megs. That's it. That's where all the dynamic Lisp stuff is. All the predefined symbols and function bindings and whatnot, and all the objects allocated during the compile job.
libz is new; I integrated libz into TXR in just the most recent release. It happens to be number 277, so I code named it (L)Z77.
For the love of.... the compiler doesn't need a stupid initialism. Just say "the compiler". There is nothing memorable or disambiguating or intuitive or efficient about initialisms.
Hmm bookmarked. I'm making an structure editor, yeah it never get old! This is quite details and helpful. Lisp and Standard ML are two languages worth learning the basic. Currently I use Webassembly spec as reference for design decision.
The "classic" prog still exists in Common Lisp. You probably haven't heard of it because it's almost never used. Here is the description in the HyperSpec: http://clhs.lisp.se/Body/m_prog_.htm#prog In Common Lisp terms, it's a combination of LET, BLOCK, and TAGBODY.
There's also progv which, off the top of my head, is the only thing in the CL standard that lets you establish bindings (dynamic only for obvious reasons) on a non-constant list of symbols.
The fact that this was once the norm but has since been done away with saddens me. I've always been fascinated with "live environments" but felt they only went half way if they didn't include the source code itself. If I'm going to be updating something in a running system, I want to know what source code was used to get the system to the state its currently in, and preferably be able to query that source code as data. Of course, that code could be kept within source control, but then it's a shadow of the running system -- a map of the territory and not the territory.
As far as I know, the only languages/environments where this functionality is still available are Tcl, Smalltalk, and to some extent stored procedures within an RDBMS.