This is awesome, great job! How long before we can write ShareLaTeX in ShareLaTeX? ;)
I once got as far as writing and http server that used the flexibility of TeX macros to turn headers like 'content-length:' into TeX commands that recorded their value, and then processed the incoming request as if it were TeX code. It wasn't pretty...
This is obviously impressive. AVR is surprisingly annoying to emulate! If you're looking to emulate enough to run C on, two platforms that are significantly less painful:
* MSP430, whose instruction set fits on a single Wikipedia page while simultaneously being nicer to program than AVR
* The c4 compiler's stack machine VM, which was designed specifically to be a minimal machine on which to run C in the smallest number of lines of code.
AVRs have a few quirks because the original ISA didn't consider parts with more than 8K program memory, so later some longer jumps where tacked on. A few instructions has a bunch of special cases for some common operations (eg. bit-based access to some, but not all, IO registers). It is kind-of-but-not-really orthogonal with comparatively many instructions for an 8 bit controller (about ~120).
It's actually not terribly difficult. There are good references, and I remember it took me about 4 days to write the very first version of simavr[0].
What's harder to emulate are all the peripherals, and all their quirks.
In simavr the 'core' code is pretty much unchanged (apart from the odd cosmetics) in the last 7 years, while most of the work was to emulate the peripheral blocks.
* All of I/O space is dual-addressed through both memory map and IO ports.
* Because there are insanely simple AVR parts that don't have a stack, the stack pointer is itself a peripheral.
* Half the register file is unusable for 16 bit instructions.
* The half that is usable includes 3 register pairs that have special pointer semantics.
* To accommodate pointer/array semantics in C, those three register pairs have an almost MOD/RM-ian collection of variant loads and stores, each specialized to a particular X, Y, or Z register.
* Because the stack pointer itself is a 16 bit IO peripheral, you have to get interrupt handling implemented properly to adjust it to support the dance AVR-GCC does to set it.
* The instruction encoding itself is a shotgun blast; read an AVR disassembler's mask and lookup table logic, for instance, which is something you don't even need to do in X86.
* For all that trouble, you don't even get a consistent instruction size.
I'd still rather do AVR than X86, but perhaps not by much.
I thought the Harvard architecture was pretty common for microcontrollers, where firmware lives permanently in Flash memory. PIC microcontrollers have opcodes that are 12 and 14 bits, for example.
It certainly isn't uncommon. I didn't mean that it is strange in comparison to other microcontrollers, just that Harvard can be strange for the author of an emulator, depending on their background.
Well TeX is Turing-complete, so any computable program can be written and executed using TeX. Back in the '80s, there was an MIT undergrad named Andrew Marc Greene who wrote a BASIC interpretter in TeX (BaSiX—https://www.ctan.org/tex-archive/macros/generic/basix).
I once got as far as writing and http server that used the flexibility of TeX macros to turn headers like 'content-length:' into TeX commands that recorded their value, and then processed the incoming request as if it were TeX code. It wasn't pretty...