# Object format and target platform

The semantics of the language are closely tied to the layout of its
objects; it is critical for efficient interpretation that the new
layout be similar to the original design.

Key features of the original format:
* `WORD` size: 36-bit
* object size: 2 words
* `WORD`:pointer size ratio: 2:1

## `WORD` size

Shrinking `WORD`s to 32 bits would pose more backward compatibility
hazards than extending them to 64 bits. `WORD`, and derived types like
`FIX`, must support 64-bit operations.

## Pointer size

On modern platforms, we have two available pointer sizes: 32-bit or
64-bit. With 64-bit pointers all objects would be 4-words and
256-bits, even though most of that would be dead space for most
`TYPE`s. The Muddle object layout is only reasonable with 32-bit
pointers.

## 64-bit `WORD`s, 32-bit pointers

So in order to maintain compatibility with objects designed for a 2:1
pointer size ratio, our target platform is constrained to look like
the x32 ABI: an ILP32 environment with access to instructions that
operate on 64-bit words.

It is not necessary to compile for the actual x32 ABI, as support for
x32 executables is not widespread. The implementation can be compiled
for x86-64, but internally ensure that its memory for storing objects
is mapped in the low 32-bits of its address space and then cast 64-bit
pointers to 32-bit values for storage in objects.

## Implications for target platforms:

### x86-64 (primary target)

Muddle processes will be restricted to 4GB of address space per
process. That "should be enough for anybody," right?

Nonstandard pointers add a wrinkle to FFI, but only a superficial one:
Muddle objects need to be GC-managed, so they can't be externally
allocated anyway; and FFI-pointers belong in `WORD`s, not object
pointers.

If using an off-the-shelf GC like BDW, it does complicate things: the
library would need modification to recognize the packed pointers.

### x86-32 (possible port)

It wouldn't be hard to port the interpreter to 32-bit systems, but if
`FIX`es are allowed to use 32-bit arithmetic that could break old code
that assumes they are at least 36 bits, and any new code that assumed
they were 64 bits. In the future, we should consider switching to
explicitly-sized `FIX32`/`FIX64` or the like; for now let's just make
`FIX` 64-bit.

# Interpretation

The semantics documented in M-PL effectively mandate that the
interpreter act like an AST-interpreter:
* any `ATOM` could be rebound at any time to something that takes its
  arguments as a `CALL` and edits, as syntax, the code of its caller
  -- so a non-AST interpreter would still have to keep the AST
  available, and would have to detect whether a `CALL` modified its
  callee (e.g. by setting a `WRITE` monitor for all AST values)
* debugging mechanisms like `FRAME` allow direct inspection and even
  modification of the call stack, which requires non-internal
  subroutine calls to use the MCALL calling convention -- so most of
  the optimization a JIT or bytecode interpreter could perform is
  rendered moot

The central implementation decision is how to overcome or avoid the
difficulties inherent in implementing an interpreter with more
advanced control flow than its host language. In this case:
coroutines, re-entrant stack-allocated continuations, and non-local
return are not straightforward in C-like languages. Options include:
* stackless interpreter
* assembly-language interpreter that can follow the program's complex
  control flow
* implement just the compiler and use a metacircular evaluator to get
  an interpreter

The stackless approach goes well with an AST interpreter, and from the
wording in M-PL's Appendix I think it was the original
implementation's approach (although it's also possible they matched
the program's control flow, since that wouldn't have been hard for a
program written in PDP assembly).

## Stackless interpreter

Implement the interpreter core in plain C. Decouple the interpreter's
control flow from the interpreted program:
* Allocate the CONTROL STACKs on the heap. The interpreter core
  subroutines use slots in their `FRAME` for local variables that may
  need to be persisted across a `RESUME`.
* Track program execution explicitly as a state machine.

Advantages:
* portability and readability of C

Disadvantages:
* using state machines in place of direct control flow and eliminating
  local variables makes for tricky C (cf. Boost's ASIO)

# License

Copyright (C) 2018 Keziah Wesley

You can redistribute and/or modify this file under the terms of the
GNU Affero General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any
later version.

This file is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public
License along with this file. If not, see
<http://www.gnu.org/licenses/>.