Bare Metal x86 FORTH

(home) 2021-01-15

In this post I develop a simple FORTH system which can be put on a USB stick and will boot on an x86 PC into an interactive REPL. It's really just the beginning of a self-hosted development environment but it does have the basic ability to accept keyboard input, display results on screen, execute FORTH words and enter integers.

I'm exploring the idea of how little assembler we can use to bootstrap an interactive development environment on a new machine.

abstraction and portablity

The less assembler we have to write to bootstrap our system, the less work we have to do to port the system onto another platform. FORTH allows us to start writing the FORTH system in FORTH itself, long before we have anything as complicated as a self-hosted compiler or assembler We write forth words using assembler data definitions.

The aim is to write the minimum amount of assembler possible and start writing FORTH as early as possible. This will certainly not provide the most speed/size efficient system, but will provide the most portable system.

x86 16-bit code

I am forced to write this system in 16-bit x86 real mode assembler, the assembler of the 8086 processor which lives on in every intel chip.

The reason for this is that I want to make use of the BIOS code for accessing screen, keyboard and disk and the PC BIOS only runs as long as the processor remains in real mode.

Even if you are familiar with x86 assembler this, restricted machine, may be new to you.

The system is built using the nasm assembler since it provides an easy way to develop 16-bit, x86 code and generate flat executables suitable for PC boot. It also provides macros which make things easier.

FORTH allows for a very simple VM core

With just 2 assembly routines, enter and exit, FORTH provides a VM core engine. One must decide on registers to store the 2 required stacks (data and return) and also the code pointer and then these two routines provide the engine for all FORTH control flow.

I'll just assume that you know FORTH is a stack machine and all function/word arguments are passed explicitly on the data stack. Calling a function/word just uses the name of that word. There is no parameter passing; parameters are expected to be on the stack.

As usual the full code is linked at the end.

FORTH VM briefly described

I'll try to describe the FORTH VM approach here briefly. Rather than using normal call/ret, FORTH takes control of this explicitly with a return stack and jumps. FORTH functions (called words) can be seen as a vector of pointers to code (Code Field Addresses/CFA). The machine's code pointer points at these vectors and jumps to the code pointed at by the vector entries (2 indirection levels). To move to the next instruction/word/function we just increment the code pointer.

To call another FORTH word we just point the code pointer to it's CFA vector and push the address of the next CFA so we can return here.

Eventually we need to call real code. So all CFA's actually point to assembler and we just jump there. For primitive words, this is fine. For FORTH words there is a special assembler prelude (called ENTER) which actually does the return stack push, and the change of the code pointer to point to the current word's CFA vector. So the call is actually handled inside the callee rather than by the caller.

The end of a CFA vector is just another (primitive) FORTH word called exit. This contains assembler code to pop the code pointer off the return stack and jump to it, thus continuing from the word after the one calling this word.

This is a little mind-bending to explain, but the actual code to do it is shorter than even this brief description and is listed below.

Here's some more info on the threaded code approach.

VM register definitions and code

I've chosen the following register setup for my code pointer and stacks (the 8086 is very limited, and there is not a lot of choice).

I'll define some assembler macros to rename the chosen registers to more FORTH-like names:

%define codeptr di
%define rstack bp
%define dstack sp

As an example, the top-level function in this system looks like this. It begins with the enter prelude macro and is then just a list of pointers to assembly code blocks, the final one being exit.

  dw cls, lit, str_welcome, puts, interpret, exit

Here are the enter and exit macros which provide the VM code I need to inline in other places.

Note that non-FORTH, primitive words don't actually use the return stack. They don't start with the enter code and have their own primexit code at the end.

;; this asm code is inlined at the start of every forth word to setup the
;; code pointer to point at the start of the CFA vector
%define entry_code_len 18 ; must be updated if macro enter is changed
%macro enter 0
  mov codeptr, [codeptr] ; code-pointer = this word's ENTER code
  add codeptr, entry_code_len ; offset code-pointer to just after ENTER code i.e. start of cfa vector
  next ; run first cfa in this word's cfa vector
;; this asm code is inlined at the end of every primitive/pure asm word
;; to continue executing with the next CFA in the vector
%macro primexit 0
  add codeptr, 2 ; move code-pointer to next cfa vector entry
;; forth primitive used to exit forth words

Some other helper macros used above:

;; execute next word
%macro next 0
  jmp [codeptr]
;; push to return stack for forth word call
%macro rpush 0
  mov dx, codeptr
  add dx, 2
  sub rstack, 2
  mov word [rstack], dx
;; pop from return stack to resume after word call
%macro rpop 0
  mov codeptr, [rstack]
  add rstack, 2


To write useful code we need loops. I've decided to use the tail-call approach to loops. loop allows a FORTH word to return to its beginning and that's enough to implement loops. We find the current word's beginning (it's CFA vector) by cheating. We look at the entry on the return stack. It points just after the word calling this one. If we step back we can get the CFA for this word. Then we just skip the entry prelude assembler code and have a pointer to the CFA vector for this word. We jump there and we have looped. Simple!

  ; jump back to start of current function (tail call)
  mov codeptr, [rstack] ; rpeek
  sub codeptr, 2
  mov codeptr, [codeptr] ; current word entry
  add codeptr, entry_code_len ; skip entry prelude to cfa vec

conditional branches

Conditional exit is like normal exit at the end of a word (in fact exit can be used anywhere in a word) but it only happens if the value popped off the stack is zero. Again this is enough to implement if/then/else and while loops. You just need to make a separate word for each IF statement and exit early to avoid the body of the IF.

  pop ax
  cmp ax, 0
  jnz .donothing
  rpop ; exit

In fact I initially just had conditional exit, but I later added a conditional jump which adds an offset to the code pointer if the stack top is zero. It requires manually counting the offset from after the instruction to the next instruction and changing that every time the intervening code changes length. a FORTH compiler would handle this automatically by providing an if/then/else construct.

  add codeptr, 2
  pop ax
  cmp ax, 0
  jnz .donothing
  add codeptr, [codeptr]

Here's an example use of an unconditional jump instead of loop and a variant of zexit.

  dw dup, cfetch, zexitkeep, emit, inc, jump, -7*2, exit


Since CFA vectors just have code addresses how do we get numbers/addresses on the data stack? There is a special primitive word called lit which takes the next CFA in the vector, pushes it on the data stack and skips it (since it's not really a CFA).

  add codeptr, 2
  mov ax, [codeptr]
  push ax

Machine IO

The only things we need to create a minimal, self-hosted, development environment on a platform are

The x86 BIOS provides reasonably simple interfaces to these which makes developing this system easier than for other platforms (see discussion of ARM systems below).

I started off proving that I can get the screen and keyboard working. With the eventual goal of having some kind of interactive REPL.

Here we see main, the top-level entry word for the system. It's actually written in FORTH not assembler.

Actually main is not allowed to return, and interpret is a loop.

  dw cls, lit, str_welcome, puts, interpret, exit

In real FORTH this would look something like this:

: main cls ." Welcome to FORTH!" interpret ;

where ." is a magical compiler immediate which puts the following string somewhere in memory and compiles in [lit, addr, puts] instead

screen output

The FORTH word emit puts a single character on screen and updates the cursor position for the next character. We can build more complicated string output words on that basis.

Here is the underlying bios call for character output and the FORTH word emit which calls it and keeps track of the cursor update. I don't include the cursor update code since it's a bit involved. It uses conditionals to check overflow and a global to keep the cursor position (see code linked at end).

;; forth word to emit character on stack to screen and update cursor
  dw putc, crsinc, exit
;; bios call to write a character at current cursor position
  pop ax
  mov ah, 9 ; write attr char at cursor
  mov bl, green
  mov cx, 1 ; repeat
  int 0x10

puts for writing strings uses emit and shows conditionals and loops:

  dw dup, cfetch, zexitkeep, emit, inc, loop, exit
  dw puts_, drop, exit

The final exit is actually never reached since loop always returns to the start of the word, but I left it in for consistency.

The reason for puts/_puts is that the looping function exits with the string pointer on the stack and we don't want that so we write a wrapper function which drops it.

keyboard input

I use a BIOS call to wait for a keypress and get the keycode and the equivalent ascii character. For now I ignore the keycode and just push the ascii character on the stack.

;; bios call which waits for a keypress and returns both the keycode and equivalent ascii char
;; (I only keep the ascii char)
  xor ax,ax ; wait for key (=> AH/AL scan/ascii)
  int 0x16
  xor ah,ah ; zero keycode
  push ax

readwordlen uses key to read a space-delimited string into a 64 byte buffer at address text. FORTH strings have a byte length prefix so this word increments that as well as storing the input characters.

;; read a string from keyboard until space character, store at [text]
  dw key, dup, lit, space, equal, zjump, 2*2, drop, exit
  dw dup, emit, lit, text, dup, cfetch, add, inc, swap, cstore, lit, text, cincstore, loop, exit
;; 64 byte text buffer
text: times 65 db 0 ;  buffer for user input byte[0]=len

So far I've used zero-terminated/c-style strings in the rest of the code so nulterm terminates this string.

;; zero terminate a FORTH style string
nulterm: ; ( string - )
  dw dup, cfetch, add, inc, lit, 0, cstore, exit
;; clear the string at [text]
  dw lit, text, lit, 0, cstore, exit

Normally FORTH processes space-delimited words one at a time and my current REPL only takes one word at a time. A better approach would be to read until enter is pressed and then interpret each space-delimited word in the line.

still to do

an ARM based system?

I initially planned to write this system on ARM (raspberry pi 4) but I ran into the following problems.

the code

The full code is linked below and I've tested it with QEMU (v5.2) on archlinux. I've also booted a laptop from a usb stick containing forth.bin.

Building forth.bin from forth.nasm:

$ nasm -f bin -o forth.bin forth.nasm

Running the result in QEMU

$ qemu-system-x86_64 -drive file=forth.bin,format=raw,index=0,media=disk

Copying to a USB stick (at /dev/sdb). Warning, don't run this command unless you know what you're doing. You could wipe your internal drive.

# dd if=forth.bin of=/dev/sdb bs=512 count=1 conv=fsync
// personally I use cat
# cat forth.bin >/dev/sdb; sync


Tags: forth lowlevel baremetal x86 16bit minimalism bootstrapping (home)