Sugar - a typed lispy language targeting webasm/wat (part 1)

(home) 2020-12-06

Despite what I may have said in an earlier post, it's not all that nice to code directly in wat. So I've started work on a compiler. A very thin compiler. The language should feel very similar to our, beloved, wat, but save us from typing all those i32.const, local.get.

Some points up front:

node.js/npm
That feeling when ...
People say Minimalist System
Then say "just use npm install ..."

sugar example

Here is the malloc.wat example from the previous blog post on webasm re-written in sugar. It's quite a bit more concise and convenient than the original raw wat (see below).

(previous post on wat /blog/2020-12-03-webasm-forth-with-lisp-syntax.html )

; malloc.sugar - sugar example code
(memory (import "js" "mem") 10)
(import "console" "log" (func js_log (i i32)))
(global memend i32 #x1000)
(global sbrk (mut i32) 0)
; ----- allocate some memory in wasm-space
(defun (export "malloc") (len i32) :result i32
  (locals (i32 (newbrk (+ sbrk len)) (mem sbrk)))
  (if (>= newbrk memend) (return 0))
  (set sbrk newbrk)
  mem)
; ----- write bytes in wasm-space memory using js callback
(defun (export dump_range) (start i32 len i32)
  (locals (i32 i (end (+ start len))))
  (for (i start end)
    (js_log (get.u8* i))))

And this is what you would have written if you were coding directly in wat.

(module
  (memory (import "js" "mem") 10)
  (import "console" "log" (func $js_log (param $i i32)))
  (global $memend i32  (i32.const 4096))
  (global $sbrk (mut i32)  (i32.const 0))
  ;; ----- allocate some memory in wasm-space
  (func (export "malloc") (param $len i32) (result i32)
    (local $newbrk i32)
    (local $mem i32)
    (local.set $newbrk (i32.add (global.get $sbrk) (local.get $len)))
    (if (i32.ge_u (local.get $newbrk) (global.get $memend))
      (return (i32.const 0)))
    (local.set $mem (global.get $sbrk))
    (global.set $sbrk (local.get $newbrk))
    (local.get $mem))
  ;; ----- write bytes in wasm-space memory using js callback
  (func (export "dump_range") (param $start i32) (param $len i32)
    (local $i i32)
    (local $end i32)
    (local.set $end (i32.add (local.get $start) (local.get $len)))
    (local.set $i (local.get $start))
    (block $break2 (loop $head1
      (br_if $break2 (i32.eq (local.get $i)  (local.get $end)))
      (call $js_log (i32.load8_u (local.get $i)))
      (local.set $i (i32.add  (i32.const 1) (local.get $i)))
      (br $head1)))))

using sugar

Write a your-source.sugar source file in your favorite text editor. Then run sugar on it to produce your-source.wat

This is how I use sugar (with a makefile for calling sugar, then wat2wasm): /src/sugar.mk

$ mk malloc.wasm
./sugar malloc.sugar >malloc.wat
summary: globals:2 functions:3 macros:2
wat2wasm malloc.wat

(see below for installing common lisp)

compiler

The compiler is very small and may be interesting to read, for that reason. /src/sugar

It includes a macro system, which I have used to implement inc and for but it's not yet available for use from sugar itself. Also the full CL defmacro/destructuring-bind lambda list syntax is non trivial to replicate; not to mention backquote.

compiler structure

Here is the overall compiler structure

loop:
  read a form [from source file]
  (compile form) => stdout
compile form:
  integer => (i32.const int)
  symbol => (local/global.get varname) [check in environment/scope]
  list/s-expr/cons =>
    if (is-macro? form)
      (compile (expand-macro form))
    else:
      is-special? form => (compile-special form) [defun/import/export/global/set...]
      is-function? =>
        for all args (compile arg)
        call func with args
      otherwise => error undefined function

common lisp?

Too many brackets?

Rule 1: There are no brackets, only indentation.
Rule 0: We suffer the brackets cos it gives us defmacro and that's worth a lot of suffering.

Did you ever generate code? Did you think that was cool? You'll love common lisp. Try it for 2 weeks. Money back guarantee if not satisfied.

value-if

Lisp has value-IF i.e. you can use IF anywhere you use a value. That's like C's ternary but more powerful. At first I thought WAT doesn't have it, but (if) has an optional (result type) clause, like (func) so that it's branches can return (type matched) values. So you can do clever things like this: note the (result i32) clause between if and condition.

(set x
  (if (result i32) (>= z limit)
    (comp-true-value y)
    (comp-false-value s 14 q)))

wat (select) is like the C ternary operator although it can have blocks/statement sequences inside.

select v if: select has all 3 clauses always evaluated while (if) has only the conditional and the chosen branch evaluated. The other (if) branch is not evaluated.

future directions

user accessible macros (defmacro)

Implement (defmacro ...) for sugar.

shortcut and/or

At the moment I only have the bitwise and/or operators '&' and 'bor' and next on the list is having shortcutting logical and/or operators like CL.

Ideally I'd like to have lisp-style and/or where the expression value is returned rather than just 0 or 1. Also standalone and/or so we can have things like

(and (file-existsp fname) (process-file fname))
(or (setup-completep) (error setup-failed))
Shortcut and/or actually compile to nested (if) expressions.

missing operators/functions

A whole slew of function counterparts for other types (i64,f32,f64) are missing. The compiler cannot type-infer which one (see below) so these would have explicit names like +f64, set.f64 etc.

There are many other relops missing etc. They are fairly trivial to add.

arrays

Currently I'm using pointer get*/set* but I'd like to have aref and implicit index to pointer arithmetic.

loops

Some more loops. while repeat etc. The underlying block, loop, br_if, br are exposed so writing loops is possible. It would only be the case of adding some more macros to enable:

(while condition body...)
(forever body...)

Also continue, break would be nice.

structures / defstruct

Some kind of struct might be nice to have. Even if all the fields had to be i32 initially i.e. just a vector with named, instead of numbered, fields.

better syntax/semantic checking

Even the current error messages drop you into the SBCL debugger, which can be a scary place - [ctrl-d] to exit.

type tracing

Knowing the types of values would allow the compiler to check signatures of calls of user functions and builtins. It would also allow me to infer the correct type for set rather than requiring set.f32 etc.

LET

Locals must be defined at the start of a function. This means implementing LET (scoped blocks with local variables) is not trivial. The current, single-pass, compiler would need to become far more complex to first decide how many locals are required in a function, then allocate them up-front and possibly rename them if there are clashes.

self hosting

The holy grail of compiler writers is that they rewrite their compiler in their new language. Common Lisp is so far above the level of wat that this would be a large undertaking. First I'd need to write malloc...

installing common lisp

Setting up sugar/common lisp may be trivial or hard depending on how familiar with common lisp you are :)

Installing common lisp (I use SBCL) is outside the scope of this post. (Tell me it's harder than npm ...)

I refer you to [Zach Beane's] https://www.quicklisp.org/beta/

SBCL might even be in your package manager:

$ pacman -Ss sbcl

sugar is run as a unix script. I've made a custom sbcl core since I prefer iterate over loop. Here is a very brief explanation of how to make such a core, starting with plain, vanilla sbcl.

$ sbcl
* (ql:quickload '#:iterate)
* (sb-ext:save-lisp-and-die "sbcl-iterate")
$ sudo mv sbcl-iterate /usr/share/sbcl/sbcl-iterate

source

mistakes

[2020-12-07] My original compiler structure had a macro expansion bug. I was misled since I was in a block where I had just checked that (form) was a consp/list and I assumed this remained true inside that block. However, I missed the fact that I am modifying the form inside the block, with (expand-macro). It's possible (expand-macro) returns a non-list e.g. a symbol. Then my list assumptions break.

compile form:
  ...
  list/s-expr/cons =>
    while: (is-macro? form)
      form := (expand-macro form) <---- bug: this can change form into a non-list
    is-special? form => ...
    is-function? => ...

First I fixed this with an extra list check in is-macro and also in the is-special/is-function parts, and a recursive call to compile for non-list forms.

I later thought if I had used a functional approach and avoided the mutation, I would not have made this mistake. My visual hint that I was dealing with a list would have remained true. It turns out the functional solution is even more elegant than the original and doesn't have the bug (+1 to functional purists).

The functional solution, doesn't loop, it recurses on compile. Then compile gets the chance to decide again what type of form expand-macro returned (nested macros will be recursively expanded).

compile form:
  ...
  list/s-expr/cons =>
    if: (is-macro? form)
      (compile (expand-macro form)) <--- recurse, don't loop, no mutation
    else: is-special?/is-function? ...

I have backported the fix into the attached compiler source code: /src/sugar

I discovered the bug while implementing the base cases for shortcut and/or i.e. (and x) and (or x).

references

Tags: sugar sugarlang sugarland wat wasm wabt common-lisp compiler (home)