I could have written it myself faster

(home) 2020-12-18

When starting a new project we are faced with the choice whether to use an existing tool, package or library or whether just to write our own.

The modern, fashionable, way seems to be to use an existing package. I tend strongly in the other direction, and usually write my own. However, recently, since I was writing for an audience, I decided to try using a package instead. This is what happened.

The project, in this case, was reading and parsing a .wasm file. I wanted to understand the .wasm format for another project I had planned.

I might have done this in C, but I decided on common lisp since I wanted a formal description of the format which I could use for other tools later. CL is good at that.

I chose a lisp binary parsing package pretty much at random from quicklisp, lisp-binary.

wasm reader - using lisp-binary

(Disclaimer: I am neither disparaging lisp-binary, nor its author. Writing a generic system to parse binaries is non-trivial and the author gives example parsers for a few real-world binary formats. His package is good. This article is discussing the higher level choice of re-using or writing from scratch.)

In the beginning I was struggling to understand the .wasm spec and the package api simultaneously. There were many times where I knew exactly what code I wanted to write but struggled instead with finding out how lisp-binary wanted me to describe that. This often led me to thinking I could have written my own parser in less time.

I wanted a reader and then wanted to produce arbitrary tools which took the resulting output and actually produced a completely different format (i.e. not .wasm).

I was curious as to why the author creates structs rather than simple tagged lists. Using structs means an external tool has to include the definitions of those structs too and implies it will also depend on lisp-binary where there was no real need.

The main lisp-binary tutorial stops just before getting to the part in which I was most interested, namely how the structures produced by the reader can be pattern-matched by another tool.

I later realized that lisp-binary seems targeted at creating a reader/filter/writer style of program which probably explains the mismatch with what I was trying to do (just a reader) and explains the design decisions that seemed strange from my viewpoint. No package author can guess all the things a user might want to do with the package. That cannot be expected of him.

Again, I'm sure that lisp-binary's author could quickly achieve what I wanted using his own system, but it wasn't clear to me.

I did, finally, create a .wasm parser using lisp-binary /src/wasm-read.lisp .

Afterwards I decided that it was worth rewriting the parser just as an exercise to see if it were really true that I could have written my own parser faster than learning and using an existing one, or whether I was mistaken.

Points of note:

Even though I chose the re-use path I still had to write the more difficult parts myself, namely the LEB128 format integer reader.
I struggled to find the proper api methods for doing simple things in the beginning (does this amortize if I were to write 10 parsers?). The project wasn't long enough to make me feel like I was an expert in lisp-binary by the end either.
The hard part of this task should have been understanding the .wasm spec, but understanding lisp-binary was at least as hard. Writing a parser from scratch is not easy either, but it uses standard tools with which I am already familiar.
lisp-binary has lots of pre-defined readers for various types. Almost none of them were useful to me. Even the .wasm strings were different since the length prefix was a LEB128 (an integer type not supported by the package; again no package can guess every type a user might want).
The tricky edge cases used an eval clause and that means the final file description is no longer purely declarative. Also eval doesn't allow arbitrary code injection for parsing, only for selecting the type of the object being read. So there were some cases I couldn't work around.
There was an edge case (reading an unknown number of blocks until EOF) which wasn't handled by the lisp-binary package. The author has an extensive comment explaining how he wanted to include this feature but it proved difficult within the architecture. Obviously, he had put some considerable thought into it. It wasn't just an oversight. General tools are harder than specific tools.
I spent a lot of time reading the source of the package to figure out what I could and couldn't do.
The package targets a slightly different use-case than mine. While it seems like just what I need at first, upon delving deeper the difference in approach is seen.
I'm sure the package author is very efficient at creating binary readers using his package, and that reinforces my point. He understands the package so well cos he wrote it himself.
I end up including lots of code from the package which I don't use at all e.g. all the complicated readers which didn't quite match the ones I needed.

wasm reader - scratch-written using standard CL

I realize that the second time writing the same program should go faster. After all I should now know the .wasm spec (or at least be better able to read the spec this time). I did also re-use some code, for example the LEB128 reader, but very little in fact and this is code I needed to write regardless of the approach I took. I attempted to discount these advantages.

Writing this parser was a much more pleasurable experience than using the package. I'm fairly familiar with common lisp's reading and byte manipulation functions, so there was no cognitive struggle between what I wanted to do and actually writing the code.

Is it possible that if I didn't know how to write this kind of parser that using a package would have been easier? I cannot really comment. I feel like knowing how to write the parser already meant I could understand the lisp-binary package easier, since I could understand it's architecture. I think I would have struggled a lot more with lisp-binary as a beginner.

My final reader code /src/wasmrd.lisp ended up being a little longer although mine includes a reader for signed LEB128, missing in the original. Considering that mine includes all the reader infrastructure required, which was contained in the lisp-binary package for the original, my scratch-written reader could be considered substantially shorter, in fact.

$ wc -l wasm-read.lisp wasmrd.lisp
301 wasm-read.lisp (lisp-binary)
340 wasmrd.lisp (scratch-written)

Points of note:

The new reader is very easy for me to read and understand. It's written in my language, not someone else's.
The new reader's structure very closely matches .wasm's and is easy to follow when thinking about .wasm files. There is no extra abstraction to learn.
Solving the problem of reading an unknown number of blocks was trivial. It's not difficult, just difficult in lisp-binary.
It was easy to incorporate my reader output directly into the tool I was writing. I was producing mostly a tagged list as output (there is a top level struct; I blame lisp-binary mind-control...) .
I could have made my reader shorter using macros since there is some repetitive code. Is it worth it for something that probably can't be re-used? I suppose if it were to be maintained by someone else, making it neater and shorter would be worthwhile.
When including the size of the lisp-binary package, my reader is much smaller. My tool also has one less external dependency to worry about.

the code

references

Tags: software-reuse bloat not-invented-here-syndrome (home)