Bicameral, Not Homoiconic
If you spend enough time reading internet discussions of programming
languages, youâll learn that Lispy languages have a special property:
they are homoiconic. This property is vested with mystical
powers that both enrich Lisps and debase its competitors.
I have programmed in, and built, Lisps since the late 1980s. My blog
is called âparenthetically speakingâ. And yet Iâm here to tell you
that this term is mostly nonsense. However, there is something
specialâ
usefulâ
transporting its essence to other languages.
1Â (Weak) Homoiconicity
What, supposedly, is homoiconicity? You will hear things like: the
property that âa property of some programming languages that allows
programs to be represented as data within the languageâ, or with
ârepresentedâ substituted by âmanipulatedâ, or more simply as
âcode as dataâ.
Letâs tease these apart a bit. Consider the following Python code:
This is clearly a program. But can I represent this as a datum
within the language? Sure:
is a perfectly good representation. (Well, it may be good but
itâs not great; weâll return to that!) Can I manipulate
it? Sure, I can concatenate strings to create it:
will produce that program, and
will take it apart into constituent pieces.
Does that make Python homoiconic?
Of course, thereâs nothing special about Python here. We can use
JavaScript to represent and manipulate JavaScript programs, C to do
the same to C programs, and so on. Essentially, any programming
language with a string datatype seems to be homoiconic. Heck, we
didnât even need strings: we could just as well have represented the
programs as numbers (e.g., using
Gödel numbering).
One of the traits of a good definition is that it be non-trivial: it
must capture some things but it must also exclude some things. Itâs
not clear that this notion of homoiconicity excludes much of anything.
2Â (Strong) Homoiconicity
But thereâs a reasonable objection to what we wrote above. All that
weâve done is written, combined, and taken apart strings. But
strings are not necessarily programs; strings are just strings,
a form of data. Data are data, but programsâ
that we can runâ
How do we turn data into programs? We do need some language support
for that. We need something that will take some agreed-upon data
representation of a program and treat it like a program,
i.e., do whatever the program would have done. Typically, this is a
function called eval: it evaluates the datum, performing the
effects described in the datum, just as if it were a program. (Note
that eval really treats âdata as codeâ, not âcode as
dataâ.)
So maybe eval is the real characteristic of homoiconic
languages? Maybe. Itâs certainly true that eval is a
distinctive feature, and some languages have it while others donât:
that is, it non-trivially distinguishes between languages. But itâs
worth noting:
-
Many languages, including Python and JavaScript, have an
eval. If theyâre all homoiconic, then clearly this isnât a
particularly Lispy trait. -
eval interacts poorly with its lexical environment,
thereby making it hard to even program with effectively.
We showed that JavaScriptâs
eval is not one but four operations and there are eight
contexts that determine which of the four to use. This kind of
complexity is overwhelming. -
The complexity might be worth it if eval were a good
idea, but itâs often a bad idea in programs! It makes code statically
invisible, making every other aspect of program managementâstatic
analysis, compilation, security checking, and moreâmuch, much harder
(or, for some important and useful kinds of analysis, impossible).
This seems like a disappointing way to end: homoiconic languages are
ones that have a complex, excessively-powerful feature that we
probably shouldnât use but is anyway found in many languages that are
not Lispy at allâŠwhich certainly doesnât seem to be a good way to
describe what makes Lispy languages distinctive.
But this just shows why we shouldnât be talking about homoiconicity at
all. Letâs talk about whatâs actually interesting instead.
3Â The Parsing Pipeline
Letâs talk briefly about the classical parsing pipeline. For decades,
weâve been taught to think of parsing a program as having two phases:
tokenization (sometimes colloquially called âlexingâ) followed by
parsing (not colloquially called âyaccingâ), and more. What do
they do?
A program is a list, or stream, of characters. A language
implementation needs to first understand what a program is saying
before it can make the program do what it says. That is, it has to
understand the meaning of a program. This process of determining
meaning is usually done in several stages, which make sense once we
understand them in terms of their expressiveness.
A typical dataflow in the front-end of a language implementation looks
like this:
What does this mean?
Just as you donât read this article at the level of individual
letters, given a program like hello = 1, we donât want to deal
with it at the level of each letter: h, e, l, âŠ
either. Instead, weâd like to start by thinking of it as having three
tokens: hello, =, and 1. Note that in the
process, we can ignore whether or not the = was surrounded by
spaces (which Python does not require, but some languages
do).Weâll return to the elided spaces later! This is the job
of the scanner.
So the input list or stream of characters gets transformed, by the
scanner, into an output list or stream of tokens. But as a reader, you
still donât really read the program at the level of tokens; instead,
you assign meaning. That is, you presumably read the above as
âassign hello to 1â.The actual meaning of = in
Python is much more messy. This is the job of the parser: to
create a data structure representing the high-level semantics of the
language. In this case, it might create an abstract syntax tree of a
âvariable assignmentâ-type node with two children, one the variable being
assigned (here, hello) and one the value expression (here, just
1). If the actual expression were more complicatedâ
program were hello = 1 + 2â
complicated tree representing addition with two children.This
being Python, of course the actual meaning of + is also messy.
The exact mechanisms by which this form of parsing happens is somewhat
outside scope here. For many languages, grammars are ambiguous and
top-down parsing strategiesâ
ofâ
process is complicated, algorithmically fascinating, and often bad at
giving good errors. There are very interesting trade-offs here that we
often donât pay enough attention to.
for instance, experimented with
parsing
without using a scanner. Both conceptually and practically, it is
still useful to have this separation:
-
The scanner usually sits at the lowest level of the Chomsky
hierarchy: it is regular. It is therefore quick and efficient and its
rules are easy to reason about in depth. -
The parser, in contrast, is usually context-free or worse,
making it a much more complicated beast.
We should want to take advantage of complexity theory in building our
tools! This also represents a good separation-of-concerns, whereby we
could imagine building multiple parsersâ
speed or space (or both) for better error messagesâ
same scanner.
In what follows, therefore, I will take the scanner as a
given. Thus, the traditional pipeline then has just one stage:
the parser. I therefore call this the unicameral pipeline.
4Â The Bicameral Analogy
Many countries have what is called a bicameral legislature:
âbiâ meaning two, âcameraâ meaning room. That is, there are two
houses of parliament. India, for instance, has the Lok Sabha and Rajya
Sabha; England has the House of Commons and House of Lords; the USA
has the House of Representatives and Senate. In all cases, there is a
âlowerâ house and âupperâ house.
My goal here is not to endorse various class-related associations or
to delve into the actual practices, but rather to employ this as a
useful, rough analogy. A theory of bicamerality is this. The lower
house is a bit more rowdy and boisterous. It filters out some of the
more absurd legislations, but it allows through many that are perhaps
not entirely wise. The upper house is more calm and deliberative; in
legislatures, its members are either appointed or serve longer terms,
so they do not need to bend to every populist will. Therefore, the
upper house usefully serves as a filter.
So: we have two houses. The lower house filters out dreck, but lets
through several bills, some of which are unwise. The upper house
passes judgment on these bills, allowing only the ones it considers
wise. Enough political science, back to computer science.
5Â Bicameral Syntax
As Iâve alluded above, having just a parser gives you a unicameral
syntax. What Lispy languages give you is a bicameral
syntax. But they are not unique in this; other notations, such as XML
and JSON, are also bicameral. To make my point, I will use these other
notations.
First of all, all these notations have to get through a scanner. In
XML, you canât write
thatâs just not even a valid opening tag. In JSON, you canât write
“hi without a closing “; thatâs just not a valid string. We
can think of these as basic tokenization failures. So letâs assume our
input passes the scanner and we get a list or stream of tokens.
Even once youâve written proper tokens, there are still things you
cannot do in XML or JSON. For instance, the following full document is
not legal in XML:
because we didnât âcloseâ
isnât legal either,
because we didnât preserve the nesting structure (we should close
bar befor we close foo). The same is true in JSON: we canât
open a structure without closing it (or vice versa); similarly, we
have to close braces and brackets in the opposite order of how we
opened them.
Observe that we can make these judgments without knowing what
the datum represents. Thereâs a reason I used names like foo and
bar instead of something meaningful. Thatâs because it doesnât
matter: these XML and JSON data are wrong at a basic level. We
say that they fail to be well-formed. This is our lower house.
However, there may be more complicated rules at play. It may be that
a bar really should not reside within a foo; it may be that
every baz requires one or more quuxes. If youâve ever worked
with an XML or JSON format, you know that there are all kinds of rules
like this (that may or may not have been documented well!) that you
need to follow. For instance, when you read the documentation of a Web
API, it might tell you that a certain kind of request needs to provide
certain fields, but must also not provide certain other
keys. All these are called rules of validity. They build on top
ofâ
what wasnât checked there. This is the upper house.
This leads to a new pipeline, one where some tool performs an
intermediate check for well-formedness, and another for validity. The
last stageâ
produces terms tagged with their âparts of speechââ
parser. But now thereâs one more tool in the middle. I donât believe
thereâs a standard name for it, but there should be; following Lisp
tradition, I call it the reader:
|
Scaner –> Reader –> Parser |
Now, as written, itâs not clear what the output of the reader is. It
could be that the reader just checks the tokens, and then
passes on the same token list or stream to the parser, just as in the
previous pipeline. But this would be a loss! The reader has the
ability to construct an extremely useful intermediate data
structure. The main property the reader looks for is that
âthings match up and nest properlyâ. In other words, the reader is
confirming that the input is a tree. Thus, instead of passing
on a dumb, flat list of tokens to the parser, it constructs trees.
This greatly simplifies the job of the parser for two reasons. At a
more basic level, some of the checking that the parser previously had
to do is already done. More interestingly, tokens are very low-level;
trees are high-level. Itâs much easier to write a recursive function
that checks for validity and produces abstract syntax trees when
already given well-formed trees as an input!For this reason,
PLAI calls this the âPrimus Inter
Parsersâ, a joke that seems lost on most.
Again, complexity theory is a useful guide, and what we have done is
steadily ratchet up the complexity:
-
The scanner is still regular, with all the attendant benefits
and weaknesses. -
The reader is context-free. It does not need to do more
than that. -
The parser is now freed of basic context-free checks, and can
focus on other context-free and, in particular,
context-sensitive checks.
And once again, we can exploit this separation of concerns to swap out
other implementations if we need to.
This, then, is a bicameral syntax. The reader is the âlower
houseâ; it rejects basic mistakes, but still lets through some
invalid things. But it also presents them wrapped up in a cleaner,
simpler structure to work with. The parser is the âupper houseâ; it
receives this clean structure and can look for deeper flaws in it,
only allowing those things that pass its more exacting scrutiny.
The advantages to a bicameral syntax are many:
-
We get a greater separation of concerns, making it easier to
both understand and to swap out components. -
We get to more gradually walk up the complexity hierarchy.
-
The reader sits at a key intermediate level, turning tokens into
trees, thereby greatly simplifying the parserâs task while itself
being quite easy to write.
This simplicity of (correct!) implementation matters! It doesnât
matter as much when you have only one implementation of a
language. However, a lot of our traffic in computing is best
represented as trees: they are much more powerful than strings, but
also simple enough to write and read. It is therefore no surprise that
bicameral syntaxes like JSON have become ubiquitous. Every
language now needs a reader for them, and these readers have to
implement the same language, so simplicity of specification and of
implementation are a virtue. In turn, many systems need to separately
validate a JSON input; again, the simplicity of writing this kind of
(tree-consuming) parser is a big win.
This additional level also has other tooling advantages. Itâs not only
language implementations that need to support syntax; so do
editors. Itâs a lot easier to support matching, indentation, coloring,
and so on at the well-formedness level. And itâs easy to indent
trees! Itâs also easy to traverse them: there are meaningful
generic tree-level operations (âgo to the openingâ, âgo to
the closingâ, âgo up a levelâ, âgo down a levelâ, etc.) that can
be applied to all languages with the same well-formedness
characteristic (e.g., JSON). And again, we have lots of editors, and
they each need to support the same syntaxes; bicameral languages, by
providing the intermediate notion of well-formedness, let tools hit
the trifecta of: correct, useful, and relatively easy.
These advantages are offset by one drawback: some people just donât
like them. It feels constraining to some to always write
programs in terms of trees, rather than more free-form syntax. Even in
this bicameral space, there are trade-offs: XML requires you to repeat
tags at closing, which improves error-checking and catches errors in
refactoring, but makes authoring painful; Lisps donât require tag
repetition but, traditionally, use only one kind of parenthesis,
making structure harder to see; JSON mixes delimiters while avoiding
closing tags, which seems to be some kind of sweet-spot (because it
still provides some checking against editing errors). Still, what
people are willing to embrace for writing data seems to irk them when
writing programs, leading to the long-standing hatred for Lispy
syntaxes.
6Â Back to Lisps
We started with Lisp, so letâs go back there. What is Lisp? Lisp is a
feeling, an emotion, a sentiment; Lisp is a vibe; Lisp is the dew on
morning grass, itâs the scent of pine wafting on a breeze, itâs the
sound of a cricket ball on a bat, itâs theâŠoh, wait, where was
I. Sorry.
Okay, seriously. Note that I keep writing âLispy languagesâ: both
the âyâ and â[language]sâ should have stood out by now. What do I
mean by this?
Iâm not referring to a particular language: Common Lisp, say. Iâm
referring to a family of languages, which includes Common Lisp but
also Scheme and Racket and Clojure and many others. This family is
bound by some cultural antecedents, but the individual languages can
be wildly different. What they share is a philosophy of
syntax. And now that weâve fleshed out that philosophy, if we are
being generous, we could allow that any bicameral language is
at its essence âLispyâ.
The power of this philosophy is that the language platform can
implement its bicameral syntax once, and well, and lots of languages
can inherit it. We noted above that there are many (data) languages
built atop JSON, each of which has its own validity rules. We can
likewise build many data and programming languages atop any bicameral
syntax; they will share some familial traits but differ quite a bit in
the details.
The bicamerality of Lisps, however, leads to some
misconceptions:
-
It does not dictate a particular implementation strategy. For
instance, the argument in Lisp sometimes goes, âcode is data because
code is represented using s-expressions, which are also Lisp dataâ. A
full tutorial of what these terms means is beyond what I want to get
into here. What I will just say is that there are Lispy languages
where this is not true at all.The one I know best is Racket, because I originally built some of
these parts of it. In Racket, code is not represented using
s-expressions. This is because âcodeâ requires many more things: it
needs, for instance, to record the source locations so that it can use
it to report errors;Remember those whitespaces we said we could
ignore? We canât if we want to report errors!
it needs hygiene information to avoid inadvertent
capture; and more. Therefore, Racket has a notion of
syntax object
(which, yes, is a datum) that captures this much richer notion of
âprogram sourceâ. Syntax is a separate type that happens to parallel
(and enrich) s-expressions and can be interconverted with them, but it
is not an s-expression. So, while there is a large family of functions
that operate on (say) lists, and s-expressions include lists, none of
those functions will apply to syntax. -
People will sometimes say that the read primitive
âparsesâ. It does not: it reads. It âparsesâ inasmuch as it
confirms that the input is well-formed, but it is not the parser of
conventionâone that determines validity according to
context-sensitive rules, and identifies âparts of speechââso it is
false to say that Lisps ship with a parser.To make this concrete, here is a well-formed Lispy term that
read has no problem with: (lambda 1). That is, however,
a syntax error in most Lisps: a determination made by the
parser, not the reader. Of course, nothing prevents us from creating a
new language where that term has some meaning. That languageâs parser
would be responsible for making sense out of the pieces. -
People sometimes believe you can only have macros if you have a
bicameral syntax. This isnât true: all you need is a way to represent
code, because macros transform code to code. But as more languages
have realized the benefits of having macros, some have added
procedural macros (an idea that dates back decades in Lisp
lore). Bicameral syntaxes just provide a gateway to gloriously rich
macro systems, because weâve learned that the well-formedness layer is
an especially good layer atop which to build macros.
7Â What About Other Languages?
I want to end with some forward-looking comments.
One of my frequent sources of amusementâ
search for people asking âHow do I build a programming language?â
and reading the advice they get. These conversations invariably get
mired in syntax issues. But by the time the novice gets through
building all the knick-knacks of syntax and learning to build parsers,
theyâve usually exhausted their complexity, cognitive, and time
budget. The resulting language either never gets done or is a semantic
mess.
I would like to suggest another path forward: start with a bicameral
syntax. Quickly get to the point where you can think about
semantics. What does your language do? How does one
reason about programs? How does one run them? Do all the hard and
interesting and important parts first.
But, you argue, âNow I have a bicameral syntax! Nobody will want to
program in it!â And that may be true. But I want you to consider the
following perspective:
Syntax is a view.
One of the enduring lessons weâve learned from software design is the
model-view separation: that is, there are often many useful ways to
view the same thing, some of which we havenât even dreamt up yet, so
itâs wise to separate an object from how we view it (and coordinate
the different views). This principle seems to virtually never be
applied to programs, but it should! We should view the abstract
syntax as the modelâ
syntax as a view on it.
So what you should really do is define an abstract syntax, and layer a
light bicameral concrete syntax on top of it. Then build all the other
parts. And then, think about what views you want to provide. Some
people may be happy with the bicameral syntax. Others may want a more
traditional infix syntax. Some may want braces, some may want
significant whitespace. Some users may even want to use blocksâ
ultimate bicameral syntax. You can (and should) view syntaxes as the fun
desserts that you get to design and deploy once the hard work is
done.This is not to say syntax is not hard; itâs actually
harder than most people think, because it requires a delicate
interplay between mathematics and human factors.
Both
F*dging up a Racket and, in much more depth,
Beautiful Racket
illustrate this principle practically and beautifully.
This perspective also gives you a useful artifact: a bicameral syntax
that is a very nice target for programs that need to generate programs
in your language. This is no longer a new idea, so you donât have to
feel radical: formats like SMT-LIB and WebAssembly text format are
s-expressions for a reason. Our Forge tool supports both an
syntax based on Alloy and one in s-expressions, the former
for (most) people to use, the latter a target to tools that use Forge
as a back-end.
In short, I hope this inspires a new vision of syntax
design. Bicamerality is the best intermediate language weâve come up
with; itâs driven by both theoretical elegance and practical
experience. For some, the bicameral syntax is also a lovely source
language, but it should be just one of many views onto the core
abstract syntax. Tools will become more reusable, and at the same
time, the syntax wars can end. We can find peace, joy, and
happiness. It is not a coincidence that a pair of parentheses looks
like an embrace.