Some Programming Language Ideas – iRi
Programming languages seem to have somewhat stagnated to me. A lot of
shuffling around ideas that already exist, not a lot of new ones.
This is not necessarily bad. Shuffling around ideas that already exist
is a natural part of refining them. Shuffling around ideas that
already exist is safer than a radical rewrite of existing
convention. Even taking a language that already exists and giving it a
new standard library can be worthwhile.
However, I occasionally have some ideas that may then trigger other
ideas in people, and in the interests of getting them out of my head,
I’m posting them here.
Disclaimers
None of these ideas are fleshed out to a specification level. Some of
these are little more than goals with no idea how to manifest
them in a real language.
While I mostly don’t know of languages that do these things, that
doesn’t mean I’m claiming they don’t exist, just that I don’t know of
them, because of course I am not intimately familiar with every
language that has every been written.
Some of these ideas are probably bad, even outright crazy. At least
one of them is something that I would put in the known bad category,
and the idea is mostly just “but what if someone could figure out a
way to fix it?”
Some of these are mutually contradictory and can’t live in one language.
In general, while I can’t control how people react to this list,
should this end up on, say, Hacker News, I’m looking more for replies
of the form “that’s interesting and it makes me think of this
other interesting idea” and less “that’s stupid and could never
work because X, Y, and Z so everyone stop talking about new
ideas” or “why
hasn’t jerf heard of this other obscure language that tried that 30
years ago”. (Because, again, of course I don’t know everything that
has been tried.)
Loosen Up The Functions
This idea comes from Erlang, though it doesn’t quite follow through on
it to the extent I’m talking about here.
A function call is a very strong primitive. There is no possibility a
function call can fail. This is so deeply ingrained into our
understanding of function calls that we can’t even see it. Note this
is not a matter of the “function returning an error” or
“throwing an exception”, this is
“the code reached for strlen
and it wasn’t there”. Or in the case
of dynamic languages, not that it couldn’t “find” a particular
function but the code to locate it suddenly is gone.
The closest we get is “I ran out of memory trying to make this call”
and most of us, most of the time, just ignore that possibility.
I wrote a comment on Hacker News a while back about how network RPC
failed in the 90s when it tried to pretend to be a native local
function. It isn’t,
and it can’t pretend to be a native local function. RPC functions have
errors that can happen that a normal function can’t have. And it’s not
just a matter of returning an error value; sometimes the error is
“this call should have taken 250 nanoseconds but it froze for a minute
before timing out”. While it’s no problem for a computer to wait for a
minute, trying to build a program on operations that may take
“somewhere in this range of 9 orders of magnitude to execute” is not a
very useful primitive.
Well, what if you loosen up the concept of a function until all
function calls are “looser” and can express all the failure cases
that also arise with RPC? This would include things like:
- Pervasively include a concept of timing out for all functions.
- Pervasively make functions easy to wait on or skip past; in “async”
terms, solve the problem by making all function calls async.
And so on.
Then, once you’ve done that, you will find that it is annoying to call
a single function and do all the error handling you need to do, so you
make that easier, with maybe scoped error handling declarations or
some more defaults or something.
Then, RPC really can be as easy as calling a function in your
program, because you lowered what a function promises.
Erlang implements this to some degree in its various gen_* services,
but still has a conventional concept of function call overall.
The downside of this is that it is questionable whether a function
call’s simplicity can be recovered to the point that programmers are
willing to use this. Ultimately, strlen
isn’t going to fail to be
available and I probably want to just wait for it regardless, and
somehow the syntax is going to need to make that reasonably easy, even
if it’s as simple as prefixing such calls with a symbol that means
“just pretend this is local” or perhaps embedding
“yeah, this is actually a local call” into the type system.
Or possibly it can all be left alone and the compiler can simply
optimize local function calls when possible, which is a lot of the time.
Capabilities
This is not a new idea, so I won’t go deeply into what it is, and I
have been told some languages are playing with it, but it is something
I’d like to see more of.
There’s a language called E that tried to bring this about, but my
impression is that it was built on top of Java, and that’s probably
too great a change to try to build on Java. It really needs it own
language from top to bottom.
The time was probably not ripe. Consider this not a claim to novelty,
but the observation that maybe the time is right for this now.
A possible hazard here is trying to make capabilities too much,
e.g., also trying to write Rust-style mutation controls into them. Or
perhaps that would work like gangbusters, I dunno.
We keep trying to half-ass bodge capabilities on the side of existing
programs, and it honestly just keeps not working that well. Maybe
instead of writing conventional code and then trying to work out
exactly what capabilities that capability-oblivious code and
programmers ended up calling for, it’s time to put this into languages
themselves.
Production-Level Releases
We’ve learned a lot about production-quality releases in the past
several years. Little of this has made it back into the
languages. This has probably been a good thing, because we’ve needed
the freedom to experiment, but I think it’s time for languages to
start embedding solution to these problems into themselves so we can
harvest the benefits of solutions.
We really ought to be able to:
- Have a fully standardized logging interface now, such that external
libraries can just provide configurable logging output. - Built-in metrics from day one, so third-party libraries can just
provide metrics gathering and processing. - Build in some sort of “request context” usable for request tracing
and stuff.
This is an example of something that isn’t even a
“language feature”. It’s not like we need custom syntax constructs for
metrics. (Although if you’re going for a very “pure” language, maybe
some way of having some ability to ping metrics reliably without that
“counting” as impure would be helpful.) Just getting good-enough
versions of this stuff into the standard library would be enough.
There’s a number of these capabilities that are getting fairly mature
and could be lifted up into a new language directly, e.g. I think
“structured logging” has probably matured to this point.
The disadvantage of lifting into the language is that anything so
lifted becomes difficult to change once 1.0 hits. The advantage is
that the rest of the standard library and third party libraries can
integrate with it. It’s great that language X has 7 viable logging
libraries but it becomes more difficult for other libraries to build
on an assumption of what “logging” looks like.
This is an example of where a new language that is otherwise “just”
shuffling around older ideas could still get a leg up on its
competition. It takes a lot of maturity to write these interfaces,
though. For each of the things I mentioned, you really need to get a
lot of experienced devs together and make sure you’re pulling in the
best tested versions of the capability. This is, for better or worse,
not a place for some 19-year-old who has never worked with any of
these things to just splat out some off-the-cuff interface
specification that gets cast in stone. We already have that.
Semi-Dynamic Language
Many programmers like the convenience of dynamic languages. I spent
the first ~15 years of my career about 100% dynamic, and I’m not sure
I include myself in that set. Still, they’re pretty popular.
The problem is that they seem to be fundamentally slow. Some people
still discuss their performance as if it’s the year 2000 and maybe
someday we’ll get “sufficiently smart compilers”, but the reality is
that immense effort has been poured into speeding these things up
and it is no longer appropriate to “hope” for what may happen
someday. And the result has been… some success. Mixed success. You
can get dynamic languages to go faster, but it costs you a lot of
RAM and you still tend to cap out at 10x slower than C. It’s a lot of
work for a fairly marginal reward, worth it only because they’re so
popular that a 4x speedup multiplied across
“all the dynamic code in the world” is still well worth fighting for.
Alternatively, you can go the LuaJIT route and just hack out bits of
the language that don’t JIT well. But this seems to be only minimally
popular. Other than that it’s a good idea, though.
But the thing about “dynamicness” is that if you look, the vast, vast
majority of it takes place at startup, or at very defined times such
as “I’m loading in a new user plugin”. Almost no code is constantly
sitting there and dynamically modifying this and that as it runs. Yet
you pay for this dynamicness all the time. Every attribute lookup
needs to run a whole bunch of code to correctly look up an attribute,
in case someone has modified the lookup procedure since the last time
the value was looked up, or, in the case of a JIT, the JIT’s
procedures still need to be correct as if this can happen all the
time, which is inevitably slower than code that can’t do that.
What about a language where for any given bit of code, the dynamicness
is only a phase of compilation? The code can do whatever during
initialization, load database tables to dynamically construct classes
or whatever, but once it’s done, there’s a point where it locks down,
becomes nearly statically-typed (not necessarily fully, you could look
at this as an incremental typing situation), and being dynamic is no
longer possible?
I’m not sure what all this would look like. It’s a sketch of an
idea. Partially because I’m pretty satisfied in my own programming
world with static languages.
But if there was a phase where everything locked down, then a JIT
would have vastly more power to optimize the code safely. JITs have to
do so much work to deal with “well what if someone passes in
something really pathological to this function later?” and it seems
like they could go a lot faster if they could be rigidly guaranteed by
the types in the final compiled code that couldn’t happen.
You may also be able to create a sort of hybrid compile phase, where
the code is not “compiled”, but you can still run something like a
“check” that verifies the locked-down program is coherent according to
whatever rules the runtime or the user want to implement.
While I’m not aware of anything that works exactly like what I have in
mind, it is clearly a position on a well-explored continuum of
“exactly when does compilation happen?” and not some brand-new idea. I
reiterate that I’m not making a claim that any of this is brand
new. The Common Language Runtime’s ILR and subsequent compilation on a
target system is reasonably close to this, but focused on something
different. Some Lisps may be able to do all this, although I don’t
know if they quite do what I’m talking about here; I’m talking about
there being a very distinct point where the programmer says “OK, I’m
done being dynamic” for any given piece of code. Shader compilation
for video games may have a component of this, especially including the
ability to cache compilation outputs.
Another view on this idea is, “Isn’t it about time someone wrote a
dynamic scripting language that was designed from day one to be easy
to JIT?” What we have out in the world right now is either dynamic
scripting languages where the JITs came along literally a decade or
two after the language was created, and the JIT basically had to be
instantly 100% compatible with a language that was never designed for
JIT’ing right out of the gate to be even remotely useful, or we have
LuaJIT where an existing language got bits and pieces sliced out of
it, but we don’t have anything that I know of where the language was
designed from the start to be dynamic, but still easy to JIT.
While you’re at it, you’ll naturally also create a dynamic scripting
language that handles threading properly from the beginning, rather
than trying to retrofit it on to a decades-old code base. A dynamic
scripting language with perhaps a 2-3x slowdown over C (or, to put it
another way, basically the same speed as Go) that is also
natively capable of near-static-language threading speeds could raise
a lot of eyebrows.
Value Database
Smalltalk and another esoteric programming environment I used for a
while called Frontier had an idea of a persistent data store
environment. Basically, you could set global.x = 1
, shut your
program down, and start it up again, and it would still be there.
And by that I mean, storing a value persistently was literally that
easy; no opening a file and dumping JSON and loading it later, no
fussing with SQLite and having to interact with a foreign SQL
interface (which, no matter how nice that may be, is not your
language’s native paradigm), none of that. Just
“set this value and keep it forever”.
This… is a superficially appealing but bad idea, unfortunately.
It’s on the list of Things I See People Angrily Claim Programming
Needs To Do To Level Up, right there along with Everything Should Be
Visual Programming and the recent “Low Code” burst that seems to have
died down again. It’s an entrant that doesn’t show up often, but I’ve
seen it enough that it’s on my list.
But it carries some significant disadvantages,
most notably that entropy tends to attack this shared store pretty
badly. The developer sets a value in their store, then sends the code
out to production, but whoops, it turns out the code absolutely
depends on that value being set and it fails everywhere else. It’s
takes a lot of work to set up a scenario in which twiddling a run-time
variable for debugging in your staging environment can propagate
straight into a bug on production because of accidental dependencies
on that value, but persistent stores are up for the challenge!
So, my own experience certainly attests to the fact that this is far
from a magic solution to all our problems.
However, I still wonder if this can’t be fetched from the dustbin of
history somehow, with some sort of better controls on what goes into
these stores. Base it on event streaming? Access controls? Some sort
of structural typing system that ensures that a “table” is of some
shape directly, before trying to use values of it and failing? Just
plain typing these things?
Because, my gosh, what a mess you could make on the one hand… but on
the other, I can’t tell you how nice it is to just say myval.x = 5
and it’s just there as myval.x
tomorrow, with no queries, no
mappings, no ORMs, no files, no failures… just boom, there.
A Truly Relational Language
Although, on the note of “no fussing with SQLite”, how about a
language whose fundamental data type is a relational DB table?
You wouldn’t actually want SQL; you’d want to go back to relational
principles and build something that works as a programming language
rather than banging SQL together. The many, many technologies in the
world like LINQ in the .Net world or SQLAlchemy show at least a
possibility of what that would look like, although being able to sit
down at the language grammar level to integrate this even more deeply
opens up even more interesting possibilities than LINQ.
(Being able to emit SQL from the language for when you really do want
to talk to an SQL database is probably a good idea. This is harder
than it looks at first glance. You definitely want to study LINQ, and consider how you
allow the user to use things like SQL_NO_CACHE
or
SQL_CALC_FOUND_ROWS
,
because you need to be able to do those sorts of things to SQL even
if you don’t need them in every query.)
Relational databases are clearly a tech that is here to stay, yet most
modern languages still treat them as an exotic thing to be dipped into
every once in a while at great cost. Maybe there’s a special “table”
data type as the data programmers have with their “data frames”, or
you get something nice like LINQ, but the language is still ultimately
either product or sum type data structures as its native
representation and there’s always this foreign conversion step to go
from the relational data to the “real” data.
You know you’d have something like this when you could query across
three different data types in your code, and get the results in the
form of some ad-hoc data type specifically for that query, which could
then be natively passed around and perhaps even have methods added to
it directly.
In the type theory world this is heavily related to row
types. I
am not aware of a language that uses them natively. (Although as is
often the case, if you squint hard enough at a dynamically-typed
language it can “look like” row types, but that’s again because by
punting on types entirely it can “look like” a lot of things, but in
the end if you violate the types you just get an exception thrown.)
Although there is more work to be done to make this idea work, row
types is just where I’d start. You’d still want to examine things like
“can I put methods on some sort of row type in such a way that the
method doesn’t care how the row type is constructed?”
That is, suppose you had a user ID and a username in one table, that
linked to an identity that contained their human name. By querying
across these two things you could end up with a User ID/Username/Human
Name tuple, even though that doesn’t literally exist as a data type in
your system… could you work out a way to put a method on this anyhow
that might do something like a debug dump of those three things? A
method on a datatype that never concretely exists as a declared data
type? There’s some interesting possibilities here.
(This also may harmonize with the JIT idea above. This would tend to
create a proliferation of possible types that some code somewhere
could use; conceivably even an infinite number of them depending on
how you implement it. Conventional generics-based precompilation may
not really be possible. But in practice there would be a finite and
generally relatively small set of those types actually used and a JIT
that could determine what those are and JIT them to native-ish speeds
could potentially recover a lot of performance out of this by not
compiling all of the myriad possible types that could have a
UserID/Username/HumanName in them with their static struct offsets
until the type is actually used.)
A Language To Encourage Modular Monoliths
The modular
monolith
has been a structure that has been flying under the radar lately, but
I feel like it’s coming up more and more often. Personally, everything
I write large enough to need architecture is now a modular monolith,
and I find it a fantastic way to program at at least medium scales. I
have to admit I have not yet tried it on a truly large project. My
guess it is that it should continue to scale, albeit possibly
requiring more discipline to maintain, but it doesn’t seem to be
gassing out in my own uses yet.
To have a modular monolith, you need to use dependency injection and
interfaces. You write as much code as possible in terms of “Hey, this
is what I need; I need a way to turn DNS addresses into IP address,
and I need a way to turn email addresses into user accounts, and I
need this and that and the other thing”, and then you construct each
component of the modular monolith by providing each of them components
with all the services it needs.
I think “modular monolith” is arguably what should be the “default”
architecture for any non-trivial project.
However, modern languages tend to fight you on that.
Static languages require extensive declaration of interfaces of some
sort to do this, and as such, it requires much more discipline than
hard-wiring together everything with concrete types. As such, in real
code, lots of things end up hardwired together just due to the sheer
hassle of interfaces, even in the languages where they are the easiest.
Dynamic languages are nominally easier, once again because they pretty
much punt on everything, but the trade off is no compile-time guarantee
that all the services you are getting passed actually do the things
you want them to do. In practice this becomes scarier and scarier to
do as you scale up, because now every time you call a new method on
some passed-in parameter you are changing the interface for all things
passed in to that method, and there is effectively no way to notify
the callers of that method. After all this may even be a library and
you may have no human connection whatsoever to the caller.
I’d be interested in something that strikes a middle ground; a static
language with compile time guarantees, but one where all function
parameters are automatically interfaces, even if they are given an
“exemplar” type in their type signature. If I declare something as a
“string”, and what I do with that string is concatenate it with
another string and iterate on Unicode codepoints, what if the compiler
just automatically was able to take anything that could “concatenate
itself to a string” and “iterate on Unicode codepoints” and accept it
by treating it as if there was an interface declaration right there already?
(It would be interesting to see if you can get type inference to the
point that “exemplar types” are no longer necessary but working out if
that is the case is well beyond the level of work I’m doing here.)
Every parameter coming in to a function could have an interface
automatically extracted out of it just by what the user does to that
value. Integration through a language server could do something like
extract that interface out automatically. I don’t know if it should be
implicit and checked at compile time, or if you might want to do
something like “on save, automatically reify all interfaces into
actual declarations the human can see”. If it is left implicit we
definitely want the language server to have a command that returns
“this is the actual interface for this parameter”.
(On that note, not worth its own section, but I think that there’s a
lot of interesting “write a static language that assumes you’re
writing it with the language server and provides very rich querying
capabilities” like that that could be done. You see a lot of good
ideas in the best IDEs for that sort of thing but the ideas always end
up detached from the languages and eventually stranded when the IDE
line comes to an end. Collecting those capabilities up into the
language project itself and integrating the language serve right into
the design of the language at all phases should probably have some
interesting effects.)
I think I’d want to see something like the Python module system, where
technically, libraries themselves are objects, which means that entire
libraries could be swapped out by providing a different one.
You could in principle merge this with another interesting idea, which
is more extensive use of dynamic
scopes to
do something like provide a built-in service registry of things that
look like a fancy dependency-injection library, so some code can do
something like “fork my current registry, change the UserProvider to
this other object, and run this test code”. Or change the definition
of a “transaction”. Or whatever.
In theory, if you successfully made sure there aren’t any back doors
to this system (like “primitive types like ints are just ints and they
can’t be shimmed”), this would almost automatically make any system
written in it a modular monolith. Of course, it might be a super
messy modular monolith, but in principle any function in the system,
even though the whole thing is statically typed, could be executed in
such a way that everything it depends on, regardless of whether it was
written for being swappable, is swappable, with enough work. You could
do things like have literally any code that reads & writes to a
filesystem be executed in a context that provides a fake file system
for testing, without the code itself having to do any explicit
declaration of that fact.
You’d want to block truly global variables entirely, although if you
combine them with the dynamic scope idea, you can put things in the
dynamic scope for similar uses.
There’s probably also some interesting synergies with structured
concurrency, and having these dynamic scopes attached to execution
contexts. Make sure your dynamic scopes also have the capabilities
that the Go contexts do and you’d end up with some interesting
possibilities.
Modular Linting
This is another place where I absolutely make no claims about what may
be happening in the many dozens of language communities in the
world. I’m just making an observation from one of the ones I’m deeply
into and suggesting it’s a good pattern, not that it’s the only place
it happens. Plus I’m going to say it isn’t happening enough anyhow
even where it is happening.
With that throat clearing, the Go world happened to end up with a lot
of various linters over the years. Eventually they were combined into
a project called
golangci-lint, which has
become the de facto linter for the community. I linked to the list
of linters built in so you can see what’s in there.
What’s interesting about golangci-lint, though, is that the various
linters are largely independent from each other, each independent
projects written by various developers to scratch a particular
itch. They were later merged together technically, but are in
principle still just a big pile of community linters with a nice
modular interface on top.
This isn’t about the language at all, but it would be interesting for
a language project to reify that. golangci-lint eventually shared an
AST view among a lot of its linters; a project could copy that idea
and write it in early. Let linters be fully modular, perhaps even by
mentioning them by github project through a fully standardized
interface that doesn’t require them to even be “integrated” into a
single executable from a third-party project.
What’s neat about this approach is that if it was made a part of the
language design process, a lot of things that aren’t necessarily
important could be kicked to optional linting. For instance, Go was
especially a bit notorious when it first came out for mandating that
all imported packages were used, and all declared variables were
used. Complaints about that over the years have quieted down, but
that’s a good example of something that could have been taken out of
the compiler and shuffled off into a linter provided by the main
project.
There’s definitely some downsides too… you end up with “dialects” of
the language, but, the truth is, you end up with that
anyhow. Generally developers learn pretty quickly not to fire their
own bespoke linting configuration at other people’s libraries.
But it would be interesting to see how much could be kicked out to
linters, like, do you want to insist that all values of an enumeration
(whether a classic int or the later trend towards using that term for
what I think of as “sum types”) are checked in switch statements or
not? Do you want to validate your printf parameters are correct?
Do you want a linter flagging every time you use an external program
to pass your arguments through some check routine? Perhaps things like
HTML template libraries could even ship with their own linters to flag
suspicious constructs for injection, formally as part of the library.
This pairs in an interesting way with the parenthetical about leaning
on a Language Server more; taking the Language Server as a core part
of the project makes the language task bigger, but allowing for
community linters and kicking non-essential aspects of the language
out to the linters shrinks what the core of the language project has
to worry about.
This is also one of the only ideas in this list that doesn’t need to
be in the language from the very, very beginning, and could be added
either by a young language design team or even just a motivated
external developer to an existing project. Things like Python or C#
have too much inertia for a dedicated dev, but you might be able to
get the momentum in something still young like Nim or Zig.