SQLite’s Use Of Tcl
D. Richard Hipp
24th Annual Tcl/Tk Conference
Houston, TX
2017-10-19
1.0 Introduction
SQLite is a TCL extension that has escaped into the wild.
The design of SQLite was inspired by the design of TCL, both in
the way it handles datatypes and in the formatting of its source
code.
The index use case for SQLite was in a Tcl/Tk application for an
industrial company.
From its inception, SQLite has always depended heavily on TCL.
These days, SQLite no longer uses TCL internally and can be run
separately from any TCL interpreter, and yet the SQLite development
process still depends heavily on TCL.
1.1 About SQLite
SQLite is an SQL database engine, and
the most widely used database engine in the world.
SQLite is built into all cellphones as a core component and is the
primary means of data persistence on phones. SQLite is also an
integral part of most web browsers. SQLite is built into MacOS and
is used by most of the default applications on that platform.
Windows10 requires the C:\Windows\System32\winsqlite3.dll file in
order to boot. Countless other popular applications like Skype
and WhatsApp and iTunes depend on SQLite.
Because SQLite is open source and can be freely downloaded and
duplicated, exact usage numbers are unavailable. But reasonable
estimates are that there are more SQLite instances in operation
today than there are people on earth. Most devices that
use the SQLite database engine contain hundreds of separate databases,
and there are billions of such devices. Hence, there are likely
around one trillion SQLite databases in active use.
There are more copies of SQLite in use than there are copies of Linux.
We know this because SQLite is used on almost all Linux systems, but
SQLite is also used on many other non-linux systems such as Macs, iPhones,
and Windows computers.
By similar arguments, there are more copies of SQLite in use than
there are Macs, or Windows PCs.
There are probably more copies of SQLite in use than all other database
engines combined.
It seems likely that SQLite is the most widely used and deployed
software component of any kind, with the possible exception of the
zLib compression library.
1.2 SQLite Is A TCL Extension, Not A TCL Application
SQLite is not written in TCL. Rather, SQLite is intended to be used
by TCL. Like the TCL interpreter, SQLite is written in ANSI C.
The fact that SQLite was primarily intended to be used from TCL
is evident in an number of ways.
All programming languages, other than C/C++, require some kind of
adaptor in order to interface with the SQLite C implementation. Language
adaptors for SQLite are widely available from third party programmers.
The adaptors for PHP and Python are built into those languages, for
example. A Java adaptor is baked into Android. And so forth. Only
the TCL adaptor is included as part of the SQLite core. The source code
file that implements the TCL adaptor for SQLite, “tclsqlite.c”,
was part of the very first check-in on the SQLite source repository
on 2000-05-29. (See
https://sqlite.org/src/ci/6f3655f79f9b6fc9.)
All modern SQL implementations provide a means to do late binding of
parameter values to the SQL statements. Usually a naked “?” character,
or a “?” followed by an integer is used. For example:
SELECT passwd, photo FROM user WHERE uid=?1
The “?1” token in the SQL above would be assigned a value at run-time
in order to look up the password and photograph of a particular user.
SQLite supports this syntax. But because of its TCL heritage, SQLite
also allows the parameter to take the form of a TCL variable. Hence:
SELECT passwd, photo FROM user WHERE uid=$uid
When a statement such as the above is run, the TCL language adaptor
automatically binds the value of the $uid TCL variable to the SQL statement
before it is evaluated, providing an intuitive and seamless interface
between TCL and SQL. SQLite is the only database engine that behaves
this way.
The TCL heritage of SQLite is visible in the type system of
SQLite. Early versions of SQLite (prior to 2004) operated on the classic
TCL principal that “everything is a string”. Beginning with SQLite3
(2004-06-18), SQLite also supports binary data. However, types are still
very flexible in SQLite, just as they are in TCL. SQLite treats the datatypes
on column names in a CREATE TABLE statement as suggestions rather than
hard requirements. SQLite is happy to store a 100KB string value in a
column that is declared “SHORT INT”, just as TCL is happy to store either
a large string or a small integer in the same variable. There are some
differences in how SQLite deals with datatypes, in comparison to TCL, due
to the different nature of the SQL language. SQLite has the concept of
“type affinity”. If a column is declared “INT” and one inserts a string
into that column that looks like an integer and can be safely converted into
an integer without loss of information, then that conversion occurs
automatically. This provides a measure of compatibility with the rigid
type systems of other SQL database engines.
The flexible type system of SQLite seems natural and intuitive to programmers
with prior experience programming in TCL. Curiously, though, it is a source
of frustration and frequent complaining from programmers accustomed to the
rigid and unforgiving type systems of languages like Java.
The similarities in the type systems of TCL and SQLite extends to more than
just the interface. An important part of the C/C++ interface for SQLite
is the “sqlite3_value” object
(https://sqlite.org/c3ref/value.html) which is analogous to the
Tcl_Obj object in TCL. Both TCL and SQLite use a dual-representation approach,
where each value can be represented simultaneously as both a string and
some other type.
2.0 How SQLite Depends On Tcl
SQLite began as a TCL extension, though these days most uses of SQLite
are in applications written in languages other than TCL. Many programmers
who use SQLite in their applications have no knowledge or awareness of TCL.
The SQLite source code used by most developers is a single file of pure
C code named “sqlite3.c” that contains no TCL code.
This is what we mean when we
say that SQLite as “escaped” into the wild. Deployed instances of SQLite
no longer depends on TCL.
Nevertheless, SQLite is still heavily dependent upon TCL and the ongoing
support, maintenance, and enhancement of SQLite would not be possible without
TCL, and would be seriously inconvenienced without Tk.
2.1 SQLite Source Code Is Mostly TCL
The deliverable source code for SQLite is a single file named
“sqlite3.c” and its companion header “sqlite3.h”.
Both files are 100% ANSI-C code. But developers do not edit these
files directly. The sqlite3.c and sqlite3.h source files are build
products, and the source tree used to build those files is over 50%
TCL code. Figure 1 nearby shows the exact ratios.
Figure 1: Breakdown Of SQLite Source Code By Language
Figure 1 is for the main SQLite source repository. Many of the test
cases and much of the documentation is held in separate repositories,
not included in Figure 1. The separate repositories also contain a great
deal of TCL code.
Much of the TCL code in the main SQLite repository consists of test scripts.
At this writing, the core repository contains 1153 separate test scripts
totally about 389 KB of space.
But this is not the only use of TCL in SQLite.
2.2 Machine Generated C Code
A non-trivial amount of the deliverable C code for SQLite is machine
generated. Some of the machine generated code is created by C programs,
such as LEMON which translates the SQL language grammar into C code to
implement a push-down automaton to parse the SQL language. But much of
the automatically generated code is created using TCL scripts. TCL is
well suited for scanning source files to extract information to be
merged with other files and for making mechanized edits. For example,
the byte-code engine used to evaluate SQL statements inside of SQLite
is implemented as a large “switch” statement inside a “for” loop, with a
separate “case” for each opcode, all in the “vdbe.c” source file.
At build-time, TCL scripts scan the vdbe.c source file looking for the
appropriate “case” statements and then build header files that assign
consecutive integers to each symbolic opcode name. (The opcodes used by
the SQLite byte-code engine are not an API as they are in TCL and thus
can change from one build to the next.) This mapping of symbolic opcode
names into integers is not a simple as one might suppose. For reasons
of optimization, there are many constraints on the specific values that
are assigned to opcodes. For example, many opcodes such as OP_Add must
have the same numeric value as the corresponding “+” token in the SQL
language parser. Sometimes a group of related opcodes, such as the
comparison operators OP_Eq, OP_Ne, OP_Lt, OP_Le, OP_Ge, and OP_Gt, need
to be assigned consecutive integers in a specific order. These constraints
are all handled effortlessly in TCL. Accomplishing the
same with AWK would be rather more difficult.
Perhaps the most important task for TCL during the SQLite build process
is constructing the SQLite amalgamation source code file.
Recall that most developers use SQLite in the form of a single big file
of C code named “sqlite3.c” and referred to as “the amalgamation”.
A TCL script named “mksqlite3c.tcl” runs in order to construct the
amalgamation from over one hundred separate input files. Each of these
inputs files must be added to the amalgamation in just the right order.
Furthermore, the source files are edited as part of the amalgamation building
process. When mksqlite3c.tcl encounters a “#include” for an SQLite header,
it replaces the “#include” with a copy of that header file, taking care
to make sure each header file is only included once. The mksqlite3.tcl
script automatically adds the “static” keyword to internal SQLite APIs
to give them file linkage, and makes other similar edits.
2.3 The sqlite3_analyzer.exe Utility
In addition to the core SQLite library, the SQLite source tree also contains
code for several analysis and control programs. One of these programs
is called “sqlite3_analyzer” (or “sqlite3_analyzer.exe” on Windows).
The sqlite3_analyzer program examines an SQLite database and generates
a detailed report on the disk usage by the various tables and indexes
within that database. The sqlite3_analyzer program is very useful in
understanding how an application is using disk space.
It turns out that sqlite3_analyzer, though disguised as an ordinary
executable, is really a TCL application. The main source code file for
this application is
tool/spaceanal.tcl.
During the build process, this script is converted into a C-language
string constant (using another TCL script) and added to a very simple
C-language wrapper than starts a TCL interpreter and then passes the
application script to that interpreter.
The sqlite3_analyzer program could be rewritten in pure C. But that
would be a lot of code. The TCL script that implements sqlite3_analyzer is
less than 1000 lines long. The equivalent C program would surely be at
least ten times larger.
Note that the sqlite3_analyzer utility program statically links a TCL
interpreter and so does not require a TCL installation on the target
computer to use. The sqlite3_analyzer utility program is used by
tens of thousands of developers, most of whom do not realize that they
are really running a TCL application.
2.4 TCL Used For Testing SQLite
One of the key features of SQLite is that it uses aviation-grade testing.
The tests of SQLite, at a minimum, provide 100% modified condition/decision
coverage (MC/DC) of the SQLite code, with independence. 100% MC/DC roughly
means that every branch instruction at the machine code level is exercised
at least once in each direction. The precise definition of
MC/DC is slightly stricter than this, for example when comparing boolean
vectors, but the 100% branch tests coverage definition is very close
approximation. The “with independence” term means that SQLite is tested
in multiple ways with test code being written and maintained by different
individuals.
The amount of testing done on SQLite is fanatical. On the other hand,
that level of testing is necessary for a fundamental low-level component,
such as a database engine, that is used by billions of devices. If an
ordinary application encounters an obscure bug, it can normally be rebooted
to clear the problem. But the job of a database engine is to remember things,
and so databases tend to remember their mistakes across reboots. For these
reasons, it is important that SQLite have a very low bug density.
TCL is used in every aspect of SQLite testing.
The test cases that are part of the primary SQLite source code
repository are written in TCL.
Other test cases such as
TH3 and
SQLLogicTest are written
in C but still depend on TCL for operation.
The TH3 test suite is a set of proprietary tests for SQLite that form the
primary means of achieving 100% MC/DC. TH3 is designed to
run on embedded hardware without the support of a desktop operating system.
TH3 consists of over 1350 test modules together with over 100 control files.
The test modules are written in either C or SQL or a combination of both.
The control files are text formatted in a way that easily parsed by TCL.
To build a TH3 test, a TCL script is run that combines some subset of the
test modules and control files into a single large C program that will
automatically run the required tests. This C program is then linked against
the “sqlite3.c” amalgamation and the resulting binary is moved to the target
computer and executed. TCL scripts automate this entire process on
all major host platforms.
To verify that the TH3 tests really do provide 100% MC/DC, special options
are added to the TCL scripts that run the tests, causing the GCC coverage
analysis tools to be invoked. The output of gcov is then postprocessed to
reveal and report any branch instructions that were missed by the tests.
The TH3 tests themselves are all implemented in C and SQL, but the operation
and management of those tests is all done with TCL.
2.5 TCL Generated Documentation
The extensive documentation for SQLite available on the SQLite website
(https://sqlite.org/) is all generated
by TCL. Many of the documents, such as the API reference documentation and
the descriptions of the byte-code engine opcodes, are created by TCL scripts
that scan C source code and extract the necessary information from the code
and comments. Thus, the API documentation is largely derived from
comments in the source code. Keeping the official documentation (in
comments) and the source code close together helps ensure that they are
in agreement.
Other whitepaper documents are generated from source files that look
mostly like
HTML but which contain additional TCL code embedded inside of
advanced formatting techniques, for automatically creating cross-references
and cross-links, and for constructing complex displays such as the
popular “rail-road” syntax diagrams for SQL.
2.6 Tcl/Tk Used During Development Of SQLite
The text editor used by the primary author of SQLite is a custom editor
with emacs-style key bindings that is built on top of the Tk Text widget.
The “e” editor, as it is called, is cross-platform, which helps in the
development of a cross-platform software library like SQLite. The “e”
editor has been used for the entire 17-year history of SQLite. It has
been enhanced over the years with various customizations created especially
to help manage the SQLite source code.
The Fossil version control system used for the SQLite source code
(and written specifically for that purpose) uses Tcl/Tk to show graphical
side-by-side diffs in a separate window. When the “fossil diff –tk”
command is run, Fossil generates a script to show the diff graphically
and then kicks off a separate “wish” process to run that script and
display the diff in a separate window. This graphical diff window has
a “Save” button which will cause the Tcl/Tk code needed to reproduce itself
to be written to a file. This file can be, in turn, sent to a collaborator
for display. Passing around graphical diffs as ordinary text files is
much simpler and easier than passing around JPEG images or text “context”
diffs.
No two SQLite developers work in the same office. The team is
geographically distributed. To help the team stay in touch, a custom
chatroom has been created using a Tcl/Tk script. The same script works
as both client and server. The chatroom is private and uses a proprietary
protocol, so that developers are free to discuss sensitive matters without
fear of eavesdropping. The chatroom is implemented as just over
1000 lines of Tk code, and is thus accessible and easy to customize.
Among the customizations is the ability to send saved “fossil diff –tk”
graphical diffs to collaborators and have the diff pop automatically on
the collaborators screen. Small features like this seem trivial
by themselves, but together than help the developers
to work much more efficiently. These kinds of productivity-enhancing
features are unavailable to users of commercial business collaboration
packages such as HipChat.
3.0 Summary
SQLite is an indispensable element of most modern computer systems, and
TCL is an indispensable tool used in the production and maintenance of
SQLite. Thus, the computing infrastructure we enjoy today would not exist
except for TCL.
As deployed, SQLite contains no TCL code.
However, the design of SQLite is inspired by TCL.
And TCL is used extensively in the code generation, testing,
analysis, documentation, and development of SQLite. Without
TCL, SQLite would not exist.
Every developer and every team has a finite number of “brain cycles”
available to do their job. The fewer cycles spent messing with tools,
the more cycles are available to devote towards solving the problem.
So for maximum productivity, it is important to use tools that get
the job done with a minimum of fuss and bother.
Our 17-year experience using TCL in the SQLite project has convinced
us that TCL is just such a tool. Tcl provides the most help per
brain cycle of any similar technology.