posted by Matthew Flatt with thanks to collaborators
Kent Dybvig, Andy Keep, Gustavo Massaccesi, Sarah Spall, Sam Tobin-Hochstadt, and Jon Zeppieri
It’s been almost a year since we
decided
to try running Racket on
Chez Scheme, and it has been
almost six months since my last
status
report. As of RacketCon
2017, DrRacket could start up on Chez Scheme (but just barely, and
just in time for the demo).
The current
implementation is now considerably more
complete. DrRacket still doesn’t run well, but Racket-on-Chez can
build a full Racket distribution in reasonable time
and space.
With the current Racket-on-Chez compilation strategy, runtime
performance is plausible on traditional
benchmarks, but cross-module optimization is needed to bring the
results fully in line with our expectations.
Startup and load times are much slower than we
would like. Compile/build times are a factor of 4
slower than the current version of Racket.
Glossary of implementations:
-
Racket —
the current Racket release, version 6.x
-
racket7 —
mostly the same as Racket, but using a new,
Racket-implemented macro expander;
the source is in the
https://github.com/racket/racket7 repo
-
Chez Scheme —
a completely different implementation of
Scheme that we’re trying to use as a replacement
for part of Racket
-
Racket-on-Chez —
Chez Scheme plus additional layers to make
it implement the same language as Racket;
the source is also in the
https://github.com/racket/racket7 repo
Approach
Our overall goal is to improve Racket’s implementation and
make it more portable to different runtime systems. To a first
approximation, improving Racket and making it more portable means
“less C code.” Building on a Chez Scheme is a promising means toward
that end.
|
The picture on the right illustrates the current Racket distribution.
The parts in blue are implemented in Racket,
while the part in red is implemented in C. The green
block is Scribble documentation source.
The numbers next to each block are a rough lines-of-code counts, and
the blocks are scaled to those counts. The
collects block represents the content of the
main "collects" directory; along with the
core block, it’s roughly the code in a Minimal Racket
distribution. The pkgs block is the part of the
main distribution that is available in separate packages but bundled
with the main distribution. The docs block is
spread across the same packages as the pkgs
code.
Porting Racket to Chez Scheme is all about the core
part. The goal is to make no changes to existing
Racket and Scribble code,
although we’ll inevitably have to make small changes to support
multiple Racket implementations. |
|

|
|
Let’s zoom into the core part. (From now on, all the
pictures on the right are at this scale.)
The picture breaks the C code in Racket’s current
implementation into several major subsystems. In reality, the
implementation is not organized into such tidy layers, but
the picture is conceptually close to the truth.
The large builtins block implements primitive data
structures and their operations, such as numbers (fixnums, bignums,
flonums, exact rationals, complex numbers), strings, characters, byte
strings, lists, vectors, and hash tables. Most other blocks are
self-explanatory.
The expander block implements the macro expander,
reader, and module system. More than any other part of the current
stack, the expander part is better implemented in
Racket. |
|

|
|
That was exactly the work of 2016: building a new
expander to replace the C-implemented
expander. You can use that racket7 variant of
Racket now by building from the http://github/racket/racket7
repo.
This new expander hasn’t made it into the
release, because a few interactions haven’t been sorted out: the macro
stepper, the demodularizer, and to some degree the decompiler. The
expander also runs at about half the speed of the C-implemented
expander, and that performance difference was part of the motivation
to investigate a new runtime system.
Although the compiler+JIT block is drawn the same as
before in this picture, it’s a slightly different compiler, because it
doesn’t know about modules or syntax objects. Instead, the compiler
works in terms of linklets, which are lambda-like
blocks of code that import and export variables. That separation of
expansion/modules from compilation helps enable the transition to a
different compiler and runtime system. |
|

|
|
And that brings us to Chez Scheme. The most obvious feature in the
picture on the right is that Chez Scheme’s C-implemented
kernel is a fraction of the implementation, while most
of Chez Scheme is more sensibly written in
Scheme.
Chez Scheme’s expander is considerably simpler
than Racket’s. Although Chez Scheme’s expander is hygienic and hugely
influential on Racket’s expander, Racket’s macro and module system
does a lot more.
Chez Scheme’s compiler, meanwhile, does a lot
more than Racket’s compiler+JIT. The
builtins functionality in Chez Scheme is similar
to Racket’s builtins.
Chez Scheme’s GC (at the bottom) is tiny and elegant.
It lacks some functionality of Racket’s GC, but
mostly it lacks Racket’s historical complexity for trying to cooperate
directly with C. |
|

|
|
That brings us, finally, to the Racket-on-Chez picture. It starts with
the Chez Scheme stack and adds a rumble block,
which extends Chez Scheme to match the current Racket’s
GC through control+structs blocks.
Mostly, rumble is about immutable hash tables,
delimited continuations, applicable structures, structure types
properties, and impersonators/chaperones. We’ve given this layer of
functionality the name Rumble, because it’s useful to have a
name for the language that hosts other layers that are built using
Racket.
The threads, io, and
regexp blocks are entirely new implementations
of the corresponding blocks in the current Racket implementation. The
rktio block is used by io, and
it’s the same as in current Racket: a portability layer over OS
facilities that is similar to (but more extensive than) code that is
in Chez Scheme’s kernel. |
|

|
The expander block (the same as for
racket7) sits on a small schemify adaptor.
The schemify layer converts Racket linklets into
plain Scheme functions, adjusting expressions as needed to support
applicable structures, to enforce left-to-right evaluation, and to
implement Racket’s allocation model for local functions (i.e., lift to
avoid closure allocation whenever possible).
Naturally, the Racket-on-Chez implementation is bootstrapped by
applying the expander and
schemify layers to themselves and the other
Racket-implemented parts of the stack, and then compiling the
schemified linklets using the Chez Scheme
compiler. A thin main
driver on top handles command-line arguments to load modules and start
the read–eval–print loop.
Current Functionality
DrRacket starts up in Chez Scheme, and just starting DrRacket
exercises many Racket language constructs and libraries. The largest
Racket-on-Chez demonstration to date is building all of the packages
and documentation of the main Racket distribution. Building packages
and documentation, again, covers more Racket functionality than you
may expect, because our macros do interesting things, such as
typechecking and generating bitmap images. Also, the build process is
too long and complex to tolerate leaks or major inefficiencies.
Still, many pieces are buggy, and closing the gap is mostly a matter
of plowing through the Racket test suites. The main
missing or incomplete pieces are: futures, places, and a racket
executable that starts Racket-on-Chez.
Performance: Benchmark Sanity Check
The overall compilation story is not yet right in Racket-on-Chez,
partly because there’s no cross-module optimization. Functions like
map, memq, and assv need cross-module
optimization to be compiled well in Racket-on-Chez. Benchmarks
sometimes do not rely on cross-module optimization, however, and
preliminary benchmark measurements can be useful as a sanity
check on the ways that schemify mangles
individual modules.
Benchmarks are in the racket-benchmark package
whose source is in the racket7 repo.
The benchmarks run in Chez Scheme as R6RS libraries.
The difference between Racket and racket7 is in the noise.
The table below shows a handful of rough benchmark results. You
should not take the results seriously, but they line up with my
overall impression of where we are.
|
|
|
Rough and unscientific benchmark numbers |
|
|
|
|
|
msecs |
|
deriv |
|
dynamic |
|
sboyer |
|
maze |
|
nfa |
|
|
|
|
|
Chez Scheme |
|
770 |
|

|
|
360 |
|

|
|
1270 |
|

|
|
930 |
|

|
|
2100 |
|

|
|
|
|
|
|
Racket-on-Chez |
|
1560 |
|

|
|
1030 |
|

|
|
1470 |
|

|
|
1520 |
|

|
|
2200 |
|

|
|
|
|
|
|
Racket |
|
2030 |
|

|
|
560 |
|

|
|
2230 |
|

|
|
1330 |
|

|
|
4100 |
|

|
|
|
All benchmarks were run in safe mode and without Gustavo’s
work-in-progress addition to the Chez Scheme compiler that can
eliminate some runtime checks and related dead code.
Performance: Startup
Startup time for Racket-on-Chez is determined by Chez Scheme’s base
startup time (to load boot files) plus time to load the
expander and other parts of the Racket-on-Chez
stack. The load times shown below for racket/base and
racket include the startup time and add the additional
time needed to load the many compiled files that are parts of those
libraries.
For Racket and racket7, bytecode parsing can be delayed at the
granularity of a procedure, so that a procedure’s bytecode isn’t
parsed until the procedure is called. Delayed parsing is enabled by
default, and the -d flag disables it.
|
|
|
Startup/load time |
|
|
|
|
|
msecs |
|
startup |
|
-l racket/base |
|
-l racket |
|
|
|
|
|
Racket |
|
16 |
|

|
|
50 |
|

|
|
220 |
|

|
|
|
|
|
|
Racket -d |
|
16 |
|

|
|
80 |
|

|
|
500 |
|

|
|
|
|
|
|
racket7 |
|
50 |
|

|
|
110 |
|

|
|
340 |
|

|
|
|
|
|
|
racket7 -d |
|
50 |
|

|
|
190 |
|

|
|
650 |
|

|
|
|
|
|
|
Chez Scheme |
|
|
|

|
|
170 |
|

|
|
|
|

|
|
|
|
|
|
Racket-on-Chez |
|
440 |
|

|
|
550 |
|

|
|
1200 |
|

|
|
|
|
|
|
Racket-on-Chez/JIT |
|
440 |
|

|
|
550 |
|

|
|
1600 |
|

|
|
|
Compared to the current Racket version, starting up and loading takes
longer in racket7, because racket7 has to load embedded
bytecode for the expander’s implementation, and because the bytecode
encoding of linklets has not been refined as much.
The Chez Scheme row in the table doesn’t actually load
racket/base, but that’s the closest reasonable column.
That row corresponds to just starting Chez Scheme with its
compiler—
so it’s a signficantly smaller language than
racket/base, but it’s more than the Racket
startup rows. It’s also a lower bound on the Racket-on-Chez
startup time.
Loading Racket-on-Chez takes significantly longer, still, due to
larger and slower-to-load machine code for both the Racket-on-Chez
stack and libraries compiled with it. Although individual linklets
within a module (roughly, individual phases) are parsed on demand,
there is no delayed parsing at the granularity of procedures, so the
work performed by loading is closer to the Racket -d and
racket7 -d rows.
The last row above, Racket-on-Chez/JIT, refers to a variant of
Racket-on-Chez that does not compile whole linklets. Instead, it keeps
individual lambdas in source form until called, and it compiles
each called lambda on demand. Load times go up, mostly because
code is compiled as needed, but also because the source-fragment
representation is even larger than machine code right now. The
Racket-on-Chez/JIT approach does not look the most promising, so far,
but it’s also not crazy. With more effort and tuning (especially in
the representation of source fragments), it may be a viable approach
and more in line with the way the current Racket implementation works.
Clearly, load time is a direction for further effort.
Performance: Expander and Compiler
The expander runs at about the same speed in
Racket-on-Chez and racket7, but the linklet-compilation step
takes much longer in Racket-on-Chez. As a result, compilation time for
Racket-on-Chez dominates expansion proper for a task like racket
-cl racket/base.
Reported times in this section use a Racket-on-Chez stack
that is compiled in unsafe mode and run on a 2013 MacBook Pro 2.6GHz
Core i5. Unsafe mode improves expand and read performance by about
10%.
|
|
|
Steps within racket -cl racket/base |
|
|
|
|
|
msecs |
|
expand |
|
schemify |
|
compile |
|
eval |
|
read |
|
|
|
|
|
racket7 |
|
2511 |
|

|
|
|
|

|
|
662 |
|

|
|
235 |
|

|
|
275 |
|

|
|
|
|
|
|
Racket-on-Chez |
|
2398 |
|

|
|
917 |
|

|
|
2852 |
|

|
|
390 |
|

|
|
227 |
|

|
|
|
|
|
|
Racket-on-Chez/JIT |
|
2441 |
|

|
|
974 |
|

|
|
1044 |
|

|
|
515 |
|

|
|
229 |
|

|
|
|
For Racket-on-Chez/JIT, compile times go down, showing that about 1/3
of the code that is compiled during expansion is also run during
expansion. The fraction would be lower for most programs; the
racket/base library is unusual in building up layers
of macros to implement itself. The schemify step is certainly a
target for improvement, since its job is much simpler than the
compile step’s job.
That factor-of-2 slowdown relative to racket7 is
compounded by the Racket-implemented expander
being twice as slow as the current Racket’s C-implemented
expander:
|
|
|
Loading from source with racket -cl |
|
|
|
|
|
msecs |
|
racket/base |
|
racket |
|
|
|
|
|
Racket |
|
1830 |
|

|
|
21700 |
|

|
|
|
|
|
|
racket7 |
|
3490 |
|

|
|
38140 |
|

|
|
|
|
|
|
Racket-on-Chez |
|
7060 |
|

|
|
70093 |
|

|
|
|
|
|
|
Racket-on-Chez/JIT |
|
5400 |
|

|
|
55050 |
|

|
|
|
The gap between the Racket-implemented expander
and the C-implemented expander is the one that we
most hoped to close by moving to Chez Scheme. As part of the
Racket-on-Chez stack, the expander is already
compiled in a reasonable way (i.e., no cross-module optimization
issues), so it’s not clear how to improve. Still, I still think the
expander can be made to run faster in Chez
Scheme, and we’ll keep trying.
Performance: Distribution Build
The current version of Racket-on-Chez builds on a 64-bit machine with
a peak memory use around 1.25 GB, which is on a similar scale to the
current Racket release’s peak memory use around 1 GB.
The following plots show, on the same scale, memory use over time when
building a distribution. Each plot has two black lines (easiest to
distinguish in the last plot): the top black line describes peak
memory use, since it’s before a major GC, while the bottom black line
is closer to live-memory use, since it’s after a major GC. The orange
region is for compiling libraries (each vertical line starts a
collection), the blue region is for running documentation, the green
region is for rendering documentation, and the final pink region is
for re-rendering documentation as needed to reach a fix-point for
cross references. Memory use is expected to grow modestly during the
blue region, since the builds collecting cross-reference information
for use in the green region. Memory use should grow minimally in the
orange and green regions (as a side-effect of caches).
These graphs report builds on a Linux 3.4GHz Core i7-2600
machine with the Racket parts of the Racket-on-Chez stack compiled in
safe mode.
Racket
C-implemented expander |
|

|
|
|
|
|
racket7
Racket-implemented expander |
|

|
|
|
|
|
Racket-on-Chez
Same expander as racket7 |
|

|
The plots illustrate that, while memory use is similar, build times
are much longer. Expander and compiler
performance largely explains the difference.
The build time certainly can be improved some. Maybe build times can
be improved significantly, or maybe slower build times will seem
worthwhile for a more maintainable implementation of Racket and/or
for faster generated code.
Outlook
It would be nice if porting to Chez Scheme made every aspect of Racket
magically faster. It hasn’t done that, but we have plenty of room for
improvement; the performance results to date are a lower bound on
performance, not an upper bound.
Keep in mind that the original goal is not to have a faster Racket,
but a better-implemented Racket with acceptable performance. The
Racket-on-Chez implementation is far more maintainable and flexible
than the current Racket implementation, so maybe we’re half-way there
after one year of work.
You can try Racket-on-Chez from
https://github.com/racket/racket7/. See
racket/src/cs/README.txt
in that repo for more information.
If you’re interested in even more implementation details and plans, see
the extended version of this report.