29 Jan 2019

Racket-on-Chez Status: January 2019

posted by Matthew Flatt

For background information about Racket on Chez Scheme (a.k.a. Racket CS), see the original announcement, last January’s report, and the report at Scheme Workshop 2018.

Racket on Chez Scheme is done in a useful sense. All functionality is in place, DrRacket CS works fully, the main Racket CS distribution can build itself, and 99.95% of the core Racket test suite passes.

You can download a build for Windows, Linux, or Mac OS from the Utah snapshot site (look for “Racket CS”):

https://www.cs.utah.edu/plt/snapshots/

While code generally runs as fast as it should, end-to-end performance is not yet good enough to make Racket CS the default implementation of Racket. We’ll let the implementation settle and gradually improve, with the expectation that it will eventually be good enough to switch over— and better in the long run.

Compatibility with the Current Racket Implementation

Racket CS is intended to behave the same as the existing Racket implementation with a few exceptions:

There are still a few internal gaps related to handling large numbers of file descriptors (needed by servers with lots of connections, for example) and support for file-change events. Those should be easy to fill in, but our focus right now is on flushing out the remaining bugs that are exposed by test suites.

Another kind of incompatibility is that the compiled form of Racket code with the current implementation is platform-independent bytecode, while Racket CS’s compiled form is platform-specific machine code. This difference can sometimes affect a development workflow, and it required adjustments to the distribution-build process. Racket CS does not yet support cross compilation.

Here’s an incomplete list of things that are compatible between the current Racket implementation and Racket CS and that required some specific effort:

Macros, modules, threads, futures, places, custodians, events, ports, networking, string encodings, paths, regular expressions, mutable and immutable hash tables, structure properties and applicable structures, procedure arities and names, chaperones and impersonators, delimited continuations, continuation marks, parameters, exceptions, logging, security guards, inspectors, plumbers, reachability-based memory accounting, ephemerons, ordered and unordered finalization, foreign-function interface (mostly), phantom byte strings, source locations, left-to-right evaluation, result-arity checking, left-associative arithmetic, eqv? on NaNs, and eq? and flonums.

Outlook

The rest of this report will provide lots of numbers, but none of them expose the main benefit of Racket CS over the current Racket implementation: it’s more flexible and maintainable.

Putting a number on maintainability is less easy than measuring benchmark performance. Anecdotally, as the person who has worked on both systems, I can report that it’s no contest. The current Racket implementation is fundamentally put together in the wrong way (except for the macro expander), while Racket CS is fundamentally put together in the right way. Time and again, correcting a Racket CS bug or adding a feature has turned out to be easier than expected.

To maximize the maintenance benefits of Racket CS, it’s better to make it the default Racket variant sooner rather than later— and, ideally, discard the current Racket implementation. But while Racket CS is compatible with Racket to a high percentage, it’s never going to be 100%. From here, it’s some combination of patching differences and migrating away from irreconcilable differences, and that will take a little time. Given that both implementations need to exist for a while, anyway, we can given some weight to end-to-end performance when deciding on the right point to switch.

Many plots in this report are intended to tease out reasons for the performance difference between Racket CS and current Racket. From the explorations, so far, its does not appear that the performance difference is an inevitable trade-off from putting Racket together in a better way. Part of the problem is that some new code on top of Chez Scheme needs to be refined. Perhaps more significantly, there are some trade-offs in the space of compilation timing (ahead-of-time or just-in-time) and code representation (machine code versus bytecode) that we can adjust with more work.

Although the current Racket implementation and Racket CS will both exist for a while, we do not anticipate the dueling implementations to create problems for the Racket community. The question of which to use will be more analogous to “which browser works best for your application?” than “does this library need Python 2 or Python 3?”

Meanwhile, there’s even more code to maintain, and accommodating multiple Racket variants creates some extra complexity by itself (e.g., in the distribution builds). It still looks like a good deal in the long run.

Performance of Compiled Code

The plots below show timings for Chez Scheme (purple), Racket CS (blue), and current Racket (red) on traditional Scheme benchmarks. Shorter is better. The results are sorted by Chez Scheme’s time over Racket’s time, except that benchmarks that rely on mutable pairs are in a second group with green labels.

  • Note that the break-even point between Chez Scheme and Racket is toward the end of the set of benchmarks with black lables, which reflects that Chez Scheme is usually faster than current Racket.

  • The main result is that the blue bar tracks the purple bar fairly well for the benchmarks without mutable pairs: Racket CS’s layers on top of Chez Scheme are not interfering too much with Chez Scheme’s base performance, even though Racket CS wraps and constrains Chez Scheme in various ways (e.g., enforcing left-to-right evaluation of application arguments).

  • For the benchmarks that use mutable pairs (green labels), Racket CS loses some of Chez Scheme’s performance by redirecting mutable-pair operations away from the built-in pair datatype, since built-in pairs are used only for immutable pairs in Racket CS.

  • The tak variants where the blue bar is shortest may be due to an extra layer of function inlining. The collatz test is effectively a test of exact-rational arithmetic on large fractions.

The next set of plots compare Racket CS and current Racket on the Racket implementations of benchmarks that were written over the years for The Computer Language Benchmarks Game. These rely more heavily on Racket-specific language features. Racket CS’s slowness toward the end of the list is often due to the I/O implementation, which is newly implemented for Racket CS and will take time to refine.

Aside from the fact that I/O needs work in Racket CS, the takeaway here is that there are no huge problems nor huge performance benefits with the Racket CS implementation. Longer term, the red lines probably aren’t going to move, but because so much new code is involved with the blue lines, there’s reason to think that some blue lines can get shorter.

About the measurements: These benchmarks are in the "racket-benchmark" package in the "common" and "shootout" directories. We used commit f6b6f03401 of the Racket fork of Chez Scheme and commit c9e3788d42 of Racket. The Chez Scheme fork includes Gustavo Massaccesi’s “cptypes” pass, which improves Chez Scheme’s performance on a few benchmarks. The test machine was a Core i7-2600 3.4GHz running 64-bit Linux.

Startup and Load Times

Startup and load time have improved since previous reports, but Racket CS remains slower.

Startup for just the runtime system without any libraries (still on a Core i7-2600 3.4GHz running 64-bit Linux):

The difference here is that the Racket CS startup image has much more Scheme and Racket code that is dynamically loaded and linked, instead of loaded as a read-only code segment like the compiled C code that dominates the current Racket implementation. We can illustrate that effect by building the current Racket implementation in a mode where its Racket-implemented macro expander is compiled to C code instead of bytecode, too, shown below as “R/cify.” We can also compare to Racket v6, which had an expander that was written directly in C:

The gap widens if we load compiled Racket code. Loading the racket/base library:

The additional difference here is that Racket CS’s machine code is bigger than current Racket’s bytecode representation. Furthermore, the current Racket implementation is lazy about parsing some bytecode. We can tease out the latter effect by disabling lazy bytecode loading with the -d flag, shown as “R/all”:

(We could also force bytecode to be JITted immediately— but JITting is more work than just loading, so that timing result would not be useful.)

We get a similar shape and a larger benefit from lazy loading with the racket library, which is what the racket executable loads by default for interactive mode:

About the measurements: These results were gathered by using time in a shell a few times and taking the median. The command was as shown, but using racketcs for the “R/CS” lines and racket -d for the “R/all” lines.

Memory Use

Like load times, differences in memory use between Racket CS and current Racket can be attributed to code-size differences from bytecode versus machine code and by lazy bytecode loading.

The following plots show memory use, including both code and data, after loading racket/base or racket, but subtracting memory use at the end of a run that loads no libraries (which reduces noise from different ways of counting code in the initial heap). The “R/jit!” line uses -d to load all bytecode eagerly, and it further forces that bytecode to be compiled to native code by the JIT compiler:

These results show that bytecode is more compact than machine code, as expected. Lazy parsing of bytecode also makes a substantial difference in memory use for the current Racket implementation. Racket’s current machine code takes a similar amount of space as Chez Scheme machine code, but the JIT overhead and other factors make it even larger. (Bytecode is not retained after conversion to machine code by the JIT.)

On a different scale and measuring peak memory use instead of final memory use for DrRacket start up and exit:

This result reflects that DrRacket’s memory use is mostly the code that implements DrRacket, at least if you just start DrRacket and immediately exit.

The gap narrows if you open an earlier version of this document’s source and run it three times before exiting, so that memory use involves more than mostly DrRacket’s own code:

About the measurements: These results were gathered by running racket or racketcs starting with the arguments -l racket/base, -l racket, or -l drracket. The command further included -W "debug@GC" -e (collect-garbage) -e (collect-garbage) and recording the logged memory use before that second collection. For the “R” line, the reported memory use includes the first number that is printed by logging in square brackets, which is the memory occupied by code outside of the garbage collector’s directly managed space. For “R/all,” the -d flag is used in addition, and for “R/jit!,” the PLT_EAGER_JIT environment variable was set in addition to supplying -d.

Expand and Compile Times

Compile times have improved some for Racket CS since the original report, but not dramatically. These plots compare compile times from source for the racket/base module (and all of its dependencies) and the racket module (and dependencies):

Compilation requires first macro-expanding source, and that’s a significant part of the time for loading from source. Racket CS and current Racket use the same expander implementation, and they expand at practically the same speed, so the extra time in Racket CS can be attributed to machine-code compilation. The following plots show how parts of the compile time can be attributed to specific subtasks:

Another way to look at compile times is to start with modules that are already expanded by the macro expander and just compile them. The -M flag alone does not do that, but it’s meant here to represent an installation that was constructed by using the -M flag for all build steps:

The difference in these compile times reflects how Chez Scheme puts much more effort into compilation. Of course, the benefit is the improved run times that you see in so many benchmarks.

The compile-only bars are also significantly shorter than taking the expansion-plus-compilation bars and removing only the gray part. That’s because the gray part only covers time spent specifically in the macro expander or running macro transformers, but it does not cover the time to compile macro definitions as they are discovered during expansion or to instantiate modules for compile-time use.

Given that the Chez Scheme compiler is so much slower (for good reason) than the current Racket compiler, we might ask how it compares to other, non-Racket compilers. Fortunately, we can make a relatively direct comparison between C and Racket, because the Racket macro expander was formerly written in C, and now it is written in Racket with essentially the same algorithms and architecture (only nicer). The implementations are not so different in lines of code: 45 KLoC in C versus 28.5 KLoC in Racket. The following plot shows compile times for the expander’s implementation:

To further check that we’re comparing similar compilation tasks, we can check the size of the generated machine code. Toward that end, we can compile the Racket code to C code through a cify compiler, which is how the expander is compiled for the current Racket implementation for platforms that are not supported by Racket’s JIT. Below is a summary of machine-code sizes for the various compiled forms of the expander.

The current Racket implementation generates much more code from the same implementation, in part because it inlines functions aggressively and relies on the fact that only called code is normally translated to machine code; the “R/jit!/no” bar shows the code size when inlining is disabled. In any case, while the machine-code sizes vary quote a bit in this test, they’re all on the same general scale.

In summary, as an extensible language, the question of compile times is more complicated than for a conventional programming language. At the core-compiler level, current Racket manages to be very fast as a compiler by not trying hard. Racket CS, which gets its compile times directly from Chez Scheme, spends more time compiling, but it still has respectable compile times.

About the measurements: The numbers in compile-time plots come from running the shown command (but with racketcs instead of racket for the “R/CS” lines) with the PLT_EXPANDER_TIMES and PLT_LINkLET_TIMES environment variables set. The overall time is as reported by time for user plus system time, and the divisions are extracted from the logging that is enabled by the environment variables.

For measuring compile times on the expander itself, the Chez Scheme measurement is based on the build step that generates "expander.so", the current-Racket measurement is based on the build step that generates "cstartup.inc", and the C measurement is based on subtracting the time to rebuild Racket version 6.12 versus version 7.2.0.3 when the ".o" files in "build/racket/gc2" are deleted.

For measuring machine-code size, the expander’s code size for Chez Scheme was computed by comparing the output of object-counts after loading all expander prerequsites to the result after the expander; to reduce the code that is just form the library wrapper, the expander was compiled as a program instead of as a library. The code size for Racket was determined by setting PLT_EAGER_JIT and PLT_LINKLET_TIMES and running racket -d -n, which causes the expander implemtation to be JITted and total bytes of code generated by the JIT to be reported. The “R/no-inline” variant was the same, but compiling the expander to bytecode with compile-context-preservation-enabled set to #f, which disables inlining. The “R/cify” code size was computed by taking the difference on sizes of the Racket shared library for a normal build and one with --enable-cify, after stripping the binaries with strip -S, then further subtracting the size of the expander’s bytecode as it is embedded in the normal build’s shared library. The “C” code size was similarly computed by subtracting the size of the Racket shared library for version 7.2.0.3 from the size for the 6.12 release, stipped and with the expander bytecode size subtracted.

Build Time

Since Racket programs rely heavily on metaprogramming facilities–either directly or just by virtue of being a Racket program— the time required to build a Racket program depends on a combination of compile time, run time, and load time. Few Racket programmers may care exactly how long it takes to build the Racket distribution itself, but distribution-build performance is probably indicative of how end-to-end performance will feel to a programmer using Racket.

The following plots are all on the same scale, and they show memory use plotted against time for building the Racket distribution from source. The first two plots are essentially the same as in the January 2018 report. While the graph stretches out horizontally for Racket CS, showing a build that takes about three times as long, it has very much the same shape for memory use.

Racket CS

 

 

current Racket

 

One might assume that the difference in compile time explains the slower Racket CS build. However, this assumption does not hold up if we completely isolate the step of compiling fully expanded modules. To set up that comparison, the following plots show build activity when using current Racket and making “compile” just mean “expand.” It happens to take about the same time as a Racket CS build, but with more of the time in the documentation phases:

current Racket -M

 

Although current Racket compiles from expanded source relatively quickly, a build requires loading the some modules over and over for compiling different sets of libraries and running different documentation examples. The documentation running and rendering phases, as shown in the blue in green regions, are especially show and use especially much memory, because documentation often uses sandboxes that load libraries to run and render examples. (The big jump at the same point in the blue and green region merits further investigation. It might be a sandbox bug or a leaky unsafe library.)

Given the result of the expand-only build as an input, we can switch in-place to either Racket CS or normal-mode current Racket and compile each fully expanded module to machine code:

Racket CS finish

 

 

current Racket finish

 

Each module is compiled from expanded form just once, and that compiled form can be used as needed (for cross-module optimization) to compile other modules. Also, documentation doesn’t get re-run and re-rendered in this finishing build, because the build process can tell that the sources did not change. Overall, compilation finishes in under 20 minutes for Racket CS, which is a reasonable amount of time for 1.2 million lines of source Racket code.

These build-finishing plots illustrate how the Racket distribution server generate bundles for multiple platforms and variants in hours instead of days. The build server first creates expanded-module builds of the packages and main collections, and it serves those to machine-specific finishing builds.

About the measurements: These plots were generated using the "plt-build-plot" package, which drives a build from source and plots the results. The -M build was created by setting the PLT_COMPILE_ANY environment variable, and then the finishing builds were measured by another run on the result but using the --skip-clean flag for "plt-build-plot".

Implementation Outlook

Based on the data that we’ve collected so far, I see three directions toward improving end-to-end performance for Racket CS:

  • Improvements to new implementation of Racket’s I/O API.

  • Better support in Chez Scheme to trade performance for faster compilation, combined at the Racket CS level with a bytecode-and-JIT setup that supports lazy decoding of bytecode.

    The January 2018 report mentions an experimental JIT mode for Racket CS, and that alternative remains in place. At the moment, it’s not a good alternative to Racket CS’s default mode, but it still may be a step in the right direction, especially considering that it allows JIT-style compilation and ahead-of-time compilation to coexist.

  • Algorithmic improvements to the way macros and modules work.

    That the full expansion stack takes 10 times as long as core compilation for Racket libraries suggests that there is room for algorithmic improvements that would help both the current Racket implementation and Racket CS.

There are bound to be additional performance factors that we haven’t yet isolated. Whether it turns out to be the factors that we know or others, working in the new implementation of Racket will make it easier explore the solutions.

more →

26 Oct 2018

Racket v7.1

posted by Vincent St-Amour

Racket version 7.1 is now available from http://racket-lang.org/

  • Although it is still not part of this release, the development of Racket on Chez Scheme continues. We still hope and expect that Racket-on-Chez will be ready for production use later in the v7.x series, perhaps mid–2019.

  • Trackpad scrolling works in more reliably in some Windows and Linux/Unix environments.

  • New users of DrRacket will open files into new tabs (by default).

  • The teaching languages support the unicode character for lambda. The teaching unit test framework no longer stops testing when a tested expression signals an error.

  • A refinement to error reporting for compile-time code helps clarify when an syntax error is likely due to an earlier unbound identifier (because the unbound-identifier error otherwise must be delayed, in case a definition appears later).

  • A ++lang <lang> flag for raco exe simplifies the creation of executables that dynamically load #lang <lang> modules at run time.

  • Typed Racket adds types for mutable and immutable vectors: (Mutable-Vectorof T), (Immutable-Vectorof T), (Immutable-Vector T), and (Mutable-Vector T). The new types are subtypes of the existing Vectorof and Vector types. The return types of a few standard vector functions use the new, more specific, types. When an immutable vector flows from untyped code to typed code, Typed Racket may be able to check the vector with a flat contract.

  • The hashing functions sha1-bytes, sha224-bytes, and sha256-bytes are added to racket/base.

  • curry from racket/function supports currying functions with keyword arguments, and procedure-arity and procedure-keywords return the correct result when applied to curried functions.

  • Slideshow supports widescreen mode (finally!). Implement widescreen slides using slideshow/widescreen or provide the --widescreen command-line flag to Slideshow. Combine --widescreen with --save-aspect to make widescreen mode the default in your installation.

  • Racket supports FreeBSD/aarch64.

  • Various improvements and additions were made to the DeinProgramm teaching languages and their documentation.

The following people contributed to this release: Akihide Nano, Alex Harsányi, Alex Knauth, Alexander McLin, Alexis King, Andrew Kent, Ben Greenman, Bruno Cuconato, Chongkai Zhu, Claes Wallin, David Benoit, Gary F. Baumgartner, Gustavo Massaccesi, Jay McCarthy, Jens Axel Søgaard, Jérôme Martin, John Clements, Jordan Johnson, Kimball Germane, Leif Andersen, Matthew Butterick, Matthew Flatt, Matthias Felleisen, Mike Sperber, Milo Turner, myfreeweb, Oling Cat, Paulo Matos, Philip McGrath, Robby Findler, Roman Klochkov, Ryan Culpepper, Sam Caldwell, Sam Tobin-Hochstadt, Shu-Hung You, Stephen Chang, Tong-Kiat Tan, Vincent St-Amour, Winston Weinert, and yjqww6.

Feedback Welcome

more →

27 Jul 2018

Racket v7.0

posted by Vincent St-Amour

Racket version 7.0 is now available from http://racket-lang.org/

Racket version 7.0 includes substantial internal changes toward the long-term goal of replacing Racket’s current runtime system and supporting multiple runtime systems. We do not expect Racket users to see a big difference between Racket v6.12 and Racket v7.0, but since the internals differ significantly, a major-version bump helps track the change.

Version 7.0 replaces about 1/8 of the core v6.12 implementation with a new macro expander that bootstraps itself. The expander turns out to be about 40% of the new code needed to replace Racket’s core with Chez Scheme. Most of the other 60% is also implemented, but it is not included in this release; we hope and expect that Racket-on-Chez will be ready for production use later in the v7.x series.

  • The syntax (#') form supports new template subforms: ~@ for splicing and ~? for choosing between subtemplates based on whether pattern variables have “absent” value (from an ~optional pattern in syntax-parse, for example). The syntax/parse/experimental/template library, where these features originated, re-exports the new forms under old names for compatibility.

  • On Windows, an --embed-dlls flag for raco exe creates a truly standalone, single-file ".exe" that embeds Racket’s DLLs.

  • DrRacket’s “Create Executable” option for the teaching language (Beginner Student, etc.) uses --embed-dlls to create single-file, standalone ".exe"s on Windows.

  • Typed Racket’s support for prefab structs is significantly improved. This supports using prefab structs more polymorphically, and fixes significant bugs in the current implementation. Programs which currently use predicates for prefab structs on unknown data may need to be revised, since previous versions of Typed Racket allowed potentially buggy programs to type check. See Typed Racket RFC 1 and this blog post for more details on this change and on how to fix programs affected by it.

  • Typed Racket supports #:rest-star in the ->* type constructor, which allows function types to specify rest arguments with more complex patterns of types, such as the hash function.

  • Interactive overlays can be added to plots produced by plot-snip. This allows constructing interactive plots or displaying additional information when the mouse hovers over the plot area. Examples of how to use this feature can be found here

  • racket/plot provides procedures for displaying candlestick charts for use in financial time series analysis.

  • Added contract-equivalent?, a way check if two contracts are mutually stronger than each other without the exponential slowdown that two calls to contract-stronger? brings.

  • Lazy Racket supports functions with keyword arguments.

The following people contributed to this release:

Adam Davis Lee, Alex Harsányi, Alex Knauth, Alexander McLin, Alexander Shopov, Alexis King, Andrew M. Kent, Asumu Takikawa, Ben Greenman, Caner Derici, Daniel Feltey, David Benoit, David Kempe, Don March, Eric Dobson, evdubs, Foo Chuan Wei, Georges Dupéron, Gustavo Massaccesi, Hashim Muqtadir, Jakub Jirutka, James Bornholt, Jasper Pilgrim, Jay McCarthy, Jens Axel Søgaard, John Clements, Juan Francisco Cantero Hurtado, Kashav Madan, Kieron Hardy, Leandro Facchinetti, Leif Andersen, Luke Lau, Matthew Butterick, Matthew Flatt, Matthias Felleisen, Michael Ballantyne, Michael Burge, Michael Myers, Mike Sperber, Milo Turner, NoCheroot, Oling Cat, Paulo Matos, Philip McGrath, Philippe Meunier, Robby Findler, Ryan Culpepper, Sam Tobin-Hochstadt, Sarah Spall, Shu-Hung You, Sorawee Porncharoenwase, Spencer Florence, Stephen Chang, Tony Garnock-Jones, Tucker DiNapoli, UM4NO, Vincent St-Amour, and William J. Bowman.

Feedback Welcome

more →

26 Jan 2018

Racket v6.12

posted by Vincent St-Amour

Racket version 6.12 is now available from http://racket-lang.org/

  • Changed the way some unsafe operations are exposed via ffi/unsafe libraries to help smooth a future transition to a new runtime system.

  • The syntax-parse form supports unwinding side-effects when it backtracks, both explicitly with ~undo patterns and implicitly with the built-in managed state (using syntax-parse-state-ref, etc).

  • The db library supports SCRAM-SHA–256 authentication for backends running PostgreSQL 10 or later. Client support for SCRAM and other SASL mechanisms is provided by the new sasl library.

  • The lazy-require-syntax form supports lazy loading of macro transformers. Note that the macros must obey certain implementation constraints (see the lazy-require-syntax documentation).

  • Typed Racket no longer enforces types like (U String (Boxof String)) with the any/c contract. This fixes a type soundness issue, but may affect performance. Please submit a bug report if you find a program that runs significantly slower on v6.12 than earlier versions.

  • Typed Racket’s type instantiation (inst) uses Any for omitted type arguments, allowing APIs to add additional type variables to functions without breaking existing programs.

  • for/fold users can customize the final result of a loop’s computation using the #:result keyword.

  • The --deps option to raco test tests the packages the argument packages depends on, in addition to testing the packages themselves. For example, raco test -p --deps pkg1 pkg2 tests all files from pkg1, pkg2, and all of their dependencies.

The following people contributed to this release: Alexander Shopov, Alexis King, Andrew Gwozdziewycz, Andrew Kent, Ben Greenman, Chung-chieh Shan, Conor Finegan, Daniel Feltey, Daniel Mendler, Eric Dobson, Gabriel Ebner, Greg Cooper, Greg Hendershott, Gustavo Massaccesi, Huma Zafar, Jack Firth, James Bornholt, Jay McCarthy, John Clements, Kimball Germane, Leif Andersen, Matias Eyzaguirre, Matthew Butterick, Matthew Flatt, Matthias Felleisen, Michael Ballantyne, Mike Sperber, Milo Turner, Robby Findler, Rommel Martinez, Ryan Culpepper, Sam Tobin-Hochstadt, Shu-Hung You, Soulaïmane Sahmi, Spencer Florence, Stephen De Gabrielle, Vincent St-Amour, Wesley Kerfoot, and William G Hatch.

Feedback Welcome

more →

05 Jan 2018

Racket-on-Chez Status: January 2018

posted by Matthew Flatt with thanks to collaborators Kent Dybvig, Andy Keep, Gustavo Massaccesi, Sarah Spall, Sam Tobin-Hochstadt, and Jon Zeppieri

It’s been almost a year since we decided to try running Racket on Chez Scheme, and it has been almost six months since my last status report. As of RacketCon 2017, DrRacket could start up on Chez Scheme (but just barely, and just in time for the demo).

The current implementation is now considerably more complete. DrRacket still doesn’t run well, but Racket-on-Chez can build a full Racket distribution in reasonable time and space.

With the current Racket-on-Chez compilation strategy, runtime performance is plausible on traditional benchmarks, but cross-module optimization is needed to bring the results fully in line with our expectations. Startup and load times are much slower than we would like. Compile/build times are a factor of 4 slower than the current version of Racket.

Glossary of implementations:
  • Racket the current Racket release, version 6.x

  • racket7 mostly the same as Racket, but using a new, Racket-implemented macro expander; the source is in the https://github.com/racket/racket7 repo

  • Chez Scheme a completely different implementation of Scheme that we’re trying to use as a replacement for part of Racket

  • Racket-on-Chez Chez Scheme plus additional layers to make it implement the same language as Racket; the source is also in the https://github.com/racket/racket7 repo

Approach

Our overall goal is to improve Racket’s implementation and make it more portable to different runtime systems. To a first approximation, improving Racket and making it more portable means “less C code.” Building on a Chez Scheme is a promising means toward that end.

The picture on the right illustrates the current Racket distribution. The parts in blue are implemented in Racket, while the part in red is implemented in C. The green block is Scribble documentation source.

 

The numbers next to each block are a rough lines-of-code counts, and the blocks are scaled to those counts. The collects block represents the content of the main "collects" directory; along with the core block, it’s roughly the code in a Minimal Racket distribution. The pkgs block is the part of the main distribution that is available in separate packages but bundled with the main distribution. The docs block is spread across the same packages as the pkgs code.

 

Porting Racket to Chez Scheme is all about the core part. The goal is to make no changes to existing Racket and Scribble code, although we’ll inevitably have to make small changes to support multiple Racket implementations.

 

image

Let’s zoom into the core part. (From now on, all the pictures on the right are at this scale.) The picture breaks the C code in Racket’s current implementation into several major subsystems. In reality, the implementation is not organized into such tidy layers, but the picture is conceptually close to the truth.

 

The large builtins block implements primitive data structures and their operations, such as numbers (fixnums, bignums, flonums, exact rationals, complex numbers), strings, characters, byte strings, lists, vectors, and hash tables. Most other blocks are self-explanatory.

 

The expander block implements the macro expander, reader, and module system. More than any other part of the current stack, the expander part is better implemented in Racket.

 

image

That was exactly the work of 2016: building a new expander to replace the C-implemented expander. You can use that racket7 variant of Racket now by building from the http://github/racket/racket7 repo.

 

This new expander hasn’t made it into the release, because a few interactions haven’t been sorted out: the macro stepper, the demodularizer, and to some degree the decompiler. The expander also runs at about half the speed of the C-implemented expander, and that performance difference was part of the motivation to investigate a new runtime system.

 

Although the compiler+JIT block is drawn the same as before in this picture, it’s a slightly different compiler, because it doesn’t know about modules or syntax objects. Instead, the compiler works in terms of linklets, which are lambda-like blocks of code that import and export variables. That separation of expansion/modules from compilation helps enable the transition to a different compiler and runtime system.

 

image

And that brings us to Chez Scheme. The most obvious feature in the picture on the right is that Chez Scheme’s C-implemented kernel is a fraction of the implementation, while most of Chez Scheme is more sensibly written in Scheme.

 

Chez Scheme’s expander is considerably simpler than Racket’s. Although Chez Scheme’s expander is hygienic and hugely influential on Racket’s expander, Racket’s macro and module system does a lot more.

 

Chez Scheme’s compiler, meanwhile, does a lot more than Racket’s compiler+JIT. The builtins functionality in Chez Scheme is similar to Racket’s builtins.

 

Chez Scheme’s GC (at the bottom) is tiny and elegant. It lacks some functionality of Racket’s GC, but mostly it lacks Racket’s historical complexity for trying to cooperate directly with C.

 

image

That brings us, finally, to the Racket-on-Chez picture. It starts with the Chez Scheme stack and adds a rumble block, which extends Chez Scheme to match the current Racket’s GC through control+structs blocks.

 

Mostly, rumble is about immutable hash tables, delimited continuations, applicable structures, structure types properties, and impersonators/chaperones. We’ve given this layer of functionality the name Rumble, because it’s useful to have a name for the language that hosts other layers that are built using Racket.

 

The threads, io, and regexp blocks are entirely new implementations of the corresponding blocks in the current Racket implementation. The rktio block is used by io, and it’s the same as in current Racket: a portability layer over OS facilities that is similar to (but more extensive than) code that is in Chez Scheme’s kernel.

 

image

The expander block (the same as for racket7) sits on a small schemify adaptor. The schemify layer converts Racket linklets into plain Scheme functions, adjusting expressions as needed to support applicable structures, to enforce left-to-right evaluation, and to implement Racket’s allocation model for local functions (i.e., lift to avoid closure allocation whenever possible).

Naturally, the Racket-on-Chez implementation is bootstrapped by applying the expander and schemify layers to themselves and the other Racket-implemented parts of the stack, and then compiling the schemified linklets using the Chez Scheme compiler. A thin main driver on top handles command-line arguments to load modules and start the readevalprint loop.

Current Functionality

DrRacket starts up in Chez Scheme, and just starting DrRacket exercises many Racket language constructs and libraries. The largest Racket-on-Chez demonstration to date is building all of the packages and documentation of the main Racket distribution. Building packages and documentation, again, covers more Racket functionality than you may expect, because our macros do interesting things, such as typechecking and generating bitmap images. Also, the build process is too long and complex to tolerate leaks or major inefficiencies.

Still, many pieces are buggy, and closing the gap is mostly a matter of plowing through the Racket test suites. The main missing or incomplete pieces are: futures, places, and a racket executable that starts Racket-on-Chez.

Performance: Benchmark Sanity Check

The overall compilation story is not yet right in Racket-on-Chez, partly because there’s no cross-module optimization. Functions like map, memq, and assv need cross-module optimization to be compiled well in Racket-on-Chez. Benchmarks sometimes do not rely on cross-module optimization, however, and preliminary benchmark measurements can be useful as a sanity check on the ways that schemify mangles individual modules.

Benchmarks are in the racket-benchmark package whose source is in the racket7 repo. The benchmarks run in Chez Scheme as R6RS libraries. The difference between Racket and racket7 is in the noise.

The table below shows a handful of rough benchmark results. You should not take the results seriously, but they line up with my overall impression of where we are.

 

  

Rough and unscientific benchmark numbers

  

 

 

  

msecs

  

deriv

  

dynamic

  

sboyer

  

maze

  

nfa

  

 

 

  

Chez Scheme

  

770

  

image

  

360

  

image

  

1270

  

image

  

930

  

image

  

2100

  

image

  

 

 

  

Racket-on-Chez

  

1560

  

image

  

1030

  

image

  

1470

  

image

  

1520

  

image

  

2200

  

image

  

 

 

  

Racket

  

2030

  

image

  

560

  

image

  

2230

  

image

  

1330

  

image

  

4100

  

image

  

 

All benchmarks were run in safe mode and without Gustavo’s work-in-progress addition to the Chez Scheme compiler that can eliminate some runtime checks and related dead code.

Performance: Startup

Startup time for Racket-on-Chez is determined by Chez Scheme’s base startup time (to load boot files) plus time to load the expander and other parts of the Racket-on-Chez stack. The load times shown below for racket/base and racket include the startup time and add the additional time needed to load the many compiled files that are parts of those libraries.

For Racket and racket7, bytecode parsing can be delayed at the granularity of a procedure, so that a procedure’s bytecode isn’t parsed until the procedure is called. Delayed parsing is enabled by default, and the -d flag disables it.

 

  

Startup/load time

  

 

 

  

msecs

  

startup

  

-l racket/base

  

-l racket

  

 

 

  

Racket

  

16

  

image

  

50

  

image

  

220

  

image

  

 

 

  

Racket -d

  

16

  

image

  

80

  

image

  

500

  

image

  

 

 

  

racket7

  

50

  

image

  

110

  

image

  

340

  

image

  

 

 

  

racket7 -d

  

50

  

image

  

190

  

image

  

650

  

image

  

 

 

  

Chez Scheme

  

  

image

  

170

  

image

  

  

image

  

 

 

  

Racket-on-Chez

  

440

  

image

  

550

  

image

  

1200

  

image

  

 

 

  

Racket-on-Chez/JIT

  

440

  

image

  

550

  

image

  

1600

  

image

  

 

Compared to the current Racket version, starting up and loading takes longer in racket7, because racket7 has to load embedded bytecode for the expander’s implementation, and because the bytecode encoding of linklets has not been refined as much.

The Chez Scheme row in the table doesn’t actually load racket/base, but that’s the closest reasonable column. That row corresponds to just starting Chez Scheme with its compiler— so it’s a signficantly smaller language than racket/base, but it’s more than the Racket startup rows. It’s also a lower bound on the Racket-on-Chez startup time.

Loading Racket-on-Chez takes significantly longer, still, due to larger and slower-to-load machine code for both the Racket-on-Chez stack and libraries compiled with it. Although individual linklets within a module (roughly, individual phases) are parsed on demand, there is no delayed parsing at the granularity of procedures, so the work performed by loading is closer to the Racket -d and racket7 -d rows.

The last row above, Racket-on-Chez/JIT, refers to a variant of Racket-on-Chez that does not compile whole linklets. Instead, it keeps individual lambdas in source form until called, and it compiles each called lambda on demand. Load times go up, mostly because code is compiled as needed, but also because the source-fragment representation is even larger than machine code right now. The Racket-on-Chez/JIT approach does not look the most promising, so far, but it’s also not crazy. With more effort and tuning (especially in the representation of source fragments), it may be a viable approach and more in line with the way the current Racket implementation works.

Clearly, load time is a direction for further effort.

Performance: Expander and Compiler

The expander runs at about the same speed in Racket-on-Chez and racket7, but the linklet-compilation step takes much longer in Racket-on-Chez. As a result, compilation time for Racket-on-Chez dominates expansion proper for a task like racket -cl racket/base.

Reported times in this section use a Racket-on-Chez stack that is compiled in unsafe mode and run on a 2013 MacBook Pro 2.6GHz Core i5. Unsafe mode improves expand and read performance by about 10%.

 

 

Steps within racket -cl racket/base

 

 

 

 

msecs

 

expand

 

schemify

 

compile

 

eval

 

read

 

 

 

 

racket7

 

2511

 

image

 

 

image

 

662

 

image

 

235

 

image

 

275

 

image

 

 

 

 

Racket-on-Chez

 

2398

 

image

 

917

 

image

 

2852

 

image

 

390

 

image

 

227

 

image

 

 

 

 

Racket-on-Chez/JIT

 

2441

 

image

 

974

 

image

 

1044

 

image

 

515

 

image

 

229

 

image

 

 

For Racket-on-Chez/JIT, compile times go down, showing that about 1/3 of the code that is compiled during expansion is also run during expansion. The fraction would be lower for most programs; the racket/base library is unusual in building up layers of macros to implement itself. The schemify step is certainly a target for improvement, since its job is much simpler than the compile step’s job.

That factor-of-2 slowdown relative to racket7 is compounded by the Racket-implemented expander being twice as slow as the current Racket’s C-implemented expander:

 

  

Loading from source with racket -cl

  

 

 

  

msecs

  

racket/base

  

racket

  

 

 

  

Racket

  

1830

  

image

  

21700

  

image

  

 

 

  

racket7

  

3490

  

image

  

38140

  

image

  

 

 

  

Racket-on-Chez

  

7060

  

image

  

70093

  

image

  

 

 

  

Racket-on-Chez/JIT

  

5400

  

image

  

55050

  

image

  

 

The gap between the Racket-implemented expander and the C-implemented expander is the one that we most hoped to close by moving to Chez Scheme. As part of the Racket-on-Chez stack, the expander is already compiled in a reasonable way (i.e., no cross-module optimization issues), so it’s not clear how to improve. Still, I still think the expander can be made to run faster in Chez Scheme, and we’ll keep trying.

Performance: Distribution Build

The current version of Racket-on-Chez builds on a 64-bit machine with a peak memory use around 1.25 GB, which is on a similar scale to the current Racket release’s peak memory use around 1 GB.

The following plots show, on the same scale, memory use over time when building a distribution. Each plot has two black lines (easiest to distinguish in the last plot): the top black line describes peak memory use, since it’s before a major GC, while the bottom black line is closer to live-memory use, since it’s after a major GC. The orange region is for compiling libraries (each vertical line starts a collection), the blue region is for running documentation, the green region is for rendering documentation, and the final pink region is for re-rendering documentation as needed to reach a fix-point for cross references. Memory use is expected to grow modestly during the blue region, since the builds collecting cross-reference information for use in the green region. Memory use should grow minimally in the orange and green regions (as a side-effect of caches).

These graphs report builds on a Linux 3.4GHz Core i7-2600 machine with the Racket parts of the Racket-on-Chez stack compiled in safe mode.

Racket

C-implemented expander

 

 

racket7

Racket-implemented expander

 

 

Racket-on-Chez

Same expander as racket7

 

The plots illustrate that, while memory use is similar, build times are much longer. Expander and compiler performance largely explains the difference.

The build time certainly can be improved some. Maybe build times can be improved significantly, or maybe slower build times will seem worthwhile for a more maintainable implementation of Racket and/or for faster generated code.

Outlook

It would be nice if porting to Chez Scheme made every aspect of Racket magically faster. It hasn’t done that, but we have plenty of room for improvement; the performance results to date are a lower bound on performance, not an upper bound.

Keep in mind that the original goal is not to have a faster Racket, but a better-implemented Racket with acceptable performance. The Racket-on-Chez implementation is far more maintainable and flexible than the current Racket implementation, so maybe we’re half-way there after one year of work.

You can try Racket-on-Chez from https://github.com/racket/racket7/. See racket/src/cs/README.txt in that repo for more information.

If you’re interested in even more implementation details and plans, see the extended version of this report.

more →

11 Nov 2017

Refinement Types in Typed Racket

posted by Andrew Kent

With the Racket v6.11 release, Typed Racket has begun to support basic refinement and dependent function types. This post gives an overview for working with these types while writing some simple vector operations.

Currently these types are documented under Typed Racket’s experimental features. They are experimental in the sense that they are relatively new. We look forward to improving their usefulness by incorporating user feedback.

Note: If you wish to experiment with refinement and/or dependent function types in the near future, you should use a snapshot build, as it will have additional features and bug fixes from user feedback. Some of the examples in this post will require a build from early November 2017 or later.

more →

30 Oct 2017

Racket v6.11

posted by Vincent St-Amour

Racket version 6.11 is now available from http://racket-lang.org/

  • A bug preventing OneDrive users on Windows 10 Fall Creators update from opening files has been fixed.

  • Typed Racket supports refinement types and dependent function types. Previously an experimental feature, refinement types allow types to describe more interesting properties of values, especially integers. For example, this type shows that the max function always produces a number at least as big as its inputs: (-> ([x : Integer] [y : Integer]) (Refine [z : Integer] (and (>= z x) (>= z y))))

  • DrRacket’s Program Contour is significantly more efficient; using it no longer hurts DrRacket’s interactivity.

  • The web-server/formlets library produces formlets that are serializable, facilitating dynamic uses of formlets in stateless #lang web-server servlets. The new web-server/formlets/stateless and web-server/formlets/unsafe libraries provide additional support with the same API.

  • The db library supports the Cassandra database.

The following people contributed to this release: Alex Knauth, Andrew Gwozdziewycz, Andrew Kent, Asumu Takikawa, Ben Greenman, Ben Noordhuis, Carlo Dapor, Daniel Feltey, David Christiansen, David Van Horn, Eric Dobson, Gustavo Massaccesi, Harold Carr, Jack Firth, Jay McCarthy, Jeannie Tran, Jens Axel Søgaard, jgreco, John Clements, Jordan Johnson, Justin Slepak, Leandro Facchinetti, Leif Andersen, Matthew Butterick, Matthew Flatt, Matthias Felleisen, Michael Ballantyne, Milo Turner, Nadeem Abdul Hamid, Philip McGrath, rain, Robby Findler, Royall Spence, Ryan Culpepper, Ryan Davis, Sam Tobin-Hochstadt, Shu-Hung You, Spencer Florence, Stephen Chang, Vincent St-Amour, WarGrey Gyoudmon Ju, and Weng Shiwei.

Feedback Welcome

more →

11 Oct 2017

Tutorial: Creating a Package

posted by Stephen Chang

This post is a summary of a tutorial presented at RacketCon 2017.

It describes how to create a package starting from a single Racket file.

Specifically, this post explains how to:

more →

27 Sep 2017

Tutorial: Contributing to Racket

posted by Ben Greenman, with help from Matthew Butterick and Robby Findler and Jack Firth and Vincent St-Amour and Sam Tobin-Hochstadt

This post describes 3 ways to contribute to Racket: (1) fixing a typo online, (2) submitting a pull request to the racket/racket repository, and (3) submitting a pull request to a repository in the Racket GitHub organization.

more →

12 Sep 2017

Racket v6.10.1

posted by Vincent St-Amour

Racket version 6.10.1 is now available from http://racket-lang.org/

Version 6.10.1 patches the recent v6.10 release in three small ways:

  • On Linux, corrects a mishandling of file-stream ports that can block on input or output (such as pipes between processes), where closing a blocked port may cause a port that’s opened later to be incorrectly reported as blocked.

  • On Windows, corrects a crash in the handling of symbolic links.

  • On all platforms, corrects the peer-side results from tcp-addresses.

more →

Made with Frog, a static-blog generator written in Racket.
Source code for this blog.