19 Feb 2020

Racket-on-Chez Status: February 2020

posted by Matthew Flatt

For background information about Racket on Chez Scheme (a.k.a. Racket CS), see the original announcement, the January 2018 report, the report at Scheme Workshop 2018, the January 2019 report, the ICFP experience report, and the report at RacketCon 2019.

Racket on Chez Scheme (Racket CS) is ready for production use. Racket CS now passes all of the tests for the main Racket distribution tests, and differences in compile and run times are much reduced. Overall, Racket CS tends to perform about the same as the traditional Racket implementation (Racket BC, “before Chez”)— sometimes better and sometimes worse, but typically using more memory due to larger code sizes.

Racket CS is not yet the default Racket implementation, but it is available as a download option alongside the regular Racket release at https://download.racket-lang.org/ (select CS from the Variant popup).

Run-Time Performance

Run-time performance for Racket CS has continued to improve. Benchmarks show the difference to some degree, but they understate the difference for a typical Racket application. Alex Harsanyi shared his initial experience with Racket CS in December 2018, and he has been kind enough to keep taking measurements. The plots below show his results for Racket CS and Racket BC, where lower is better, and the overall trend here seems typical for applications that I’ve measured.

As the trend lines may suggest, the overall improvement is from many small changes that add up. The plateau around June to October 2019 coincides with a push on correctness and compatibility, as opposed to performance, to make all Racket tests pass.

The plots below show current results for traditional Scheme benchmarks. The top bar is current and unmodified Chez Scheme, while the second bar is Chez Scheme as modified to support Racket. The third bar is Racket CS, and the bottom bar is Racket BC. The last two rows of benchmarks rely on mutable pairs, so they are run in Racket as #lang r5rs programs. There’s not a lot of difference here compared to one year ago, except that a faster path for integer division in the Racket variant of Chez Scheme has eliminated the collatz outlier.

The next set of plots compare Racket CS and Racket BC for Racket-specific implementations over years for The Computer Language Benchmarks Game. Compared to one year ago, the fraction of benchmarks where Racket CS wins over Racket BC is reversed. Much of that improvement happened in the thread and I/O layers that were newly implemented for Racket CS.

About the measurements: Alex Harsanyi’s measurements are for parts of the ActivityLog2 test suite; ActivityLog2 and the CI infrastructure it runs on have both changed over time, so the plots are approximate, but they are generally consistent with fresh runs with the current ActivitlyLog2 implementation on Racket versions 7.3 through 7.6.0.12. All other measurements use a Core i7-2600 at 3.4GHz running 64-bit Linux. Benchmarks are in the "racket-benchmark" package in the "common" and "shootout" directories. We used commit a20e3f305c of the Racket variant of Chez Scheme and commit cdd0659438 of Racket.

Startup and Load Times

Load times have improved for Racket CS. Loading the racket/base library:

Loading the full racket library:

Load times are, of course, directly related to memory use, and Racket CS load times improved primarily through reduced memory use.

About the measurements: These results were gathered by using time in a shell a few times and taking the median. The command was as shown.

Memory Use

Differences in memory use between Racket CS and Racket BC are mostly due to bytecode versus machine code, plus the fact that Racket BC can load bytecode more lazily. Various improvements, including changes to the Chez Scheme compiler to reduce code size, have decreased memory use in Racket CS by around 20% for typical applications.

The following plots show memory use, including both code and data, after loading racket/base or racket, but subtracting memory use at the end of a run that loads no libraries (which reduces noise from different ways of counting code in the initial heap). The gray portion of each bar is an estimate of memory occupied by code, which may be in machine-code form, bytecode form, or not yet unmarshaled.

Racket BC heap sizes here are larger compared to previous reports by about 5 MB. That difference reflects a more accurate measurement of the initial Racket BC heap.

On a different scale and measuring peak memory use instead of final memory use for DrRacket start up and exit:

About the measurements: These results were gathered by running racket with the arguments -l racket/base, -l racket, or -l drracket. The command further included -W "debug@GC" -e (collect-garbage) -e (collect-garbage), and reported sizes are based on the logged memory use before the second collection. For the Racket BC bar, the reported memory use includes the first number that is printed by logging in square brackets, which is the memory occupied by code outside of the garbage collector’s directly managed space. Baseline memory use was measured by setting the PLT_GCS_ON_EXIT environment variable and running with -n, which is has the same effect as -e '(collect-garbage)' -e '(collect-garbage)'. DrRacket was initialized with racket/base as the default language; also, background expansion was disabled, because measuring memory use is tricky on Racket BC. Code size was estimate using dump-memory-stats counting bytecode, JIT-generated native code, and marshaled code for Racket BC and machine code, relocations, and bytevectors (which are mostly marshaled code) for Racket CS.

Expand and Compile Times

Compile times have improved substantially for Racket CS. The main change was to use an interpreter for compile-time code within the module currently being compiled, because that’s a better trade-off in (meta) compilation time and (meta) run time. Compile-time code that is exported for use by other modules is compiled and optimized normally.

While the Racket CS interpreter and the one in Racket BC are both intended to be safe for space, the Racket CS one stands a much better chance of achieving that goal.

About the measurements: These results were gathered by using time in a shell a few times and taking the median. The command was as shown.

Build Time

The time and memory used to build a Racket distribution using Racket CS is now much closer to the time and memory used by Racket BC. The following plots are all on the same scale, and they show memory use plotted against time for building the Racket distribution from source:

In each plot, there are two lines, although they are often smashed together. The top line is memory use just before a major garbage collection, and the bottom line is memory use just after a major garbage collection.

Racket CS

 

 

Racket BC

 

The Racket CS plot used to be more than twice as wide as the Racket BC plot. About half of the improvement came from fixing a cache that interacted badly with Racket CS’s more frequent minor garbage collections. The rest of the improvement was due to many small improvements.

About the measurements: These plots were generated using the "plt-build-plot" package, which drives a build from source and plots the results.

Run-Time Performance Redux

In some ways, the Racket CS project has validated the Racket BC implementation, because it turns out that Racket BC performs pretty well. With the notable exception of first-class continuations (where Racket BC use a poor strategy), the traditional, JIT-based Racket engine performs close to Chez Scheme.

Then again, development to date has been aimed at making Racket CS match the performance of Racket BC on a code base that was developed and tuned for Racket BC, and that obviously gives Racket BC an advantage. For example, Racket BC makes dormant code relatively cheap, so Racket libraries generate a lot of code. Future Racket libraries will likely shift to take advantage of Racket CS’s cheaper function calls and dramatically cheaper continuations. One day, probably, Racket BC will no longer be a viable alternative to Racket CS for most programs.

To make that prediction more concrete, consider these three ways of counting to 10 million (if N is 10000000):

; self-contained loop
(for/fold ([v #f]) ([i (in-range N)])
  i)
 
; indirect function calls for loop control
(for/fold ([v #f]) ([i N]) ; no `in-range`
  i)
 
; continuations
(let ([g (generator ()
           (for ([i (in-range N)])
             (yield i))
           #f)])
  (for/fold ([v #f]) ([i (in-producer g #f)])
    i))

Here are run times normalized to the first one, which is about the same in Racket CS and Racket BC:

     

  

  

R/CS

  

R/BC

     

  

in-range

  

×1

  

×1

     

  

no in-range

  

×5

  

×25

     

  

generator

  

×335

  

×2672

While no one will start writing trivial loops the slow way, there’s a big difference between a ×5 and ×25 overhead when choosing how to implement a new abstraction and deciding how much to make static versus dynamic. There’s an even bigger difference between ×335 and ×2672— both irrelevant for a toy loop, but ×335 becomes relevant sooner than ×2672 for more interesting calculations. Overall, when implementation choices start to rely on the R/CS column, the R/BC column will sometimes be unacceptable.

Reflection and Outlook

It took three years to get Racket on Chez Scheme running well enough for production use, and it will take yet more time for Racket CS to fully replace Racket BC. But a certain amount of optimism is necessary to take on a large project like this, and if the timeline gets stretched beyond initial and early projections, then that’s only to be expected.

Racket CS will eventually outpace Racket BC for the reason that originally motivated porting Racket to Chez Scheme: it’s put together in a better way, so it’s easier to modify and improve. Maintainability is difficult to capture with the same clarity as performance benchmarks, but after spending one more year modifying both implementations, I remain as convinced as ever that Racket CS is much better.

This report is the last one for Racket CS. Here are a few things that are on the Racket CS roadmap— but, increasingly, we’ll just call it the Racket roadmap:

  • Support for embedding Racket CS in a larger application. Probably the C API here will start with the Chez Scheme C API, which will make it different from Racket BC’s C API: providing similar functionality, but with simpler rules for cooperating with the memory manager.

  • Improved garbage collection, especially for large heap sizes, including support for incremental collection.

  • Unboxed floating-point arithmetic, especially for local compositions of floating-point operations.

more →

13 Feb 2020

Racket v7.6

posted by John Clements

Racket version 7.6 is now available from https://racket-lang.org/.

  • DrRacket’s scrolling has been made more responsive.

  • DrRacket’s dark mode support is improved for Mac OS and Unix.

  • Racket CS is ready for production use. We will work to further improve Racket CS before making it the default implementation, but it now consistently passes all of our integration tests and generally performs well. (Compiled code remains significantly larger compared to the default implementation.)

  • The Web Server provides fine-grained control over various aspects of handling client connections (timeouts, buffer sizes, maximum header counts, etc.) via the new “safety limits” construct.

Using this new construct, we have decreased the web server’s default level of trust in client connections and made it detect additional, maliciously constructed requests. Resource-intensive applications may need to adjust the default limits (for example, to accept large file uploads). In trusted settings, they can be disabled completely by starting the web server with #:safety-limits (make-unlimited-safety-limits).

  • The Web Server’s handling of large files is improved, and its latency for long-running request handlers is reduced.

  • The Macro Stepper has a new macro hiding algorithm that tracks term identity through syntax protection (see syntax-arm), making macro hiding work more reliably. Its UI indicates protected and tainted syntax.

  • The Racket documentation includes a “building and contributing” guide.

Contributors: Alex Harsanyi, Alex Knauth, Alex Muscar, Alexis King, Ben Greenman, Bogdan Popa, Brian Wignall, Dan Holtby, David K. Storrs, Dionna Glaze, Dominik Pantůček, Fred Fu, Geoff Shannon, Gustavo Massaccesi, Jack Firth, Jay McCarthy, Jens Axel Søgaard, Jesse Alama, Joel Dueck, John Clements, Jordan Johnson, Julien Delplanque, Leo Uino, Luka Hadži-Đokić, Luke Lau, Matthew Flatt, Matthias Felleisen, Mike Sperber, Paulo Matos, Philip McGrath, Reuben Thomas, Robby Findler, Ross Angle, Ryan Culpepper, Sage Gerard, Sam Tobin-Hochstadt, Shu-Hung You, Sorawee Porncharoenwase, Stephen De Gabrielle, Syntacticlosure, Timo Wilken, Tommy McHugh, Winston Weinert, Zaoqi

more →

19 Nov 2019

Racket v7.5

posted by John Clements

Racket version 7.5 is now available from https://racket-lang.org/.

  • Almost all of Racket version 7.5 is distributed under a new, less-restrictive license: either the Apache 2.0 license or the MIT license. See https://blog.racket-lang.org/2019/11/completing-racket-s-relicensing-effort.html for more details.

  • Racket CS remains “beta” quality for the v7.5 release, but its compatibility and performance continue to improve. We expect that it will be ready for production use by the next release. We encourage you to check how well the v7.5 CS release works for your programs, and help push the project forward by reporting any problems that you find.

  • The Web Server provides a standard JSON MIME type, including a response/jsexpr form for HTTP responses bearing JSON.

  • GNU MPFR operations run about 3x faster.

  • Typed Racket supports definitions of new struct type properties and type checks uses of existing struct type properties in struct definitions. Previously, these uses were ignored by the type checker, so type errors there may have been hidden.

  • The performance bug in v7.4’s big-bang has been repaired.

  • DrRacket supports Dark Mode for interface elements.

  • Plot can display parametric 3d surfaces.

  • Redex supports modeless judgment forms.

  • MacOS 10.15 (Catalina) includes a new requirement that executables be “notarized”, to give Apple the ability to prevent certain kinds of malware. In this release, all of the disk images (.dmg’s) are notarized, along with the applications that they contain (.app’s). Many users may not notice any difference, but two groups of Catalina users will be affected: those that use the “racket” binary directly, and those that download the ".tgz" bundles. In both cases, the operating system is likely to inform you that the given executable is not trusted, or that the developer can’t be verified. Fortunately, both groups of users are probably also running commands at in a shell, and the solution for both groups is the same: you can disable the quarantine flag using the xattr command, e.g.

    xattr -d com.apple.quarantine /path/to/racket

    TL;DR: Everything is fine. Read this note again if you run into problems.

The following people contributed to this release:

Alex Knauth, Alexander Shopov, Alexis King, Ayman Osman, Ben Greenman, Bert De Ketelaere, Bogdan Popa, Caleb Allen, Chuan Wei Foo, David Florness, Diego A. Mundo, Dominik Pantůček, Fred Fu, Geoffrey Knauth, Gregory Cooper, Gustavo Massaccesi, James Bornholt, Jay McCarthy, Jens Axel Søgaard, Jesse A. Tov, Jesse Alama, John Clements, Jon Zeppieri, Leo Uino, Luke Nelson, Matthew Flatt, Matthias Felleisen, Max New, Mike Sperber, Nick Thompson, Noah W M, Paulo Matos, Philip McGrath, Robby Findler, Ryan Culpepper, Sam Tobin-Hochstadt, Shu-Hung You, Sorawee Porncharoenwase, Stephen Chang, Thomas Dickerson, and William J. Bowman

more →

15 Nov 2019

Completing Racket’s relicensing effort

posted by Sam Tobin-Hochstadt, with help from Sage Gerard and Joel Dueck and Matthew Flatt and the Software Freedom Conservancy, especially Pamela Chestek

With the upcoming Racket 7.5 release, almost all of Racket, including the core Racket CS binary, the standard library, and the packages provided with the main distribution, are available under a permissive license, either the Apache 2.0 License or the MIT License. You can read the details of the new license in the GitHub repository. This has been a long process, beginning in 2017, and we’re grateful to all the contributors to Racket, including those from very long ago, who gave permission for the re-licensing. More than 350 contributors to Racket responded; many of the responses can be seen in this GitHub issue.

more →

08 Aug 2019

Racket v7.4

posted by John Clements

Racket version 7.4 is now available from https://racket-lang.org/

With this 7.4 release, we are making Racket CS available, a beta version of the Racket on Chez Scheme implementation. Racket CS is “beta” quality for the v7.4 release. It works well enough to be worth trying, but there are likely too many lingering problems for a project to switch to Racket CS for production use at this time. We encourage you to kick the tires of the new CS releases, and to help push this project forward by reporting any problems that you find.

  • Racket CS is available as a download option. To download Racket CS, select “CS” instead of “Regular” from the “Variant” popup menu.

  • Single-precision floating-point literals, a.k.a. single-flonums, are no longer supported by default.

This is a backward-incompatible change, but the use of single-flonums appears to be rare. Since Racket CS does not support single-flonums, disabling single-flonums by default smooths the transition from regular Racket and Racket CS.

The read-single-flonum parameter can be set to #t to enable reading single-flonum literals, but a better strategy in most cases is to use real->single-flonum when single-flonum-available? reports #t or when single-flonum? reports #t for a value (which implies that single-flonums must be supported). Where single-flonums are supported, Racket’s compiler will fold a call of real->single-flonum on a literal number to a constant single-flonum value.

  • New compilation flags including --disable-generations and --enable-ubsan provide better support for alternative architectures.

  • The 2htdp/universe teachpack supports an event log window for big-bang. With this option, students can inspect the events that big-bang handled, plus their payload. The event log includes messages from external sources.

The following people contributed to this release: Alex Knauth, Alexander B. McLin, Alexis King, Andreas Düring, Asumu Takikawa, Atharva Raykar, Ben Greenman, Benjamin Yeung, Dmitry Moskowski, Fred Fu, Gustavo Massaccesi, Ilnar Salimzianov, Jason Hemann, Jay McCarthy, Jesse A. Tov, Jesse Alama, John Clements, Leif Andersen, Lukas Lazarek, Matthew Flatt, Matthias Felleisen, Mike Sperber, Morgan Lemmer-Webber, Noah W M, Paulo Matos, Philip McGrath, Robby Findler, Rodrigo, Roman Klochkov, Ryan Culpepper, Sam Tobin-Hochstadt, Simon ‘Sze’ L. Schlee, Sorawee Porncharoenwase, Spencer Florence, Stephan Renatus, Stephen Chang, Stephen De Gabrielle, Thomas Dickerson, Vincent St-Amour, yjqww6

more →

14 May 2019

Racket v7.3

posted by John Clements

Racket version 7.3 is now available from https://racket-lang.org/

Racket-on-Chez continues to improve. Snapshot builds are currently available at pre.racket-lang.org, and we expect that Racket-on-Chez will be included as a download option in the next release.

In addition, the Racket 7.3 release includes the following improvements:

  • There is a new set of teaching languages for the upcoming German-language textbook “Schreibe Dein Programm!” (https://www.deinprogramm.de/).

  • Racket’s IO system has been refactored to improve performance and simplify internal design.

  • Racket’s JSON reader is dramatically faster.

  • The plot library includes color map support for renderers.

  • The Racket web library has improved support for 307 redirects.

  • The Racket web server provides better response messages by default for common status codes.

  • The pict library includes a shear function.

The following people contributed to this release: Alex Harsányi, Alexander McLin, Alexander Shopov, Alexis King, Alex Knauth, Andrew Kent, Bert De Ketelaere, Ben Greenman, Fred Fu, Georges Dupéron, Greg Hendershott, Gustavo Massaccesi, Jay McCarthy, Jesse Alama John Clements, Jordan Johnson, Kimball Germane, Lassi Kortela, Leif Andersen Leo Uino, Marc Kaufmann, Matthew Butterick, Matthew Flatt, Matthias Felleisen Michael MacLeod, Mike Sperber, Paulo Matos, Philip McGrath Philippe Meunier, Pierre-Evariste Dagand, Robby Findler, Ron Garcia, Ryan Culpepper, Ryan Kramer, Sam Tobin-Hochstadt, Shu-Hung You, Sorawee Porncharoenwase, Spencer Florence Spencer Mitchell, Stephen De Gabrielle, Vincent St-Amour, Vladilen Kozin, Winston Weinert, yjqww6, and Wayo Cavazos

Feedback Welcome

more →

30 Jan 2019

Racket v7.2

posted by Vincent St-Amour

Racket version 7.2 is now available from http://racket-lang.org/

Racket-on-Chez is done in a useful sense, but we’ll wait until it gets better before making it the default Racket implementation. For more information, see http://blog.racket-lang.org/2019/01/racket-on-chez-status.html

In addition, the Racket 7.2 release includes the following improvements, which apply to both implementations:

  • The contract system supports collapsible contracts, which avoid repeated wrappers in certain pathological situations. Thanks to Daniel Feltey.

  • Quickscript, a scripting tool for DrRacket, has become part of the standard distribution. Thanks to Laurent Orseau.

  • The web server’s built-in configuration for serving static files recognizes the ".mjs" extension for JavaScript modules.

  • The data/enumerate library supports an additional form of subtraction via but-not/e, following Yorgey and Foner’s ICFP’18 paper. Thanks to Max New.

  • The letrec.rkt example model in Redex has been changed to more closely match Racket, which led to some bug fixes in Racket’s implementation of letrec and set!.

  • The racklog library has seen a number of improvements, including fixes to logic variable binding, logic variables containing predicates being applicable, and the introduction of an %andmap higher-order predicate.

The following people contributed to this release: Akihide Nano, Alex Feldman-Crough, Alexander McLin, Alexander Shopov, Alexis King, Alex Knauth, Andrew Kent, Asumu Takikawa, Ben Greenman, Bogdan Popa, Caner Derici, Chongkai Zhu, Dan Feltey, Darren Newton, Gan Shen, Greg Hendershott, Gustavo Massaccesi, Jay McCarthy, Jens Axel Søgaard, John Clements, Jordan Johnson, Kevin Robert Stravers, Leif Andersen, Leo Uino, Matt Kraai, Matthew Butterick, Matthew Flatt, Matthias Felleisen, Max New, Michael Burge, Mike Sperber, Paul C. Anagnostopoulos, Paulo Matos, Philip McGrath, Robby Findler, Ronald Garcia, Ryan Culpepper, Ryan Kramer, Sam Tobin-Hochstadt, Shu-Hung You, Sorawee Porncharoenwase, Spencer Florence, Stephen Chang, and Vincent St-Amour

Feedback Welcome

more →

29 Jan 2019

Racket-on-Chez Status: January 2019

posted by Matthew Flatt

For background information about Racket on Chez Scheme (a.k.a. Racket CS), see the original announcement, last January’s report, and the report at Scheme Workshop 2018.

Racket on Chez Scheme is done in a useful sense. All functionality is in place, DrRacket CS works fully, the main Racket CS distribution can build itself, and 99.95% of the core Racket test suite passes.

You can download a build for Windows, Linux, or Mac OS from the Utah snapshot site (look for “Racket CS”):

https://www.cs.utah.edu/plt/snapshots/

While code generally runs as fast as it should, end-to-end performance is not yet good enough to make Racket CS the default implementation of Racket. We’ll let the implementation settle and gradually improve, with the expectation that it will eventually be good enough to switch over— and better in the long run.

Compatibility with the Current Racket Implementation

Racket CS is intended to behave the same as the existing Racket implementation with a few exceptions:

There are still a few internal gaps related to handling large numbers of file descriptors (needed by servers with lots of connections, for example) and support for file-change events. Those should be easy to fill in, but our focus right now is on flushing out the remaining bugs that are exposed by test suites.

Another kind of incompatibility is that the compiled form of Racket code with the current implementation is platform-independent bytecode, while Racket CS’s compiled form is platform-specific machine code. This difference can sometimes affect a development workflow, and it required adjustments to the distribution-build process. Racket CS does not yet support cross compilation.

Here’s an incomplete list of things that are compatible between the current Racket implementation and Racket CS and that required some specific effort:

Macros, modules, threads, futures, places, custodians, events, ports, networking, string encodings, paths, regular expressions, mutable and immutable hash tables, structure properties and applicable structures, procedure arities and names, chaperones and impersonators, delimited continuations, continuation marks, parameters, exceptions, logging, security guards, inspectors, plumbers, reachability-based memory accounting, ephemerons, ordered and unordered finalization, foreign-function interface (mostly), phantom byte strings, source locations, left-to-right evaluation, result-arity checking, left-associative arithmetic, eqv? on NaNs, and eq? and flonums.

Outlook

The rest of this report will provide lots of numbers, but none of them expose the main benefit of Racket CS over the current Racket implementation: it’s more flexible and maintainable.

Putting a number on maintainability is less easy than measuring benchmark performance. Anecdotally, as the person who has worked on both systems, I can report that it’s no contest. The current Racket implementation is fundamentally put together in the wrong way (except for the macro expander), while Racket CS is fundamentally put together in the right way. Time and again, correcting a Racket CS bug or adding a feature has turned out to be easier than expected.

To maximize the maintenance benefits of Racket CS, it’s better to make it the default Racket variant sooner rather than later— and, ideally, discard the current Racket implementation. But while Racket CS is compatible with Racket to a high percentage, it’s never going to be 100%. From here, it’s some combination of patching differences and migrating away from irreconcilable differences, and that will take a little time. Given that both implementations need to exist for a while, anyway, we can given some weight to end-to-end performance when deciding on the right point to switch.

Many plots in this report are intended to tease out reasons for the performance difference between Racket CS and current Racket. From the explorations, so far, its does not appear that the performance difference is an inevitable trade-off from putting Racket together in a better way. Part of the problem is that some new code on top of Chez Scheme needs to be refined. Perhaps more significantly, there are some trade-offs in the space of compilation timing (ahead-of-time or just-in-time) and code representation (machine code versus bytecode) that we can adjust with more work.

Although the current Racket implementation and Racket CS will both exist for a while, we do not anticipate the dueling implementations to create problems for the Racket community. The question of which to use will be more analogous to “which browser works best for your application?” than “does this library need Python 2 or Python 3?”

Meanwhile, there’s even more code to maintain, and accommodating multiple Racket variants creates some extra complexity by itself (e.g., in the distribution builds). It still looks like a good deal in the long run.

Performance of Compiled Code

The plots below show timings for Chez Scheme (purple), Racket CS (blue), and current Racket (red) on traditional Scheme benchmarks. Shorter is better. The results are sorted by Chez Scheme’s time over Racket’s time, except that benchmarks that rely on mutable pairs are in a second group with green labels.

  • Note that the break-even point between Chez Scheme and Racket is toward the end of the set of benchmarks with black lables, which reflects that Chez Scheme is usually faster than current Racket.

  • The main result is that the blue bar tracks the purple bar fairly well for the benchmarks without mutable pairs: Racket CS’s layers on top of Chez Scheme are not interfering too much with Chez Scheme’s base performance, even though Racket CS wraps and constrains Chez Scheme in various ways (e.g., enforcing left-to-right evaluation of application arguments).

  • For the benchmarks that use mutable pairs (green labels), Racket CS loses some of Chez Scheme’s performance by redirecting mutable-pair operations away from the built-in pair datatype, since built-in pairs are used only for immutable pairs in Racket CS.

  • The tak variants where the blue bar is shortest may be due to an extra layer of function inlining. The collatz test is effectively a test of exact-rational arithmetic on large fractions.

The next set of plots compare Racket CS and current Racket on the Racket implementations of benchmarks that were written over the years for The Computer Language Benchmarks Game. These rely more heavily on Racket-specific language features. Racket CS’s slowness toward the end of the list is often due to the I/O implementation, which is newly implemented for Racket CS and will take time to refine.

Aside from the fact that I/O needs work in Racket CS, the takeaway here is that there are no huge problems nor huge performance benefits with the Racket CS implementation. Longer term, the red lines probably aren’t going to move, but because so much new code is involved with the blue lines, there’s reason to think that some blue lines can get shorter.

About the measurements: These benchmarks are in the "racket-benchmark" package in the "common" and "shootout" directories. We used commit f6b6f03401 of the Racket fork of Chez Scheme and commit c9e3788d42 of Racket. The Chez Scheme fork includes Gustavo Massaccesi’s “cptypes” pass, which improves Chez Scheme’s performance on a few benchmarks. The test machine was a Core i7-2600 3.4GHz running 64-bit Linux.

Startup and Load Times

Startup and load time have improved since previous reports, but Racket CS remains slower.

Startup for just the runtime system without any libraries (still on a Core i7-2600 3.4GHz running 64-bit Linux):

The difference here is that the Racket CS startup image has much more Scheme and Racket code that is dynamically loaded and linked, instead of loaded as a read-only code segment like the compiled C code that dominates the current Racket implementation. We can illustrate that effect by building the current Racket implementation in a mode where its Racket-implemented macro expander is compiled to C code instead of bytecode, too, shown below as “R/cify.” We can also compare to Racket v6, which had an expander that was written directly in C:

The gap widens if we load compiled Racket code. Loading the racket/base library:

The additional difference here is that Racket CS’s machine code is bigger than current Racket’s bytecode representation. Furthermore, the current Racket implementation is lazy about parsing some bytecode. We can tease out the latter effect by disabling lazy bytecode loading with the -d flag, shown as “R/all”:

(We could also force bytecode to be JITted immediately— but JITting is more work than just loading, so that timing result would not be useful.)

We get a similar shape and a larger benefit from lazy loading with the racket library, which is what the racket executable loads by default for interactive mode:

About the measurements: These results were gathered by using time in a shell a few times and taking the median. The command was as shown, but using racketcs for the “R/CS” lines and racket -d for the “R/all” lines.

Memory Use

Like load times, differences in memory use between Racket CS and current Racket can be attributed to code-size differences from bytecode versus machine code and by lazy bytecode loading.

The following plots show memory use, including both code and data, after loading racket/base or racket, but subtracting memory use at the end of a run that loads no libraries (which reduces noise from different ways of counting code in the initial heap). The “R/jit!” line uses -d to load all bytecode eagerly, and it further forces that bytecode to be compiled to native code by the JIT compiler:

These results show that bytecode is more compact than machine code, as expected. Lazy parsing of bytecode also makes a substantial difference in memory use for the current Racket implementation. Racket’s current machine code takes a similar amount of space as Chez Scheme machine code, but the JIT overhead and other factors make it even larger. (Bytecode is not retained after conversion to machine code by the JIT.)

On a different scale and measuring peak memory use instead of final memory use for DrRacket start up and exit:

This result reflects that DrRacket’s memory use is mostly the code that implements DrRacket, at least if you just start DrRacket and immediately exit.

The gap narrows if you open an earlier version of this document’s source and run it three times before exiting, so that memory use involves more than mostly DrRacket’s own code:

About the measurements: These results were gathered by running racket or racketcs starting with the arguments -l racket/base, -l racket, or -l drracket. The command further included -W "debug@GC" -e (collect-garbage) -e (collect-garbage) and recording the logged memory use before that second collection. For the “R” line, the reported memory use includes the first number that is printed by logging in square brackets, which is the memory occupied by code outside of the garbage collector’s directly managed space. For “R/all,” the -d flag is used in addition, and for “R/jit!,” the PLT_EAGER_JIT environment variable was set in addition to supplying -d.

Expand and Compile Times

Compile times have improved some for Racket CS since the original report, but not dramatically. These plots compare compile times from source for the racket/base module (and all of its dependencies) and the racket module (and dependencies):

Compilation requires first macro-expanding source, and that’s a significant part of the time for loading from source. Racket CS and current Racket use the same expander implementation, and they expand at practically the same speed, so the extra time in Racket CS can be attributed to machine-code compilation. The following plots show how parts of the compile time can be attributed to specific subtasks:

Another way to look at compile times is to start with modules that are already expanded by the macro expander and just compile them. The -M flag alone does not do that, but it’s meant here to represent an installation that was constructed by using the -M flag for all build steps:

The difference in these compile times reflects how Chez Scheme puts much more effort into compilation. Of course, the benefit is the improved run times that you see in so many benchmarks.

The compile-only bars are also significantly shorter than taking the expansion-plus-compilation bars and removing only the gray part. That’s because the gray part only covers time spent specifically in the macro expander or running macro transformers, but it does not cover the time to compile macro definitions as they are discovered during expansion or to instantiate modules for compile-time use.

Given that the Chez Scheme compiler is so much slower (for good reason) than the current Racket compiler, we might ask how it compares to other, non-Racket compilers. Fortunately, we can make a relatively direct comparison between C and Racket, because the Racket macro expander was formerly written in C, and now it is written in Racket with essentially the same algorithms and architecture (only nicer). The implementations are not so different in lines of code: 45 KLoC in C versus 28.5 KLoC in Racket. The following plot shows compile times for the expander’s implementation:

To further check that we’re comparing similar compilation tasks, we can check the size of the generated machine code. Toward that end, we can compile the Racket code to C code through a cify compiler, which is how the expander is compiled for the current Racket implementation for platforms that are not supported by Racket’s JIT. Below is a summary of machine-code sizes for the various compiled forms of the expander.

The current Racket implementation generates much more code from the same implementation, in part because it inlines functions aggressively and relies on the fact that only called code is normally translated to machine code; the “R/jit!/no” bar shows the code size when inlining is disabled. In any case, while the machine-code sizes vary quote a bit in this test, they’re all on the same general scale.

In summary, as an extensible language, the question of compile times is more complicated than for a conventional programming language. At the core-compiler level, current Racket manages to be very fast as a compiler by not trying hard. Racket CS, which gets its compile times directly from Chez Scheme, spends more time compiling, but it still has respectable compile times.

About the measurements: The numbers in compile-time plots come from running the shown command (but with racketcs instead of racket for the “R/CS” lines) with the PLT_EXPANDER_TIMES and PLT_LINkLET_TIMES environment variables set. The overall time is as reported by time for user plus system time, and the divisions are extracted from the logging that is enabled by the environment variables.

For measuring compile times on the expander itself, the Chez Scheme measurement is based on the build step that generates "expander.so", the current-Racket measurement is based on the build step that generates "cstartup.inc", and the C measurement is based on subtracting the time to rebuild Racket version 6.12 versus version 7.2.0.3 when the ".o" files in "build/racket/gc2" are deleted.

For measuring machine-code size, the expander’s code size for Chez Scheme was computed by comparing the output of object-counts after loading all expander prerequsites to the result after the expander; to reduce the code that is just form the library wrapper, the expander was compiled as a program instead of as a library. The code size for Racket was determined by setting PLT_EAGER_JIT and PLT_LINKLET_TIMES and running racket -d -n, which causes the expander implemtation to be JITted and total bytes of code generated by the JIT to be reported. The “R/no-inline” variant was the same, but compiling the expander to bytecode with compile-context-preservation-enabled set to #f, which disables inlining. The “R/cify” code size was computed by taking the difference on sizes of the Racket shared library for a normal build and one with --enable-cify, after stripping the binaries with strip -S, then further subtracting the size of the expander’s bytecode as it is embedded in the normal build’s shared library. The “C” code size was similarly computed by subtracting the size of the Racket shared library for version 7.2.0.3 from the size for the 6.12 release, stipped and with the expander bytecode size subtracted.

Build Time

Since Racket programs rely heavily on metaprogramming facilities–either directly or just by virtue of being a Racket program— the time required to build a Racket program depends on a combination of compile time, run time, and load time. Few Racket programmers may care exactly how long it takes to build the Racket distribution itself, but distribution-build performance is probably indicative of how end-to-end performance will feel to a programmer using Racket.

The following plots are all on the same scale, and they show memory use plotted against time for building the Racket distribution from source. The first two plots are essentially the same as in the January 2018 report. While the graph stretches out horizontally for Racket CS, showing a build that takes about three times as long, it has very much the same shape for memory use.

Racket CS

 

 

current Racket

 

One might assume that the difference in compile time explains the slower Racket CS build. However, this assumption does not hold up if we completely isolate the step of compiling fully expanded modules. To set up that comparison, the following plots show build activity when using current Racket and making “compile” just mean “expand.” It happens to take about the same time as a Racket CS build, but with more of the time in the documentation phases:

current Racket -M

 

Although current Racket compiles from expanded source relatively quickly, a build requires loading the some modules over and over for compiling different sets of libraries and running different documentation examples. The documentation running and rendering phases, as shown in the blue in green regions, are especially show and use especially much memory, because documentation often uses sandboxes that load libraries to run and render examples. (The big jump at the same point in the blue and green region merits further investigation. It might be a sandbox bug or a leaky unsafe library.)

Given the result of the expand-only build as an input, we can switch in-place to either Racket CS or normal-mode current Racket and compile each fully expanded module to machine code:

Racket CS finish

 

 

current Racket finish

 

Each module is compiled from expanded form just once, and that compiled form can be used as needed (for cross-module optimization) to compile other modules. Also, documentation doesn’t get re-run and re-rendered in this finishing build, because the build process can tell that the sources did not change. Overall, compilation finishes in under 20 minutes for Racket CS, which is a reasonable amount of time for 1.2 million lines of source Racket code.

These build-finishing plots illustrate how the Racket distribution server generate bundles for multiple platforms and variants in hours instead of days. The build server first creates expanded-module builds of the packages and main collections, and it serves those to machine-specific finishing builds.

About the measurements: These plots were generated using the "plt-build-plot" package, which drives a build from source and plots the results. The -M build was created by setting the PLT_COMPILE_ANY environment variable, and then the finishing builds were measured by another run on the result but using the --skip-clean flag for "plt-build-plot".

Implementation Outlook

Based on the data that we’ve collected so far, I see three directions toward improving end-to-end performance for Racket CS:

  • Improvements to new implementation of Racket’s I/O API.

  • Better support in Chez Scheme to trade performance for faster compilation, combined at the Racket CS level with a bytecode-and-JIT setup that supports lazy decoding of bytecode.

    The January 2018 report mentions an experimental JIT mode for Racket CS, and that alternative remains in place. At the moment, it’s not a good alternative to Racket CS’s default mode, but it still may be a step in the right direction, especially considering that it allows JIT-style compilation and ahead-of-time compilation to coexist.

  • Algorithmic improvements to the way macros and modules work.

    That the full expansion stack takes 10 times as long as core compilation for Racket libraries suggests that there is room for algorithmic improvements that would help both the current Racket implementation and Racket CS.

There are bound to be additional performance factors that we haven’t yet isolated. Whether it turns out to be the factors that we know or others, working in the new implementation of Racket will make it easier explore the solutions.

more →

26 Oct 2018

Racket v7.1

posted by Vincent St-Amour

Racket version 7.1 is now available from http://racket-lang.org/

  • Although it is still not part of this release, the development of Racket on Chez Scheme continues. We still hope and expect that Racket-on-Chez will be ready for production use later in the v7.x series, perhaps mid–2019.

  • Trackpad scrolling works in more reliably in some Windows and Linux/Unix environments.

  • New users of DrRacket will open files into new tabs (by default).

  • The teaching languages support the unicode character for lambda. The teaching unit test framework no longer stops testing when a tested expression signals an error.

  • A refinement to error reporting for compile-time code helps clarify when an syntax error is likely due to an earlier unbound identifier (because the unbound-identifier error otherwise must be delayed, in case a definition appears later).

  • A ++lang <lang> flag for raco exe simplifies the creation of executables that dynamically load #lang <lang> modules at run time.

  • Typed Racket adds types for mutable and immutable vectors: (Mutable-Vectorof T), (Immutable-Vectorof T), (Immutable-Vector T), and (Mutable-Vector T). The new types are subtypes of the existing Vectorof and Vector types. The return types of a few standard vector functions use the new, more specific, types. When an immutable vector flows from untyped code to typed code, Typed Racket may be able to check the vector with a flat contract.

  • The hashing functions sha1-bytes, sha224-bytes, and sha256-bytes are added to racket/base.

  • curry from racket/function supports currying functions with keyword arguments, and procedure-arity and procedure-keywords return the correct result when applied to curried functions.

  • Slideshow supports widescreen mode (finally!). Implement widescreen slides using slideshow/widescreen or provide the --widescreen command-line flag to Slideshow. Combine --widescreen with --save-aspect to make widescreen mode the default in your installation.

  • Racket supports FreeBSD/aarch64.

  • Various improvements and additions were made to the DeinProgramm teaching languages and their documentation.

The following people contributed to this release: Akihide Nano, Alex Harsányi, Alex Knauth, Alexander McLin, Alexis King, Andrew Kent, Ben Greenman, Bruno Cuconato, Chongkai Zhu, Claes Wallin, David Benoit, Gary F. Baumgartner, Gustavo Massaccesi, Jay McCarthy, Jens Axel Søgaard, Jérôme Martin, John Clements, Jordan Johnson, Kimball Germane, Leif Andersen, Matthew Butterick, Matthew Flatt, Matthias Felleisen, Mike Sperber, Milo Turner, myfreeweb, Oling Cat, Paulo Matos, Philip McGrath, Robby Findler, Roman Klochkov, Ryan Culpepper, Sam Caldwell, Sam Tobin-Hochstadt, Shu-Hung You, Stephen Chang, Tong-Kiat Tan, Vincent St-Amour, Winston Weinert, and yjqww6.

Feedback Welcome

more →

27 Jul 2018

Racket v7.0

posted by Vincent St-Amour

Racket version 7.0 is now available from http://racket-lang.org/

Racket version 7.0 includes substantial internal changes toward the long-term goal of replacing Racket’s current runtime system and supporting multiple runtime systems. We do not expect Racket users to see a big difference between Racket v6.12 and Racket v7.0, but since the internals differ significantly, a major-version bump helps track the change.

Version 7.0 replaces about 1/8 of the core v6.12 implementation with a new macro expander that bootstraps itself. The expander turns out to be about 40% of the new code needed to replace Racket’s core with Chez Scheme. Most of the other 60% is also implemented, but it is not included in this release; we hope and expect that Racket-on-Chez will be ready for production use later in the v7.x series.

  • The syntax (#') form supports new template subforms: ~@ for splicing and ~? for choosing between subtemplates based on whether pattern variables have “absent” value (from an ~optional pattern in syntax-parse, for example). The syntax/parse/experimental/template library, where these features originated, re-exports the new forms under old names for compatibility.

  • On Windows, an --embed-dlls flag for raco exe creates a truly standalone, single-file ".exe" that embeds Racket’s DLLs.

  • DrRacket’s “Create Executable” option for the teaching language (Beginner Student, etc.) uses --embed-dlls to create single-file, standalone ".exe"s on Windows.

  • Typed Racket’s support for prefab structs is significantly improved. This supports using prefab structs more polymorphically, and fixes significant bugs in the current implementation. Programs which currently use predicates for prefab structs on unknown data may need to be revised, since previous versions of Typed Racket allowed potentially buggy programs to type check. See Typed Racket RFC 1 and this blog post for more details on this change and on how to fix programs affected by it.

  • Typed Racket supports #:rest-star in the ->* type constructor, which allows function types to specify rest arguments with more complex patterns of types, such as the hash function.

  • Interactive overlays can be added to plots produced by plot-snip. This allows constructing interactive plots or displaying additional information when the mouse hovers over the plot area. Examples of how to use this feature can be found here

  • racket/plot provides procedures for displaying candlestick charts for use in financial time series analysis.

  • Added contract-equivalent?, a way check if two contracts are mutually stronger than each other without the exponential slowdown that two calls to contract-stronger? brings.

  • Lazy Racket supports functions with keyword arguments.

The following people contributed to this release:

Adam Davis Lee, Alex Harsányi, Alex Knauth, Alexander McLin, Alexander Shopov, Alexis King, Andrew M. Kent, Asumu Takikawa, Ben Greenman, Caner Derici, Daniel Feltey, David Benoit, David Kempe, Don March, Eric Dobson, evdubs, Foo Chuan Wei, Georges Dupéron, Gustavo Massaccesi, Hashim Muqtadir, Jakub Jirutka, James Bornholt, Jasper Pilgrim, Jay McCarthy, Jens Axel Søgaard, John Clements, Juan Francisco Cantero Hurtado, Kashav Madan, Kieron Hardy, Leandro Facchinetti, Leif Andersen, Luke Lau, Matthew Butterick, Matthew Flatt, Matthias Felleisen, Michael Ballantyne, Michael Burge, Michael Myers, Mike Sperber, Milo Turner, NoCheroot, Oling Cat, Paulo Matos, Philip McGrath, Philippe Meunier, Robby Findler, Ryan Culpepper, Sam Tobin-Hochstadt, Sarah Spall, Shu-Hung You, Sorawee Porncharoenwase, Spencer Florence, Stephen Chang, Tony Garnock-Jones, Tucker DiNapoli, UM4NO, Vincent St-Amour, and William J. Bowman.

Feedback Welcome

more →

Made with Frog, a static-blog generator written in Racket.
Source code for this blog.