23 Jun 2009

Serializable Closures in PLT Scheme

posted by Jay McCarthy

PLT Scheme supports an extensible serialization system for structures. A structure is serializable if it has a prop:serializable property. There are many properties in PLT Scheme for other extensions, such as applicable structures and custom equality predicates.

The PLT Web application development framework uses these features to provide serializable continuations through a number of source transformations and a serializable closure structure.

Warning: This remainder post refers to features only available in the latest SVN revision of PLT Scheme.

I’ve recently made these closures more accessible to non-Web programs through web-server/lang/serial-lambda. Here’s a demo:

#lang scheme
(require web-server/lang/serial-lambda
         scheme/serialize)

(define f
  (let ([z 5])
    (serial-lambda
     (x y)
     (+ x y z))))

(define (test-it)
  (printf "~S~n" (f 1 2))
  (let ([fs (serialize f)])
    (printf "~S~n" fs)
    (let ([df (deserialize fs)])
      (printf "~S~n" df)
      (printf "~S~n" (df 1 2)))))

> (test-it)
8
((2) 1 ((#"/Users/jay/Dev/svn/plt/collects/web-server/exp/test-serial.ss" . "lifted.6")) 0 () () (0 5))
#(struct:7a410aca70b31e88b4c2f0fe77fa7ffe:0 #)
8

Now, let’s see how it is implemented. web-server/lang/serial-lambda is thin wrapper around web-server/lang/closure, which has two syntax transformer functions: define-closure! which defines the closure structure and make-closure which instantiates the closure. (The two tasks are separated to easily provide a user top-level definition syntax for named closures with different free identifires, rather than simply anonymous lambdas with fixed free identifiers.)

make-closure does the following:

Expands the procedure syntax using local-expand, so it can use free-vars to compute the free identifires.
Uses define-closure! to define the structure and get the name for the constructor.
Instantiates the closure with the current values of the free identifiers.

The more interesting work is done by define-closure!. At a high-level, it needs to do the following:

Create a deserialization function.
Create a serialization function that references the deserializer.
Define the closure structure type that references the serializer.
Provide the deserializer from the current module so that arbitrary code can deserialize instances of this closure type.

These tasks are complicated in a few ways:

The deserializer needs the closure structure type definition to create instances and the serializer needs the closure structure type to access their fields.
The serializer needs the syntactic identifier of the deserializer so that scheme/serialize can dynamic-require it during deserialization.
The deserializer must be defined at the top-level, so it may be provided.
All this may occur in a syntactic expression context.

Thankfully, the PLT Scheme macro system is powerful to support all this.

syntax-local-lift-expression allows a syntax transformer to lift an expression to the top-level of a module and returns the identifier it is bound to.
syntax-local-lift-values-expression (added in 4.2.0.3) provides the same for expressions that return multiple values, such as make-struct-type, which is used to define structures.
syntax-local-lift-provide (added in 4.2.0.4) allows a syntax transformer to lift a provide to the top-level.

The only complicated piece is allowing the deserializer and serializer to refer to the closure structure constructor and accessors. This is easily accomplished by first defining lifting boxes that will hold these values and initializing them when the structure type is defined. This is safe because all accesses to the boxes are under lambdas that are guaranteed not to be run before the structure type is defined.

An aside on the closure representation. The closure is represented as a structure with one field: the environment. The environment is represented as a thunk that returns n values, one for each of the free identifiers. This ensures that references that were under lambdas in the original syntax, remain under lambdas in the closure construction, so the serializable closures work correctly inside letrec. This thunk is applied by the serializer and the free values are stored in a vector. The closure also uses the prop:procedure structure property to provide an application function that simply invokes the environment thunk and binds its names, then applys the original procedure syntax to the arguments.

An aside on the serializer. The deserializer is bound to lifted identifier which is represented in PLT Scheme as an unreadable symbol. Version 4.2.0.5 added support for (de)serializing these.

Awesome. Serialization is one of the things I’m looking for right now, so I’m glad my favorite distro is making it easier.

— Zachary, 3 January 2010

This is very, very, shiny. I might use this in my master’s thesis.

— Alex, 12 February 2010