07 Nov 2012

Racket v5.3.1

posted by Eli Barzilay

Racket version 5.3.1 is now available from http://racket-lang.org/ Racket::

The case form dispatches on characters, fixnums, symbols, and keywords in logarithmic time. (Thanks to Jon Zeppieri.)
The new racket/format library provides new and improved string-formatting functions.
Logging tools include improved filtering support based on the name of a logger. A new define-logger form simplifies the use of named loggers. Forms such as log-debug now support string formatting.
The for forms now support #:break and #:final clauses.
The new PLTCOMPILEDROOTS environment variable configures the search path for compiled bytecode.

DrRacket::

Check Syntax now summarizes the documentation (i.e., the blue boxes) for the identifier at the insertion point in the top-right corner of the definitions window.
Check Syntax now runs continuously for programs that declare their language within the source. This mode has been available for several of the past releases, but now enabled by default.
DrRacket can spell-check string constants (enable this in the Edit menu).

Typed Racket::

Typed Racket interprets the Any type as a different contract. This may signal dynamic errors in some existing mixed typed/untyped programs. The normal fix is to replace a use of Any with a more specific types.
NaN is included in all of Typed Racket’s floating-point types, which makes precise floating-point types easier to use.
Typed Racket supports a cast operation with support for higher-order types.
Typed Racket provides the :query-type/args and :query-type/result utilities to explore types at the REPL.

Miscellaneous::

The compatibility collection provides features from Racket relatives, such as defmacro and mutable lists. These features are provided to ease porting code to Racket. Avoid them in modern Racket code.
Screenshots of the widgets provided by the Racket GUI library are included in the documentation. (Thanks to Diogo F. S. Ramos.)
FrTime was ported to racket #lang. (Thanks to Patrick Mahoney.)

Deprecation:: The following has been deprecated and will be removed in the January 2013 release:

the planet command-line tool; use raco planet instead.

The following has been deprecated and will be removed in the August 2013 release:

the mzlib/class100 library; use racket/class instead.

01 Nov 2012

Generics

posted by Asumu Takikawa

Recently at RacketCon, I gave a talk about the new Generics library that ships with Racket 5.3. In this blog post, I’ll offer a slightly expanded commentary about the new library. For the ten-minute video explanation, see the Youtube video. The accompanying slides are available here.

Introduction

Probably the first question that comes to your mind is: what are generics and what do we mean by generic programming? Let me illustrate with some example code:

> (vector-ref#(123)2)
3
> (list-ref'(123)2)
3

Both of the lines above operate on sequence-like datatypes, but for each of the datatypes we use different functions: vector-ref and list-ref. This is also the case with other datatypes like dictionary-like datatypes: hash-ref, assoc, etc. Or all the different kinds of equality: string=?, boolean=?, and =. These specialized operations may seem redundant. Ideally, we’d have generic functions that don’t care about the specific datatypes that we operate over.

Thankfully, Racket does provide these. For dictionaries, we have functions like dict-ref and dict-set that operate over any kind of dictionary-like type. For sequences, sequence-ref, sequence-append, and so on. These generic interfaces are all built-in to the standard library. You might wonder, however, if you can define your own generic functions.

As of version 5.3, you just need to (require racket/generic) to get all the tools you need to define your own generic interface. In the rest of the article I’ll show you how to define your own generic interface and how to use it. If you’ve seen Rust’s traits or Clojure’s protocols, our design will feel familiar.

Examples

The running example will be the implementation of a simple queue interface. Our queue will contain five operations: queue-enqueue, queue-dequeue, queue-head, queue-empty?, and queue-length.

The first thing to do is to require the library:

> (requireracket/generic)

Then we use define-generics to define a generic interface:

> (define-generics queue
    [queue-enqueue queue elem]
    [queue-dequeue queue]
    [queue-head queue]
    [queue-empty? queue]
    [queue-length queue])

The method headers above define the methods that concrete implementations of the generic interface need to provide. Each header needs to contain at least one argument that matches the name of the interface; this argument will be used for dispatch. The form defines several identifiers: queue?, gen:queue, queue/c, and each of the generic functions. The first is a predicate that checks whether a given value implements the interface. The identifier prefixed with gen: is a binding that’s used to supply methods for the interface. The queue/c specifies a contract combinator for the interface, which I’ll describe later in the article.

We now have the generic functions, but they’re not very useful without concrete implementations. To implement a generic interface, you first need to define a structure type. We’ll define a simple functional queue(Okasaki 1998, p.42) that uses two lists:

> (struct simple-queue (front back)
    #:methods gen:queue
    [; helper function to balance lists
     (define (check-front queue)
       (match queue
         [(simple-queue '() back)
          (simple-queue (reverse back) '())]
         [_ queue]))
     ; enqueue an element
     (define (queue-enqueue queue elem)
       (match queue
         [(simple-queue front back)
          (check-front (simple-queue front (cons elem back)))]))
     ; dequeue an element
     (define (queue-dequeue queue)
       (match queue
         [(simple-queue (cons x xs) back)
          (check-front (simple-queue xs back))]))
     ; get the head of the queue
     (define (queue-head queue)
       (match queue
         [(simple-queue (cons x xs) back) x]))
     ; check if the queue is empty
     (define (queue-empty? queue)
       (empty? (simple-queue-front queue)))
     ; get the queue's length
     (define (queue-length queue)
       (+ (length (simple-queue-front queue))
          (length (simple-queue-back queue))))])
> (define empty-queue
    (simple-queue '() '()))

Using the #:methods keyword and the gen:queue binding, we can specify the methods that a simple-queue implements. Note that a #:methods block may also contain helper functions (like check-front) and definitions that are used to define the methods. Each method has the same method header as the corresponding headers in the interface definition.

We can check that our new queue actually works with the generic functions:

> (queue-head(queue-enqueueempty-queue5))
5
> (queue-empty?empty-queue)
#t
> (queue-length(queue-enqueue(queue-enqueueempty-queue7)5))
2

It works! For any structure type, we can define methods in the same way. For example, we can define an efficient persistent queue(Okasaki 1998, p.64) that implements the same methods. This time, the implementation will use lazy evaluation with streams:

> (struct persistent-queue (front-len front back-len back)
    #:methods gen:queue
    [; helper function to balance lists
     (define (check queue)
       (match queue
         [(persistent-queue front-len front back-len back)
          (if (<= back-len front-len)
              queue
              (persistent-queue
               (+ front-len back-len)
               (stream-append front (stream-reverse back))
               0 stream-null))]))
     ; enqueue an element
     (define (queue-enqueue queue elem)
       (match queue
         [(persistent-queue front-len front back-len back)
          (check (persistent-queue
                  front-len front
                  (+ 1 back-len) (stream-cons elem back)))]))
     ; dequeue an element
     (define (queue-dequeue queue)
       (match queue
         [(persistent-queue front-len front back-len back)
          (check (persistent-queue
                  (- front-len 1) (stream-rest front)
                  back-len back))]))
     ; get the head of the queue
     (define (queue-head queue)
       (match queue
         [(persistent-queue front-len front back-len back)
          (stream-first front)]))
     ; check if the queue is empty
     (define (queue-empty? queue)
       (= 0 (persistent-queue-front-len queue)))
     ; get the queue's length
     (define (queue-length queue)
       (+ (persistent-queue-front-len queue)
          (persistent-queue-back-len queue)))])
> (define empty-persistent-queue
    (persistent-queue 0 stream-null 0 stream-null))

Our operations from before work as expected:

> (queue-head(queue-enqueueempty-persistent-queue5))
5
> (queue-empty?empty-persistent-queue)
#t
> (queue-length(queue-enqueue(queue-enqueueempty-persistent-queue7)5))
2

Contracts

Earlier, I mentioned that the generic interface also comes with a contract form that’s automatically defined. You can use these to attach dynamic checks to your implementations.

For example, we can write a contract that restricts our queues to only accept integers as data elements:

> (define int-queue/c
    (recursive-contract
     (queue/c [queue-enqueue (-> int-queue/c integer? int-queue/c)]
              [queue-dequeue (-> int-queue/c int-queue/c)]
              [queue-head    (-> int-queue/c integer?)]
              [queue-empty?  (-> int-queue/c boolean?)]
              [queue-length  (-> int-queue/c integer?)])))

For the queue interface, the automatically defined queue/c combinator allows us to specify contracts on each of the methods in the interface. We also use a recursive contract here just so that we can reference the int-queue/c contract within itself.

We can apply the contract to a particular queue:

> (define/contract checked-queue
    int-queue/c
    empty-queue)
> (queue-enqueue checked-queue 42)
#<simple-queue>
> (queue-enqueue checked-queue "not an integer")
checked-queue: contract violation
 expected: integer?
 given: "not an integer"
 in: the 2nd argument of
     the queue-enqueue method of
      (recursive-contract
       (queue/c
        (queue-enqueue
         (-> int-queue/c integer? int-queue/c))
        (queue-dequeue
         (-> int-queue/c int-queue/c))
        (queue-head (-> int-queue/c integer?))
        (queue-empty? (-> int-queue/c boolean?))
        (queue-length (-> int-queue/c integer?))))
 contract from: (definition checked-queue)
 blaming: top-level
 at: eval:15.0

The second use of queue-enqueue causes a contract error as expected, since we can’t add a string to an integer queue. You can also provide a constructor for your integer queue that’s contracted to produce int-queue/cs. Any queues created with that constructor will be checked for integers.

Also, you might have noticed that the queues we wrote above don’t protect against dequeueing or taking the head of empty queues. To prevent this, we can write contracts that ensure these operations raise contract errors on empty queues. Since we want to enforce this for all queues instead of just some of them, we apply contracts to the generic functions:

> (define non-empty-queue/c
    (flat-named-contract
     'non-empty-queue
     (λ (q) (and (queue? q) (not (queue-empty? q))))))
> (define/contract (checked-dequeue queue)
    (-> non-empty-queue/c queue?)
    (queue-dequeue queue))
> (define/contract (checked-head queue)
    (-> non-empty-queue/c any/c)
    (queue-head queue))
> (checked-head empty-persistent-queue)
checked-head: contract violation
 expected: non-empty-queue
 given: #<persistent-queue>
 in: the 1st argument of
      (-> non-empty-queue any/c)
 contract from: (function checked-head)
 blaming: top-level
 at: eval:20.0

The checked-head function raises a contract error as expected instead of an exception from a stream function. In a real implementation, you would just export the original generic functions with contracts attached using contract-out instead of defining checked versions like we did here.

Summary

Racket 5.3 has made the process of defining and using generic interfaces much easier. The new library is still under active development and we plan to experiment with additional features and performance improvements. The full code from this article can be found in the following gist: https://gist.github.com/3995200

Bibliography: Chris Okasaki. Purely Functional Data Structures. Cambridge University Press, 1998.

24 Oct 2012

The 3n+1 problem

posted by Danny Yoo

1Introduction: I’m starting to go through [Programming Challenges: The Programming Contest Training Manual](http://www.amazon.com/exec/obidos/ASIN/0387001638/thealgorithmrepo), by Steven S. Skiena and Miguel Revilla. I thought it would be fun to show how to approach the problems using the [Racket](http://racket-lang.org/) programming language. Rather than use a small, toy educational subset of the language, I’ll take off the kid gloves, and use whatever’s available in [#lang racket](http://docs.racket-lang.org/guide/index.html). 2The problem:

The 3n+1 problem is as follows: consider a positive number n. The cycle length of n is the number of times we repeat the following, until we reach n=1:

If n is odd, then n ⇐ 3n+1
If n is even, then n ⇐ n/2

For example, given n=22, we see the following sequence: 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1. The cycle length of 22 is, therefore, 16, since it took 16 repetitions to get to 1.

Given a definition of cycle length of a number, here’s the rest of the problem: given any two numbers i and j, compute the maximum cycle length over all numbers between i and j, inclusive.

2.1The plan: Before we do any real coding, let’s figure out a plan of attack and how to test that plan. * We need a way of computing cycle-length. * We need to run cycle-length over a range of values and pick out the biggest result. It sounds like we may want a function called cycle-length that will compute how long it takes for us to get n to 1. If we have cycle-length as a helper function, then it becomes a fairly direct loop through the range between i and j to pick out which one produces the largest cycle length. Let’s first write up a stub function that computes some nonsense. We’ll correct it in a moment, of course! ;cycle-length: positive-integer -> positive-integer ;Computes the cycle length of n, according to ;the 3n+1 rules. > (define(cycle-lengthn) 42) This is certainly not right, but it’s a start. And it’s something we can test! 3Test cases:

We want

(cycle-length1)==>1

Let’s express this more formally with the rackunit unit testing library in Racket.

;Load up rackunit: > (requirerackunit)

;Let’s express that test: > (check-equal?(cycle-length1)1)

FAILURE name: check-equal? location: (eval 4 0 4 1) expression: (check-equal? (cycle-length 1) 1) actual: 42 expected: 1

Check failure

;A few more tests, according to the problem statement above: > (check-equal?(cycle-length2)2)

FAILURE name: check-equal? location: (eval 5 0 5 1) expression: (check-equal? (cycle-length 2) 2) actual: 42 expected: 2

Check failure

(check-equal?(cycle-length4)3)

FAILURE name: check-equal? location: (eval 6 0 6 1) expression: (check-equal? (cycle-length 4) 3) actual: 42 expected: 3

Check failure

(check-equal?(cycle-length5)6)

FAILURE name: check-equal? location: (eval 7 0 7 1) expression: (check-equal? (cycle-length 5) 6) actual: 42 expected: 6

Check failure

(check-equal?(cycle-length22)16)

FAILURE name: check-equal? location: (eval 8 0 8 1) expression: (check-equal? (cycle-length 22) 16) actual: 42 expected: 16

Check failure

All of our test cases fail. Hurrah!

4A solution: Ok, now that we coded up the tests, let’s write a solution. We can write out a definition for cycle-length almost straight out of the problem statement: > (define(cycle-lengthn) (cond [(=n1) 1] [(odd?n) (add1(cycle-length(add1(*3n))))] [(even?n) (add1(cycle-length(/n2)))])) ;Let us try it out on a few values: > (cycle-length1) 1 > (cycle-length2) 2 > (cycle-length22) 16 If we run this through our test suite, we should be fairly confident that cycle-length is probably doing the right thing. (... modulo crazy inputs into the function such as 0. If we want to guard against such inputs, we can use the features in racket/contract.) 5Optimizing cycle-length: How fast is the performance for cycle-length? Let’s try timing it for a few values, using the time utility. We’ll run cycle-length for a range of numbers, and see how long it takes.

(time(for([i(in-range1100000)]) (cycle-lengthi)))

cpu time: 890 real time: 889 gc time: 0

5.1Introducing an accumulator: There are a few things we might do to improve the performance of this. Having the (add1 ...) in the definition, waiting until the recursion finishes up, seems ok, but I’m curious to see whether writing the definition with an explicit accumulator will help us. > (define(cycle-lengthn) (cycle-length/accn1)) ;Helper function: > (define(cycle-length/accnacc) (cond [(=n1) acc] [(odd?n) (cycle-length/acc(add1(*3n))(add1acc))] [(even?n) (cycle-length/acc(/n2)(add1acc))])) With this reformulation, how does this do now? > (time(for([i(in-range1100000)]) (cycle-lengthi))) cpu time: 790 real time: 790 gc time: 0 This does help. Although we do get an improvement, let’s drop this version for now and go back to our previous definition since it’s simpler—and because the next potential optimization will work better on it! 5.2Adding memoization: Another thing that comes to mind is this: our first good version of cycle-length works recursively. More to the point: repeated use of cycle-length can reuse prior results that we computed earlier. Maybe memoization will help. Let’s try it out: we’ll keep a small table of results, and consult that to see if we’ve already encountered the solution before.

;We’ll maintain a table of known results. > (definetable(make-hash))

(define(cycle-lengthn) (cond ;Consult the table: [(hash-has-key?tablen) (hash-reftablen)] [else ;If we can’t find it, compute it… (defineanswer (cond [(=n1) 1] [(odd?n) (add1(cycle-length(add1(*3n))))] [(even?n) (add1(cycle-length(/n2)))])) ;… and then put it into the table. (hash-set!tablenanswer) ;Don’t forget to return the value back! answer]))

Does the overhead of setting up this table pay for itself? Let’s see:

(time(for([i(in-range1100000)]) (cycle-lengthi)))

cpu time: 217 real time: 217 gc time: 44

Hey, not bad at all! That’s significantly better.

We should make sure, of course, that all our test cases are running on this ok.

(check-equal?(cycle-length1)1)

(check-equal?(cycle-length2)2)

(check-equal?(cycle-length4)3)

(check-equal?(cycle-length5)6)

(check-equal?(cycle-length22)16)

All’s quiet on the cycle-length front. The tests are all passing.

5.3Advanced: abstracting memoization to a helper macro: It turns out that the kind of memoization we’ve done here can be lifted out, so that we can easily perform it at will. That is, what we’re doing is something like the following: Given a definition that we’d like to memoize: * create a table for exclusive use by the definition, and * slightly tweak the definition’s body so it consults the table before going through computation. In terms of Racket, we can say that like this: ;A little helper to centralize the memoization logic ;into a single rewrite rule: > (define-syntax-rule(define/memo(nameid)body...) (begin (definetable(make-hash)) (define(nameid) (cond [(hash-has-key?tableid) (hash-reftableid)] [else (defineanswer(beginbody...)) (hash-set!tableidanswer) answer])))) This defines a small rewrite rule that expresses the idea of memoizing simple, 1-argument function definitions. Once we have this define/memo, we can rewrite cycle-length to use it: > (define/memo(cycle-lengthn) (cond [(=n1) 1] [(odd?n) (add1(cycle-length(add1(*3n))))] [(even?n) (add1(cycle-length(/n2)))])) which is nice because it’s easy to read. 6Cycling back to a loop: Now that we have a fairly robust cycle-length function, we can do the rest of the problem. Given a range of numbers, we want to go through them, compute the cycle lengths, and pick out the biggest one.

We can try to write this directly with a for/list to create a list of all the cycle-lengths, and apply the max across that list. Let’s write this in code:

(define(max-cycle-length-rangeij) (applymax (for/list([n(in-rangei(add1j))]) ;(add1 j) for inclusion … (cycle-lengthi))))

Let’s write a few test cases to make sure that this is computing the right thing:

;From the “Sample Output” section of ;http://acm.uva.es/p/v1/100.html > (check-equal?(max-cycle-length-range110)20)

FAILURE name: check-equal? location: (eval 31 0 31 1) expression: (check-equal? (max-cycle-length-range 1 10) 20) actual: 1 expected: 20

Check failure

What?! Oh, whoops, I wasn’t using the n in the loop. Silly me. Let’s fix that.

(define(max-cycle-length-rangeij) (applymax (for/list([n(in-rangei(add1j))]) (cycle-lengthn))))

Thank goodness for test cases.

Ok, let’s try that again.

(check-equal?(max-cycle-length-range110)20)

(check-equal?(max-cycle-length-range100200)125)

(check-equal?(max-cycle-length-range201210)89)

(check-equal?(max-cycle-length-range9001000)174)

All passing? Much better!

6.1Advanced: maximizing a loop: It would be nice if we could directly express taking the maximum across a for loop. We’re performing the maximum computation by first constructing a list of all the cycle lengths, then applying max over the whole list. Can we avoid that auxiliary list construction, and just compute the max as we’re running through the numbers? In fact, there are several variations of for loops in Racket, so maybe one of those variations will work for us. For example, we could use for/fold, which gives us enough expressive power to take the maximum during iteration. > (for/fold([current-max-inf.0]) ([n'(31415926)]) (if(>ncurrent-max)ncurrent-max)) 9 There are other versions of for loops, such as the one for taking sums (for/sum). But as of this writing, there doesn’t seem to be be a for/max form that lets us take the maximum directly. The question arises: how difficult is it to build for/max? It turns out that it’s not too bad, though it requires a little more macrology: we’ll use for/fold/derived to express our own for/max loop in terms of folding: > (define-syntax(for/maxstx) (syntax-casestx() [(_clauses. defs+exprs) (with-syntax([originalstx]) #'(for/fold/derivedoriginal ([current-max-inf.0]) clauses (definemaybe-new-max (let(). defs+exprs)) (if(>maybe-new-maxcurrent-max) maybe-new-max current-max)))])) Essentially, as we’re looping through numbers, we maintain a current-max, and update that max accordingly as we walk across the iteration. The rest of the code in for/max delegates the rest of the gruntwork to for/fold (technically, for/fold/derived). We must test this, of course: ;Edge case: if we take the maximum of no numbers, ;let's see -inf.0. > (check-equal?(for/max([i'()]) i) -inf.0) > (check-equal? (for/max([i'(31415926)]) i) 9) > (check-equal? (for/max[(i(in-range123))] i) 22) > (check-equal? (for/max([n'(3.141592.718281.61803)] [s'(-111)]) (*ns)) 2.71828) ;... and of course... > (check-equal? (for/max[(i(in-range900(add11000)))] (cycle-lengthi)) 174) Looks good. With this, let’s express max-cycle-length-range in terms of for/max now: > (define(max-cycle-length-rangeij) (for/max([n(in-rangei(add1j))]) (cycle-lengthn))) 7Making a module: Now that we have most of the solution worked out, let’s make a module that encapsulates what we’ve done. Let’s lift up the definitions that we used to make the solution nice and pretty, and place them into “helpers.rkt”:

“helpers.rkt”

langracket

(providefor/maxdefine/memo)

(define-syntax(for/maxstx) (syntax-casestx() [(_clauses.defs+exprs) (with-syntax([originalstx]) #’(for/fold/derivedoriginal ([current-max-inf.0]) clauses (definemaybe-new-max (let().defs+exprs)) (if(>maybe-new-maxcurrent-max) maybe-new-max current-max)))]))

(define-syntax-rule(define/memo(nameid)body…) (begin (definetable(make-hash)) (define(nameid) (cond [(hash-has-key?tableid) (hash-reftableid)] [else (defineanswer(beginbody…)) (hash-set!tableidanswer) answer]))))

(module+test (requirerackunit) (check-equal?(for/max([i’()]) i) -inf.0) (check-equal?(for/max([i’(31415926)]) i) 9) (check-equal?(for/max[(i(in-range123))] i) 22)

(check-equal? (for/max([n’(3.141592.718281.61803)] [s’(–111)]) (*ns)) 2.71828))

Who knows? We might reuse “helpers.rkt” sometime.

(You may note that the bottom of “helpers.rkt” contains a test submodule which collects the unit tests that we’ve written. We can run a module’s test suite by using raco test.)

With our “helpers.rkt” in in hand, let’s put our solution in “three-n-plus-one.rkt”:

“three-n-plus-one.rkt”

langracket

(require"helpers.rkt")

(define/memo(cycle-lengthn) (cond [(=n1) 1] [(odd?n) (add1(cycle-length(add1(*3n))))] [(even?n) (add1(cycle-length(/n2)))]))

(define(max-cycle-length-rangeij) (for/max([n(in-rangei(add1j))]) (cycle-lengthn)))

(module+test (requirerackunit)

(check-equal?(cycle-length1)1) (check-equal?(cycle-length2)2) (check-equal?(cycle-length4)3) (check-equal?(cycle-length5)6) (check-equal?(cycle-length22)16)

(check-equal? (max-cycle-length-range110)20) (check-equal? (max-cycle-length-range100200)125) (check-equal? (max-cycle-length-range201210)89) (check-equal? (max-cycle-length-range9001000)174) (check-equal? (for/max[(i(in-range900(add11000)))] (cycle-lengthi)) 174))

8Integrating with I/O and a main: Finally, all this unit testing is fine and dandy, but we don’t actually read input from standard input. Let’s fix that, and modify "three-n-plus-one.rkt" so it can be run as the main entry point. We can read individual lines as strings by iterating across current-input-port with in-lines: (for([line(in-lines(current-input-port))])...) Once we have a line in hand, we can parse out the individual chunks with read. read doesn’t normally read from strings directly, so we first translate each string into a port-like value using open-input-string. Last of all, let’s add the following to the bottom of "three-n-plus-one.rkt": (module+main (for([line(in-lines(current-input-port))]) (defineline-port(open-input-stringline)) (definei(readline-port)) (definej(readline-port)) (when(and(number?i)(number?j)) (printf"~a ~a ~a\n" ij (max-cycle-length-rangeij))))) This defines a main submodule. When we run "three-n-plus-one.rkt" directly from the command line, it will run main: $ cat sample-data.txt 1 10 100 200 201 210 900 1000 $ racket three-n-plus-one.rkt < sample-data.txt 1 10 20 100 200 125 201 210 89 900 1000 174 9The files:

28 Sep 2012

I Write Funny-Lookin’ Racket Code: An Alternate Style for Delimiters and Indentation

posted by Carl Eastlund

A lot of people are quite surprised when they see the Racket code I write. Let’s say I needed a function to render hash table values as expressions that would produce the same value. A “normal” Racketeer might write something like the following.

(define (hash->expr ht)
  (cons 'hash
        (for/fold ([args '()])
                  ([(k v) (in-hash ht)])
          (cons (list 'quote k)
                (cons (list 'quote v)
                      args)))))

There might be a few variances in style, especially depending on whether one has Racket or Emacs set up to indent for/fold specially. Almost no one, however, would come up with the code I write.

(define (hash->expr ht)
  (cons 'hash
    (for/fold
        {[args '()]}
        {[{k v} (in-hash ht)]}
      (cons (list 'quote k)
        (cons (list 'quote v)
          args)))))

The biggest reaction I get is from the presence of {curly braces}, but those are largely incidental as far as I’m concerned. It’s all about the indentation to me.

A while back I found that my .emacs file was growing in proportion to my Racket code—all of it I had ever written, in fact. Every new macro in my code or in the latest Racket version needed a line like: (put 'for/fold 'scheme-indent-function 2)This would tell Emacs more or less how I wanted it to indent the given form. So long as I followed the use patterns Emacs could cope with. For instance, with for/fold, Emacs could cope with both of the “special” arguments on the same line as the macro name, or both on separate lines. Changing that up got weird results.

Also, function arguments would lead to significant rightward-creep in my indentation. Adding up the lengths of a list of strings, for instance, might look like this:

(foldl +
       0
       (map string-length
            (list "There"
                  "are"
                  "thirty-four"
                  "characters"
                  "here.")))

This wastes a lot of space on the left, and to me it doesn’t do enough for readability to justify it. I don’t need my eyes drawn to 0 and + nearly that much.

I discovered a new style of indentation in the {_Little_, Seasoned, Reasoned} Schemer series of books by Dan Friedman and his many cohorts. These books always start a new indentation level at a fixed distance in from the previous one, regardless of the cause for the indentation. Arguments on the same line as the function or macro name are ignored; they do not “push” indentation over to the right at all.

This indentation style has a lot of appeal to me for a number of reasons. One, it wastes no space on the left. Two, it never needs to “know” what a given macro means. It doesn’t matter if you’re applying + or lambda or for/fold, all lines beyond the first move two (or however many) characters to the right. I saw a light at the end of the tunnel: no more .emacs customization for every new form!

This style leaves two issues. One, how to indent cond? The Little books treat cond differently, indenting each clause only as far as the keyword cond, while other form’s arguments are pushed in slightly farther than the function or macro name. Two, how to “fix” forms like for/fold where a few lines really ought to be indented differently? A straight-up interpretation of this style would generate code like this:

(for/fold
  ([x 0])
  ([str (in-list '("12" "characters"))])
  (define n (string-length str))
  (+ x n))

Now we can’t tell visually where the for/fold iteration clauses leave off and the loop body definitions and expressions begin.

The cond issue is easy enough to resolve. In Racket, unlike in vanilla Scheme, we use [brackets] around cond clauses. The same goes for a number of other repeated clauses, in fact: let, match, syntax-parse, and so forth. So I decided my new, custom indentation style would indent [brackets] differently from (parentheses). Parens indent one farther than brackets. That way,

(let ([x 1]
      [y 2])
  (+ x y))`doesn't become

`(let ([x 1]
       [y 2])
  (+ x y))

Since I already use [brackets] every time I have a repeated, non-expression clause, this rule does exactly what I need it to do.

Once I had differentiated [] from (), resolving the for/fold issue was obvious. I needed a new indentation rule and a new lexical marker: {braces}. Now every time I have a fixed number of special non-expression forms in a macro, I wrap them in braces. Anything in braces is indented slightly farther (four spaces rather than two) than ordinary sub-expressions. So my for/fold example comes out like this.

(for/fold
    {[x 0]}
    {[str (in-list '("12" "characters"))]}
  (define n (string-length str))
  (+ x n))

Suddently it’s quite clear which parts are “special” in the for/fold macro.

So now I write code using (parentheses) for definitions, expressions, and anything else resembling a nestable, expandable term (e.g. match patterns, syntax templates), [brackets] for repeated, non-expandable clauses (e.g. cond clauses, let bindings), and {braces} for non-repeated, non-expandable forms (e.g. lambda formals, groups of let bindings). And I don’t bother to align function arguments; I tend to treat the most significant argument as an “accumulator”, and put everything else on one line if I can.

(foldl + 0
  (map string-length
    (list
      "There"
      "are"
      "thirty-four"
      "characters"
      "here.")))

The way I read this code, the first line tells us we are performing a summation; the second line tells us we want the length of each string; the third line tells us we have a list coming; and the rest give its contents. The result “accumulates” from a list to its lengths to their sum as the indentation cascades out and up from the inside.

With these three rules, I now write my Racket code without bothering to customize my .emacs file as I go. I just use delimiters judiciously to tell Emacs how I want everything indented, and everything comes out pretty much how I want it.

For anyone interested in installing this indentation mode or looking at its source code, I’ve put the file up on GitHub at:

https://github.com/carl-eastlund/simple-sexp

To use it, just put it somewhere your Emacs knows to look for Elisp code and add (require 'simple-sexp) to your .emacs file.

Addendum: Oh, and there’s some structured s-expression editing code in that file as well. It preserves matched parens, brackets, braces, and quotes (for strings). It’s probably a much inferior implementation of things like paredit; this code represents the flailings of an Elisp novice. Use at your own peril.

29 Aug 2012

RacketCon 2012

posted by Sam Tobin-Hochstadt

We’re pleased to announce that (second RacketCon) will take place on October 13, 2012, at Northeastern University in Boston. This year, RacketCon will feature 3 2-hour tutorial sessions, as well as a series of short talks about development in and of Racket over the last year.

Potential tutorial sessions include:

Building a new domain-specific language using syntactic extension
Using contracts in application development
Adding types to an existing application with Typed Racket
Parallelizing Racket applications with futures and places Potential talks include:
submodules
WhaleSong
futures and visualization
distributed places
optimization coaching
Dracula and ACL2
PLT Redex Lunch will be provided.

On Sunday after RacketCon, we plan to hold a hackathon to work as a group on various Racket projects such as documentation improvements and FFI bindings. This will be organized by Asumu Takikawa.

To register, fill out this form. The conference website has more information.

25 Aug 2012

Dynamic Programming versus Memoization

posted by Shriram Krishnamurthi

[Edit on 2012–08–27, 12:31EDT: added code and pictures below. 2012–08–27, 13:10EDT: also incorporated some comments.]

I wrote this on the Racket educators’ mailing list, and Eli Barzilay suggested I post it here as well.

The article is about the difference between memoization and dynamic programming (DP). Before you read on, you should stop and ask yourself: Do I think these two are the same concept?; if you think they are different, How do I think they differ?; and for that matter, Do I even think of them as related?

Did you think? Okay, then read on.

They most certainly are related, because they are both mechanisms for optimizing a computation by replacing repeated sub-computations with the storage and reuse of the result of those sub-computations. (That is, both trade off space for time.) In that description is already implicit an assumption: that the sub-computation will return the same result every time (or else you can’t replace the computation with its value on subsequent invocations). You’ve almost certainly heard of DP from an algorithms class. You’ve probably heard of memoization if you’re a member of this language’s community, but many undergrads simply never see it because algorithms textbooks ignore it; and when they do mention it they demonstrate fundamental misunderstandings (as Algorithms by Dasgupta, Papadimitriou, and Vazirani does).

Therefore, let’s set aside precedent. I’ll tell you how to think about them.

Memoization is fundamentally a top-down computation and DP is fundamentally bottom-up. In memoization, we observe that a computational tree can actually be represented as a computational DAG (a directed acyclic graph: the single most underrated data structure in computer science); we then use a black-box to turn the tree into a DAG. But it allows the top-down description of the problem to remain unchanged. (As I left unstated originally but commenter23 below rightly intuited, the nodes are function calls, edges are call dependencies, and the arrows are directed from caller to callee. See the pictures later in this article.)

In DP, we make the same observation, but construct the DAG from the bottom-up. That means we have to rewrite the computation to express the delta from each computational tree/DAG node to its parents. We also need a means for addressing/naming those parents (which we did not need in the top-down case, since this was implicit in the recursive call stack). This leads to inventions like DP tables, but people often fail to understand why they exist: it’s primarily as a naming mechanism (and while we’re at it, why not make it efficient to find a named element, ergo arrays and matrices).

In both cases, there is the potential for space wastage. In memoization, it is very difficult to get rid of this waste (you could have custom, space-saving memoizers, as Vclav Pech points out in his comment below, but then the programmer risks using the wrong one…which to me destroys the beauty of memoization in the first place). In contrast, in DP it’s easier to save space because you can just look at the delta function to see how far “back” it reaches; beyond there lies garbage, and you can come up with a cleverer representation that stores just the relevant part (the “fringe”). Once you understand this, you realize that the classic textbook linear, iterative computation of the fibonacci is just an extreme example of DP, where the entire “table” has been reduced to two iteration variables. (Did your algorithms textbook tell you that?)

In my class, we work through some of the canonical DP algorithms as memoization problems instead, just so when students later encounter these as “DP problems” in algorithms classes, they (a) realize there is nothing canonical about this presentation, and (b) can be wise-asses about it.

There are many trade-offs between memoization and DP that should drive the choice of which one to use.

Memoization:

leaves computational description unchanged (black-box)
avoids unnecessary sub-computations (i.e., saves time, and some space with it)
hard to save space absent a strategy for what sub-computations to dispose of
must alway check whether a sub-computation has already been done before doing it (which incurs a small cost)
has a time complexity that depends on picking a smart computation name lookup strategy

In direct contrast, DP:

forces change in description of the algorithm, which may introduce errors and certainly introduces some maintenance overhead
cannot avoid unnecessary sub-computations (and may waste the space associated with storing those results)
can more easily save space by disposing of unnecessary sub-computation results
has no need to check whether a computation has been done before doing it—the computation is rewritten to ensure this isn’t necessary
has a space complexity that depends on picking a smart data storage strategy

[NB: Small edits to the above list thanks to an exchange with Prabhakar Ragde.]

I therefore tell my students: first write the computation and observe whether it fits the DAG pattern; if it does, use memoization. Only if the space proves to be a problem and a specialized memo strategy won’t help—or, even less likely, the cost of “has it already been computed” is also a problem—should you think about converting to DP. And when you do, do so in a methodical way, retaining structural similarity to the original. Every subsequent programmer who has to maintain your code will thank you.

I’ll end with a short quiz that I always pose to my class.

Memoization is an optimization of a top-down, depth-first computation for an answer. DP is an optimization of a bottom-up, breadth-first computation for an answer. We should naturally ask, what about

top-down, breadth-first
bottom-up, depth-first Where do they fit into the space of techniques for avoiding recomputation by trading off space for time?
Do we already have names for them? If so, what?, or
Have we been missing one or two important tricks?, or
Is there a reason we don’t have names for these?

Where’s the Code?

I’ve been criticized for not including code, which is a fair complaint. First, please see the comment number 4 below by simli. For another, let me contrast the two versions of computing Levenshtein distance. For the dynamic programming version, see Wikipedia, which provides pseudocode and memo tables as of this date (2012–08–27). Here’s the Racket version:

(define levenshtein
  (lambda (s t)
    (cond
     [(and (empty? s) (empty? t)) 0]
     [(empty? s) (length t)]
     [(empty? t) (length s)]
     [else
      (if (equal? (first s) (first t))
   (levenshtein (rest s) (rest t))
   (min (add1 (levenshtein (rest s) t))
        (add1 (levenshtein s (rest t)))
        (add1 (levenshtein (rest s) (rest t)))))])))

The fact that this is not considered the more straightforward, reference implementation by the Wikipedia author is, I think, symptomatic of the lack of understanding that this post is about.

Now let’s memoize it (assuming a two-argument memoize):

(define levenshtein
  (memoize
    (lambda (s t)
      (cond
       [(and (empty? s) (empty? t)) 0]
       [(empty? s) (length t)]
       [(empty? t) (length s)]
       [else
 (if (equal? (first s) (first t))
     (levenshtein (rest s) (rest t))
     (min (add1 (levenshtein (rest s) t))
   (add1 (levenshtein s (rest t)))
   (add1 (levenshtein (rest s) (rest t)))))]))))

All that changed is the insertion of the second line.

Bring on the Pitchers!

The easiest way to illustrate the tree-to-DAG conversion visually is via the Fibonacci computation. Here’s a picture of the computational tree:

Fibonacci tree

Now let’s see it with memoization. The calls are still the same, but the dashed ovals are the ones that don’t compute but whose values are instead looked up, and their emergent arrows show which computation’s value was returned by the memoizer.

Fibonacci DAG

Important: The above example is misleading because it suggests that memoization linearizes the computation, which in general it does not. If you want to truly understand the process, I suggest hand-tracing the Levenshtein computation with memoization. And to truly understand the relationship to DP, compare that hand-traced Levenshtein computation with the DP version. (Hint: you can save some manual tracing effort by lightly instrumenting your memoizer to print inputs and outputs. Also, make the memo table a global variable so you can observe it grow.)

24 Aug 2012

Fully Inlined Merge Sort

posted by Neil Toronto

While writing the code for the triangular distribution in the upcoming math library, I found that I needed a function that sorts exactly three numbers. This kind of code is annoying to write and to get right. But it comes up rarely enough, and it seems simple enough, that I’ve never felt like making a library function for it.

But what if I wrote a macro that generated code to sort n numbers very quickly, where n is known at expansion time, but the numbers themselves aren’t? I think I could justify putting that in a library.

Here’s code that correctly sorts three numbers a, b and c:

(if (< b c)
    (if (< a b)
        (values a b c)
        (if (< a c)
            (values b a c)
            (values b c a)))
    (if (< a c)
        (values a c b)
        (if (< a b)
            (values c a b)
            (values c b a))))

It’s an if tree. Notice that there are 6 leaf expressions, for the 3! = 6 possible permutations. Also, it never compares more than it has to. It’s optimal.The optimality came from my reasoning about transitivity. For example, only two comparisons are needed before returning (values a b c). I knew that both (< a b) and (< b c), so (< a b c) must be true by transitivity.

It would be nice if the macro generated optimal code by explicitly reasoning about transitivity, or as an emergent property of the sorting algorithm it uses.

We’ll write a macro that does the latter, by generating a fully inlined merge sort.

[Edit: The final inline sort macro is here.]

Runtime Merge Sort

Start with a simple, runtime merge sort:

(define (merge as bs)
  (match* (as bs)
    [((list) bs)  bs]
    [(as (list))  as]
    [((list a as ...) (list b bs ...))
     (if (< a b)
         (cons a (merge as (cons b bs)))
         (cons b (merge (cons a as) bs)))]))
 
(define (merge-sort vs)
  (match vs
    [(list)  vs]
    [(list a)  vs]
    [_  (define-values (lvs rvs)
          (split-at vs (quotient (length vs) 2)))
        (merge (merge-sort lvs) (merge-sort rvs))]))

Example:

> (merge-sort '(5 1 2 5 10))
'(1 2 5 5 10)

To make a macro out of merge sort, we need to change two things. The most obvious is that it has to return syntax for an if instead of evaluating it. That’s easy for a novice macrologist: change the functions to operate on syntax, stick a syntax-quasiquote in front of the if, and unquote the bits inside that get evaluated at expansion time.

But if we did only that, we’d end up with expanded code like this:

(if (< a b)
    (cons a (if ...))
    (cons b (if ...)))

It would be slow because cons allocates. We want the code to be fast.So the other change is to move the conses inside the ifs, and evaluate them at expansion time. We can then construct a values expression out of the resulting list.

Accumulator-Passing Style Won’t Work

Novice functional programmers should know that accumulator-passing style (APS) moves conses inward. For example, this “add 1 to each element” function:

racket (define (list-add1 vs) (match vs [(list) (list)] [(list v vs ...) (cons (add1 v) (list-add1 vs))]))

becomes this after conversion to APS:

(define (list-add1/acc vs [acc (list)])
  (match vs
    [(list)  (reverse acc)]
    [(list v vs ...)
     (list-add1/acc vs (cons (add1 v) acc))]))

Now cons is where we want it: inside the recursive call, instead of in tail position. The problem is that APS doesn’t work on tree-shaped recursion.

Continuation-Passing Style Does Work

In continuation-passing style (CPS), we pass a continuation k—i.e. “what happens next”— instead of an accumulator. The functions call k instead of returning values. For example:

(define (list-add1/k vs [k identity])
  (match vs
    [(list)  (k (list))]
    [(list v vs ...)
     (list-add1/k vs (λ (vs) (k (cons (add1 v) vs))))]))

If we want, we can pass something besides identity as the base-case continuation:

> (list-add1/k '(1 2 3 4) (λ (vs) (apply values vs)))
2
3
4
5

CPS turns every call into a tail call, so it moves conses inward even with tree-shaped recursion. As a demonstration, here’s a CPSed merge sort:

(define (merge/k as bs k)
  (match* (as bs)
    [((list) bs)  (k bs)]
    [(as (list))  (k as)]
    [((list a as ...) (list b bs ...))
     (if (< a b)
         (merge/k as (cons b bs) (λ (vs) (k (cons a vs))))
         (merge/k (cons a as) bs (λ (vs) (k (cons b vs)))))]))
 
(define (merge-sort/k vs k)
  (match vs
    [(list)  (k vs)]
    [(list a)  (k vs)]
    [_  (define-values (lvs rvs)
          (split-at vs (quotient (length vs) 2)))
        (merge-sort/k
         lvs (λ (lvs)
               (merge-sort/k
                rvs (λ (rvs)
                      (merge/k lvs rvs k)))))]))

You can read the last expression in merge-sort/k as, “Sort lvs. Then, with the sorted lvs, sort rvs. Then, with the sorted rvs, merge lvs and rvs.” Example:

> (merge-sort/k '(5 2 3 1) (λ (vs) (apply values vs)))
1
2
3
5

The Inline Sort Macro

When we macro-ize the CPSed merge sort, we’ll turn the continuations into expansion-time functions. So not only will macro-ized CPS move conses inward, it’ll apply them all at expansion time!

We’ll do it in three parts: the user-facing macro, the inline merge function, and the inline sort function.

The User-Facing Macro

Let’s put a nice face on inline sorting:

(define-syntax (inline-sort stx)
  (syntax-case stx ()
    [(_ lst ...)
     (with-syntax ([(vs ...)
                    (generate-temporaries #'(lst ...))])
       #`(let ([vs lst] ...)
           #,(inline-sort/k #'(vs ...)
                            (λ (vs) #`(values #,@vs)))))]))

This macro does two things. First, it names the values to be sorted, so they don’t get re-evaluated every time they’re compared. Second, it calls inline-sort/k with a base-case continuation that converts syntax lists to values expressions.Note that the call #,(inline-sort/k …) happens at expansion time, and that the continuation (λ (vs) …) it passes is an expansion-time value.

The Inline Merge Function

Changing merge/k to operate on syntax at expansion time is as straightforward as possible:

(define-for-syntax (inline-merge/k as bs k)
  (syntax-case (list as bs) ()
    [(() bs)  (k #'bs)]
    [(as ())  (k #'as)]
    [((a as ...) (b bs ...))
     #`(if (< a b)
           #,(inline-merge/k #'(as ...) #'(b bs ...)
                             (λ (vs) (k (cons #'a vs))))
           #,(inline-merge/k #'(a as ...) #'(bs ...)
                             (λ (vs) (k (cons #'b vs)))))]))

The only substantial changes are the quasiquotes and unquotes, and using syntax-case to destructure syntax lists instead of using match to destructure lists. Again, note that the recursive calls #,(inline-merge/k …) in each if branch happen at expansion time, and that their continuations are expansion-time values.We can see what kind of code merge/k returns by applying it at expansion time:

> (begin-for-syntax
    (print
     (syntax->datum
      (inline-merge/k #'(a) #'(b c) (λ (vs) vs)))))
'(if (< a b) (a b c) (if (< a c) (b a c) (b c a)))

The syntax list #’(b c) is assumed already sorted, meaning (< b c) at runtime. Therefore, if (< a b) at runtime, by transitivity, (< a b c) at runtime, so the merge generates #’(a b c). In other words, inline-merge/k’s assumption that its arguments are sorted is equivalent to reasoning about transitivity.

The Inline Sort Function

Lastly, the divide-and-conquer part:

(require (for-syntax racket/list))  ; for list functions
 
(define-for-syntax (inline-sort/k vs k)
  (syntax-case vs ()
    [()  (k vs)]
    [(a)  (k vs)]
    [_  (let ([vs  (if (list? vs) vs (syntax->list vs))])
          (define-values (lvs rvs)
            (split-at vs (quotient (length vs) 2)))
          (inline-sort/k
           lvs (λ (lvs)
                 (inline-sort/k
                  rvs (λ (rvs)
                        (inline-merge/k lvs rvs k))))))]))

This is changed similarly to merge/k. The only new change is using syntax->list to convert syntax to lists so we can use the functions length and split-at. Example:

> (inline-sort 5 2 3 1)
1
2
3
5

Of course, the result of evaluating an inline-sort doesn’t tell the whole story. Let’s fire up the macro stepper and see what (inline-sort 5 2 3) expands to. Copying from the macro stepper window and renaming temp variables, we get

(let ([a 5] [b 2] [c 3])
  (if (< b c)
      (if (< a b)
          (values a b c)
          (if (< a c)
              (values b a c)
              (values b c a)))
      (if (< a c)
          (values a c b)
          (if (< a b)
              (values c a b)
              (values c b a)))))

It’s an if tree. Notice that there are 6 leaf expressions, for the 3! = 6 possible permutations. Also, it never compares more than it has to. It’s optimal.

Inline Sort Properties

Inherited from the merge sort, the inline sort has the following properties (assuming a length-n list):

Time optimality: The depth of the if tree is O(n log(n)).
Size optimality: The number of leaves is exactly n!. The term “size optimality” is misleading, because that’s still a lot of code. I’ve seriously considered requiring any user of this macro to state how many permutations there are for the number of values they’re sorting. They’d have to prove to the macro that they know how much code they’re asking it to generate.Inline sort is wicked fast, as we should expect:

> (define vs '(5 2 3 1))
> (match-define (list a b c d) vs)
> (for ([_  (in-range 5)])
    (time (for ([_  (in-range 1000000)])
            (match-let ([(list a b c d)  (sort vs <)])
              (void)))))
cpu time: 550 real time: 540 gc time: 30
cpu time: 510 real time: 518 gc time: 0
cpu time: 520 real time: 517 gc time: 20
cpu time: 520 real time: 517 gc time: 0
cpu time: 510 real time: 516 gc time: 10
> (for ([_  (in-range 5)])
    (time (for ([_  (in-range 1000000)])
            (let-values ([(a b c d)  (inline-sort a b c d)])
              (void)))))
cpu time: 20 real time: 28 gc time: 0
cpu time: 30 real time: 27 gc time: 0
cpu time: 30 real time: 27 gc time: 0
cpu time: 30 real time: 27 gc time: 0
cpu time: 20 real time: 27 gc time: 0

So about 20 times faster than sort on a length–4 list.I use it in Typed Racket, on floating-point numbers. Typed Racket’s optimizer replaces < with unsafe-fl<. This tells the compiler that the elements are floats, so it can keep them stack-allocated, which reduces allocations further. In all, for my particular use of inline-sort, it’s over 50 times faster than sort.And using it is a heckuvalot easier than writing an if tree and reasoning about transitivity every time I need to sort a handful of floats.

Conclusion

Writing macros in expansion-time CPS to fully inline a recursive function works out beautifully. I suspect that it will work on any recursive, polymorphic function whose well-foundedness follows only from the structure of the input data.Also, it can generate a metric truckload of code.Interesting note: I originally wrote inline-sort using only syntax-rules, passing the names of higher-order macros to other macros as continuations. Sorting a five-element list took almost 19000 expansion steps, which is ridiculously inefficient even for a 120-leaf if tree.

07 Aug 2012

Racket v5.3

posted by Eli Barzilay

Racket version 5.3 is now available from

http://racket-lang.org/ Release Highlights::

Submodules are nested module declarations that can be loaded and run independently from the enclosing module. See also the overview of submodules.
The futures visualizer is a graphical profiling tool for parallel programs using futures. The tool shows a detailed execution timeline depicting the migration of futures between threads, and gives detailed information about each runtime synchronization that occurred during program execution. In addition, would-be-future is a special type of future that always executes sequentially and records all potential barricades a regular future would encounter.
Optimization Coach (formerly Performance Report) reports information about Racket’s inlining optimizations. Optimization Coach can be launched in any language through the View menu.
The new images/flomap library defines floating-point bitmaps and fast image processing operations on them. It is written in Typed Racket, so Typed Racket code may use it without the cost of contract checks.
The new json library supports parsing and generating JSON. (Originally based on Dave Herman’s planet library.)
racket/string is extended with a set of simplified string manipulation functions that are more convenient than using regexps. regexp-match* and friends can now be used with new keyword arguments to return specific matched regexp group/s and gaps between matches.
The new racket/generic library allows generic function definitions, which dispatch to methods added to a structure type via the new #:methods keyword.
The class form supports declaring a method abstract. An abstract method prevents a class from being instantiated unless it is overridden.
The contract library comes with support for interfaces, generics, prompts, continuation-marks, and structs.
Most error messages use a new multi-line format that is more consistent with contract errors and accommodates more information.
Typed Racket supports function definitions with keyword arguments; the startup time of Typed Racket programs has been sharply reduced.
The new ffi/com library replaces MysterX; a compatibility mysterx library remains, but without ActiveX support. The new ffi/unsafe/com library offers a more primitive and direct way to use COM classes and methods.
There is now a very complete completion code for zsh. It is not included in the distribution though; get it at http://goo.gl/DU8JK (This script and the bash completions will be included in the standard installers in future versions.)

Deprecation: Effective this release:

The tex2page and combinator-parser libraries have been moved from the Racket distribution to PLaneT:

(require (planet plt/tex2page)) (require (planet plt/combinator-parser))
The following has been deprecated and will be removed in the January 2013 release:

the planet command-line tool; use raco planet instead.

The following has been deprecated and will be removed in the August 2013 release:

the mzlib/class100 library; use racket/class instead.

03 Jun 2012

Submodules

posted by Matthew Flatt

A Racket submodule is a module that is syntactically nested within another module. Submodules will be supported in the next release of Racket, and they are available in the current pre-release version.

Submodules provide nested namespaces, but that kind of nesting is already available through forms like define-package. The power of submodules is that they can be separately loaded and separately run relative to their enclosing modules, in the same way that top-level modules can be separately load and run. This separation of dependencies means that submodules can be used to add code and information to modules—such as tests, documentation, and parsing information—that is loaded only when specifically requested.

The `main` Submodule

One use of a submodule is to declare a main submodule. A main submodule is instantiated when the enclosing module is run as the main program, but not when the enclosing module is used as a library.

"fish.rkt"

#lang racket/base
(provide fish)
(define fish '(1 2))
(module+ main
  (map displayln fish))

"sum-fish.rkt"

#lang racket/base
(require "fish.rkt")
(apply + fish)

The "fish.rkt" module exports fish as a list of numbers. Running "sum-fish.rkt", which imports "fish.rkt", prints the sum of the numbers. Running "fish.rkt" directly, however, triggers the instantiation of the main submodule within "fish.rkt", which displays each number in fish on its own line.

A (module+ main ....) declaration is similar to the Python if __name__ == "__main__": idiom, but with a significant difference. Importing "fish.rkt" into another module ignores the main submodule completely, so that the main submodule’s code and its dependencies aren’t loaded.

Unit Tests

Another use for submodules—and one where independent loading matters more than for "fish.rkt"’s main—is for test suites. A main submodule could be used for tests, so that running the module runs its tests, but our preferred convention is to declare a test submodule:

"fish2.rkt"

#lang racket/base
(provide fish)
(define fish '(1 2))
(module+ test
  (require rackunit)
  (check andmap number? fish))

The new raco test shell command runs the test submodule of a given module, so that raco test fish2.rkt checks that all the values of the fish list are numbers. The test submodule imports rackunit for its check form, but that import does not create a dependency on rackunit (which is a substantial library) for modules that import "fish2.rkt"; the dependency is only for the test submodule.

The module+ form creates a dependency of the submodule on the enclosing module, since it implicitly imports all bindings of its enclosing module. The implicit import explains why the test submodule in "fish2.rkt" can use fish directly (i.e., it’s not simply because the submodule is syntactically nested). The implicit import includes all bindings from the enclosing module, including bindings that are not exported via provide, which supports unit tests for unexported functions.

Finally, the module+ form splices together multiple declarations of a particular submodule, which is useful for interleaving definitions and tests:

"fish3.rkt"

#lang racket/base
(provide fish feed)
(module+ test (require rackunit))
(define fish '(1 2))
(module+ test (check andmap number? fish))
(define (feed n) (+ n 1))
(module+ test (check-equal? 3 (feed 2)))

Since tests are isolated to a submodule, it might make sense to “strip” tests from a set of modules to prepare them for distribution to end-users. Although we haven’t created the raco strip command, yet, it’s a likely future addition. In that way, submodules act like sections in an object-linking file format such as ELF.

Core Submodule Forms

The module+ form is actually just a macro that expands to a more primitive form for declaring submodules. The primitive submodule forms are module and module*, which reflect the two different directions that module dependencies can run: the module* form allows the submodule to import its enclosing module, while the module form allows the enclosing module to import the submodule.

As a minor feature, submodules can be declared with module and used by a require—essentially the same within a module as interactively:

#lang racket/base
(module zoo racket/base
  (provide tiger)
  (define tiger "Tony"))
(require 'zoo)
tiger

More significantly, module allows a submodule to be free of any dependency on its enclosing module, while the enclosing module similarly has no obligation to import the submodule.

The module* form similarly implies no a priori dependency of the submodule on its enclosing module, except that a #f for the submodule’s initial import means an import of all of the enclosing module’s bindings. The module+ form expands (after collecting all pieces of a submodule’s body) to a module* form with a #f initial import.

In-Source Documentation

A more interesting example is the scribble/srcdoc library, which supports documentation within a library’s source in a JavaDoc-like way:

"fish4.rkt"

#lang racket
(require scribble/srcdoc
         (for-doc racket/base scribble/manual))
(provide
 (thing-doc
  fish (listof number?)
  ("Our fish, each represented as a number.")))
(define fish '(1 2))
(provide
 (proc-doc/names
  feed (number? . -> . number?) (n)
  ("Feed 1 pound of food to the fish " (racket n) ".")))
(define (feed n) (+ n 1))

The scribble/srcdoc library provides thing-doc and proc-doc, which can be used instead of a plain provide form to attach both a contract and documentation to the exported binding. The contract is used at run time to guard uses of the value. The contract is also included in the documentation with hyperlinks to bindings that are used in the contract, such as number?.

In addition to provide forms, the scribble/srcdoc library provides for-doc for use within require. A for-doc imports forms that are used in the implementation of the documentation, as opposed to the implementation of the library. In "fish4.rkt", scribble/manual is imported for the racket form that is used in the documentation of feed.

These forms from scribble/srcdoc work together to construct a srcdoc submodule that contains documentation for the enclosing module without creating any documentation-related run-time overhead for the enclosing module. The module’s documentation is loaded from bytecode only when specifically requested from the srcdoc submodule for inclusion in a documentation build via include-extracted:

"fish4.scrbl"

#lang scribble/manual
@(require scribble/extract)
@title{Fish}
@defmodule["fish.rkt"]
@include-extracted["fish4.rkt"]

Implementing Languages

Top-level modules in Racket intentionally inherit no bindings from the top-level environment, so that (1) a module’s meaning is fixed independent of its load order or other effects, and (2) the initial import of a module can act as a “language” with complete control over the module’s meaning. That is, #lang is in principle the only top-level form in Racket. With only modules at the top level, however, macros cannot abstract over sets of top-level modules.

Submodules provide more flexibility, in that a macro defined within a module can abstract over a set of submodules. As it happens, abstracting over a set of submodules is useful for defining a new language for use with #lang.

A language for use with #lang is implemented by several pieces that live at different times, including the language’s parser, the language’s run-time support library, and the language’s syntax-coloring plug-in for DrRacket. Formerly, a programmer who implements a language with those three pieces was forced to write three different modules (or else tangle the different pieces in a single module, which invariably pulls too many dependencies into any one of them). Those pieces now can be in submodules, which opens the possibility for new abstractions that conveniently generate the various pieces of a language.

For example, if you want to define an ocean language that is racket/base plus fish, it’s enough to install the following module as "main.rkt" in an "ocean" collection (e.g., in an "ocean" directory is that is registered as a collection with the command raco link ocean):

#lang racket/base
(provide (all-from-out racket/base)
         fish)
(define fish '(1 2 3))
(displayln "Fish are swimming")
(module reader syntax/module-reader
  #:language 'ocean)

When Racket sees a module that starts #lang ocean, it does not simply load the "main.rkt" module of the "ocean" collection. Instead, #lang looks for a reader submodule of the "main.rkt" module. The reader module above does not depend on its enclosing module, so that parsing a module in the ocean language does not trigger the “Fish are swimming” printout. Instead, the #:language 'ocean part of the reader submodule indicates that a module parsed from #lang ocean starts by importing the ocean module, so the bindings of ocean are available in the program, and “Fish are swimming” will print when the program is run.

Submodules are Like…

At some level, syntactic nesting of modules is an obvious feature to include in a module system. Nevertheless, Racket’s submodules are not like nested modules in most languages—including Python, Chez, or ML—where nesting is for namespace management and nested modules are always instantiated along with the enclosing module. Racket submodules can be used in a similar way, but the fact that submodules are separately loadable makes them available to solve a larger class of problems.

If I had to pick just one analogy, I’d say that submodules are most like a generalization of annotations in the Java sense. Java annotations allow the decoration of code with metadata, and the annotations are preserved through run time, so that annotations can be inspected in source, in compiled code, or reflectively at run time. Java annotations are limited to data, so that any abstraction or programatic interpretation of the data depends on yet another external tool and language, or else the code part (such as test to run for a @Test annotation) is difficult to separate from the main program. C# attributes are slightly more general, in that the data can have associated methods, but attribute code is still mingled with run-time code. Submodules generalize annotations to make them “live,” so that the language of annotations can include expressions, functions, and even syntactic extensions, while allowing the annotation/submodule code to stay separate from the base code.

For more information on submodules, see the pre-release Guide section.

02 Apr 2012

Scribble Your Blogs

posted by Ryan Culpepper

Scribble is a great language for writing documentation. Now it’s a great language for writing blog posts, too. I’ve just released a tool called Scriblogify that compiles Scribble documents and posts them as blog entries. Scriblogify is a more polished and automated version of the scripts I’ve been using for several months to prepare posts for my own blog.

To get Scriblogify, just download it from PLaneT:

(require(planetryanc/scriblogify:1))

raco planet install ryanc scriblogify.plt 1 0

The package automatically installs a raco subcommand (raco scriblogify) that can be used to configure Scriblogify and process and upload blog posts.

Configure Scriblogify by running

raco scriblogify --setup

That will open a browser window with the Scriblogify configuration servlet. The servlet will prompt you to authorize Scriblogify to access your Blogger and Picasa Web Albums accounts (only the Blogger/Picasa combination is currently supported) and then create one or more profiles—named combinations of blogs and web albums to upload to.

Scriblogify automatically handles images computed in your Scribble documents by uploading them to a web album. For example, here are some images computed with the slideshow/pict library:

> (require slideshow/pict)
> (define rainbow-colors
    '("red" "orange" "yellow" "green" "blue" "purple"))
> (for/list ([c rainbow-colors])
    (colorize (filled-rounded-rectangle 20 20) c))
'(image image image image image image)

> (for/list ([c rainbow-colors]
             [dir (in-cycle '(right left))])
    (standard-fish 25 25 #:color c #:direction dir))
'(image image image image image image)

> (cc-superimpose
   (cc-superimpose (cloud 100 80 "lightblue")
                   (cloud 90 70 "white"))
   (hc-append 10
    (standard-fish 30 30 #:color "red" #:direction 'right)
    (standard-fish 25 20 #:color "blue" #:direction 'left)))
image

By Scribbling your blog entries, you get Scribble’s nice code formatting, colorizing, and documentation links for free—well, once you’ve updated your blog’s CSS (see below). If you’re blogging about bleeding-edge work, there’s an option to make Scriblogify link to the nightly build docs (updated daily) instead of the release docs (updated every 3 months).

Scriblogify’s documentation has more details, including how to update your blog’s CSS for Scribbled content and what bloggable Scribble documents look like.You can see the source for this blog post here. This blog entry was created with the following command:raco scriblogify -p the-racket-blog scribble-your-blogs.scrbl

Now go forth and Scribble your blogs.

Introduction

Examples

Contracts

Summary

Check failure

Check failure

Check failure

Check failure

Check failure

Check failure

langracket

langracket

Where’s the Code?

Bring on the Pitchers!

Runtime Merge Sort

Accumulator-Passing Style Won’t Work

Continuation-Passing Style Does Work

The Inline Sort Macro

The User-Facing Macro

The Inline Merge Function

The Inline Sort Function

Inline Sort Properties

Conclusion

The main Submodule

Unit Tests

Core Submodule Forms

In-Source Documentation

Implementing Languages

Submodules are Like…

The `main` Submodule