28 Sep 2012

I Write Funny-Lookin’ Racket Code: An Alternate Style for Delimiters and Indentation

posted by Carl Eastlund

A lot of people are quite surprised when they see the Racket code I write. Let’s say I needed a function to render hash table values as expressions that would produce the same value. A “normal” Racketeer might write something like the following.

(define (hash->expr ht)
  (cons 'hash
        (for/fold ([args '()])
                  ([(k v) (in-hash ht)])
          (cons (list 'quote k)
                (cons (list 'quote v)
                      args)))))

There might be a few variances in style, especially depending on whether one has Racket or Emacs set up to indent for/fold specially. Almost no one, however, would come up with the code I write.

(define (hash->expr ht)
  (cons 'hash
    (for/fold
        {[args '()]}
        {[{k v} (in-hash ht)]}
      (cons (list 'quote k)
        (cons (list 'quote v)
          args)))))

The biggest reaction I get is from the presence of {curly braces}, but those are largely incidental as far as I’m concerned. It’s all about the indentation to me.

A while back I found that my .emacs file was growing in proportion to my Racket code—all of it I had ever written, in fact. Every new macro in my code or in the latest Racket version needed a line like: (put 'for/fold 'scheme-indent-function 2)This would tell Emacs more or less how I wanted it to indent the given form. So long as I followed the use patterns Emacs could cope with. For instance, with for/fold, Emacs could cope with both of the “special” arguments on the same line as the macro name, or both on separate lines. Changing that up got weird results.

Also, function arguments would lead to significant rightward-creep in my indentation. Adding up the lengths of a list of strings, for instance, might look like this:

(foldl +
       0
       (map string-length
            (list "There"
                  "are"
                  "thirty-four"
                  "characters"
                  "here.")))

This wastes a lot of space on the left, and to me it doesn’t do enough for readability to justify it. I don’t need my eyes drawn to 0 and + nearly that much.

I discovered a new style of indentation in the {_Little_, Seasoned, Reasoned} Schemer series of books by Dan Friedman and his many cohorts. These books always start a new indentation level at a fixed distance in from the previous one, regardless of the cause for the indentation. Arguments on the same line as the function or macro name are ignored; they do not “push” indentation over to the right at all.

This indentation style has a lot of appeal to me for a number of reasons. One, it wastes no space on the left. Two, it never needs to “know” what a given macro means. It doesn’t matter if you’re applying + or lambda or for/fold, all lines beyond the first move two (or however many) characters to the right. I saw a light at the end of the tunnel: no more .emacs customization for every new form!

This style leaves two issues. One, how to indent cond? The Little books treat cond differently, indenting each clause only as far as the keyword cond, while other form’s arguments are pushed in slightly farther than the function or macro name. Two, how to “fix” forms like for/fold where a few lines really ought to be indented differently? A straight-up interpretation of this style would generate code like this:

(for/fold
  ([x 0])
  ([str (in-list '("12" "characters"))])
  (define n (string-length str))
  (+ x n))

Now we can’t tell visually where the for/fold iteration clauses leave off and the loop body definitions and expressions begin.

The cond issue is easy enough to resolve. In Racket, unlike in vanilla Scheme, we use [brackets] around cond clauses. The same goes for a number of other repeated clauses, in fact: let, match, syntax-parse, and so forth. So I decided my new, custom indentation style would indent [brackets] differently from (parentheses). Parens indent one farther than brackets. That way,

(let ([x 1]
      [y 2])
  (+ x y))`doesn't become

`(let ([x 1]
       [y 2])
  (+ x y))

Since I already use [brackets] every time I have a repeated, non-expression clause, this rule does exactly what I need it to do.

Once I had differentiated [] from (), resolving the for/fold issue was obvious. I needed a new indentation rule and a new lexical marker: {braces}. Now every time I have a fixed number of special non-expression forms in a macro, I wrap them in braces. Anything in braces is indented slightly farther (four spaces rather than two) than ordinary sub-expressions. So my for/fold example comes out like this.

(for/fold
    {[x 0]}
    {[str (in-list '("12" "characters"))]}
  (define n (string-length str))
  (+ x n))

Suddently it’s quite clear which parts are “special” in the for/fold macro.

So now I write code using (parentheses) for definitions, expressions, and anything else resembling a nestable, expandable term (e.g. match patterns, syntax templates), [brackets] for repeated, non-expandable clauses (e.g. cond clauses, let bindings), and {braces} for non-repeated, non-expandable forms (e.g. lambda formals, groups of let bindings). And I don’t bother to align function arguments; I tend to treat the most significant argument as an “accumulator”, and put everything else on one line if I can.

(foldl + 0
  (map string-length
    (list
      "There"
      "are"
      "thirty-four"
      "characters"
      "here.")))

The way I read this code, the first line tells us we are performing a summation; the second line tells us we want the length of each string; the third line tells us we have a list coming; and the rest give its contents. The result “accumulates” from a list to its lengths to their sum as the indentation cascades out and up from the inside.

With these three rules, I now write my Racket code without bothering to customize my .emacs file as I go. I just use delimiters judiciously to tell Emacs how I want everything indented, and everything comes out pretty much how I want it.

For anyone interested in installing this indentation mode or looking at its source code, I’ve put the file up on GitHub at:

https://github.com/carl-eastlund/simple-sexp

To use it, just put it somewhere your Emacs knows to look for Elisp code and add (require 'simple-sexp) to your .emacs file.

Addendum: Oh, and there’s some structured s-expression editing code in that file as well. It preserves matched parens, brackets, braces, and quotes (for strings). It’s probably a much inferior implementation of things like paredit; this code represents the flailings of an Elisp novice. Use at your own peril.

more →

29 Aug 2012

RacketCon 2012

posted by Sam Tobin-Hochstadt

We’re pleased to announce that (second RacketCon) will take place on October 13, 2012, at Northeastern University in Boston. This year, RacketCon will feature 3 2-hour tutorial sessions, as well as a series of short talks about development in and of Racket over the last year.

Potential tutorial sessions include:

  • Building a new domain-specific language using syntactic extension

  • Using contracts in application development

  • Adding types to an existing application with Typed Racket

  • Parallelizing Racket applications with futures and places Potential talks include:

  • submodules

  • WhaleSong

  • futures and visualization

  • distributed places

  • optimization coaching

  • Dracula and ACL2

  • PLT Redex Lunch will be provided.

On Sunday after RacketCon, we plan to hold a hackathon to work as a group on various Racket projects such as documentation improvements and FFI bindings. This will be organized by Asumu Takikawa.

To register, fill out this form. The conference website has more information.

more →

25 Aug 2012

Dynamic Programming versus Memoization

posted by Shriram Krishnamurthi

[Edit on 2012–08–27, 12:31EDT: added code and pictures below. 2012–08–27, 13:10EDT: also incorporated some comments.]

I wrote this on the Racket educators’ mailing list, and Eli Barzilay suggested I post it here as well.

The article is about the difference between memoization and dynamic programming (DP). Before you read on, you should stop and ask yourself: Do I think these two are the same concept?; if you think they are different, How do I think they differ?; and for that matter, Do I even think of them as related?

Did you think? Okay, then read on.

They most certainly are related, because they are both mechanisms for optimizing a computation by replacing repeated sub-computations with the storage and reuse of the result of those sub-computations. (That is, both trade off space for time.) In that description is already implicit an assumption: that the sub-computation will return the same result every time (or else you can’t replace the computation with its value on subsequent invocations). You’ve almost certainly heard of DP from an algorithms class. You’ve probably heard of memoization if you’re a member of this language’s community, but many undergrads simply never see it because algorithms textbooks ignore it; and when they do mention it they demonstrate fundamental misunderstandings (as Algorithms by Dasgupta, Papadimitriou, and Vazirani does).

Therefore, let’s set aside precedent. I’ll tell you how to think about them.

Memoization is fundamentally a top-down computation and DP is fundamentally bottom-up. In memoization, we observe that a computational tree can actually be represented as a computational DAG (a directed acyclic graph: the single most underrated data structure in computer science); we then use a black-box to turn the tree into a DAG. But it allows the top-down description of the problem to remain unchanged. (As I left unstated originally but commenter23 below rightly intuited, the nodes are function calls, edges are call dependencies, and the arrows are directed from caller to callee. See the pictures later in this article.)

In DP, we make the same observation, but construct the DAG from the bottom-up. That means we have to rewrite the computation to express the delta from each computational tree/DAG node to its parents. We also need a means for addressing/naming those parents (which we did not need in the top-down case, since this was implicit in the recursive call stack). This leads to inventions like DP tables, but people often fail to understand why they exist: it’s primarily as a naming mechanism (and while we’re at it, why not make it efficient to find a named element, ergo arrays and matrices).

In both cases, there is the potential for space wastage. In memoization, it is very difficult to get rid of this waste (you could have custom, space-saving memoizers, as Vclav Pech points out in his comment below, but then the programmer risks using the wrong one…which to me destroys the beauty of memoization in the first place). In contrast, in DP it’s easier to save space because you can just look at the delta function to see how far “back” it reaches; beyond there lies garbage, and you can come up with a cleverer representation that stores just the relevant part (the “fringe”). Once you understand this, you realize that the classic textbook linear, iterative computation of the fibonacci is just an extreme example of DP, where the entire “table” has been reduced to two iteration variables. (Did your algorithms textbook tell you that?)

In my class, we work through some of the canonical DP algorithms as memoization problems instead, just so when students later encounter these as “DP problems” in algorithms classes, they (a) realize there is nothing canonical about this presentation, and (b) can be wise-asses about it.

There are many trade-offs between memoization and DP that should drive the choice of which one to use.

Memoization:

  • leaves computational description unchanged (black-box)

  • avoids unnecessary sub-computations (i.e., saves time, and some space with it)

  • hard to save space absent a strategy for what sub-computations to dispose of

  • must alway check whether a sub-computation has already been done before doing it (which incurs a small cost)

  • has a time complexity that depends on picking a smart computation name lookup strategy

In direct contrast, DP:

  • forces change in desription of the algorithm, which may introduce errors and certainly introduces some maintenance overhead

  • cannot avoid unnecessary sub-computations (and may waste the space associated with storing those results)

  • can more easily save space by disposing of unnecessary sub-computation results

  • has no need to check whether a computation has been done before doing it—the computation is rewritten to ensure this isn’t necessary

  • has a space complexity that depends on picking a smart data storage strategy

[NB: Small edits to the above list thanks to an exchange with Prabhakar Ragde.]

I therefore tell my students: first write the computation and observe whether it fits the DAG pattern; if it does, use memoization. Only if the space proves to be a problem and a specialized memo strategy won’t help—or, even less likely, the cost of “has it already been computed” is also a problem—should you think about converting to DP. And when you do, do so in a methodical way, retaining structural similarity to the original. Every subsequent programmer who has to maintain your code will thank you.

I’ll end with a short quiz that I always pose to my class.

Memoization is an optimization of a top-down, depth-first computation for an answer. DP is an optimization of a bottom-up, breadth-first computation for an answer. We should naturally ask, what about

  • top-down, breadth-first

  • bottom-up, depth-first Where do they fit into the space of techniques for avoiding recomputation by trading off space for time?

  • Do we already have names for them? If so, what?, or

  • Have we been missing one or two important tricks?, or

  • Is there a reason we don’t have names for these?

Where’s the Code?

I’ve been criticized for not including code, which is a fair complaint. First, please see the comment number 4 below by simli. For another, let me contrast the two versions of computing Levenshtein distance. For the dynamic programming version, see Wikipedia, which provides pseudocode and memo tables as of this date (2012–08–27). Here’s the Racket version:

(define levenshtein
  (lambda (s t)
    (cond
     [(and (empty? s) (empty? t)) 0]
     [(empty? s) (length t)]
     [(empty? t) (length s)]
     [else
      (if (equal? (first s) (first t))
   (levenshtein (rest s) (rest t))
   (min (add1 (levenshtein (rest s) t))
        (add1 (levenshtein s (rest t)))
        (add1 (levenshtein (rest s) (rest t)))))])))

The fact that this is not considered the more straightforward, reference implementation by the Wikipedia author is, I think, symptomatic of the lack of understanding that this post is about.

Now let’s memoize it (assuming a two-argument memoize):

(define levenshtein
  (memoize
    (lambda (s t)
      (cond
       [(and (empty? s) (empty? t)) 0]
       [(empty? s) (length t)]
       [(empty? t) (length s)]
       [else
 (if (equal? (first s) (first t))
     (levenshtein (rest s) (rest t))
     (min (add1 (levenshtein (rest s) t))
   (add1 (levenshtein s (rest t)))
   (add1 (levenshtein (rest s) (rest t)))))]))))

All that changed is the insertion of the second line.

Bring on the Pitchers!

The easiest way to illustrate the tree-to-DAG conversion visually is via the Fibonacci computation. Here’s a picture of the computational tree:

Fibonacci tree

Fibonacci tree

Now let’s see it with memoization. The calls are still the same, but the dashed ovals are the ones that don’t compute but whose values are instead looked up, and their emergent arrows show which computation’s value was returned by the memoizer.

Fibonacci DAG

Fibonacci DAG

Important: The above example is misleading because it suggests that memoization linearizes the computation, which in general it does not. If you want to truly understand the process, I suggest hand-tracing the Levenshtein computation with memoization. And to truly understand the relationship to DP, compare that hand-traced Levenshtein computation with the DP version. (Hint: you can save some manual tracing effort by lightly instrumenting your memoizer to print inputs and outputs. Also, make the memo table a global variable so you can observe it grow.)

more →

24 Aug 2012

Fully Inlined Merge Sort

posted by Neil Toronto

While writing the code for the triangular distribution in the upcoming math library, I found that I needed a function that sorts exactly three numbers. This kind of code is annoying to write and to get right. But it comes up rarely enough, and it seems simple enough, that I’ve never felt like making a library function for it.

But what if I wrote a macro that generated code to sort n numbers very quickly, where n is known at expansion time, but the numbers themselves aren’t? I think I could justify putting that in a library.

Here’s code that correctly sorts three numbers a, b and c:

(if (< b c)
    (if (< a b)
        (values a b c)
        (if (< a c)
            (values b a c)
            (values b c a)))
    (if (< a c)
        (values a c b)
        (if (< a b)
            (values c a b)
            (values c b a))))

It’s an if tree. Notice that there are 6 leaf expressions, for the 3! = 6 possible permutations. Also, it never compares more than it has to. It’s optimal.The optimality came from my reasoning about transitivity. For example, only two comparisons are needed before returning (values a b c). I knew that both (< a b) and (< b c), so (< a b c) must be true by transitivity.

It would be nice if the macro generated optimal code by explicitly reasoning about transitivity, or as an emergent property of the sorting algorithm it uses.

We’ll write a macro that does the latter, by generating a fully inlined merge sort.

[Edit: The final inline sort macro is here.]

Runtime Merge Sort

Start with a simple, runtime merge sort:

(define (merge as bs)
  (match* (as bs)
    [((list) bs)  bs]
    [(as (list))  as]
    [((list a as ...) (list b bs ...))
     (if (< a b)
         (cons a (merge as (cons b bs)))
         (cons b (merge (cons a as) bs)))]))

(define (merge-sort vs)
  (match vs
    [(list)  vs]
    [(list a)  vs]
    [_  (define-values (lvs rvs)
          (split-at vs (quotient (length vs) 2)))
        (merge (merge-sort lvs) (merge-sort rvs))]))

Example:

> (merge-sort '(5 1 2 5 10))
'(1 2 5 5 10)

To make a macro out of merge sort, we need to change two things. The most obvious is that it has to return syntax for an if instead of evaluating it. That’s easy for a novice macrologist: change the functions to operate on syntax, stick a syntax-quasiquote in front of the if, and unquote the bits inside that get evaluated at expansion time.

But if we did only that, we’d end up with expanded code like this:

(if (< a b)
    (cons a (if ...))
    (cons b (if ...)))

It would be slow because cons allocates. We want the code to be fast.So the other change is to move the conses inside the ifs, and evaluate them at expansion time. We can then construct a values expression out of the resulting list.

Accumulator-Passing Style Won’t Work

Novice functional programmers should know that accumulator-passing style (APS) moves conses inward. For example, this “add 1 to each element” function:

racket (define (list-add1 vs) (match vs [(list) (list)] [(list v vs ...) (cons (add1 v) (list-add1 vs))]))

becomes this after conversion to APS:

(define (list-add1/acc vs [acc (list)])
  (match vs
    [(list)  (reverse acc)]
    [(list v vs ...)
     (list-add1/acc vs (cons (add1 v) acc))]))

Now cons is where we want it: inside the recursive call, instead of in tail position. The problem is that APS doesn’t work on tree-shaped recursion.

Continuation-Passing Style Does Work

In continuation-passing style (CPS), we pass a continuation k—i.e. “what happens next”— instead of an accumulator. The functions call k instead of returning values. For example:

(define (list-add1/k vs [k identity])
  (match vs
    [(list)  (k (list))]
    [(list v vs ...)
     (list-add1/k vs (λ (vs) (k (cons (add1 v) vs))))]))

If we want, we can pass something besides identity as the base-case continuation:

> (list-add1/k '(1 2 3 4) (λ (vs) (apply values vs)))
2
3
4
5

CPS turns every call into a tail call, so it moves conses inward even with tree-shaped recursion. As a demonstration, here’s a CPSed merge sort:

(define (merge/k as bs k)
  (match* (as bs)
    [((list) bs)  (k bs)]
    [(as (list))  (k as)]
    [((list a as ...) (list b bs ...))
     (if (< a b)
         (merge/k as (cons b bs) (λ (vs) (k (cons a vs))))
         (merge/k (cons a as) bs (λ (vs) (k (cons b vs)))))]))

(define (merge-sort/k vs k)
  (match vs
    [(list)  (k vs)]
    [(list a)  (k vs)]
    [_  (define-values (lvs rvs)
          (split-at vs (quotient (length vs) 2)))
        (merge-sort/k
         lvs (λ (lvs)
               (merge-sort/k
                rvs (λ (rvs)
                      (merge/k lvs rvs k)))))]))

You can read the last expression in merge-sort/k as, “Sort lvs. Then, with the sorted lvs, sort rvs. Then, with the sorted rvs, merge lvs and rvs.” Example:

> (merge-sort/k '(5 2 3 1) (λ (vs) (apply values vs)))
1
2
3
5

The Inline Sort Macro

When we macro-ize the CPSed merge sort, we’ll turn the continuations into expansion-time functions. So not only will macro-ized CPS move conses inward, it’ll apply them all at expansion time!

We’ll do it in three parts: the user-facing macro, the inline merge function, and the inline sort function.

The User-Facing Macro

Let’s put a nice face on inline sorting:

(define-syntax (inline-sort stx)
  (syntax-case stx ()
    [(_ lst ...)
     (with-syntax ([(vs ...)
                    (generate-temporaries #'(lst ...))])
       #`(let ([vs lst] ...)
           #,(inline-sort/k #'(vs ...)
                            (λ (vs) #`(values #,@vs)))))]))

This macro does two things. First, it names the values to be sorted, so they don’t get re-evaluated every time they’re compared. Second, it calls inline-sort/k with a base-case continuation that converts syntax lists to values expressions.Note that the call #,(inline-sort/k …) happens at expansion time, and that the continuation (λ (vs) …) it passes is an expansion-time value.

The Inline Merge Function

Changing merge/k to operate on syntax at expansion time is as straightforward as possible:

(define-for-syntax (inline-merge/k as bs k)
  (syntax-case (list as bs) ()
    [(() bs)  (k #'bs)]
    [(as ())  (k #'as)]
    [((a as ...) (b bs ...))
     #`(if (< a b)
           #,(inline-merge/k #'(as ...) #'(b bs ...)
                             (λ (vs) (k (cons #'a vs))))
           #,(inline-merge/k #'(a as ...) #'(bs ...)
                             (λ (vs) (k (cons #'b vs)))))]))

The only substantial changes are the quasiquotes and unquotes, and using syntax-case to destructure syntax lists instead of using match to destructure lists. Again, note that the recursive calls #,(inline-merge/k …) in each if branch happen at expansion time, and that their continuations are expansion-time values.We can see what kind of code merge/k returns by applying it at expansion time:

> (begin-for-syntax
    (print
     (syntax->datum
      (inline-merge/k #'(a) #'(b c) (λ (vs) vs)))))
'(if (< a b) (a b c) (if (< a c) (b a c) (b c a)))

The syntax list #’(b c) is assumed already sorted, meaning (< b c) at runtime. Therefore, if (< a b) at runtime, by transitivity, (< a b c) at runtime, so the merge generates #’(a b c). In other words, inline-merge/k’s assumption that its arguments are sorted is equivalent to reasoning about transitivity.

The Inline Sort Function

Lastly, the divide-and-conquer part:

(require (for-syntax racket/list))  ; for list functions

(define-for-syntax (inline-sort/k vs k)
  (syntax-case vs ()
    [()  (k vs)]
    [(a)  (k vs)]
    [_  (let ([vs  (if (list? vs) vs (syntax->list vs))])
          (define-values (lvs rvs)
            (split-at vs (quotient (length vs) 2)))
          (inline-sort/k
           lvs (λ (lvs)
                 (inline-sort/k
                  rvs (λ (rvs)
                        (inline-merge/k lvs rvs k))))))]))

This is changed similarly to merge/k. The only new change is using syntax->list to convert syntax to lists so we can use the functions length and split-at. Example:

> (inline-sort 5 2 3 1)
1
2
3
5

Of course, the result of evaluating an inline-sort doesn’t tell the whole story. Let’s fire up the macro stepper and see what (inline-sort 5 2 3) expands to. Copying from the macro stepper window and renaming temp variables, we get

(let ([a 5] [b 2] [c 3])
  (if (< b c)
      (if (< a b)
          (values a b c)
          (if (< a c)
              (values b a c)
              (values b c a)))
      (if (< a c)
          (values a c b)
          (if (< a b)
              (values c a b)
              (values c b a)))))

It’s an if tree. Notice that there are 6 leaf expressions, for the 3! = 6 possible permutations. Also, it never compares more than it has to. It’s optimal.

Inline Sort Properties

Inherited from the merge sort, the inline sort has the following properties (assuming a length-n list):

  • Time optimality: The depth of the if tree is O(n log(n)).

  • Size optimality: The number of leaves is exactly n!. The term “size optimality” is misleading, because that’s still a lot of code. I’ve seriously considered requiring any user of this macro to state how many permutations there are for the number of values they’re sorting. They’d have to prove to the macro that they know how much code they’re asking it to generate.Inline sort is wicked fast, as we should expect:

> (define vs '(5 2 3 1))
> (match-define (list a b c d) vs)
> (for ([_  (in-range 5)])
    (time (for ([_  (in-range 1000000)])
            (match-let ([(list a b c d)  (sort vs <)])
              (void)))))
cpu time: 550 real time: 540 gc time: 30
cpu time: 510 real time: 518 gc time: 0
cpu time: 520 real time: 517 gc time: 20
cpu time: 520 real time: 517 gc time: 0
cpu time: 510 real time: 516 gc time: 10
> (for ([_  (in-range 5)])
    (time (for ([_  (in-range 1000000)])
            (let-values ([(a b c d)  (inline-sort a b c d)])
              (void)))))
cpu time: 20 real time: 28 gc time: 0
cpu time: 30 real time: 27 gc time: 0
cpu time: 30 real time: 27 gc time: 0
cpu time: 30 real time: 27 gc time: 0
cpu time: 20 real time: 27 gc time: 0

So about 20 times faster than sort on a length–4 list.I use it in Typed Racket, on floating-point numbers. Typed Racket’s optimizer replaces < with unsafe-fl<. This tells the compiler that the elements are floats, so it can keep them stack-allocated, which reduces allocations further. In all, for my particular use of inline-sort, it’s over 50 times faster than sort.And using it is a heckuvalot easier than writing an if tree and reasoning about transitivity every time I need to sort a handful of floats.

Conclusion

Writing macros in expansion-time CPS to fully inline a recursive function works out beautifully. I suspect that it will work on any recursive, polymorphic function whose well-foundedness follows only from the structure of the input data.Also, it can generate a metric truckload of code.Interesting note: I originally wrote inline-sort using only syntax-rules, passing the names of higher-order macros to other macros as continuations. Sorting a five-element list took almost 19000 expansion steps, which is ridiculously inefficient even for a 120-leaf if tree.

more →

07 Aug 2012

Racket v5.3

posted by Eli Barzilay

Racket version 5.3 is now available from

http://racket-lang.org/ Release Highlights::

  • Submodules are nested module declarations that can be loaded and run independently from the enclosing module. See also the overview of submodules.

  • The futures visualizer is a graphical profiling tool for parallel programs using futures. The tool shows a detailed execution timeline depicting the migration of futures between threads, and gives detailed information about each runtime synchronization that occurred during program execution. In addition, would-be-future is a special type of future that always executes sequentially and records all potential barricades a regular future would encounter.

  • Optimization Coach (formerly Performance Report) reports information about Racket’s inlining optimizations. Optimization Coach can be launched in any language through the View menu.

  • The new images/flomap library defines floating-point bitmaps and fast image processing operations on them. It is written in Typed Racket, so Typed Racket code may use it without the cost of contract checks.

  • The new json library supports parsing and generating JSON. (Originally based on Dave Herman’s planet library.)

  • racket/string is extended with a set of simplified string manipulation functions that are more convenient than using regexps. regexp-match* and friends can now be used with new keyword arguments to return specific matched regexp group/s and gaps between matches.

  • The new racket/generic library allows generic function definitions, which dispatch to methods added to a structure type via the new #:methods keyword.

  • The class form supports declaring a method abstract. An abstract method prevents a class from being instantiated unless it is overridden.

  • The contract library comes with support for interfaces, generics, prompts, continuation-marks, and structs.

  • Most error messages use a new multi-line format that is more consistent with contract errors and accommodates more information.

  • Typed Racket supports function definitions with keyword arguments; the startup time of Typed Racket programs has been sharply reduced.

  • The new ffi/com library replaces MysterX; a compatibility mysterx library remains, but without ActiveX support. The new ffi/unsafe/com library offers a more primitive and direct way to use COM classes and methods.

  • There is now a very complete completion code for zsh. It is not included in the distribution though; get it at http://goo.gl/DU8JK (This script and the bash completions will be included in the standard installers in future versions.)

Deprecation: Effective this release:

  • The tex2page and combinator-parser libraries have been moved from the Racket distribution to PLaneT:

    (require (planet plt/tex2page)) (require (planet plt/combinator-parser))

  • The following has been deprecated and will be removed in the January 2013 release:

the planet command-line tool; use raco planet instead.

  • The following has been deprecated and will be removed in the August 2013 release:

the mzlib/class100 library; use racket/class instead.

more →

03 Jun 2012

Submodules

posted by Matthew Flatt

A Racket submodule is a module that is syntactically nested within another module. Submodules will be supported in the next release of Racket, and they are available in the current pre-release version.

Submodules provide nested namespaces, but that kind of nesting is already available through forms like define-package. The power of submodules is that they can be separately loaded and separately run relative to their enclosing modules, in the same way that top-level modules can be separately load and run. This separation of dependencies means that submodules can be used to add code and information to modules—such as tests, documentation, and parsing information—that is loaded only when specifically requested.

The main Submodule

One use of a submodule is to declare a main submodule. A main submodule is instantiated when the enclosing module is run as the main program, but not when the enclosing module is used as a library.

"fish.rkt"

#lang racket/base
(provide fish)
(define fish '(1 2))
(module+ main
  (map displayln fish))

"sum-fish.rkt"

#lang racket/base
(require "fish.rkt")
(apply + fish)

The "fish.rkt" module exports fish as a list of numbers. Running "sum-fish.rkt", which imports "fish.rkt", prints the sum of the numbers. Running "fish.rkt" directly, however, triggers the instantiation of the main submodule within "fish.rkt", which displays each number in fish on its own line.

A (module+ main ....) declaration is similar to the Python if __name__ == "__main__": idiom, but with a significant difference. Importing "fish.rkt" into another module ignores the main submodule completely, so that the main submodule’s code and its dependencies aren’t loaded.

Unit Tests

Another use for submodules—and one where independent loading matters more than for "fish.rkt"’s main—is for test suites. A main submodule could be used for tests, so that running the module runs its tests, but our preferred convention is to declare a test submodule:

"fish2.rkt"

#lang racket/base
(provide fish)
(define fish '(1 2))
(module+ test
  (require rackunit)
  (check andmap number? fish))

The new raco test shell command runs the test submodule of a given module, so that raco test fish2.rkt checks that all the values of the fish list are numbers. The test submodule imports rackunit for its check form, but that import does not create a dependency on rackunit (which is a substantial library) for modules that import "fish2.rkt"; the dependency is only for the test submodule.

The module+ form creates a dependency of the submodule on the enclosing module, since it implicitly imports all bindings of its enclosing module. The implicit import explains why the test submodule in "fish2.rkt" can use fish directly (i.e., it’s not simply because the submodule is syntactically nested). The implicit import includes all bindings from the enclosing module, including bindings that are not exported via provide, which supports unit tests for unexported functions.

Finally, the module+ form splices together multiple declarations of a particular submodule, which is useful for interleaving definitions and tests:

"fish3.rkt"

#lang racket/base
(provide fish feed)
(module+ test (require rackunit))
(define fish '(1 2))
(module+ test (check andmap number? fish))
(define (feed n) (+ n 1))
(module+ test (check-equal? 3 (feed 2)))

Since tests are isolated to a submodule, it might make sense to “strip” tests from a set of modules to prepare them for distribution to end-users. Although we haven’t created the raco strip command, yet, it’s a likely future addition. In that way, submodules act like sections in an object-linking file format such as ELF.

Core Submodule Forms

The module+ form is actually just a macro that expands to a more primitive form for declaring submodules. The primitive submodule forms are module and module*, which reflect the two different directions that module dependencies can run: the module* form allows the submodule to import its enclosing module, while the module form allows the enclosing module to import the submodule.

As a minor feature, submodules can be declared with module and used by a require—essentially the same within a module as interactively:

#lang racket/base
(module zoo racket/base
  (provide tiger)
  (define tiger "Tony"))
(require 'zoo)
tiger

More significantly, module allows a submodule to be free of any dependency on its enclosing module, while the enclosing module similarly has no obligation to import the submodule.

The module* form similarly implies no a priori dependency of the submodule on its enclosing module, except that a #f for the submodule’s initial import means an import of all of the enclosing module’s bindings. The module+ form expands (after collecting all pieces of a submodule’s body) to a module* form with a #f initial import.

In-Source Documentation

A more interesting example is the scribble/srcdoc library, which supports documentation within a library’s source in a JavaDoc-like way:

"fish4.rkt"

#lang racket
(require scribble/srcdoc
         (for-doc racket/base scribble/manual))
(provide
 (thing-doc
  fish (listof number?)
  ("Our fish, each represented as a number.")))
(define fish '(1 2))
(provide
 (proc-doc/names
  feed (number? . -> . number?) (n)
  ("Feed 1 pound of food to the fish " (racket n) ".")))
(define (feed n) (+ n 1))

The scribble/srcdoc library provides thing-doc and proc-doc, which can be used instead of a plain provide form to attach both a contract and documentation to the exported binding. The contract is used at run time to guard uses of the value. The contract is also included in the documentation with hyperlinks to bindings that are used in the contract, such as number?.

In addition to provide forms, the scribble/srcdoc library provides for-doc for use within require. A for-doc imports forms that are used in the implementation of the documentation, as opposed to the implementation of the library. In "fish4.rkt", scribble/manual is imported for the racket form that is used in the documentation of feed.

These forms from scribble/srcdoc work together to construct a srcdoc submodule that contains documentation for the enclosing module without creating any documentation-related run-time overhead for the enclosing module. The module’s documentation is loaded from bytecode only when specifically requested from the srcdoc submodule for inclusion in a documentation build via include-extracted:

"fish4.scrbl"

#lang scribble/manual
@(require scribble/extract)
@title{Fish}
@defmodule["fish.rkt"]
@include-extracted["fish4.rkt"]

Implementing Languages

Top-level modules in Racket intentionally inherit no bindings from the top-level environment, so that (1) a module’s meaning is fixed independent of its load order or other effects, and (2) the initial import of a module can act as a “language” with complete control over the module’s meaning. That is, #lang is in principle the only top-level form in Racket. With only modules at the top level, however, macros cannot abstract over sets of top-level modules.

Submodules provide more flexibility, in that a macro defined within a module can abstract over a set of submodules. As it happens, abstracting over a set of submodules is useful for defining a new language for use with #lang.

A language for use with #lang is implemented by several pieces that live at different times, including the language’s parser, the language’s run-time support library, and the language’s syntax-coloring plug-in for DrRacket. Formerly, a programmer who implements a language with those three pieces was forced to write three different modules (or else tangle the different pieces in a single module, which invariably pulls too many dependencies into any one of them). Those pieces now can be in submodules, which opens the possibility for new abstractions that conveniently generate the various pieces of a language.

For example, if you want to define an ocean language that is racket/base plus fish, it’s enough to install the following module as "main.rkt" in an "ocean" collection (e.g., in an "ocean" directory is that is registered as a collection with the command raco link ocean):

#lang racket/base
(provide (all-from-out racket/base)
         fish)
(define fish '(1 2 3))
(displayln "Fish are swimming")
(module reader syntax/module-reader
  #:language 'ocean)

When Racket sees a module that starts #lang ocean, it does not simply load the "main.rkt" module of the "ocean" collection. Instead, #lang looks for a reader submodule of the "main.rkt" module. The reader module above does not depend on its enclosing module, so that parsing a module in the ocean language does not trigger the “Fish are swimming” printout. Instead, the #:language 'ocean part of the reader submodule indicates that a module parsed from #lang ocean starts by importing the ocean module, so the bindings of ocean are available in the program, and “Fish are swimming” will print when the program is run.

Submodules are Like…

At some level, syntactic nesting of modules is an obvious feature to include in a module system. Nevertheless, Racket’s submodules are not like nested modules in most languages—including Python, Chez, or ML—where nesting is for namespace management and nested modules are always instantiated along with the enclosing module. Racket submodules can be used in a similar way, but the fact that submodules are separately loadable makes them available to solve a larger class of problems.

If I had to pick just one analogy, I’d say that submodules are most like a generalization of annotations in the Java sense. Java annotations allow the decoration of code with metadata, and the annotations are preserved through run time, so that annotations can be inspected in source, in compiled code, or reflectively at run time. Java annotations are limited to data, so that any abstraction or programatic interpretation of the data depends on yet another external tool and language, or else the code part (such as test to run for a @Test annotation) is difficult to separate from the main program. C# attributes are slightly more general, in that the data can have associated methods, but attribute code is still mingled with run-time code. Submodules generalize annotations to make them “live,” so that the language of annotations can include expressions, functions, and even syntactic extensions, while allowing the annotation/submodule code to stay separate from the base code.

For more information on submodules, see the pre-release Guide section.

more →

02 Apr 2012

Scribble Your Blogs

posted by Ryan Culpepper

Scribble is a great language for writing documentation. Now it’s a great language for writing blog posts, too. I’ve just released a tool called Scriblogify that compiles Scribble documents and posts them as blog entries. Scriblogify is a more polished and automated version of the scripts I’ve been using for several months to prepare posts for my own blog.

To get Scriblogify, just download it from PLaneT:

(require(planetryanc/scriblogify:1))

or

raco planet install ryanc scriblogify.plt 1 0

The package automatically installs a raco subcommand (raco scriblogify) that can be used to configure Scriblogify and process and upload blog posts.

Configure Scriblogify by running

raco scriblogify --setup

That will open a browser window with the Scriblogify configuration servlet. The servlet will prompt you to authorize Scriblogify to access your Blogger and Picasa Web Albums accounts (only the Blogger/Picasa combination is currently supported) and then create one or more profiles—named combinations of blogs and web albums to upload to.

Scriblogify automatically handles images computed in your Scribble documents by uploading them to a web album. For example, here are some images computed with the slideshow/pict library:

> (require slideshow/pict)
> (define rainbow-colors
    '("red" "orange" "yellow" "green" "blue" "purple"))
> (for/list ([c rainbow-colors])
    (colorize (filled-rounded-rectangle 20 20) c))
'(image image image image image image)

> (for/list ([c rainbow-colors]
             [dir (in-cycle '(right left))])
    (standard-fish 25 25 #:color c #:direction dir))
'(image image image image image image)

> (cc-superimpose
   (cc-superimpose (cloud 100 80 "lightblue")
                   (cloud 90 70 "white"))
   (hc-append 10
    (standard-fish 30 30 #:color "red" #:direction 'right)
    (standard-fish 25 20 #:color "blue" #:direction 'left)))
image

By Scribbling your blog entries, you get Scribble’s nice code formatting, colorizing, and documentation links for free—well, once you’ve updated your blog’s CSS (see below). If you’re blogging about bleeding-edge work, there’s an option to make Scriblogify link to the nightly build docs (updated daily) instead of the release docs (updated every 3 months).

Scriblogify’s documentation has more details, including how to update your blog’s CSS for Scribbled content and what bloggable Scribble documents look like.You can see the source for this blog post here. This blog entry was created with the following command:raco scriblogify -p the-racket-blog scribble-your-blogs.scrbl

Now go forth and Scribble your blogs.

more →

26 Mar 2012

Fixed Racket v5.2.1 Installers

posted by Eli Barzilay

Dear Racketeers,

We have just released a DrRacket version 5.2.1 that starts fine today. The fixed version has replaced the 5.2.1 installers. This version andthe original 5.2.1 differ only in this one fix.

more →

02 Feb 2012

Racket v5.2.1

posted by Eli Barzilay

Racket version 5.2.1 is now available from http://racket-lang.org/ Release Highlights::

  • Performance improvements include the use of epoll()/kqueue() instead of select() for the Racket thread scheduler, cross-module inlining of small functions, and the use of SSE instead of x87 for JIT-compiled floating-point operations on platforms where SSE is always available (including x86_64 platforms). A related change is the interning of literal numbers, strings, byte strings, characters, and regexps that appear in code and syntax objects.

  • DrRacket uses a set of composable ray-traced icons available from the new images library collection.

  • Typed Racket’s typecheck-fail form allows macro creators to customize the error messages that Typed Racket produces. This is especially useful when creating pattern matching macros.

  • The performance of Redex’s matcher has been substantially improved; depending on the model you should see improvements between 2x and 50x in the time it takes to reduce terms.

  • Plots look nicer and are more correct at very small and very large scales. New features include customizable dual axis ticks and transforms (e.g., log axes, date and currency ticks, axis interval collapse and stretch), stacked histograms, and 3D vector fields. The legacy fit function and libfit have been removed.

  • The 2htdp/universe library’s big-bang form supports an experimental game pad key handler.

  • The db library now supports nested transactions and PostgreSQL arrays. Bugs involving MySQL authentication and memory corruption in the SQLite bindings have been fixed.

  • The Macro Stepper tool in DrRacket no longer executes a program after expanding it.

  • In the DMdA teaching languages, infinite recursive signatures (“streams”, for example) with no intervening mixed are now supported, and the signatures of record definitions without fields now have generators for use with property.

  • MysterX’s ActiveX support is deprecated and will be removed in the next release. MysterX’s core COM functionality will become deprecated in the next release, but COM functionality will be supported for the foreseeable future as a compatibility layer over a forthcoming ffi/com library.

more →

01 Feb 2012

Zack Galler’s Experience with Stateful vs Stateless Web Apps

posted by Jay McCarthy

Communication using HTTP between client and server is a simple problem of halted computation. A client computes a request, transmits and halts, waiting for a server response. On receipt, the server computes a response, transmits and halts, waiting for the next client request. This much is well known. Racket’s magnificent stateful Web server does three things on the server side:

  • it reifies a Racket continuation, capturing where the server computation has halted.

  • it externalizes the continuation, creating a URL-representation that uniquely maps to the Racket continuation

  • it disseminates the externalized continuation to interested clients, typically via HTTP response, but alternately via SMTP or any other protocol. Then, it waits. Later, when presented with an externalized continuation, a quick inverse mapping occurs, the underlying Racket continuation is invoked, and the server processes the new client request. Rinse and repeat. The problem with this approach is twofold

  • the reified Racket continuations live in server memory. And there’s no safe way to garbage collect, as the continuations could be invoked at any time. There are strategies to reclaim memory, but some load level will noticeably decrease the performance of your application. And its not possible to figure out what that load level is prior to finishing your application. This is a problem.

  • Again, the reified Racket continuations live in server memory and cannot be moved. So there’s no way to scale an application to more than one server. It’s a necessarily one machine system. This makes problem #1 worse. Racket’s yet more magnificent stateless Web server does exactly the same three things:

  • to reify, it rewrites the entire call stack into a format known as A-Normal Form (ANF).

  • to externalize, the ANF’d stack is encoded for transmission over HTTP.

  • and then it’s sent over to the client (dissemination). Later, when presented with encoded stack, the stateless server performs an inverse transform to reconstruct the call stack, at which point the server keeps going. So we’ve lost the invocation step and substituted a reconstruction. But in exchange, we’ve eliminated continuations from server memory, and solved both enumerated problems above. Neat trick.

I provide a few lessons learned for the archives for the next person to attempt porting #lang racket to #lang web-server code. First, the predicate serializable? from racket/serialize is invaluable. The #lang web-server code will not transform if there are non-serializable constructs in the dynamic extent of the invocation of send/suspend, such as a local binding or argument. Second, invocations of native continuations reified with call/cc frequently throw errors related to continuation prompts, such as “attempt to cross a continuation barrier” or “no corresponding prompt tag in continuation”. In all cases, I was able to remedy the situation by enclosing the invocation in call-with-continuation-prompt. This may be an error in the system, but it is unclear at this time. Third, the transformation does not allow parameters or dynamic-wind, because the internal data-structures representing them are not serializable, but continuation-marks can be used to reimplement the piece of the functionality you need.

Finally, thank you to the Racket team. I think the stateless Web language is important technology and must have required an enormous amount of work to implement. Anecdotally, application speed seems at or better than the stateful code. To learn more about the stateless Web application infrastructure, consult the manual or post to the mailing list. (This post was written by Zack Galler with minor edits before posting by Jay McCarthy.)

more →

Made with Frog, a static-blog generator written in Racket.
Source code for this blog.