23 Feb 2008

Dirty Looking Hygiene

posted by Eli Barzilay

With the recent release of Arc, there has been some discussion over hygienic macros. Yes, hygienic macros are usually very convenient, but they can become messy in some ‘corner’ cases. People who learn about macros in Scheme usually start with syntax-rules, and being the limited tool that it is, they often get the impression that for advanced uses (like a macro that captures an identifier) you need to use syntax-case which is this “really obscure thing”.

For example, say that we want to implement an if form that is similar to Arc’s if. This is pretty easy using syntax-rules:

(define-syntax if*
    (syntax-rules ()
      [(if*) (void)]
      [(if* X) X]
      [(if* C X more ...) (if C X (if* more ...))]))

But more important than being easy to write: it is also easy to read. In fact, the nice thing about syntax-rules is that you write more or less the specification of your transformation. Compare this to the specification of Arc’s if, which appears in a comment before the definition of ac-if in "ac.scm":

  ; (if) -> nil
  ; (if x) -> x
  ; (if t a ...) -> a
  ; (if nil a b) -> b
  ; (if nil a b c) -> (if b c)

(Except that the comment mixes up the syntactic specification and the semantic evaluation.)

As a side note, now that we have this definition, it is easy to construct a new language that is just like MzScheme, except for its if that behaves like the above:

(module arc-like mzscheme
    (define-syntax if* ..._the above definition_...)
    (provide (all-from-except mzscheme if)
             (rename if* if)))

You can now write code that uses "arc-like.scm" as its language, using the new if. There is no problem in accommodating two languages with two different if’s: the new form is compiled to the old one, and there is no confusion in which version you use in any module.

Back to the macro issue: as I said above, you run into problems if you want to capture names, right? For example, if you want to implement Arc’s aif. The usual syntax-case solution is to construct an identifier that has the lexical context of the input syntax. It’s easy to abstract over all this — I posted a message on the Arc forum showing how to define a defmac macro that has the simplicity of syntax-rules with the added convenience of specifying keywords and captured names. This works for some cases, but there are still some subtle corner cases.

But there’s a better solution in PLT Scheme, one that follows Paul Graham’s intuition when he says: “captured symbols are kind of freaky.” The basic idea is a change of perspective: instead of (unhygienically) binding individual occurrences of it whenever aif is used, you define it once as a thing in its own right — a special context-dependent piece of syntax. Outside of an aif form, it has no meaning: we simply make it throw a syntax error. Uses of aif provide a meaning for it by locally changing its meaning (its expansion) to something useful: the binding that holds the result of evaluating the condition expression. (“Locally” means within a piece of syntax, so the new meaning is valid in a lexical-scope.)

In PLT Scheme, the “special context-dependent piece of syntax” are syntax parameters, and you change them locally with syntax-parameterize.

To continue the above example, here’s how we make our if* anaphoric:

require the (lib "stxparam.ss" "mzlib") library,
define it as a syntax using define-syntax-parameter, and have it raise an error by default,
bind a temporary variable to the result of evaluating the condition,
wrap the positive branch with syntax-parameterize, using make-rename-transformer, which is a convenient way to make a macro that behaves like the new variable.

The implementation looks like this:

  (require (lib "stxparam.ss" "mzlib"))
  (define-syntax-parameter it
    (lambda (stx)
      (raise-syntax-error #f "can only be used inside `if'" stx)))
  (define-syntax if*
    (syntax-rules ()
      [(if*) (void)]
      [(if* X) X]
      [(if* C X more ...)
       (let ([b C])
         (if b
           (syntax-parameterize ([it (make-rename-transformer #'b)]) X)
           (if* more ...)))]))

The resulting macro does not break hygiene. For example, (let ([it 3]) (if #t it)) evaluates to 3, because it shadows the global it that if changes. This is a change from a real unhygienic macro — but that’s the whole point: we (the macro author) do not interfere with scopes in the user code.

Note that (if 1 (if 2 it)) still evaluates to 2, because the outer if does not really bind it — it is not captured, just changed locally — so the inner if changes it again. Also, (if #f it it) raises the usual context error, since our macro changes it only in the positive branch.

14 Jan 2008

A Privacy Flaw, Thwarted

posted by Shriram Krishnamurthi

My student Brendan Hickey recently identified the following security hole.

A university (let’s call it Orange University) wants to let its graduating students vote on their graduation speaker. They used to do it by paper; catching up with the times, they now do it on the Web.

They used to have a box into which you could type the name of your nominee. But that is surely problematic: people misspell names, you have to argue about how to count ambiguous votes, someone will vote for their pet bonobo, etc. Better (perhaps) to give them the names of all the students and let them choose. [Alert: if Orange U adopts a simple naming scheme for email addresses, a student can immediately screen-scrape a pretty plum list to sell a spammer. Brendan and I noticed this in a femtosecond; I don’t know why this didn’t occur to the university.]

Anyway, now you have a Web page where people are going to choose, and the software that processes the responses must distinguish between these choices. You have to associate a key with each student. You’ve already got a key for each candidate: their student ID number. So you use that as your key. Now anyone viewing the page source can immediately see which student ID number goes with which student name. So much for confidentiality.

Whoops.

I’d like to point out that Pete Hopkins’s send/suspend/dispatch, and the improved version of it by Jay McCarthy, identify and solve just this code structuring problem in a way that the privacy leak can never occur. For the most up-to-date presentation of it, read section 3.2 and section 4 of our paper.

Maybe Orange U should be using Scheme.

29 Dec 2007

The Design of Extended Exercises

posted by Shriram Krishnamurthi

One of the highlights of the TeachScheme! method is to create Extended Exercises. Several of these pepper How to Design Programs, and even more have been created since to deal with a variety of interesting problem scenarios (e.g., illustrating graphics via t-shirt design, explaining networking by having machines play roles in a theatrical play, demonstrating communication with foreign sites by processing data from a microfinance institution, etc). Through an Extended Exercise a student learns about how computer science connects to domains, develops practice building programs incrementally, learns to build earlier assignments that later assignments can depend on, and so forth.

Here is a preliminary articulation of some principles that I think govern a good Extended Exercise, with an emphasis on their “form factor”.

Pick a domain. Whether the domain looks inward (a computing activity such as networking) or outward (such as social networking) doesn’t matter. If it does look inward, try to make it more applicable through the judicious use of data (the same exercise can look very dry or very applied depending on what data you choose). For instance, our networking exercises is presented in terms of Shakespeare’s Hamlet.
If necessary, write a Teachpack. A domain almost certainly needs a Teachpack to reduce the programming burden on students. For instance, the microfinance exercise uses a Teachpack to hide the ugly details of screen-scraping (which in turn need to be constantly updated by a vigilant maintainer).
Try to provide a non-trivial dataset in the Teachpack. Good data can make an assignment more enjoyable—e.g., our networking example provides an excerpt from Act 2, Scene 2 of Hamlet (“What a piece of work is man!”)—and in cases where the exercise depends on connecting to an external site (e.g., the microfinance example), the data may be essential.
Structure the assignment to have five to eight questions: not much fewer (too few steps) nor much more (too much to grasp).
Try to decompose the problem using good principles of stepwise refinement, using your own wisdom in these matters. By showing students several such examples, we hope for them to build up an intuition for the process. Your decomposition may not be strictly linear; that’s okay. But it should be progressive.
Perhaps the most important point: At every step, try to have a full, working application. That means the Teachpack may need to export several interfaces, each one taking more parameters and accepting more functionality than the previous one. Otherwise the student needs to have all the parts working before they can understand whether even one works in context, leading to a frustrating learning experience and encouraging wanton hacking as they try (and invariably fail) to quickly get to a working system.
Design interfaces carefully to make judicious use of first-class functions. It is inevitable that students will need to provide functions (not just flat data) to what your Teachpack exports. Show them the invocations of your Teachpack in terms of named functions (that they define).
Try to provide a few extra-credit routes for ambitious students. Options include letting students peel back even more of the Teachpack, or adding interesting features.

29 Dec 2007

PLT Scheme v372

posted by Eli Barzilay

PLT Scheme version 372 is now available from http://download.plt-scheme.org/

This is mostly a bug-fix release. Changes:

DrScheme now supports name completion via Ctl-/ (Windows and X) or Cmd-/ (Mac OS X). Completion is sensitive to the current language in DrScheme, but it is not sensitive to lexical bindings.
DrScheme’s stepper now supports the “check-expect”, “check-within”, and “check-error” forms of the testing.ss teachpack.
A number of bug fixes and small improvements for ProfessorJ. The grammar for the current release slightly differs from the one in HtDC.Feedback Welcome.

19 Dec 2007

Your security hole is my fun hack, or: computing factorial in DrScheme with a click-powered loop.

posted by Robby Findler

One of the many changes in v4.0 is to close a security hole in DrScheme. Specifically, DrScheme v371 lets the program in the definitions window get a hold of the editor containing said program and manipulate it programmatically. There are lots of bad things one might do with this fact, like circumventing DrScheme’s protections and cause it to crash, or even spontaneously exit.

But, we can do something even more fun. Put the following program into a DrScheme window (in v371) and set the language to the mzscheme/textual language. Change “input” to whatever number you wish to compute the factorial of and then hit the Run button until your program transforms itself into the final result.

(define input 10)
(require (lib "mred.ss" "mred") (lib "class.ss"))
(let* ([ed (let-syntax ([m (λ (stx) (with-syntax ([x (syntax-source stx)]) #'x))])
             (m))]
       [mth (regexp-match 
             #rx"^; ([0-9]+) ([0-9]+)" 
             (send ed get-text 0 
                   (send ed paragraph-end-position 0)))]
       [lckd (send ed is-locked?)])
  (send ed begin-edit-sequence)
  (send ed lock #f)
  (if mth
      (let ([n (string->number (list-ref mth 1))]
            [acc (string->number (list-ref mth 2))])
        (send ed delete 0 (send ed paragraph-end-position 0))
        (if (= n 1)
            (begin (send ed delete 0 (send ed paragraph-end-position 0))
                   (send ed insert (format "~a\n#|" acc) 0)
                   (send ed insert "\n|#" (send ed last-position)))
            (begin (send ed delete 0 (send ed paragraph-end-position 0))
                   (send ed insert (format "; ~a ~a" (- n 1) (* n acc)) 0 0))))
      (send ed insert (format "; ~a 1\n" input) 0))
  (send ed lock lckd)
  (send ed end-edit-sequence))

12 Nov 2007

Getting rid of `set-car!` and `set-cdr!`

posted by Matthew Flatt

Functional is Beautiful

Scheme is a “mostly functional” language. Although Schemers don’t hesitate to use set! when mutation solves a problem best, Scheme programmers prefer to think functionally. Purely functional programs are easier to test, they make better and more reliable APIs, and our environments, compilers, and run-time systems take advantage of functional style.

A Schemer’s functional bias is especially strong when writing programs that process and produce lists. The map function, which does both, is a thing of beauty:

  (define (map f l)
   (cond
     [(null? l) '()]
     [else (cons (f (car l)) (map f (cdr l)))]))

The map function is most beautiful when the given f is functional. If f has side-effects, the the above implementation over-specifies map, which is traditionally allowed to process the list in any order that it wants (though PLT Scheme guarantees left-to-right order, as above). Arguably, when some other Schemer provides a non-functional f, then it’s their problem; they have to deal with the consequences (which may well be minor compared to some benefits of using mutation).

The map function might also receive a non-list, but the map implementor can guard against such misuse of map by wrapping it with a check,

  (define (checked-map f l)
    (if (list? l)
        (map f l)
        (error 'map "not a list")))

and then exporting checked-map instead of the raw map. This kind of checking gives nicer error messages, and it helps hide implementation details of map. We could further also imagine that the raw map is compiled without run-time checks on car and cdr.

The Problem with Mutable Pairs

What if someone calls checked-map like this?:

  (define l (list 1 2 3 4 5))
  (checked-map (lambda (x)
                 (set-cdr! (cddr l) 5))
               l)

The f provided to map in this case is not purely functional. Moreover, it uses mutation in a particularly unfortunate way: the list? test in checked-map succeeds, because the argument is initially a list, and the mutation is ultimately discovered by a call to cdr — but only if checks haven’t been disabled.

If you’re a Schemer, then unless you’ve seen this before, or unless you thought a bit about the title of this section, then you probably didn’t think of the above test case for map. A Schemer’s view of lists is so deeply functional that it’s hard to make this particular leap.

Furthermore, this example is not contrived. If you have either Chez Scheme version 6.1 or a pre–200 MzScheme sitting around, calling map as above leads to a seg fault or an invalid memory access:

  Chez Scheme Version 6.1
  Copyright (c) 1998 Cadence Research Systems

  > (define l (list 1 2 3 4 5))
  > (map (lambda (x) (set-cdr! (cddr l) 5)) l)

  Error: invalid memory reference.
  Some debugging context may have been lost.

The map example illustrates how mutable pairs can break a Schemer’s natural and ingrained model of programming. Of course, if optimizing and providing friendly error messages for map were the only issues with mutable pairs, then it wouldn’t matter; Scheme implementors are smart enough to (eventually) get this right. Unfortunately, the underlying problem is more pervasive.

In the API for a typical Scheme library, lists can be used for many kinds of input and output. Flags for options might be provided in a list. A function might provide information about the current configuration (e.g., the current items in a GUI list box) in a list. Procedures or methods that deal gracefully with list mutation are few and far between. In most cases, the result of unexpected mutation is merely a bad error message; sometimes, however, unexpected mutation of a list can break the library’s internal invariants. In the worst case, the library whose internal invariants are broken plays some role in a system’s overall security.

Mutable lists also interfere with the language’s extensibility. The PLT Scheme contract system, for example, offers a way to wrap an exported function with a contract that constrains its input and outputs, which are optionally (in principle) enforced by run-time checks. Higher-order contracts, such as “a list of functions that consume and produce numbers”, require wrappers on sub-pieces, and these wrappers can be installed only by copying the enclosing list. Copying a mutable list changes the semantics of a program, however, whereas contracts are supposed to enforce invariants without otherwise changing the program. Copying an immutable list creates no such problem.

Finally, mutable lists make the language’s specification messy. The R6RS editors spent considerable energy trying to pin down the exception-raising guarantees of map; the possibility of mutable pairs made it difficult to provide much of a guarantee. The standard says that implementations should check that the lists provided tomap are the same length, but it’s not worth much to require that check, since an argument’s length as a list can change via mutation to the list’s pairs.

Switching to Immutable Pairs

The designers of PLT Scheme long ago recognized the problems of mutable pairs, and we introduced functions like cons-immutable andlist-immutable to support programming with immutable lists. These additions solved some problems — but only in the cases where we were careful to use immutable lists. The R6RS editors also recognized the problems of mutable pairs, so that set-car! and set-cdr! were banished to their own library — but programmers are still free to use that library.

While these are worthwhile steps for many reasons, they do not solve the underlying problem. Library implementors who deal in lists must still either set up elaborate guards against mutation, pretend that the problem doesn’t matter, or require the use of a special immutable-list datatype that is incompatible with libraries whose authors set up elaborate guards or ignore the problem.

Why all this hassle? If most Scheme code really does use and expect pairs in a functional way, can’t we just switch to immutable pair? Most Scheme code will still work, untold security holes will have been closed, specifications will become instantly tighter, and language extensions like contracts will work better.

Schemers have been reluctant to make this leap, because it has never been clear just how much code relies on mutable pairs. We don’t know how much the switch will cost in porting time and long-term incompatibility, and we don’t really know how much we will gain. We won’t know until we try it.

For PLT Scheme v4.0, we’re going to try it. In our main dialects of Scheme (such as the mzscheme language), cons will create immutable pairs, and pair? and list? will recognize only immutable pairs and lists. The set-car! and set-cdr procedures will not exist. A new set of procedure mcons, mcar, mcdr, set-mcar!, and set-mcdr! will support mutable pairs. (A related v4.0 change is that define-struct by default creates immutable structure types.)

Of course, PLT Scheme v4.0 will support an R5RS language where cons is mcons, and so on, so many old programs can still run easily in the new version. The difference is that interoperability between R5RS libraries and PLT Scheme libraries will be less direct than before.

Experience So Far

PLT Scheme v3.99.0.2 exists already in a branch of our SVN repository, and it will soon move to the SVN trunk. That is, we have already ported at least a half million lines of Scheme code to a dialect without set-car! and set-cdr!.

The conversion took about eight hours. Obviously, relatively little code had to change. The following are the typical porting scenarios:

The reverse! and append! functions were frequently used for “linear updates” by performance-conscious implementors. As our underlying Scheme implementation has improved, however, the performance benefits of these functions has become less. All uses could be replaced with reverse and append.
The set-cdr! operation was often used to implement an internal queue. Such internal queues were easily changed to use mcons,mcar, mcdr, and set-mcdr!.
An association-list mapping was sometimes updated with set-cdr! when a mapping was present, otherwise the list was extended. Since the extension case was supported, it was easy to just update the list functionally. (The relevant lists were short; if the lists were long, the right change would be to use a hash table instead of a list.)
A pair was sometime used for an updatable mapping where a distinct structure type is better. The quick solution was to throw in a mutable box in place of the value.

The PLT Scheme code might be better positioned for the switch than arbitrary Scheme code. Most of it was written by a handful of people who understood the problems of mutable pairs, and who might therefore shy away from them. However, the PLT Scheme code base includes a lot of code that was not written specifically for PLT Scheme, including Slatex, Tex2page, and many SRFI reference implementations. With the exception of SRFI–9, which generalizes set! to work with pairs, the SRFI implementations were remarkably trouble free. (Thanks to Olin Shivers for making mutation optional in the “linear update” functions like reverse! from SRFIs 1 and 32.)

In addition, we looked at a number of standard Scheme benchmarks, which can be found here:

http://svn.plt-scheme.org/plt/trunk/collects/tests/mzscheme/benchmarks/common/

Of the 28 benchmarks, eight of them mutate pairs. Four of those are trivially converted to functional programs, along the lines of the scenarios above. One, destruct, is designed specifically to test mutation performance, so it makes no sense to port. Another, sort1, is a sorting algorithm that inherently relies on mutation; a functional sort is obviously possible, but that would be a different benchmark. The conform benchmark uses mutable pairs for tables in a relatively non-local way; as a modern Scheme program, it would probably be written with structures, but it’s not trivial to port. The peval benchmark uses pairs to represent Scheme programs, and it partially evaluates the program by mutating it, so it is not trivial to port. To summarize, out of 28 old, traditional benchmark programs, only two represent interesting programs that are not easily adapted to immutable pairs. (They run in PLT Scheme’s R5RS language, of course.)

Finally, we selected a useful third-party library that is not included with PLT Scheme. We checked the generic SSAX implementation (not the PLT Scheme version), and we found a couple of uses of set-car! and set-cdr!. Again, they fall into the above queue and association-list categories that are easily and locally converted.

Meanwhile, as we start to use v3.99 to run scripts in our day-to-day work, immutable pairs have so far created no difficulty at all. So far, then, our optimism in trying immutable pairs seems to be justified; it just might work.

But It’s Lisp Tradition!

A typical response to news of the demise of mutable pairs is that it will create lot of trouble, because mutable pairs are Scheme tradition, and surely lots of useful old code exploits them in lots of places.

We’re eager to hear whether anyone has such code. Our initial hypothesis is that practically all old code falls into one of two categories:

The code is easily ported to immutable pairs, along the same lines as above (i.e., local queues and small association lists).
The code so old and generic that it can be run as an R5RS program. It won’t call into the large PLT Scheme set of libraries that will expect immutable pairs, and it can easily be used as a library with wrappers that convert mutable pairs back and forth with immutable pairs.

Frankly, we’re not so eager to hear opinions based on guesswork about existing code and how it might get used. Download v3.99 from SVN or as a nightly build when it becomes available; let us know your guesses about how running your old code would go, but then let us know what actually happens.

The immutable-pairs plan for v4.0 is not set in stone, but we won’t make the decision based on guesswork. More libraries (other than R5RS) to aid compatibility may be useful, but so far we don’t have a tangible need for them. In any case, we’ll revert to mutable pairs only if significant experience with the pre-release version demonstrates that it really won’t work.

14 Sep 2007

Don’t say “abstract” (instead say “general”)

posted by John Clements

The word “abstract” is common in computer science. An abstract thing is one where some part of the whole is unspecified. For instance, the expression “3*x + 3” is an abstraction of the expression “3*4+3”, because the “x” is unspecified. Likewise, a function is an abstraction over some set of values, supplied when the function is called.

The word “general” is not at all common in computer science. In non-computer-science use, the word “general” is used to describe things that may be applied to more than one thing or situation. For instance, a “more general solution” is one that applies not just to the problem at hand, but instead to a larger set of problems.

From a computer science perspective, things that are abstract are also general. Things that are general are also abstract. Substituting the word “general” for the word “abstract” would not be a terrible hurdle.

From a non-computer-science perspective, however, “general” and “abstract” have very different implications. Something that is general is better: it is more useful, it applies more frequently. Something that is abstract, though, is worse: it is lacking detail, it is non-concrete.

This is one difference—the major difference?—between computer science (and of course mathematics) and the real world: the abstract is no less concrete. We can abstract over expressions using functions, and we can even abstract over syntactic things, using hygienic macros. The result of such abstraction is a perfectly well-defined element in our universe of expressions.

In computer science, then, the pejorative sense of the word “abstract” is misleading, and the use of the terms “abstract” and “abstraction” merely provides ammunition for those who wish that we could all still be writing assembly language.

I suggest instead the use of the word “general.”

John “purveyor of barbarous neologisms” Clements

09 Sep 2007

Completions in DrScheme (finally)

posted by Robby Findler

DrScheme now supports a language- sensitive (but not lexical- scope sensitive) completion feature. Type -/ and see what names are available to finish off the word you’re typing.

Thanks to Jacob (and do follow that link; we all need a little more love in our lives) and Mike for taking the initiative to actually implement what is probably the most requested feature in DrScheme at the moment.

06 Sep 2007

How many occurrences of car in the PLT source code?

posted by Robby Findler

Lets play a guessing game. See who can guess:

How many occurrences of the identifier car there are in the PLT tree (when using read and just counting the symbols that come out)?
Where does car rank on the list of the most commonly used identifiers?
What is the most common identifier, and how many occurrences of it are there?

UPDATE: The two files raw-hattori and raw-kajitani.ss are generated files containing solutions to Paint by Numbers problems and about 30,000 occurrences of x and o. Discounting them, this is the list of the top ten identifiers and the number of occurrences:

((define 25294)
 (quote 24101)
 (lambda 18883)
 (let 14796)
 (send 14349)
 (x 11877)
 (if 11118)
 (... 8474)
 (car 7610)
 (syntax 6537))

The identifier cdr ranks 21st with 5,259 occurrences, let* has 3,066 which, when combined with let comes out at 17,862, still not enough to pass lambda. Speaking of combining, λ has 2,271 occurrences, which is also not enough to move lambda. Finally map comes in 32nd with 3,853 occurrences and foldl beats out foldr (1168th place with 75 occurrences vs 1451st place with 58 occurrences).

03 Sep 2007

Birthday Easter Eggs in DrScheme

posted by Robby Findler

DrScheme has five birthday easter eggs in it, one for each of the main contributers to the PLT Scheme infrastructure (Matthias, Matthew, Eli, Shriram, and me). I put four of them in there, and mostly concentrated on making them fun. Matthew added mine and the best part of that one is figuring out on earth it shows up (it is quite tricky to find the code that actually makes that one appear).

I don’t want to ruin the fun of searching for the Easter Eggs yourself, but just to get you started, do have a look at plt/collects/framework/private/bday.ss for Matthias, Matthew, Shriram, and Eli’s birthdays. Mine is July 2nd.

Happy Hunting!