29 Dec 2007

PLT Scheme v372

posted by Eli Barzilay

PLT Scheme version 372 is now available from http://download.plt-scheme.org/

This is mostly a bug-fix release. Changes:

  • DrScheme now supports name completion via Ctl-/ (Windows and X) or Cmd-/ (Mac OS X). Completion is sensitive to the current language in DrScheme, but it is not sensitive to lexical bindings.

  • DrScheme’s stepper now supports the “check-expect”, “check-within”, and “check-error” forms of the testing.ss teachpack.

  • A number of bug fixes and small improvements for ProfessorJ. The grammar for the current release slightly differs from the one in HtDC.Feedback Welcome.

more →

19 Dec 2007

Your security hole is my fun hack, or: computing factorial in DrScheme with a click-powered loop.

posted by Robby Findler

One of the many changes in v4.0 is to close a security hole in DrScheme. Specifically, DrScheme v371 lets the program in the definitions window get a hold of the editor containing said program and manipulate it programmatically. There are lots of bad things one might do with this fact, like circumventing DrScheme’s protections and cause it to crash, or even spontaneously exit.

But, we can do something even more fun. Put the following program into a DrScheme window (in v371) and set the language to the mzscheme/textual language. Change “input” to whatever number you wish to compute the factorial of and then hit the Run button until your program transforms itself into the final result.

(define input 10)
(require (lib "mred.ss" "mred") (lib "class.ss"))
(let* ([ed (let-syntax ([m (λ (stx) (with-syntax ([x (syntax-source stx)]) #'x))])
             (m))]
       [mth (regexp-match 
             #rx"^; ([0-9]+) ([0-9]+)" 
             (send ed get-text 0 
                   (send ed paragraph-end-position 0)))]
       [lckd (send ed is-locked?)])
  (send ed begin-edit-sequence)
  (send ed lock #f)
  (if mth
      (let ([n (string->number (list-ref mth 1))]
            [acc (string->number (list-ref mth 2))])
        (send ed delete 0 (send ed paragraph-end-position 0))
        (if (= n 1)
            (begin (send ed delete 0 (send ed paragraph-end-position 0))
                   (send ed insert (format "~a\n#|" acc) 0)
                   (send ed insert "\n|#" (send ed last-position)))
            (begin (send ed delete 0 (send ed paragraph-end-position 0))
                   (send ed insert (format "; ~a ~a" (- n 1) (* n acc)) 0 0))))
      (send ed insert (format "; ~a 1\n" input) 0))
  (send ed lock lckd)
  (send ed end-edit-sequence))
more →

12 Nov 2007

Getting rid of set-car! and set-cdr!

posted by Matthew Flatt

Functional is Beautiful

Scheme is a “mostly functional” language. Although Schemers don’t hesitate to use set! when mutation solves a problem best, Scheme programmers prefer to think functionally. Purely functional programs are easier to test, they make better and more reliable APIs, and our environments, compilers, and run-time systems take advantage of functional style.

A Schemer’s functional bias is especially strong when writing programs that process and produce lists. The map function, which does both, is a thing of beauty:

  (define (map f l)
   (cond
     [(null? l) '()]
     [else (cons (f (car l)) (map f (cdr l)))]))

The map function is most beautiful when the given f is functional. If f has side-effects, the the above implementation over-specifies map, which is traditionally allowed to process the list in any order that it wants (though PLT Scheme guarantees left-to-right order, as above). Arguably, when some other Schemer provides a non-functional f, then it’s their problem; they have to deal with the consequences (which may well be minor compared to some benefits of using mutation).

The map function might also receive a non-list, but the map implementor can guard against such misuse of map by wrapping it with a check,

  (define (checked-map f l)
    (if (list? l)
        (map f l)
        (error 'map "not a list")))

and then exporting checked-map instead of the raw map. This kind of checking gives nicer error messages, and it helps hide implementation details of map. We could further also imagine that the raw map is compiled without run-time checks on car and cdr.

The Problem with Mutable Pairs

What if someone calls checked-map like this?:

  (define l (list 1 2 3 4 5))
  (checked-map (lambda (x)
                 (set-cdr! (cddr l) 5))
               l)

The f provided to map in this case is not purely functional. Moreover, it uses mutation in a particularly unfortunate way: the list? test in checked-map succeeds, because the argument is initially a list, and the mutation is ultimately discovered by a call to cdr — but only if checks haven’t been disabled.

If you’re a Schemer, then unless you’ve seen this before, or unless you thought a bit about the title of this section, then you probably didn’t think of the above test case for map. A Schemer’s view of lists is so deeply functional that it’s hard to make this particular leap.

Furthermore, this example is not contrived. If you have either Chez Scheme version 6.1 or a pre–200 MzScheme sitting around, calling map as above leads to a seg fault or an invalid memory access:

  Chez Scheme Version 6.1
  Copyright (c) 1998 Cadence Research Systems

  > (define l (list 1 2 3 4 5))
  > (map (lambda (x) (set-cdr! (cddr l) 5)) l)

  Error: invalid memory reference.
  Some debugging context may have been lost.

The map example illustrates how mutable pairs can break a Schemer’s natural and ingrained model of programming. Of course, if optimizing and providing friendly error messages for map were the only issues with mutable pairs, then it wouldn’t matter; Scheme implementors are smart enough to (eventually) get this right. Unfortunately, the underlying problem is more pervasive.

In the API for a typical Scheme library, lists can be used for many kinds of input and output. Flags for options might be provided in a list. A function might provide information about the current configuration (e.g., the current items in a GUI list box) in a list. Procedures or methods that deal gracefully with list mutation are few and far between. In most cases, the result of unexpected mutation is merely a bad error message; sometimes, however, unexpected mutation of a list can break the library’s internal invariants. In the worst case, the library whose internal invariants are broken plays some role in a system’s overall security.

Mutable lists also interfere with the language’s extensibility. The PLT Scheme contract system, for example, offers a way to wrap an exported function with a contract that constrains its input and outputs, which are optionally (in principle) enforced by run-time checks. Higher-order contracts, such as “a list of functions that consume and produce numbers”, require wrappers on sub-pieces, and these wrappers can be installed only by copying the enclosing list. Copying a mutable list changes the semantics of a program, however, whereas contracts are supposed to enforce invariants without otherwise changing the program. Copying an immutable list creates no such problem.

Finally, mutable lists make the language’s specification messy. The R6RS editors spent considerable energy trying to pin down the exception-raising guarantees of map; the possibility of mutable pairs made it difficult to provide much of a guarantee. The standard says that implementations should check that the lists provided tomap are the same length, but it’s not worth much to require that check, since an argument’s length as a list can change via mutation to the list’s pairs.

Switching to Immutable Pairs

The designers of PLT Scheme long ago recognized the problems of mutable pairs, and we introduced functions like cons-immutable andlist-immutable to support programming with immutable lists. These additions solved some problems — but only in the cases where we were careful to use immutable lists. The R6RS editors also recognized the problems of mutable pairs, so that set-car! and set-cdr! were banished to their own library — but programmers are still free to use that library.

While these are worthwhile steps for many reasons, they do not solve the underlying problem. Library implementors who deal in lists must still either set up elaborate guards against mutation, pretend that the problem doesn’t matter, or require the use of a special immutable-list datatype that is incompatible with libraries whose authors set up elaborate guards or ignore the problem.

Why all this hassle? If most Scheme code really does use and expect pairs in a functional way, can’t we just switch to immutable pair? Most Scheme code will still work, untold security holes will have been closed, specifications will become instantly tighter, and language extensions like contracts will work better.

Schemers have been reluctant to make this leap, because it has never been clear just how much code relies on mutable pairs. We don’t know how much the switch will cost in porting time and long-term incompatibility, and we don’t really know how much we will gain. We won’t know until we try it.

For PLT Scheme v4.0, we’re going to try it. In our main dialects of Scheme (such as the mzscheme language), cons will create immutable pairs, and pair? and list? will recognize only immutable pairs and lists. The set-car! and set-cdr procedures will not exist. A new set of procedure mcons, mcar, mcdr, set-mcar!, and set-mcdr! will support mutable pairs. (A related v4.0 change is that define-struct by default creates immutable structure types.)

Of course, PLT Scheme v4.0 will support an R5RS language where cons is mcons, and so on, so many old programs can still run easily in the new version. The difference is that interoperability between R5RS libraries and PLT Scheme libraries will be less direct than before.

Experience So Far

PLT Scheme v3.99.0.2 exists already in a branch of our SVN repository, and it will soon move to the SVN trunk. That is, we have already ported at least a half million lines of Scheme code to a dialect without set-car! and set-cdr!.

The conversion took about eight hours. Obviously, relatively little code had to change. The following are the typical porting scenarios:

  • The reverse! and append! functions were frequently used for “linear updates” by performance-conscious implementors. As our underlying Scheme implementation has improved, however, the performance benefits of these functions has become less. All uses could be replaced with reverse and append.

  • The set-cdr! operation was often used to implement an internal queue. Such internal queues were easily changed to use mcons,mcar, mcdr, and set-mcdr!.

  • An association-list mapping was sometimes updated with set-cdr! when a mapping was present, otherwise the list was extended. Since the extension case was supported, it was easy to just update the list functionally. (The relevant lists were short; if the lists were long, the right change would be to use a hash table instead of a list.)

  • A pair was sometime used for an updatable mapping where a distinct structure type is better. The quick solution was to throw in a mutable box in place of the value.

The PLT Scheme code might be better positioned for the switch than arbitrary Scheme code. Most of it was written by a handful of people who understood the problems of mutable pairs, and who might therefore shy away from them. However, the PLT Scheme code base includes a lot of code that was not written specifically for PLT Scheme, including Slatex, Tex2page, and many SRFI reference implementations. With the exception of SRFI–9, which generalizes set! to work with pairs, the SRFI implementations were remarkably trouble free. (Thanks to Olin Shivers for making mutation optional in the “linear update” functions like reverse! from SRFIs 1 and 32.)

In addition, we looked at a number of standard Scheme benchmarks, which can be found here:

http://svn.plt-scheme.org/plt/trunk/collects/tests/mzscheme/benchmarks/common/

Of the 28 benchmarks, eight of them mutate pairs. Four of those are trivially converted to functional programs, along the lines of the scenarios above. One, destruct, is designed specifically to test mutation performance, so it makes no sense to port. Another, sort1, is a sorting algorithm that inherently relies on mutation; a functional sort is obviously possible, but that would be a different benchmark. The conform benchmark uses mutable pairs for tables in a relatively non-local way; as a modern Scheme program, it would probably be written with structures, but it’s not trivial to port. The peval benchmark uses pairs to represent Scheme programs, and it partially evaluates the program by mutating it, so it is not trivial to port. To summarize, out of 28 old, traditional benchmark programs, only two represent interesting programs that are not easily adapted to immutable pairs. (They run in PLT Scheme’s R5RS language, of course.)

Finally, we selected a useful third-party library that is not included with PLT Scheme. We checked the generic SSAX implementation (not the PLT Scheme version), and we found a couple of uses of set-car! and set-cdr!. Again, they fall into the above queue and association-list categories that are easily and locally converted.

Meanwhile, as we start to use v3.99 to run scripts in our day-to-day work, immutable pairs have so far created no difficulty at all. So far, then, our optimism in trying immutable pairs seems to be justified; it just might work.

But It’s Lisp Tradition!

A typical response to news of the demise of mutable pairs is that it will create lot of trouble, because mutable pairs are Scheme tradition, and surely lots of useful old code exploits them in lots of places.

We’re eager to hear whether anyone has such code. Our initial hypothesis is that practically all old code falls into one of two categories:

  • The code is easily ported to immutable pairs, along the same lines as above (i.e., local queues and small association lists).

  • The code so old and generic that it can be run as an R5RS program. It won’t call into the large PLT Scheme set of libraries that will expect immutable pairs, and it can easily be used as a library with wrappers that convert mutable pairs back and forth with immutable pairs.

Frankly, we’re not so eager to hear opinions based on guesswork about existing code and how it might get used. Download v3.99 from SVN or as a nightly build when it becomes available; let us know your guesses about how running your old code would go, but then let us know what actually happens.

The immutable-pairs plan for v4.0 is not set in stone, but we won’t make the decision based on guesswork. More libraries (other than R5RS) to aid compatibility may be useful, but so far we don’t have a tangible need for them. In any case, we’ll revert to mutable pairs only if significant experience with the pre-release version demonstrates that it really won’t work.

more →

14 Sep 2007

Don’t say “abstract” (instead say “general”)

posted by John Clements

The word “abstract” is common in computer science. An abstract thing is one where some part of the whole is unspecified. For instance, the expression “3*x + 3” is an abstraction of the expression “3*4+3”, because the “x” is unspecified. Likewise, a function is an abstraction over some set of values, supplied when the function is called.

The word “general” is not at all common in computer science. In non-computer-science use, the word “general” is used to describe things that may be applied to more than one thing or situation. For instance, a “more general solution” is one that applies not just to the problem at hand, but instead to a larger set of problems.

From a computer science perspective, things that are abstract are also general. Things that are general are also abstract. Substituting the word “general” for the word “abstract” would not be a terrible hurdle.

From a non-computer-science perspective, however, “general” and “abstract” have very different implications. Something that is general is better: it is more useful, it applies more frequently. Something that is abstract, though, is worse: it is lacking detail, it is non-concrete.

This is one difference—the major difference?—between computer science (and of course mathematics) and the real world: the abstract is no less concrete. We can abstract over expressions using functions, and we can even abstract over syntactic things, using hygienic macros. The result of such abstraction is a perfectly well-defined element in our universe of expressions.

In computer science, then, the pejorative sense of the word “abstract” is misleading, and the use of the terms “abstract” and “abstraction” merely provides ammunition for those who wish that we could all still be writing assembly language.

I suggest instead the use of the word “general.”

John “purveyor of barbarous neologisms” Clements

more →

09 Sep 2007

Completions in DrScheme (finally)

posted by Robby Findler

DrScheme now supports a language- sensitive (but not lexical- scope sensitive) completion feature. Type -/ and see what names are available to finish off the word you’re typing.

Thanks to Jacob (and do follow that link; we all need a little more love in our lives) and Mike for taking the initiative to actually implement what is probably the most requested feature in DrScheme at the moment.

more →

06 Sep 2007

How many occurrences of car in the PLT source code?

posted by Robby Findler

Lets play a guessing game. See who can guess:

  • How many occurrences of the identifier car there are in the PLT tree (when using read and just counting the symbols that come out)?

  • Where does car rank on the list of the most commonly used identifiers?

  • What is the most common identifier, and how many occurrences of it are there?

UPDATE: The two files raw-hattori and raw-kajitani.ss are generated files containing solutions to Paint by Numbers problems and about 30,000 occurrences of x and o. Discounting them, this is the list of the top ten identifiers and the number of occurrences:

((define 25294)
 (quote 24101)
 (lambda 18883)
 (let 14796)
 (send 14349)
 (x 11877)
 (if 11118)
 (... 8474)
 (car 7610)
 (syntax 6537))

The identifier cdr ranks 21st with 5,259 occurrences, let* has 3,066 which, when combined with let comes out at 17,862, still not enough to pass lambda. Speaking of combining, λ has 2,271 occurrences, which is also not enough to move lambda. Finally map comes in 32nd with 3,853 occurrences and foldl beats out foldr (1168th place with 75 occurrences vs 1451st place with 58 occurrences).

more →

03 Sep 2007

Birthday Easter Eggs in DrScheme

posted by Robby Findler

DrScheme has five birthday easter eggs in it, one for each of the main contributers to the PLT Scheme infrastructure (Matthias, Matthew, Eli, Shriram, and me). I put four of them in there, and mostly concentrated on making them fun. Matthew added mine and the best part of that one is figuring out on earth it shows up (it is quite tricky to find the code that actually makes that one appear).

I don’t want to ruin the fun of searching for the Easter Eggs yourself, but just to get you started, do have a look at plt/collects/framework/private/bday.ss for Matthias, Matthew, Shriram, and Eli’s birthdays. Mine is July 2nd.

Happy Hunting!

more →

22 Aug 2007

New Debugger Features

posted by Greg Cooper

As Eli mentioned, v371 introduces support for debugging several files at a time, as well as new buttons for stepping Over and Out of expressions in the debugger.

Debugging across multiple files is easy. Start by opening the “main” file that you want to debug and all of the files it requires (directly or indirectly) that you want to debug along with it. Then click Debug in the main file’s frame. For example, if I wanted to see what the FrTime dataflow engine (in frp-core.ss) does when a particular program (say demo-module.ss) runs, I would open these two files and click Debug in the frame for demo-module.ss.

As each required file loads, DrScheme offers the option of debugging it. If you choose “yes”, then the file is included in the debugging session, so you can set breakpoints and step into it. (Note that this will make the code in the file run more slowly, and single-stepping at calls to its functions will bring you into it.) A file can only participate in one debugging session at a time, so if you’re already debugging it with some other program, DrScheme will tell you so (instead of asking whether to debug it). For best results, all of the files you debug should be modules. Once a file is included in the debugging session, you can set breakpoints and step into it as if you were debugging it by itself.

As soon as you can debug programs that span several files, it’s particularly valuable to be able to do more than set breakpoints and single-step. This is the motivation for the new Over and Out buttons, which are also quite simple. If the execution marker is at the start of an expression that’s not in tail position, then you can step over the entire expression, which is equivalent to setting a one-shot breakpoint at the end of the expression and continuing. (If you’ve set breakpoints inside the expression, or inside any functions it calls, then execution may suspend before reaching the end.) Likewise, if execution is suspended and the current expression is evaluating within a debugging-enabled context, then you can step out to the innermost such context. This would be difficult to simulate by hand, since you’d need to keep track of recent callers.

At any given point, either or both of the Over and Out buttons may be disabled, but over the course of a session they can eliminate a lot of tedium.

The screenshot above shows a session debugging frp-core.ss as used by demo-module.ss. Execution is suspended on a right paren, so stepping Over is disabled, but we see the expression’s value at the upper left, we’ve moused over b to see its value at the upper right, and it’s possible to step Out.

more →

18 Aug 2007

PLT Scheme v371

posted by Eli Barzilay

PLT Scheme version 371 is now available from

http://download.plt-scheme.org/

This is mostly a bug-fix release.

Changes:

  • The debugger now works across multiple files and supports “step over” and “step out” operations.
  • HtDP teachpacks: the world.ss teachpack now exports two add-line functions: one from image.ss and one for adding lines to scenes.
  • ProfessorJ now includes a language level between Intermediate and Advanced, Intermediate + access, that includes all of Intermediate and introduces access modifiers and overloading. The language manuals contain the complete details.Feedback Welcome.
more →

07 Aug 2007

PLT Modules and Separate Compilation

posted by Richard Cobbe

For my summer job this year, I’m programming in Common Lisp; this is the first time I’ve used the language for anything more than toy examples. The experience has given me new appreciation for the PLT module system and how it enables separate compilation.

Lisp has a package system, of course, but it’s not the same thing. It’s primarily a tool to make sure that the symbols in one part of the program don’t collide with the symbols in another part (unless you ask them to). Packages aren’t about abstraction: while you can specify which symbols are exported from the package and which aren’t, that’s just a suggestion that’s not enforced by the language.

(You’ll notice, by the way, that I used the word “symbol” and not “identifier,” which is the more common term in the study of programming languages, in the previous paragraph. That’s deliberate: the Lisp package system works on symbols, not identifiers, so it also affects quoted, literal symbols. In my experience, this is sometimes helpful, sometimes a real pain, and usually completely unexpected. But that’s a topic for another post.)

Also, there’s no real relationship between Lisp packages and files. One package can be spread across multiple files, and one file can contain code in several different packages.

All this means that separate compilation in Lisp is a real problem. There is a system, ASDF, that attempts to address this need. (For more details, consult the closest thing to a homepage that I could find for ASDF.) I’m no expert on ASDF, but essentially the programmer specifies the dependencies between source files, in a set of files that exist parallel to the Lisp source. (ASDF does support grouping source files into larger chunks and specifying dependencies between those chunks, but as far as I can tell that’s largely a convenience thing.)

The key thing for separate compilation, of course, is the dependencies. With ASDF, the programmer specifies those manually, and then ASDF basically does a topological sort such that if file a depends on file b, then ASDF ensures that a is compiled and loaded before b is compiled, and again before B is loaded. (This should start sounding a little familiar to folks who’ve worked in the area where PLT’s modules and macros intersect.)

So far, so good. Unfortunately, there are a couple of problems with this setup. First, the dependencies between files are specified outside the language. This means that, if you happen to forget one, the results are not well-defined. If ASDF happens to choose an order that’s consistent with the dependency you left out, everything will just work, and you won’t have any indication that there’s a problem. If, however, it doesn’t, then you’ll get random “undefined function” and “undefined symbol” errors—if you’re lucky (at least in SBCL, the implementation of Common Lisp that I use at my job). In PLT, by contrast, inter-module dependencies are part of the language, so the compiler will always give you an undefined-identifier error when it tries to compile a module in which you’ve forgotten a require form. Big win, in my opinion (although we could argue about whether this should be an error or a warning, and whether the compiler should report lots of errors or just one before giving up completely).

Second, because ASDF lives outside the compiler, it can’t be very smart about how macros affect separate compilation. I don’t fully understand this, perhaps because the folks who’ve been mentoring me at my job haven’t thought it worth the time to explain it to me fully. But it appears that, if you change a macro that’s used in other files, or change a function that’s called by a macro at expansion time, you have to do the effect of a make clean in a distressingly large number of cases. This is a real problem when you’ve got a large source base (~200K LOC, I think) and you’re trying to speed up builds, as we are, and it’s especially problematic if you’re trying to run unrelated parts of the build in parallel.

I’ve certainly griped about the complexity of the interaction between PLT’s modules and macros in the past. But after this summer, I have to say it’s awfully nice to have a module system that Just Works for separate compilation. Nicely done, Matthew.

(I’ve pointed the folks at work at Matthew’s ICFP 02 paper, but as that technique requires a lot of support from the compiler, and we don’t have the resources to add the necessary support to SBCL ourselves, I don’t know that it’ll be more than a “wouldn’t it be nice if we could do that?”)

(Answer to rhetorical question in preceding paragraph: Yes. Yes it would.)

more →

Made with Frog, a static-blog generator written in Racket.
Source code for this blog.