Type Tailoring
posted by Ben Greenman
Type tailoring is a technique for adding domain-specific type checkers to a  typed host language. Using the Type Systems as Macros approach to building typed languages,  implementing type tailoring in Typed Racket is straightforward. Any library can apply the core idea, and you can try programming with  type tailorings by downloading the trivial package  (requires Racket v6.4 or later).
Background: Programming the Type Elaborator
Many typed languages include both a type checker and a type elaborator. The type elaborator translates source code to an explicitly typed representation for the type checker to validate. Normally, programmers cannot extend the behavior of the type elaborator without modifying the compiler.
In Typed Racket, the Racket macro expander serves as a type elaborator. A fully-expanded Typed Racket program is valid input to the type checker. This design makes Typed Racket macro-extensible; programmers can write their own macros to grow the syntax of the language. More on this in a minute.
Recently, Stephen Chang observed that individual macros can implement the same pipeline as Typed Racket — macros can expand their sub-terms and perform static checks on the results. Such macros are essentially type rules, and a well-designed library of such macros is essentially a type system. Hence Type Systems as Macros: a library for building typed, domain-specific languages (DSLs) using the Racket macro system.
Type tailoring is programming a language’s type elaborator to implement domain-specific typing rules on top of an existing type system. Any language that lets programmers extend the behavior of its type elaborator supports type tailoring. In particular, Typed Racket programmers can use the Type Systems as Macros technique to implement tailored APIs as macros. These macros can statically check properties of source code, communicate the results of their checks to other macros, and expand to type annotated code for the Typed Racket type checker.
As we will see, type tailoring can even extend a “conventional” typed language with static checks that would otherwise require dependent types.
Case Study: Regular Expressions
regexp-match in Racket
Racket’s regular expression library will match a string, path, byte string, or input port against a given pattern.
$ racket > (regexp-match "r" "racket") ; '("r") > (regexp-match "z" "racket") ; #f > (regexp-match "r|z" "racket") ; '("r") > (with-input-from-string "hello world" (λ () (regexp-match "hello (.*)$" (current-input-port)))) ; '(#"hello world" #"world")
The result of matching a pattern against an input source is #f if  no part of the input matched the pattern. Otherwise, the result is a pair containing:
- the subsequence of the input that matched the pattern, and
- a list of subsequences that were captured by parenthesized sub-patterns in the pattern (capture groups).
Note that a capture group may fail to capture any subsequence. If this happens, the list of subsequences contains #f in the  corresponding position.
$ racket > (regexp-match "(r)|(z)" "racket") ; '("r" "r" #f)
regexp-match in Typed Racket
Typed Racket includes a type signature for regexp-match that accepts  the same variety of input sources, but conservatively rejects some “correct”  uses of the output.
$ racket -I typed/racket > (regexp-match "(a*)b" "aaabbb") ; - : (U False (Pairof String (Listof (U False String)))) ; '("aaab" "aaa") > (let ([m (regexp-match "(a*)b" "aaabbb")]) (when m (string-length (second m)))) ; Type Checker: Polymorphic function `second' could not be applied to arguments: ; .... ; Arguments: (Pairof String (Listof (U False String))) ; Expected result: String ; in: (second m)
The issue is that string-length expects a string, but Typed Racket  reasons that the call to second might return a string or false. One solution is to guard the value (second m) with a dynamic check. Another is to apply type tailoring to regexp-match!
Type Tailored regexp-match
We can implement a type-tailored interface to regexp-match  with a macro that:
- checks whether its first argument is a string literal
- counts the number of matched parentheses in the string literal
- expands to code designed to convince Typed Racket that the result is a list with the proper number of arguments.
The third point is crucial, and demonstrates how type tailoring can translate program properties that are implicit in the source code to facts that the type checker can understand.
Here is the macro:
(define-syntax (smart-match stx) (syntax-parse stx [(_ pattern:str other-args ...) (define num-groups (count-groups (syntax-e #`pattern))) #`(let ([m (regexp-match pattern other-args ...)]) (if m (list (car m) #,@(for/list ([i (in-range num-groups)]) #`(or (list-ref m (+ 1 #,i)) (error 'smart-match)))) m))] [(_ args ...) #`(regexp-match args ...)] [_ #`regexp-match]))
Scroll to the bottom of this post for an implementation of
count-groups.
Quick guide to reading the macro:
- syntax-parseis a pattern-matcher, the three clauses match possible uses of- smart-match
- #` is a constructor for syntax objects
- the #,@(for/list ....)term escapes to Racket’sfor/listcombinator to build a list of syntax objects, then splices the generated syntax into the program — as if the programmer had written out a list withnum-groupselements
And here is a “client side” use of the macro:
(let ([m (smart-match "(a*)b" "aaabbb")]) (when m (string-length (second m)))) ; 3
It’s exactly the result we want, and one renaming away from the code we want to write.
Improving smart-match
Our smart-match has a few obvious limitations. First, it reports an error if a capture group fails to capture a string. Second, it fails to tailor calls where the pattern is a byte string  or regular expression literal. In the cases, smart-match defaults to Typed Racket’s  regexp-match. Third, it fails to tailor calls where the pattern is a variable.
The first problem is easy to fix by changing count-groups to return  a list of num-groups booleans encoding whether each capture group  definitely or maybe captures a string when the overall match is successful. The second problem is also just a matter of engineering.
For the third problem, we can use the Type Systems as Macros technique of using the macro expander to replace bound occurrences of a variable with type-annotated variables. The domain-specific type we attach describes the capture groups; for example, given the variable declaration:
(define x #rx"([a-z]+)@gmail\\.com")
 the type annotation is '(#t) because it contains one capture group  that is certain to capture a string when used in a successful  regexp-match. This annotation can be attached to occurrences of x at expansion-time  as a syntax property  for smart-match to extract.
The
trivialpackage includes a tailoredregexp-matchthat addresses all three issues. Note: the fix for the third issue changes the expansion-time meaning ofdefineto infer and propagate domain-specific types.
Practical Benefits
Now that we’ve thoroughly criticized smart-match, it’s worth pointing  out its strengths.
- smart-matchdoes not re-implement regular expression matching. It re-uses Racket’s existing, correct, and performant- regexp-match.
- smart-matchsupports the existing syntax for regular expressions.
- smart-matchvalidates some programs that Typed Racket conservatively rejects.
- smart-matchrejects some programs that Typed Racket would accept, yet evaluate to runtime errors — for instance,- (second (regexp-match "r" "racket")).
Lastly, the use of list-ref exploits its conservative type to improve  the conservative type of regexp-match. Whether you find this use clever or brittle, it is a proof to Typed Racket that  does not resort to type casts.
Implementing general type checker plugins that can convince the host type system of their correctness is a hard problem. I know of three high-quality type checker plugins for Haskell, and all three assert their correctness to GHC (
type-nat-solver,natnormalize, anduom-plugin).
Comparison: Dependently Typed Haskell
At this year’s POPL, Stephanie Weirich gave a keynote about The Influence of Dependent Types on the Haskell type system. After a brief introduction, her fifth slide gave motivation for dependently typed Haskell: domain-specific type checkers.

The rest of the talk was a code walk of a dependently typed regular expression matcher! Stephanie’s regular expression syntax supports Python-style named capture groups, and a successful match returns a record with group names as keys. These keys are reflected in the return type. Of course the point of the talk was not regular expressions, but rather how some impressive GHC plugins compose to solve a practical problem.
What I want to point out, however, is that if the goal is “domain-specific  type checkers”, type tailoring is a more direct solution. So if you are a programmer using Haskell or OCaml or Racket or Scala or  Clojure or Java or any other language with a decent syntax extension system,  you don’t need to wait for “dependently typed X” to add more static checking  to your library. Write a macro!
More Type Tailoring
There is a full implementation of type-tailored regexp-match on the  Racket package server. Try it out by installing the trivial-pkg package:
$ raco pkg install trivial
The package also includes tailorings for:
- format strings
- fixed-length lists and vectors
- N-ary functions
- constant-folding arithmetic
as well as syntax for defining new tailorings.
For further reading on type tailoring, and a sketch of how to prove the soundness of a tailoring, we have a draft paper:
The draft also reports on a small evaluation of our tailored  regexp-match on existing code. We searched the Racket distribution and libraries for uses of regexp-match  and found over 300 in untyped code. We then converted these uses to Typed Racket. After converting, most uses did not compile using Typed Racket’s regexp-match,  but swapping in the tailored regexp-match (a 1-line change) removed  the type errors in over 250 cases.
Appendix: count-groups
Here is a simple implementation for count-groups. This file contains a more robust implementation (look for parse-groups).
(define-for-syntax (count-groups str) (define last-open-paren (box #f)) (define num-groups (for/fold ([num-groups 0]) ([c (in-string str)] [i (in-naturals)]) (case c [(#\() (set-box! last-open-paren i) num-groups] [(#\)) (if (unbox last-open-paren) (begin (set-box! last-open-paren #f) (+ 1 num-groups)) (error 'smart-match "unmatched ')' at position ~a in '~a'" i str))] [else num-groups]))) (if (unbox last-open-paren) (error 'smart-match "unmatched '(' at position ~a in '~a'" (unbox last-open-paren) str) num-groups))