The Macrological Fascicle

Chapter 4

The syntax-case system

The syntax-case system provides support for writing low-level macros in a high-level style.

Pattern variables

Pattern variables are the unifying concept of both the syntax-case system and the closely related syntax-rules system, which is defined in section 5. They provide support for accessing the terminal symbols of a basic parser which operates on Scheme forms.

Pattern variables are a type of binding exactly like variables and syntax keywords. They occupy the same namespace as variables and syntax keywords and can shadow, and be shadowed by, bindings of them; the same name cannot refer to both a pattern variable and another type of binding within the same scope. The value of pattern variables cannot be changed after they have been bound.

Unlike normal variables, pattern variables can be bound to a sequence of multiple values, or any nesting of sequences of multiple values. The number of levels of nesting is determined statically by the pattern which names the pattern variable for binding. When the values are actually assigned to such a pattern variable at run time, each sequence may ultimately be empty or contain only one value.

Parsing input

The centrepiece of the syntax-case macro system is the eponymous pattern-based parser, the fundamental form for parsing macro uses, and the syntax form, the fundamental form for constructing syntax objects. Syntax-case binds pattern variables after parsing a form, and syntax is used to access their values.

(syntax-case expression (pattern literal ...) syntax-case clause ...)
syntax
(syntax-case custom ellipsis clause expression (pattern literal ...) syntax-case clause ...)
syntax
_
auxiliary syntax
...
auxiliary syntax

Syntax: Each pattern literal must be an identifier. Each syntax-case clause must take one of the following two forms:

(pattern output expression)

(pattern fender output expression)

Fender and output expression must be expressions.

A pattern is an identifier, a constant, or one of the following.

(pattern ...)

(pattern pattern ... . pattern)

(pattern ... pattern ellipsis pattern ...)

(pattern ... pattern ellipsis pattern ... . pattern)

#(pattern ...)

#(pattern ... pattern ellipsis pattern ...)

Custom ellipsis clause, if present, is an instance of custom-ellipsis (section 4.5); ellipsis within a pattern refers to the auxiliary syntax keyword ... unless overridden by such a clause.

Semantics: A syntax-case expression first evaluates expression to obtain a syntax object. This input syntax object is matched against the patterns contained in the syntax-case clauses from left to right.

An identifier appearing within a pattern can be an underscore (_), a literal identifier listed in the list of pattern literals, or the ellipsis. All other identifiers appearing within a pattern are pattern variables.

Pattern variables match arbitrary input elements and are used to refer to elements of the input in the template. It is a syntax violation if the same pattern variable (in the sense of bound-identifier=?) appears more than once in a pattern.

Underscores also match arbitrary input elements but are not pattern variables and so cannot be used to refer to those elements. If an underscore appears in the pattern literals list, then that takes precedence and underscores in the patterns match as literals. Multiple underscores can appear in a pattern.

Identifiers that appear in (pattern literal ...) are interpreted as literal identifiers to be matched against corresponding elements of the input. An identifier within a pattern is treated as a literal identifier if and only if it is bound-identifier=? to an identifier within (pattern literal ...). An element in the input matches a literal identifier in the pattern if and only if the two identifiers are the same in the sense of free-identifier=?.

A subpattern followed by ellipsis can match zero or more elements of the input, unless ellipsis appears in the pattern literals, in which case it is matched as a literal.

More formally, an input expression E matches a pattern P if and only if:

  • P is an underscore (_); or

  • P is a non-literal identifier; or

  • P is a literal identifier and E is free-identifier=? to it; or

  • P is a list ( P 1 ... P n ) and E is a list of n elements that match P 1 through P n respectively; or

  • P is an improper list ( P 1 P 2 ... P n . P n + 1 ) and E is a list or improper list of n or more elements that match P 1 through P n , respectively, and whose nth tail matches P n + 1 ; or

  • P is of the form ( P 1 ... P k P e ellipsis P m + 1 ... P n ) where E is a proper list of n elements, the first k of which match P 1 through P k , respectively, whose next m k elements each match P e , whose remaining n m elements match P m + 1 through P n ; or

  • P is of the form ( P 1 ... P k P e ellipsis P m + 1 ... P n . P x ) where E is a list or improper list of n elements, the first k of which match P 1 through P k , whose next m k elements each match P e , whose remaining n m elements match P m + 1 through P n , and whose nth and final cdr matches P x ; or

  • P is a vector of the form #( P 1 ... P n ) and E is a vector of n elements that match P 1 through P n ; or

  • P is of the form #( P 1 ... P k P e ellipsis P m + 1 ... P n ) where E is a vector of n elements the first k of which match P 1 through P k , whose next m k elements each match P e , and whose remaining n m elements match P m + 1 through P n ; or

  • P is a constant and E is equal to P in the sense of the equal? procedure.

When the pattern of a given syntax-case clause matches the input syntax object, and the syntax-case clause contains a fender expression, the expression is evaluated to act as an additional constraint on acceptance of a clause. If the result of the evaluation is #f, the clause as a whole does not match, and pattern matching resumes on the next clause to the right. It is a syntax violation if the input syntax object does not match any of the clauses.

If the pattern of the clause matches and there is no fender expression, or the evaluation of the fender expression returned a true value, the output expression is evaluated and its value returned as the value of the syntax-case expression. If the syntax-case form is in tail context, each output expression is also in tail position.

Pattern variables contained within a clause’s pattern are bound within the clause’s fender (if present) and output expression to the corresponding pieces of the input form which they matched. Pattern variables contained within subpatterns followed by ellipsis are marked as holding sequences of multiple values according to the numbers of levels of nested levels of such subpatterns they are within; the results of destructuring those the input form according to the pattern become the values of those pattern variables.

Note: R6RS made any attempt to use the ellipsis or underscore as literals a syntax violation, and did not provide any means of renaming the ellipsis.

Generating expansions

(syntax template)
syntax
(syntax custom ellipsis clause template)
syntax
#'template
syntax
...
auxiliary syntax

Syntax: (syntax template) can be abbreviated as #'template. The two notations are equivalent in all respects.

A template is an identifier, a pattern datum, or one of the following.

(subtemplate ...)

(subtemplate ... . template)

#(subtemplate ...)

(ellipsis template)

A subtemplate is a template followed by zero or more instances of ellipsis.

Custom ellipsis clause, if present, is an instance of custom-ellipsis (section 4.5); ellipsis within a template refers to the auxiliary syntax keyword ... unless overridden by such a clause.

It is a syntax violation if the template contains circular references.

Semantics: A syntax expression is similar to a quote-syntax expression, except that the values of pattern variables appearing within template are inserted into the template by copying the template, and the result of evaluating a syntax expression is a syntax object which is only partially wrapped, as described below.

A subtemplate followed by an ellipsis expands into zero or more occurrences of the subtemplate. Pattern variables that occur in subpatterns followed by one or more ellipses may occur only in subtemplates that are followed by (at least) as many ellipses. These pattern variables are replaced in the output by the input subforms to which they are bound, distributed as specified. If a pattern variable is followed by more ellipses in the subtemplate than in the associated subpattern, the input form is replicated for the outermost excess ellipses as necessary. [Editorial note: Can the meaning of ‘replicated for the outermost excess ellipses’ be made clearer? ] The subtemplate must contain at least one pattern variable from a subpattern followed by an ellipsis, and for at least one such pattern variable, the subtemplate must be followed by exactly as many ellipses as the subpattern in which the pattern variable appears; otherwise, it is a syntax violation.

A template of the form (ellipsis template) is equivalent to template, except that the effect of the ellipsis within the template is suppressed and it is treated like any other ordinary identifier. In particular, the template (ellipsis ellipsis) produces a single ellipsis. This allows macro uses to expand into forms containing ellipses.

The result of evaluating a syntax expression is a copy of the template which is wrapped or unwrapped according to the following rules.

  • The copy of a template which is a proper or improper list consists of unwrapped pairs as far as the rightmost subtemplate which contains a pattern variable. The cars of the pairs in the copy of the list are wrapped if they would be wrapped by applying these rules to the cars in the subtemplates recursively. If the last subtemplate in a proper list contains a pattern variable, then all pairs which form part of the list and the empty list in the final cdr are unwrapped. If the template is an improper list and the final cdr is a pattern variable, then all pairs which form part of the improper list are unwrapped and the final cdr is replaced by the value of the pattern variable in the copy.

  • The copy of a template which is a vector is unwrapped if any of its subtemplates contains at least one pattern variable.

  • The copy of any other template may be wrapped.

The values of the pattern variables are not copied when substituted into the template, and are thus wrapped or unwrapped to the same degree as when they were bound. Other datums and identifiers that are not pattern variables or ellipses are copied directly into the output, maintaining the contextual information associated with them.

(quasisyntax quasi-template)
syntax
#`quasi-template
syntax
(quasisyntax custom ellipsis clause quasi-template)
syntax
#`quasi-template
syntax
(unsyntax expression ...)
auxiliary syntax
#,expression
auxiliary syntax
(unsyntax-splicing expression ...)
auxiliary syntax
#,@expression
auxiliary syntax
...
auxiliary syntax

Syntax: (quasisyntax quasi-template) can be abbreviated as #`quasi-template, (unsyntax expression) as #,expression, and (unsyntax-splicing expression) as #,@expression. The notations are equivalent in all respects.

A quasi-template is either a template, an instance of quasisyntax, unsyntax, or unsyntax-splicing, or a list or vector containing further quasi-templates. Uses of unsyntax and unsyntax-splicing are valid only within quasi-templates.

Custom ellipsis clause, if present, is an instance of custom-ellipsis (section 4.5); ellipsis within a template refers to the auxiliary syntax keyword ... unless overridden by such a clause.

The behaviour is undefined if the quasi-template contains circular references outside of a context within an expression where they are allowed.

Semantics: The quasisyntax form is similar to syntax, but it allows parts of its template to be evaluated, in a manner similar to the operation of quasiquote. Unsyntax and unsyntax-splicing are the quasisyntax analogues of unquote and unquote-splicing.

Rationale: While unquote and unquote-splicing could be re-used in quasisyntax for the purpose of escaping out of the quoted environment, that would make generating macro output including a quasiquote expression unnecessarily tricky.

Within the quasi-template, the expressions of unsyntax and unsyntax-splicing forms are evaluated; everything else is treated as ordinary template material, as with syntax. The value of each unsyntax subform is inserted into the output in place of the unsyntax form, while the value of each unsyntax-splicing subform is spliced into the surrounding list or vector structure.

A quasisyntax expression may be nested, with each quasisyntax introducing a new level of syntax quotation and each unsyntax or unsyntax-splicing taking away a level of syntax quotation. An expression nested within n quasisyntax expressions must be within n unsyntax or unsyntax-splicing expressions to be evaluated.

All uses of unsyntax-splicing, and uses of unsyntax or unsyntax-splicing with zero or more than one subform, are valid only within lists or vectors. Each use of unsyntax or unsyntax-splicing with zero subforms results in no elements being inserted into the list or vector: the unsyntax or unsyntax-splicing is treated as if it were not there. Each use of unsyntax or unsyntax-splicing with more than one subform is equivalent to the same number of individual unsyntax or unsyntax-splicing forms, each with one of the subforms, in the same order.

Rationale: Uses of unsyntax and unsyntax-splicing with zero or more than one subform enable certain idioms, such as #,@#,@. This has the effect of a doubly indirect splicing when used within a doubly nested and doubly evaluated quasisyntax expression.

Binding other pattern variables within procedural macros

(with-syntax ((pattern expression) ...) body)
syntax
(with-syntax custom ellipsis clause ((pattern expression) ...) body)
syntax

The with-syntax form is the fundamental pattern variable binding form.

Syntax: Each pattern is identical in form to a syntax-case pattern.

Custom ellipsis clause, if present, is an instance of custom-ellipsis (section 4.5); ellipsis within a pattern refers to the auxiliary syntax keyword ... unless overridden by such a clause.

Semantics: The value of each expression is computed and destructured according to the corresponding pattern, and pattern variables within the pattern are bound as if by syntax-case to the corresponding portions of the value within body. It is a syntax violation if the result of evaluating an expression does not match the corresponding pattern.

Implementation:

(define-syntax with-syntax
  (lambda (stx)
    (syntax-case stx ()
      ((_ ((pattern expression) ...) body_0 body_1 ...)
       #'(syntax-case (list expression ...) ()
           ((pattern ...) (let () body_0 body_1 ...)))))))

Writing macros which generate other macros

(custom-ellipsis custom ellipsis)
auxiliary syntax

Syntax: Custom ellipsis must be an identifier.

Semantics: When a custom-ellipsis form is the first subform of a syntax-case, syntax, quasisyntax, or with-syntax form, instances of ellipsis within the syntax of the pattern, template, or quasi-template of the respective form refer not to the auxiliary syntax keyword ..., but to any identifier which is bound-identifier=? to the custom ellipsis identifier.

Examples

Many simpler macros can be written using syntax-rules (see section 5) and trivially converted into syntax-case. This is useful, for example, when changing code by using syntax-case to add additional functionality or error checking to a macro whose original definition was in syntax-rules. The following example shows how the swap! example of syntax-rules (section 5) can first be rewritten to use syntax-case.

(define-syntax swap!
  (syntax-rules ()
    ((_ a b)
     (let ((temp a))
       (set! a b)
       (set! b temp)))))
(define-syntax swap!
  (lambda (stx)
    (syntax-case stx ()
      ((_ a b)
       #'(let ((temp a))
           (set! a b)
           (set! b temp))))))

The definition can then be improved using a fender clause to improve error reporting in the case that either of the arguments to swap! is not an identifier. With the above definition, (swap! (car x) (car y)) would result in a syntax violation being signalled which claims that set! had been used incorrectly, even though there is no set! explicitly used in the code.

(define-syntax swap!
  (lambda (stx)
    (syntax-case stx ()
      ((_ a b)
       (and (identifier? #'a)
            (identifier? #'b))
       #'(let ((temp a))
           (set! a b)
           (set! b temp))))))

With this definition, the syntax violation signalled by (swap! (car x) (car y)) will correctly report that swap! was used incorrectly.

The following example also shows how syntax-case can be used to improve error reporting from macros by writing explicit error checking code. It defines a variant of case which checks that all datums in a clause belong to types that can portably be used in case: that is, their behaviour under eqv? never depends on their location in the store, which for other types is dependent on the Scheme implementation. This kind of error checking is not possible in syntax-rules, which cannot in general detect the type of any subform as a datum. (This version of case also does not provide an else clause, instead signalling an error if no specific clause matches.)

(define-syntax my-case
  (let ((eqv-undefined?
         (lambda (x-stx)
           (let ((x (syntax->datum x-stx)))
             (not (or (boolean? x) (symbol? x) (number? x)
                      (char? x) (null? x)))))))
    (lambda (stx)
      (syntax-case stx ()
        ((_ key ((datum ...) expr_0 expr_1 ...) ...)
         (cond ((find eqv-undefined? #'(datum ... ...))
                => (lambda (bad-datum)
                     (syntax-violation
                      'my-case
                      "use of datum in my-case is not portable"
                      stx bad-datum)))
               (else
                #'(case key
                    ((datum ...) expr_0 expr_1 ...) ...
                    (else
                     (error "key did not match any my-case datum"
                            key))))))))))

Macros written using syntax-case can also bind an implicit identifier, which cannot be done with syntax-rules. The with-return example from section 3.2 can be reformulated in terms of syntax-case as follows. The two definitions are equivalent except that the first one uses quasisyntax and the second with-syntax.

(define-syntax with-return
  (syntax-case stx ()
    ((k body_0 body_1 ...)
     (let ((return-id (datum->syntax #'k 'return)))
       #`(call-with-current-continuation
          (lambda (#,return-id)
            body_0 body_1 ...))))))
(define-syntax with-return
  (syntax-case stx ()
    ((k body_0 body_1 ...)
     (with-syntax ((return (datum->syntax #'k 'return)))
       #'(call-with-current-continuation
          (lambda (return)
            body_0 body_1 ...))))))

Syntax-case can also be used in the definition of identifier macros. The used-as example from section 2.4 can be reformulated in terms of syntax-case as follows.

(define-syntax used-as
  (make-variable-transformer
   (lambda (stx)
     (syntax-case stx (set!)
       (id
        (identifier? #'id)
        #'(quote reference))
       ((set! _ value)
        #'(quote (assignment value)))
       ((_ . operands)
        #'(quote (combination . operands)))))))

Identifier macros written using syntax-case can be used to optimize expensive procedure calls at expand time, while still providing the functionality of a first-class procedure. The following wrapper around concatenate turns uses into the more efficient append-map when its argument is known to be a call to the map procedure.

(define-syntax fast-concatenate
  (lambda (stx)
    (syntax-case stx (map)
      ((_ (map f ls_0 ls_1 ...))
       #'(append-map f ls_0 ls_1 ...))
      ((_ ls)
       #'(concatenate ls))
      (id
       (identifier? #'id)
       #'concatenate))))
(fast-concatenate (map make-list '(1 2 3) '(a b c)))
(a b b c c c)
(fast-concatenate (list '(bh b p) '(dh d t)))
(bh b p dh d t)
(apply fast-concatenate '(((gh g k) (g*h g* k*))))
(gh g k g*h g* k*)

Users should note, however, that many implementations of Scheme include sophisticated compilers which are able to recognize procedure calls which can be safely evaluated before run time, and which can usually optimize such cases more effectively than any macro definition. Explicit use of macros like this should usually be limited to instances where optimization cannot be done by a compiler. This typically includes cases in which the procedure uses side effects within its definition, or (as in the above example) where an optimization is possible when some information about arguments’ values is known at expand time, but the values are otherwise not known until run time. Note also that the above example does not prevent the compiler from later additionally performing this optimization on the resulting append-map call when all its arguments are known at compile time.