Chapter 4
The syntax-case
system
The syntax-case
system provides support for writing
low-level macros in a high-level style.
Pattern variables
Pattern variables are the unifying concept of both the
syntax-case
system and the closely related
syntax-rules
system, which is defined in section 5. They provide
support for accessing the terminal symbols of a basic parser which
operates on Scheme forms.
Pattern variables are a type of binding exactly like variables and
syntax keywords. They occupy the same namespace as variables and syntax
keywords and can shadow, and be shadowed by, bindings of them; the same
name cannot refer to both a pattern variable and another type of binding
within the same scope. The value of pattern variables cannot be changed
after they have been bound.
Unlike normal variables, pattern variables can be bound to a sequence
of multiple values, or any nesting of sequences of multiple values. The
number of levels of nesting is determined statically by the pattern
which names the pattern variable for binding. When the values are
actually assigned to such a pattern variable at run time, each sequence
may ultimately be empty or contain only one value.
The centrepiece of the syntax-case
macro system is the
eponymous pattern-based parser, the fundamental form for parsing macro
uses, and the syntax
form, the fundamental form for
constructing syntax objects. Syntax-case
binds pattern
variables after parsing a form, and syntax
is used to
access their values.
(syntax-case expression (pattern literal ...)
syntax-case clause ...)
- syntax
(syntax-case custom ellipsis clause
expression (pattern literal ...)
syntax-case clause ...)
- syntax
_
- auxiliary syntax
...
- auxiliary syntax
Syntax: Each pattern literal must be an identifier. Each
syntax-case clause must take one of the following two forms:
(pattern output expression)
(pattern fender output expression)
Fender and output expression must be expressions.
A pattern is an identifier, a constant, or one of the following.
(pattern ...)
(pattern pattern ... . pattern)
(pattern ... pattern ellipsis pattern ...)
(pattern ... pattern ellipsis pattern ...
. pattern)
#(pattern ...)
#(pattern ... pattern ellipsis pattern ...)
Custom ellipsis clause, if present, is an instance of
custom-ellipsis
(section 4.5); ellipsis
within a pattern refers to the auxiliary syntax keyword ...
unless overridden by such a clause.
Semantics: A syntax-case
expression first evaluates expression
to obtain a syntax object. This input syntax object is matched against
the patterns contained in the syntax-case clauses from left
to right.
An identifier appearing within a pattern can be an underscore
(_
), a literal identifier listed in the list of pattern
literals, or the ellipsis. All other identifiers appearing
within a pattern are pattern variables.
Pattern variables match arbitrary input elements and are used to refer
to elements of the input in the template. It is a syntax violation if
the same pattern variable (in the sense of bound-identifier=?
)
appears more than once in a pattern.
Underscores also match arbitrary input elements but are not pattern
variables and so cannot be used to refer to those elements. If an
underscore appears in the pattern literals list, then that takes
precedence and underscores in the patterns match as literals.
Multiple underscores can appear in a pattern.
Identifiers that appear in (pattern literal ...)
are interpreted
as literal identifiers to be matched against corresponding elements of
the input. An identifier within a pattern is treated as a literal
identifier if and only if it is bound-identifier=?
to an identifier
within (pattern literal ...)
. An element in the input matches a
literal identifier in the pattern if and only if the two identifiers
are the same in the sense of free-identifier=?
.
A subpattern followed by ellipsis can match zero or more elements
of the input, unless ellipsis appears in the pattern literals,
in which case it is matched as a literal.
More formally, an input expression matches a pattern if
and only if:
is an underscore (_
); or
is a non-literal identifier; or
is a literal identifier and is free-identifier=?
to
it; or
is a list ( ... )
and is a list of
elements that match through respectively; or
is an improper list ( ... .
)
and is a list or improper list of or more
elements that match through , respectively, and whose
nth tail matches ; or
is of the form ( ... ellipsis
... )
where is a proper list of
elements, the first of which match through ,
respectively, whose next elements each match ,
whose remaining elements match through
; or
is of the form ( ... ellipsis
... . )
where is a list or
improper list of elements, the first of which match
through , whose next elements each match
, whose remaining elements match
through , and whose nth and final cdr matches ; or
is a vector of the form #( ... )
and is
a vector of elements that match through ; or
is of the form #( ... ellipsis
... )
where is a vector of elements
the first of which match through , whose next
elements each match , and whose remaining
elements match through ; or
is a constant and is equal to in the sense of the
equal?
procedure.
When the pattern of a given syntax-case clause matches the
input syntax object, and the syntax-case clause contains a
fender expression, the expression is evaluated to act as an
additional constraint on acceptance of a clause. If the result of the
evaluation is #f
, the clause as a whole does not match, and pattern
matching resumes on the next clause to the right. It is a syntax
violation if the input syntax object does not match any of the
clauses.
If the pattern of the clause matches and there is no fender
expression, or the evaluation of the fender expression returned a
true value, the output expression is evaluated and its value
returned as the value of the syntax-case
expression. If the
syntax-case
form is in tail context, each output expression is
also in tail position.
Pattern variables contained within a clause’s pattern are bound
within the clause’s fender (if present) and output expression
to the corresponding pieces of the input form which they matched.
Pattern variables contained within subpatterns followed by
ellipsis are marked as holding sequences of multiple values
according to the numbers of levels of nested levels of such
subpatterns they are within; the results of destructuring those the
input form according to the pattern become the values of those pattern
variables.
Note: R6RS made any attempt to use the ellipsis or underscore as
literals a syntax violation, and did not provide any means of renaming
the ellipsis.
Generating expansions
(syntax template)
- syntax
(syntax custom ellipsis clause template)
- syntax
#'template
- syntax
...
- auxiliary syntax
Syntax: (syntax template)
can be abbreviated as
#'template
. The two notations are equivalent in all respects.
A template is an identifier, a pattern datum, or one of the
following.
(subtemplate ...)
(subtemplate ... . template)
#(subtemplate ...)
(ellipsis template)
A subtemplate is a template followed by zero or more instances
of ellipsis.
Custom ellipsis clause, if present, is an instance of
custom-ellipsis
(section 4.5); ellipsis
within a template refers to the auxiliary syntax keyword ...
unless overridden by such a clause.
It is a syntax violation if the template contains circular
references.
Semantics: A syntax
expression is similar to a quote-syntax
expression, except that the values of pattern variables appearing
within template are inserted into the template by copying the
template, and the result of evaluating a syntax
expression is a
syntax object which is only partially wrapped, as described below.
A subtemplate followed by an ellipsis expands into zero or more
occurrences of the subtemplate. Pattern variables that occur in
subpatterns followed by one or more ellipses may occur only in
subtemplates that are followed by (at least) as many ellipses. These
pattern variables are replaced in the output by the input subforms to
which they are bound, distributed as specified. If a pattern variable
is followed by more ellipses in the subtemplate than in the associated
subpattern, the input form is replicated for the outermost excess
ellipses as necessary. [Editorial note: Can the meaning of ‘replicated for the
outermost excess ellipses’ be made clearer? ] The subtemplate must
contain at least one pattern variable from a subpattern followed by an
ellipsis, and for at least one such pattern variable, the subtemplate
must be followed by exactly as many ellipses as the subpattern in
which the pattern variable appears; otherwise, it is a syntax
violation.
A template of the form (ellipsis template)
is equivalent to
template
, except that the effect of the ellipsis within the
template is suppressed and it is treated like any other ordinary
identifier. In particular, the template (ellipsis ellipsis)
produces a single ellipsis. This allows macro uses to expand into
forms containing ellipses.
The result of evaluating a syntax
expression is a copy of the
template which is wrapped or unwrapped according to the following
rules.
The copy of a template which is a proper or improper list consists
of unwrapped pairs as far as the rightmost subtemplate which
contains a pattern variable. The cars of the pairs in the copy of
the list are wrapped if they would be wrapped by applying these
rules to the cars in the subtemplates recursively. If the last
subtemplate in a proper list contains a pattern variable, then all
pairs which form part of the list and the empty list in the final
cdr are unwrapped. If the template is an improper list and the final
cdr is a pattern variable, then all pairs which form part of the
improper list are unwrapped and the final cdr is replaced by the
value of the pattern variable in the copy.
The copy of a template which is a vector is unwrapped if any of its
subtemplates contains at least one pattern variable.
The copy of any other template may be wrapped.
The values of the pattern variables are not copied when substituted
into the template, and are thus wrapped or unwrapped to the same
degree as when they were bound. Other datums and identifiers that are
not pattern variables or ellipses are copied directly into the output,
maintaining the contextual information associated with them.
(quasisyntax quasi-template)
- syntax
#`quasi-template
- syntax
(quasisyntax custom ellipsis clause
quasi-template)
- syntax
#`quasi-template
- syntax
(unsyntax expression ...)
- auxiliary syntax
#,expression
- auxiliary syntax
(unsyntax-splicing expression ...)
- auxiliary syntax
#,@expression
- auxiliary syntax
...
- auxiliary syntax
Syntax: (quasisyntax quasi-template)
can be abbreviated as
#`quasi-template
, (unsyntax expression)
as
#,expression
, and (unsyntax-splicing expression)
as
#,@expression
. The notations are equivalent in all respects.
A quasi-template is either a template, an instance of
quasisyntax
, unsyntax
, or unsyntax-splicing
, or a list or vector
containing further quasi-templates. Uses of unsyntax
and
unsyntax-splicing
are valid only within quasi-templates.
Custom ellipsis clause, if present, is an instance of
custom-ellipsis
(section 4.5); ellipsis
within a template refers to the auxiliary syntax keyword ...
unless overridden by such a clause.
The behaviour is undefined if the quasi-template contains circular
references outside of a context within an expression where they
are allowed.
Semantics: The quasisyntax
form is similar to syntax
, but it
allows parts of its template to be evaluated, in a manner similar to
the operation of quasiquote
. Unsyntax
and
unsyntax-splicing
are the quasisyntax
analogues of unquote
and
unquote-splicing
.
Rationale: While unquote
and unquote-splicing
could be re-used in
quasisyntax
for the purpose of escaping out of the quoted
environment, that would make generating macro output including a
quasiquote
expression unnecessarily tricky.
Within the quasi-template, the expressions of
unsyntax
and unsyntax-splicing
forms are evaluated; everything
else is treated as ordinary template material, as with syntax
. The
value of each unsyntax
subform is inserted into the output in place
of the unsyntax form, while the value of each unsyntax-splicing
subform is spliced into the surrounding list or vector structure.
A quasisyntax
expression may be nested, with each quasisyntax
introducing a new level of syntax quotation and each unsyntax
or
unsyntax-splicing
taking away a level of syntax quotation. An
expression nested within quasisyntax
expressions must be
within unsyntax
or unsyntax-splicing
expressions to be
evaluated.
All uses of unsyntax-splicing
, and uses of unsyntax
or
unsyntax-splicing
with zero or more than one subform, are valid only
within lists or vectors. Each use of unsyntax
or unsyntax-splicing
with zero subforms results in no elements being inserted into the list
or vector: the unsyntax
or unsyntax-splicing
is treated as if it
were not there. Each use of unsyntax
or unsyntax-splicing
with
more than one subform is equivalent to the same number of individual
unsyntax
or unsyntax-splicing
forms, each with one of the
subforms, in the same order.
Rationale: Uses of unsyntax
and unsyntax-splicing
with zero or
more than one subform enable certain idioms, such as #,@#,@
. This
has the effect of a doubly indirect splicing when used within a doubly
nested and doubly evaluated quasisyntax
expression.
Binding other pattern variables
within procedural macros
(with-syntax ((pattern expression) ...)
body)
- syntax
(with-syntax custom ellipsis clause
((pattern expression) ...)
body)
- syntax
The with-syntax
form is the fundamental pattern variable binding
form.
Syntax: Each pattern is identical in form to a syntax-case
pattern.
Custom ellipsis clause, if present, is an instance of
custom-ellipsis
(section 4.5); ellipsis
within a pattern refers to the auxiliary syntax keyword ...
unless overridden by such a clause.
Semantics: The value of each expression is computed and
destructured according to the corresponding pattern, and pattern
variables within the pattern are bound as if by syntax-case
to
the corresponding portions of the value within body. It is a
syntax violation if the result of evaluating an expression does
not match the corresponding pattern.
Implementation:
(define-syntax with-syntax
(lambda (stx)
(syntax-case stx ()
((_ ((pattern expression) ...) body_0 body_1 ...)
#'(syntax-case (list expression ...) ()
((pattern ...) (let () body_0 body_1 ...)))))))
Writing macros which
generate other macros
(custom-ellipsis custom ellipsis)
- auxiliary syntax
Syntax: Custom ellipsis must be an identifier.
Semantics: When a custom-ellipsis
form is the first subform of a
syntax-case
, syntax
, quasisyntax
, or with-syntax
form,
instances of ellipsis within the syntax of the pattern,
template, or quasi-template of the respective form refer not
to the auxiliary syntax keyword ...
, but to any identifier which is
bound-identifier=?
to the custom ellipsis identifier.
Examples
Many simpler macros can be written using syntax-rules
(see
section 5) and trivially converted into
syntax-case
. This is useful, for example, when changing code by
using syntax-case
to add additional functionality or error checking
to a macro whose original definition was in syntax-rules
. The
following example shows how the swap!
example of syntax-rules
(section 5) can first be rewritten to use
syntax-case
.
≡
(define-syntax swap!
(lambda (stx)
(syntax-case stx ()
((_ a b)
#'(let ((temp a))
(set! a b)
(set! b temp))))))
The definition can then be improved using a fender clause to improve
error reporting in the case that either of the arguments to swap!
is
not an identifier. With the above definition, (swap! (car x) (car
y))
would result in a syntax violation being signalled which claims
that set!
had been used incorrectly, even though there is no set!
explicitly used in the code.
(define-syntax swap!
(lambda (stx)
(syntax-case stx ()
((_ a b)
(and (identifier? #'a)
(identifier? #'b))
#'(let ((temp a))
(set! a b)
(set! b temp))))))
With this definition, the syntax violation signalled by (swap! (car
x) (car y))
will correctly report that swap!
was used incorrectly.
The following example also shows how syntax-case
can be used to
improve error reporting from macros by writing explicit error checking
code. It defines a variant of case
which checks that all datums in a
clause belong to types that can portably be used in case
: that is,
their behaviour under eqv?
never depends on their location in the
store, which for other types is dependent on the Scheme
implementation. This kind of error checking is not possible in
syntax-rules
, which cannot in general detect the type of any subform
as a datum. (This version of case
also does not provide an else
clause, instead signalling an error if no specific clause matches.)
(define-syntax my-case
(let ((eqv-undefined?
(lambda (x-stx)
(let ((x (syntax->datum x-stx)))
(not (or (boolean? x) (symbol? x) (number? x)
(char? x) (null? x)))))))
(lambda (stx)
(syntax-case stx ()
((_ key ((datum ...) expr_0 expr_1 ...) ...)
(cond ((find eqv-undefined? #'(datum ... ...))
=> (lambda (bad-datum)
(syntax-violation
'my-case
"use of datum in my-case is not portable"
stx bad-datum)))
(else
#'(case key
((datum ...) expr_0 expr_1 ...) ...
(else
(error "key did not match any my-case datum"
key))))))))))
Macros written using syntax-case
can also bind an implicit
identifier, which cannot be done with syntax-rules
. The
with-return
example from section 3.2 can be
reformulated in terms of syntax-case
as follows. The two definitions
are equivalent except that the first one uses quasisyntax
and the
second with-syntax
.
(define-syntax with-return
(syntax-case stx ()
((k body_0 body_1 ...)
(let ((return-id (datum->syntax #'k 'return)))
#`(call-with-current-continuation
(lambda (#,return-id)
body_0 body_1 ...))))))
(define-syntax with-return
(syntax-case stx ()
((k body_0 body_1 ...)
(with-syntax ((return (datum->syntax #'k 'return)))
#'(call-with-current-continuation
(lambda (return)
body_0 body_1 ...))))))
Syntax-case
can also be used in the definition of identifier macros.
The used-as
example from section 2.4 can be
reformulated in terms of syntax-case
as follows.
(define-syntax used-as
(make-variable-transformer
(lambda (stx)
(syntax-case stx (set!)
(id
(identifier? #'id)
#'(quote reference))
((set! _ value)
#'(quote (assignment value)))
((_ . operands)
#'(quote (combination . operands)))))))
Identifier macros written using syntax-case
can be used to optimize
expensive procedure calls at expand time, while still providing the
functionality of a first-class procedure. The following wrapper around
concatenate
turns uses into the more efficient append-map
when its
argument is known to be a call to the map
procedure.
(define-syntax fast-concatenate
(lambda (stx)
(syntax-case stx (map)
((_ (map f ls_0 ls_1 ...))
#'(append-map f ls_0 ls_1 ...))
((_ ls)
#'(concatenate ls))
(id
(identifier? #'id)
#'concatenate))))
(fast-concatenate (map make-list '(1 2 3) '(a b c)))
⇒
(a b b c c c)
(fast-concatenate (list '(bh b p) '(dh d t)))
⇒
(bh b p dh d t)
(apply fast-concatenate '(((gh g k) (g*h g* k*))))
⇒
(gh g k g*h g* k*)
Users should note, however, that many implementations of Scheme
include sophisticated compilers which are able to recognize procedure
calls which can be safely evaluated before run time, and which can
usually optimize such cases more effectively than any macro
definition. Explicit use of macros like this should usually be limited
to instances where optimization cannot be done by a compiler. This
typically includes cases in which the procedure uses side effects
within its definition, or (as in the above example) where an
optimization is possible when some information about arguments’ values
is known at expand time, but the values are otherwise not known until
run time. Note also that the above example does not prevent the
compiler from later additionally performing this optimization on the
resulting append-map
call when all its arguments are known at
compile time.