The Macrological Fascicle

Chapter 2

Syntax transformation

In order to evaluate a Scheme program, the macro uses within the program must be expanded into core forms which the implementation can directly further process itself. The set of core forms is implementation-dependent: an implementation may consider any syntax form defined by this report to be either a core form or a macro; the difference is not observable to users. Uses of both macros and core forms are represented as syntax objects.

Macro uses and core forms are both distinguished in forms to be processed by syntax keywords. These keywords occupy the same namespace as variables. That is, within the same scope, an identifier can be bound as a variable or keyword, or neither, but not both, and local bindings of either kind may shadow other bindings of either kind. In order to use a macro or a core form, the corresponding keyword must be imported into a program or library, or introduced within it by means of special macro definition and binding forms.

Many uses of macros defined by the user have the form:

(keyword datum ...)

where keyword is an identifier which lexically refers to the binding established for the macro or core form. Macro uses can also take the form of improper lists, bare identifiers, or set! forms, where the second subform of the set! is the keyword:

(keyword datum ... . datum)

keyword

(set! keyword datum)

In the latter case, the set! keyword must lexically refer to the same binding as the set! form defined in this report, and the binding of keyword must explicitly allow this kind of macro use (see section 2.4).

Macros whose uses can take the form of bare identifiers are referred to as identifier macros.

Transformers

Syntax keywords are bound by user code to transformers.

Most transformers are ordinary Scheme procedures, called transformer procedures, which receive exactly one argument, a syntax object (see section 3) representing the form of a macro use, and return exactly one value, a syntax object representing the result of expanding the input macro use. The result of expanding the input form using the transformer procedure replaces the macro use in the place where it occurred.

Variable transformers (section 2.4) are another kind of transformer.

It is undefined behaviour to re-enter the dynamic extent of a call to a transformer by the expander after it has returned once.

Syntax definition and binding forms

Note: The examples in this section use the syntax-rules system to create transformers, which is defined in section 5.

(define-syntax keyword transformer expression): syntax

Define-syntax binds syntax keywords in a manner analogous to how define binds variables. Transformer expression must be an expression that evaluates at expand time to a transformer. The keyword is then bound as a syntax keyword to this transformer during the process described in section 2.6. The created binding is visible throughout the body where define-syntax is used, unless shadowed by another binding construct within the body.

Examples:

(let ()
  (define even?
    (lambda (x)
      (or (= x 0) (odd? (- x 1)))))
  (define-syntax odd?
    (syntax-rules ()
      ((odd? x) (not (even? x)))))
  (even? 10))

⇒

#t

An implication of the left-to-right processing order (section 2.6) is that one definition can affect whether a subsequent form is also a definition.

(let ()
  (define-syntax bind-to-zero
    (syntax-rules ()
      ((bind-to-zero id) (define id 0))))
  (bind-to-zero x)
  x)

⇒

0

(splicing-let-syntax syntax bindings definition or expression ...): syntax

Syntax: Syntax bindings has the form:

((keyword transformer expression) ...)

Transformer expression is as for define-syntax. It is a syntax violation if the same identifier (in the sense of bound-identifier=?) appears as the keyword of more than one of the syntax bindings.

Semantics: The definition or expression forms are expanded in a syntactic environment containing the bindings of the syntactic environment of the splicing-let-syntax form with additional bindings created by associating each of the keywords as syntax keywords to transformers obtained by evaluating the corresponding transformer expressions. The evaluation of the transformer expressions takes place within the lexical environment where the splicing-let-syntax form appears.

The definition or expression forms are treated as if wrapped in an implicit begin; thus definitions created as a result of expanding the forms have the same extent a definition which appeared in the place of the splicing-let-syntax would have.

Example:

(let ((x 21))
  (splicing-let-syntax
      ((def (syntax-rules ()
              ((def stuff ...) (define stuff ...)))))
    (def foo 42))
  foo)

⇒

42

Note: This form was called let-syntax in R6RS and had the additional restriction that the forms had to be either definitions or expressions, but not both.

(splicing-letrec-syntax syntax bindings definition or expression ...): syntax

Syntax: Same as for splicing-let-syntax.

Semantics: The definition or expression forms are expanded in a syntactic environment containing the bindings of the syntactic environment of the splicing-letrec-syntax form with additional bindings created by associating each of the keywords as syntax keywords to transformers obtained by evaluating the corresponding transformer expressions. The evaluation of the transformer expressions takes place within a lexical environment which contains the bindings of the keywords themselves, so the transformers can transcribe forms into uses of the macros introduced by the splicing-letrec-syntax form. It is undefined behaviour if the evaluation of any of the transformer expressions requires knowledge of the actual transformer bound to one of the keywords.

As for splicing-let-syntax, the definition or expression forms are treated as if wrapped in an implicit begin and can expand into definitions visible outside of the splicing-letrec-syntax form itself.

Note: This form was called letrec-syntax in R6RS and had similar restrictions on its contents to that report’s let-syntax, as described above.

(let-syntax syntax bindings body): syntax

Syntax: The syntax bindings are the same as for splicing-let-syntax and splicing-letrec-syntax.

Semantics: The syntactic environment in the location of the let-syntax expression is extended by new syntax keyword bindings in the manner of splicing-let-syntax and the body expanded within that environment. Let-syntax differs from splicing-let-syntax in that it creates a new lexical body which is not spliced into a surrounding body: definitions within the body are not visible outside of the extent of the body itself.

Example: Compare this example with the example under splicing-let-syntax.

(let ((x 21))
  (let-syntax
      ((def (syntax-rules ()
              ((def stuff ...) (define stuff ...)))))
    (def foo 42))
  foo)

⇒

21

Implementation:

(define-syntax let-syntax
  (syntax-rules ()
    ((_ bindings body_0 body_1 ...)
     (splicing-let-syntax bindings
       (let () body_0 body_1 ...)))))

Note: This form is the same as the let-syntax in the small language report, but not the same as let-syntax from the R6RS (see the remark under splicing-let-syntax). The (scheme base) library must export the same binding [Editorial note: as whatever large language library this ends up in. ]

(letrec-syntax syntax bindings body): syntax

Syntax: The syntax bindings are the same as for splicing-let-syntax and splicing-letrec-syntax.

Semantics: The syntactic environment in the location of the letrec-syntax expression is extended by new syntax keyword bindings in the manner of splicing-letrec-syntax and the body expanded within that environment. Letrec-syntax differs from splicing-letrec-syntax in that, like let-syntax, it creates a new lexical body which is not spliced into a surrounding body.

Example:

(letrec-syntax
    ((xor
      (syntax-rules ()
        ((_) #f)
        ((_ e)
         (if e #t #f))
        ((_ e_1 e_2 ...)
         (let ((temp e_1))
           (if temp
               (not (or e_2 ...))
               (xor e_2 ...)))))))
  (values (xor #t #f #f)
          (xor #t #t #f)))

⇒

#t #f

Implementation:

(define-syntax letrec-syntax
  (syntax-rules ()
    ((_ bindings body_0 body_1 ...)
     (splicing-letrec-syntax bindings
       (let () body_0 body_1 ...)))))

Note: This form is the same as the letrec-syntax in the small language report, but not the same as letrec-syntax from the R6RS (see the remark under splicing-let-syntax). The (scheme base) library must export the same binding [Editorial note: as whatever large language library this ends up in. ]

Syntax parameters

Syntax parameters are a minor variation on ordinary syntax keyword bindings. They provide a mechanism for rebinding a macro definition within the dynamic extent of a macro expansion.

Among other uses, this provides a convenient solution to one of the most common types of unhygienic macro: those that reintroduce the same unhygienic binding each time the macro is used. With syntax parameters, instead of introducing the binding unhygienically each time, one instead creates a single binding for the keyword, which is adjusted when the keyword is supposed to have a different meaning. As no new bindings are introduced, hygiene is preserved. Using a syntax parameter also provides the advantage that the identifier for the binding can be renamed when it is imported, if the macro user so wishes.

(define-syntax-parameter keyword transformer expression): syntax

Binds keyword as a parameterizable syntax keyword, using the transformer created by evaluating the transformer expression at expand time as the default transformer. When keyword is used outside the context of a syntax-parameterize body, the result is equivalent to if that keyword had been defined using define-syntax.

Define-syntax-parameter is similar to define-syntax, except the created binding is marked as parameterizable.

(syntax-parameterize ((keyword transformer expression) ...) body): syntax

Adjusts the keywords to use the transformer obtained by evaluating the corresponding transformer expressions when the keywords are used within the expansion of the body. It is a syntax violation if any of the keywords refer to bindings that are not parameterizable syntax keyword bindings.

Syntax-parameterize differs from let-syntax in that the binding is not shadowed, but adjusted, and so uses of the keywords in the expansion of body use the new transformers.

Example: The following example defines a form lambda^ which automatically makes an early-return procedure called return available within its body.

(define-syntax-parameter return
  (erroneous-syntax "return used outside of lambda^"))

(define-syntax lambda^
  (syntax-rules ()
    ((lambda^ formals body_0 body_1 ...)
     (lambda formals
       (call-with-current-continuation
        (lambda (escape)
          (syntax-parameterize
              ((return (identifier-syntax escape)))
            body_0 body_1 ...)))))))

Todo: This example will probably need changing to use delimited control operators, once it is decided what form those will take in the Foundations.

Variable transformers

Variable transformers are another kind of transformer besides transformer procedures. A variable transformer is a simple container for a procedure, created by calling make-variable-transformer on that procedure. Binding a syntax keyword to a variable transformer declares to the expander that the procedure contained within it also expects to process macro uses of the form (set! keyword datum). An attempt to expand a macro use of this form whose transformer is not a variable transformer is a syntax violation.

(make-variable-transformer proc): procedure

Wraps the procedure proc in a variable transformer and returns it.

When a syntax keyword is bound to the result of invoking make-variable-transformer on a transformer procedure, that transformer procedure is invoked for all macro uses with that keyword, including when the keyword is the left-hand side of a set! expression, which would otherwise be a syntax violation.

Rationale: If set! worked as described for all macro transformer procedures, many macros would mistakenly process set! forms as if they were macro uses with the keyword in the operator position, and could actually produce a result if their usual syntax happened to be of the approximate form (keyword identifier expression). The result of that expansion might then turn out to be valid Scheme code, creating unexpected behaviour in a program whose cause might be difficult to discover. All macros would have to check explicitly for the comparatively rare set! case to guard against it. By centralizing this check within the macro expander, requiring transformers which actually expect to process set! forms to explicitly declare this fact, this kind of programming error becomes impossible.

Example:

(define-syntax used-as
  (make-variable-transformer
   (lambda (stx)
     (cond ((identifier? stx)
            (quote-syntax (quote reference)))
           ((free-identifier=? (car (unwrap-syntax stx)) #'set!)
            `(,(quote-syntax cons)
              ,(quote-syntax (quote assignment))
              (,(quote-syntax quote)
               ,(cdr (unwrap-syntax
                      (cdr (unwrap-syntax stx)))))))
           (else
            `(,(quote-syntax cons) ,(quote-syntax (quote combination))
                                   (,(quote-syntax quote)
                                   ,(cdr (unwrap-syntax stx)))))))))

used-as

⇒

reference

(set! used-as x)

⇒

(assignment x)

(used-as y)

⇒

(combination y)

Identifier properties

During expansion, a set of properties can be associated with each identifier in a Scheme program. This allows arbitrary information to be associated with identifiers, which can be used by macro transformers to inform their treatment of particular identifiers. For example, the sample implementation of the syntax-case pattern matcher included with this report uses identifier properties to implement pattern variables.

Each property defined on an identifier associates a key (which must also be an identifier) with a value (which may be any object). When an identifier binding is created by definition or by a local binding construct, it is associated with a new, empty set of identifier properties. If the identifier bound shadows one from a containing lexical context, the identifier properties on the shadowed identifier effectively become hidden within the lexical extent of the new binding, in the same way its binding is hidden.

When an identifier property is defined on an identifier, the property belongs only to the lexical scope in which that property is defined. The property itself may shadow properties created on the same identifier and with the same key in containing lexical contexts.

When an identifier is imported from a library, it brings with it a copy of the set of identifier properties that were defined on it in that library. Additional identifier properties may be defined on it, and properties from the original library may be redefined within the context in which the identifier was imported, without those definitions or redefinitions being visible in the original library. If the identifier was imported into a library which subsequently re-exports it, the re-exported version has the identifier properties as they were (re-)defined in the library which re-exports it. If the same binding is then imported into another context from both the original and the re-exporting library, or from multiple re-exporting libraries which each defined their own properties on the identifier, the identifier in that context has a set of properties which is the union of the properties from all the libraries it is imported from. If two properties with the same key are imported on the same identifier, and the values of the properties are not the same in the sense of eqv?, it is an import error.

Note: Though identifier properties are superficially similar to a classical Lisp feature known as symbol property lists, the two are quite different, even though they can sometimes be used for the same purposes. A symbol property list is typically held globally, unlike identifier properties, which are lexically scoped to where they were defined.

Todo: Define the interaction of identifier properties with phasing.

(define-property identifier key expression): syntax

Syntax: Both identifier and key must be bound identifiers.

Semantics: The expression is evaluated at expand time to produce a single value, and an identifier property is defined on the identifier associating the key with this value.

Operationally, when the expander encounters a define-property form, it creates a new lexical address within the lexical environment for a tuple of the identifier and the lexical address for the binding of the key. It then stores the result of evaluating the expression in its global binding store under the new address.

(identifier-property id key): procedure

(identifier-property id key default): procedure

Returns the identifier property associated with the identifier id whose key has the same binding as key. If there is no such property, it returns default, or #f if no default argument was provided. If either id or key is not bound, a syntax violation is signalled.

The identifier-property procedure can only be called within the dynamic extent of a call by the expander to a transformer. If it is called in other situations, it is unspecified whether the procedure will work as intended, or act as if id or key or the property requested is unbound, or will signal an error [Editorial note: an assertion violation ].

Operationally, identifier-property first finds the lexical addresses $a_{i d}$ and $a_{k e y}$ of id and key respectively, then finds the lexical address $a_{p r o p}$ in the lexical environment of id for the tuple of id and these lexical addresses. Finally, it looks up the address $a_{p r o p}$ in the global binding store and returns the value associated with it.

Note: Two identifiers which share the same binding will not necessarily have the same identifier properties: free-identifier=? is used to match identifier keys but not the identifiers themselves in the binding store when looking up identifiers. This can occur when an identifier property’s value is shadowed, or when a binding is imported into multiple libraries or under multiple names, as in the following example.

(import (scheme base)
        (rename (only (scheme base) cons) (cons make-pair)))

(define-syntax renamed?
  (erroneous-syntax "only an identifier property key"))
(define-property make-pair renamed? #t)

(define-syntax both-renamed?
  (lambda (stx)
    (and (identifier-property #'cons #'renamed?)
         (identifier-property #'make-pair #'renamed?))))

(values (free-identifier=? #'cons #'make-pair)
        (both-renamed?))

⇒

#t #f

Expansion process

In order to expand a body (whether library, program, or other body), the expander processes the initial forms within from left to right. How the expander processes each form encountered depends upon the kind of form.

[Editorial note: The following has been formulated based on the equivalent expansion process defined by the R6RS, but assuming that R7RS will relax the restriction on the order of definitions and expressions in all bodies. R7RS already relaxed the restriction in library bodies compared to R6RS. If the restriction is not relaxed within regular bodies, only a small adjustment to the text, reverting to R6RS semantics for those bodies, is required. ]

[Editorial note: This process does not define semantics compatible with those prescribed for program bodies by the small language. Those semantics will be specified in a future fascicle. ]

macro use: The expander invokes the associated transformer to transform the macro use, then recursively performs whichever of these actions are appropriate for the resulting form.
define-syntax or define-syntax-parameter form: The expander expands and evaluates the right-hand-side expression and binds the keyword to the resulting transformer.
define form: The expander records the fact that the defined identifier is a variable but defers expansion of the right-hand-side expression until after all of the forms in the body have been processed.
define-property form: The expander expands and evaluates the value expression and creates or replaces a property for the key on the given identifier, associating it with the resulting value.
begin form: The expander splices the subforms into the list of body forms it is processing.
splicing-let-syntax or splicing-letrec-syntax form: The expander splices the inner body forms into the list of (outer) body forms it is processing, arranging for the keywords bound by the splicing-let-syntax and splicing-letrec-syntax to be visible only in the inner body forms.
expression, i.e., nondefinition: The expander defers the expansion of the expression until after all the forms in the body have been processed.

Once the rightmost form in the body has been processed, the expander makes a second pass over the forms deferred as the right-hand sides of variable definitions or as nondefinitions.

Note that this algorithm does not directly reprocess any form. It requires a single left-to-right pass over the definitions followed by a single pass (in any order) over the body expressions and deferred right-hand sides.

The behaviour is undefined if any definition in the sequence of forms to define any identifier whose binding is used to determine the meaning of the undeferred portions of the definition, or of any definition that precedes it in the sequence of forms. Similarly, the behaviour is undefined if the evaluation of any form in the sequence of forms uses or assigns the value of a defined variable whose definition is to the right of that form. For example, the behaviour of each of the following examples is undefined:

(define define 3)

(begin (define begin list))

(display (+ x 16))
(define x 32)

[Editorial note: Last example should be within a (let () ...) if the relaxation is accepted for all bodies. ]

The behaviour of the following example is not undefined, because the body of the internal increase procedure will not be evaluated by a call to it until after the value variable it closes over has been defined:

(define (make-counter)
  (define (increase)
    (set! value (+ value 1))
    value)
  (define value 0)
  increase)

Phases of evaluation and macro expansion

The algorithm for processing forms in bodies outlined above requires the expressions creating macro transformers to be evaluated before evaluation of the Scheme program as a whole can proceed. The environment in which such evaluation takes place is defined by dividing the evaluation of Scheme programs into phases. Each phase is identified by a non-negative integer, and the number of the phase in which evaluation is currently taking place at any time is denoted $ϕ$ . If a macro definition appears in phase $n$ code, then its right-hand-side expression is evaluated in phase $n + 1$ . The expansion and evaluation of Scheme forms after all syntax keywords have been defined takes place at phase $0$ ; thus, at the top level of a body, the expansion and evaluation of the right-hand sides of all define-syntax forms and the transformer expressions of splicing-let-syntax and splicing-letrec-syntax bindings takes place at phase $1$ .

The environment at each phase is defined as follows. The environment at the earliest phase of evaluation contains all bindings which the program or library has imported from other libraries. All of these bindings, whether they are variables or syntax keywords, are available at all phases of evaluation. All syntactic bindings created in the course of expansion are likewise available at all phases of evaluation within the scopes in which they are visible. Variable bindings are available only in the phase in which they are created. It is undefined behaviour to either attempt to access the binding of, or to rebind an identifier which is a variable defined in a different phase.

Note: The possibility provided by the R6RS for explicit control of the availability of imported bindings at particular phases in import specs has been removed, because it proved unpopular with implementers and users.