The Macrological Fascicle

Chapter 1

Macros and hygiene

Scheme programs and libraries can define and use new derived expression types, called macros. Each instance of a macro is called a use of the macro. Macro uses can take similar forms to any expression type defined in this report which is associated with a binding to an identifier.

Implementations of Scheme can evaluate such expression types because the definition of a macro binds an identifier, which is called the keyword or syntax keyword and which uniquely determines the expression type, to a procedure written in ordinary Scheme code which, when called with an argument representing the form of a macro use, can analyse that macro use and transcribe it into a more primitive expression. The Scheme implementation can then evaluate this new expression, possibly after transcribing additional macro uses within it into yet more primitive expressions. The procedures associated with macro definitions are called transformers. The process of transcribing macro uses using transformers within a program is called expansion, and the more primitive expression resulting from calling the transformer on the form of a macro use is called the expansion of that macro. The part of a Scheme implementation responsible for the process of expansion is thus called the expander.

This report defines two closely-related high-level systems for writing transformers (syntax-case and syntax-rules). Within these high-level systems, macros defined by the users are by default ‘hygienic’ and ‘referentially transparent’ and thus preserve Scheme’s lexical scoping. (Kohlbecker 1986, Kohlbecker et al. 1986, Bawden and Rees 1988, Clinger and Rees 1991, Dybvig, Hieb and Bruggeman 1992)

If a macro transformer inserts a binding for an identifier, the identifier will in effect be renamed throughout its scope to avoid conflicts with other identifiers.
If a macro transformer inserts a free reference to an identifier, the reference refers to the binding that was visible where the transformer was specified, regardless of any local bindings that surround the use of the macro.

Besides these high-level systems, this report also defines a low-level mechanism by which these properties are maintained, and which allows them to be deliberately broken in a controlled way. It is also possible to use these lower-level primitives within transformers which use the high-level syntax-case system, in order to write macros which deliberately break these properties.

In order to both maintain these properties by default and allow them to be broken when required, an implementation of Scheme contains a concrete implementation of a theoretical model of hygiene, which allows it to keep track of when and where an identifier was introduced, even if it is inserted elsewhere in a program by a macro expansion.

Defining hygiene

Barendregt (1984)’s hygiene condition for the lambda calculus is an informal notion that requires the free variables of an expression $N$ that is to be substituted into another expression $M$ not to be captured by bindings in $M$ when such capture is not intended. Kohlbecker et al. (1986) propose a corresponding hygiene condition for macro expansion that applies in all situations where capturing is not explicit: ‘Generated identifiers that become binding instances in the completely expanded program must only bind variables that are generated at the same transcription step’. In the terminology of this report’s model of hygiene, the generated identifiers are those introduced by a transformer rather than those present in the form passed to the transformer, and a macro transcription step corresponds to a single call by the expander to a transformer. Also, the hygiene condition applies to all introduced bindings rather than to introduced variable bindings alone.

This leaves open what happens to an introduced identifier that appears outside the scope of a binding introduced by the same call. Such an identifier refers to the lexical binding in effect where it appears inside the transformer body or one of the helpers it calls. This is essentially the referential transparency property described by Clinger and Rees (1991). Thus, the hygiene condition can be restated as follows:

A binding for an identifier introduced into the output of a transformer call from the expander must capture only references to the identifier introduced into the output of the same transformer call. A reference to an identifier introduced into the output of a transformer refers to the closest enclosing binding for the introduced identifier or, if it appears outside of any enclosing binding for the introduced identifier, the closest enclosing lexical binding where the identifier appears inside the transformer body or one of the helpers it calls.

Within the context of hygienic macro expansion, it is sometimes useful to write macros which deliberately violate this condition, either by introducing bindings for identifiers which capture the references of identifiers introduced outside of the same transformer call, or by introducing identifiers which refer to bindings present at the macro use site. Since uses of such non-hygienic macros may occur within the output of transformer calls for other macros, the transformers for non-hygienic macros must limit the extent of this capture to identifiers that were either introduced into the output of a specific transformer call, or which where introduced by and present in the original forms before expansion began. In other words, identifiers introduced in this way must be treated as if they had been implicitly present in a form generated by some specific macro transcription step, or in a form in the original source.

Modelling hygiene

Todo: Express the hygiene model in notation as well as in prose. (Needs a better markup system for the spec than the current mess.)

Note: This section describes a possible operational semantics of one theoretical model of hygiene. Implementations are not required to adopt this exact model, as long as whichever model they use repects the hygiene condition defined above. The order in which implementations actually analyse macro uses within a library or program is defined in section 2.6.

In the process of macro expansion, Scheme code is represented as datums (as if a quote expression surrounding the code had been evaluated) which are wrapped in a hygienic context. This context propagates down to each individual symbol in the datum tree, each of which is ultimately wrapped. A single symbol wrapped with context is an identifier. However, at the beginning of expansion, an entire datum is wrapped in one context. As the analysis and expansion of macro uses progresses, the hygienic context is pushed down to new wraps on the datums contained within the original datum. This allows the expander to efficiently maintain and update the same hygienic context for a large tree of Scheme code at once, copying it only when required.

The hygienic context contained in a wrap consists of a history, which is a set of time-stamps, and a lexical environment, which keeps track of identifiers’ lexical addresses. The history is used in the expansion process to keep track of when an identifier was introduced: the time-stamps within the history are of two types, recalling respectively the entry and exit of identifiers into and out of a transformer when it is called in a single macro transcription step. The lexical environment keeps track of where in the code identifiers were introduced and bound, including which binding each identifier has.

Note: In the R6RS, the history was referred to as the marks, and each time-stamp as one mark (for a time-stamp recording the completion of a macro transcription step) or an antimark (for a time-stamp recording the beginning of a macro transcription step). The lexical environment was referred to as a set of substitutions, and a lexical address as the label on one substitution. The model defined by this report is semantically identical, but the terminology has been changed (in the case of the term ‘time-stamp’, reverted to the original terminology of Kohlbecker) in the hope of improving the clarity of the definition to readers. Other authors have also referred to time-stamps as colours or renames, and to lexical addresses as bindings or binding names.

The expander can create a new wrap on a datum with an existing history and lexical environment. The expander can also unwrap a layer of the syntax tree, a process which consists of creating a new instance of the same type of datum contained within a wrap and filling it with new wraps, each with the contents of the corresponding parts of the original datum and a copy of the hygiene information within the original wrap. Unwrapping successive layers of the syntax tree allows macro transformers to parse their input.

When the expander encounters a core binding construct such as lambda or splicing-let-syntax, it extends the hygiene context in its wrap by adding new lexical addresses for each of the identifiers which the construct binds. The expander also maintains a global binding store mapping lexical addresses to their expand-time values: if the core binding construct binds syntax, it also updates this binding store mapping the new lexical address to the evaluated transformer for the new syntax.

When the expander encounters a macro use, it looks up the transformer associated with the macro use’s syntax keyword in the global binding store. It then adds a new time-stamp to the history of the wrap for the macro use, which records the beginning of a new macro transcription step, and calls the transformer with the macro use in its new wrap.

The transformer procedure is allowed to return a datum which is not wrapped in hygienic context, as long as all identifiers carry hygienic context — i.e., as long as no symbols occur in any non-wrapped subtree of the datum. In order to correctly record the end of the macro transcription step, therefore, the expander must add a time-stamp recording the end of the same macro transcription step to every wrap within the returned syntax object.

The effect of this process is that identifiers which were present anywhere in the input form of the macro use are recalled with both the time-stamp for the beginning of a transcription step and the time-stamp for the end of the step in their histories, while identifiers which were introduced by the transformer are recalled only with the time-stamp for the end of the transcription step. When histories are compared, time-stamps for the beginning and end of the same transcription step are both ignored, and both may be discarded. This allows identifiers introduced by the transformer call to be distinguished from identifiers in the input form, and more generally allows identifiers introduced by macros to be distinguished from macros within the original source code.

An identifier therefore has two kinds of name: the symbolic name, which is the symbol datum wrapped by the identifier, and the time-stamped name, which is the symbolic name plus the history. In most contexts, identifiers in Scheme treated as if their name were the time-stamped name, not the symbolic name.