The e-TeX Short Reference Manual

NTS team
October 1996

Derived from a paper originally presented as:

Philip Taylor, "e-TeX: a 100%-compatible successor to TeX"
(Following humbly in the footsteps of the Grand Wizard)

in: Proceedings of the Ninth European TeX Conference EuroTeX'95, September 4-8, 1995, Arnhem, The Netherlands, pp. 359-370.

# 1 Introduction

e-TeX is the first concrete result of an international research & development project, the NTS Project, which was established under the ægis of DANTE e.V. during 1992. The aims of the project are to perpetuate and develop the spirit and philosophy of TeX, whilst respecting Knuth's wish that TeX should remain frozen.

The group were very concerned that unless there existed some evolutionary flexibility within which TeX could react to changing needs and environments, it might all too soon become eclipsed by more modern yet less sophisticated systems. Accordingly they agreed to investigate a possible successor or successors to TeX, successors which would enshrine and encapsulate all that was best in TeX whilst being freed from the evolutionary constraints which Knuth had placed on TeX itself. To avoid any suggestion that it was TeX which the group sought to develop against Knuth's wishes, a working title of NTS (for New Typesetting System) was chosen for the project.

During the initial meetings of the NTS group, it became clear that there were two possible approaches to developments based on TeX: an evolutionary path which would simply continue where Knuth had left off, and which would use as its basis the source code of TeX itself (i.e. TeX.Web); the other a revolutionary path which would be based on a completely new implementation of TeX, using a modern rapid-prototyping language which could allow individual components of the system to be modified or replaced in a simple and straightforward manner. The group agreed that the latter (revolutionary) approach had much greater potential, but were aware that the re-implementation would be non-trivial, and would require external funding to bring it to fruition in finite time; accordingly they agreed to concentrate their initial efforts on the former (evolutionary) path, and set to work to specify and implement a direct derivative of TeX which became known as e-TeX (the e of e-TeX may be read as extended, enhanced, evolutionary or European at will(!), and is also an acknowledgement of the parallel developments which have lead the LaTeX 3 team to modify their initial goal and to release an interim LaTeX, LaTeX2e, which is directly derived from the LaTeX sources.

The group took as the starting point for the development of e-TeX the many contributions which had been made on NTS-L (the open mailing list on which discussions pertinent to e-TeX & NTS take place), together with the extremely interesting list of ideas which Knuth gives at the end of TeX82.Bug, and which he describes as Possibly nice ideas that will not be implemented' (and which he contrasts with Bad ideas that will not be implemented'!). Individual members of the group also contributed ideas of their own which had not necessarily been discussed publicly. All proposals were then subjected to a rigorous vetting procedure to ensure that they conformed to the e-TeX philosophy, which may be summarised as follows:

e-TeX will in all ways demonstrate its affinity to, and derivation from, Knuth's TeX; it will be implemented as a change-file to TeX.Web, and will not exploit features which could only be achieved by using a particular implementation, operating system or language; it will be capable of being used successfully on a machine as small as an 80286-based PC or similar.

At format-generation time, a user will have the option of generating either a TeX-compatible format or an e-TeX format; if the TeX-compatible format is subsequently used in conjunction with e-TeX, the result will be Trip-compatible (i.e. indistinguishable from TeX proper). If an e-TeX format is generated and used in conjunction with e-TeX, then provided that none of the new e-TeX primitives are used, the results will be identical to those which would be produced using TeX proper. If an e-TeX format is used in conjunction with e-TeX and if one or more of the new e-TeX primitives are used, then those portions of the document which are affected by the new primitive(s) may be processed in a manner unique to e-TeX; other portions of the document will be processed in a manner identical to that of TeX proper. Only if an e-TeX format is used in conjunction with e-TeX and if an explicit assignment is made to one of the enhanced-mode variables to enable that particular enhanced mode will e-TeX behave in a manner which may be distinguishable from that of TeX even if no other reference to an e-TeX primitive occurs anywhere in the document. (These modes of operation are referred to as compatibility-mode, extended-mode and enhanced-mode respectively.)

All new e-TeX primitives will be syntactically identical to existing TeX primitives: that is, they will be either control-words or control-symbols within a normal category code régime. Where an analogous primitive exists within TeX, the corresponding e-TeX primitive(s) will occupy the same syntactic niche. Every effort will be made to ensure that new e-TeX primitives fit into the existing set of TeX datatypes; no new datatype will be introduced unless it is absolutely essential.

In brief, this implies that e-TeX will follow the principle of least surprise: an existing TeX user, on using e-TeX for the first time, should not be surprised by e-TeX's behaviour, and should be able to take advantage of new e-TeX features without having either to unlearn some aspects of TeX or to learn some new e-TeX philosophy.

# 2 Installation

It is intended that e-TeX be available ready-compiled for those systems for which pre-compiled binaries are the norm (e.g. MS-DOS, VMS, ...); for other systems such as Unix(TM), e-TeX is supplied as a change-file which will need to be applied to TeX.Web in the normal way. However, since there will already be an implementation-specific change-file for the system of interest, some means will be required of merging TeX.Web with not one but (at least) two change-files; possibilities include PatchWeb, Tie, etc., but if none of these are available then WebMerge, a TeX script, is supplied and can be used as a slower but satisfactory alternative. In practice, two or three change-files will be needed: the e-TeX system-independent change-file, the TeX system-dependent change-file, and perhaps a small e-TeX system-dependent change-file. The system-independent e-TeX change-file is supplied as part of the e-TeX kit, and sample system-dependent e-TeX change-files are also supplied which may be used as a guide to those places at which system-dependent interactions are to be expected: an experienced implementor should have little difficulty in modifying one of these to produce an e-TeX system-dependent change-file for the system of interest. Once e-TeX has been tangled and woven, it should be compiled and linked in the normal way.

Once a working binary (or binaries, for those systems which have separate executables for IniTeX and VirTeX) has been acquired or produced, the next step will be to generate a suitable format file or files. Whilst e-TeX can be used in conjunction with Plain.TeX to produce a Plain e-format, it is better to use the supplied etex.src file which supplements the e-TeX primitives with additional useful control sequences.

When generating the format file, and regardless of the format source used, one fundamental decision must be made: is e-TeX to generate a compatibility mode format, or an extended mode format? If the former, all e-TeX extensions and enhancements will be disabled, the format will contain only the TeX-defined set of primitives, and any subsequent use of the format in conjunction with e-TeX will result in completely TeX-compatible behaviour and semantics, including compatibility at the level of the Trip test. If the latter option, however, is selected, then all extensions present in e-TeX will automatically be activated, and the format file will contain not only the TeX-defined set of primitives but also those defined by e-TeX itself; any subsequent use of such a format in conjunction with e-TeX will result in e-TeX operating in extended mode; documents which contains no references to any of the e-TeX-defined primitives will continue to generate results identical to those which would have been produced were the document processed by TeX, but compatibility at the Trip-test level can no longer be accomplished, and of course any document which makes reference to an e-TeX primitive will generate results which could not have been accomplished using TeX. It should be noted that neither a compatibility mode format nor an extended mode format may be used in conjunction with TeX itself; they are only suitable for use in conjunction with e-TeX, since formats are not in general portable. Finally it should be emphasised that even if an extended mode format is generated, any document processed using such a format but not referencing any e-TeX-defined primitive will produce results identical to those which would have been produced had the same document been processed using TeX; only if the document makes an explicit assignment to one of the enhanced mode state variables (\TeXXeTstate is the only instance of these in V1 of e-TeX) will compatibility with TeX be compromised: e-TeX is then said to be operating in enhanced mode rather than extended mode.

The choice between generating a compatibility mode format and an extended mode format is made at the point of specifying the format source file: assuming that the operating system supports command-line entry with parameters, then a normal TeX format-generation command would probably resemble:

        initex plain \dump


or if the more verbose interactive form is preferred:

        initex
**plain
*\dump


With e-TeX, exactly the same command will achieve exactly the same effect, and the format generated will be a compatibility-mode format; thus assuming that the Ini-version of e-TeX is invoked with the command einitex, the following will both generate compatibility-mode formats:

        einitex plain \dump


and

        einitex
**plain
*\dump


In order to generate an extended mode format, the file-specification for the format source file must be preceded by an asterisk (*); whilst this may seem an inelegant mechanism, it has the great advantage that it avoids almost all system dependencies (Graphical user interface (GUI) systems excepted, of course), and the asterisk as a component element of a filename is a very remote possibility (most filing systems reserve the asterisk as a wild card' character, which can therefore not form a part of a real file name per se). Thus to generate an extended mode Plain format, the following dialogue may be used:

        einitex *plain \dump


or

        einitex
***plain
*\dump


and to generate an extended mode etex.src format, the following instead:

        einitex *etex.src \dump


or

        einitex
***etex.src
*\dump


Once suitable formats have been generated, they can then be used in conjunction both with e-IniTeX and e-VirTeX without further formality: in particular, no asterisk is needed (nor should be used!) if a format is specified, since the format implicitly defines (depending as its mode of generation) in which mode (compatibilty or extended) e-TeX will operate. Thus, for example, if a plain format had been generated in compatibility mode, and an etex format had been generated in extended mode, then both:

        einitex &Plain


and

        evirtex &plain


will cause e-TeX to process any subsequent commands in compatibility mode. On the other hand, both

        einitex &etex


and

        eVirTeX &etex


will cause e-TeX to process any subsequent commands in extended mode, but only because the etex format was generated in extended mode: it is not the name of the format, nor is it the contents of the source of the format, which determine the mode of operation -- it is the mode of operation which was used when the format was generated. Any format generated in compatibility mode will cause e-TeX to operate in compatibility mode whenever it is used, whilst the equivalent format, built from the same source but generated in extended mode, will cause e-TeX to operate in extended mode whenever it is used.

Although e-TeX is completely TeX-compatible, and there is therefore no real reason why any system should need both TeX and e-TeX, it is anticipated that until complete confidence exists in the compatibility of e-TeX many sites and users will prefer to retain instances of each. For this reason it is intended that change-files and binaries should ensure that both TeX and e-TeX can happily co-exist on any system by a careful choice of name-spaces. In the case of the reference VMS implementation, for example, this is accomplished by using the prefix "etex_" for each logical name which defines the e-TeX environment, in contrast to the prefix "tex_" which defines the analogous TeX environment; the "etex_*" logical names are defined as search lists which first reference an e-TeX specific location followed by the analogous location for TeX.

# 3 The new features

Bearing in mind the contraints outlined in the introduction, the group identified 35 new primitives which they believed would give added functionality to e-TeX without compromising its compatibility with TeX; of the 35 new primitives, 29 are extensions (which by definition do not affect the semantics of existing TeX documents), whilst just six (all concerned with the implementation of TeX--XeT) are associated with an enhancement. In addition to the new primitives, additional functionality was added to some existing primitives, and TeX's behaviour in some unusual boundary conditions was made more robust (this last has been subsumed in the most recent version of TeX, so this is no longer e-TeX-specific).

The new features are listed and briefly described below, clustered together to indicate related functionality. The technical terms used below to describe syntax entities as defined in The TeXbook.

## 3.1 Additional control over expansion

\protected
is a prefix, analogous to \long, \outer, and \global; it associates with the macro being defined an attribute which inhibits expansion of the macro in expansion-only contexts (for example, within the parameter text of a \write or \edef); if, however, the parser or command processor (TeX's oesophagus' and stomach', in Knuth's alimentary paradigm) is currently demanding a command, then the \protected macro will expand in the normal way. This behaviour is identical to that displayed by the explicit expansion of a token-list register through the use of \the; the same model is used elsewhere in e-TeX to achieve a consistent paradigm for partial expansion.
\detokenize,
when followed by a <general text>, expands to yield a sequence of character tokens of \catcode 10 (space) or 12 (other) corresponding to a decomposition of the tokens of the <balanced text> of the unexpanded <general text>>; c.f. \showtokens. The effect is rather as if \scantokens (q.v.) were applied to the <general text> within a régime in which only \catcodes 10 and 12 existed. Note that in order to preserve the boundaries between control words and any following letter, a space is yielded after each control word including the last.
\unexpanded,
when followed by a <general text>, expands to yield the <balanced text> of the unexpanded <general text>. No further expansion will occur if e-TeX is currently performing a \write, \edef, etc., but further expansion will occur if the parser or command processor is currently demanding a command. The effect is as if the <general text> were assigned to a token list register, and the latter were then partially expanded using \the, but no assignment actually takes place; thus \unexpanded can be used in expansion-only contexts.

is analogous to \read, but treats each character as if it were currently of \catcode 10 (space) or 12 (other); the text thus read is therefore suitable for being scanned and re-scanned (using \scantokens, q.v.) under different \catcode régimes.
\scantokens,
when followed by a <general text>, decomposes the <balanced text> of the <general text> into the corresponding sequence of characters as if the <balanced text> were written unexpanded to a file; it then uses TeX's \input mechanism to re-process these characters under the current \catcode régime. As the \input mechanism is used, even hex notation (^^xy) will be re-interpreted. Parentheses and a single space representing the pseudo-file will be displayed if \tracingscantokens (q.v.) is positive and non-zero.

## 3.3 Environmental enquiries

\eTeXrevision:
an primitive which expands to yield a sequence of character tokens of \catcode 12 (other; these represent the minor component of the combined version/revision number. Pre-release versions will be characterised by an initial minus sign (-), whilst post-release versions will be implicitly positive; both will contain an explicit leading decimal point, which will follow any minus sign present.
\eTeXversion:
an internal read-only integer representing the major component of the combined version/revision number.
\currentgrouplevel:
an internal read-only integer which returns the current group level (i.e. depth of nesting).
\currentgrouptype:
an internal read-only integer which returns the type of the innermost group as an integer in the range 0..16. Textual definitions of these types may be provided through an associated macro library, but it is intended that these definitions shall be easily replaceable by national language versions in environments within which English language texts are sub-optimal.
\ifcsname:
similar in effect to the sequence \unless \expandafter \ifx \expandafter \relax \csname but avoids the side-effect of the cs-name being ascribed the value \relax, and also does not rely on \relax having its canonical meaning. No hash-table entry is used if cs-name does not exist. (\unless is explained below.)
\ifdefined:
similar in effect to \unless \ifx \undefined, but does not require \undefined to actually be undefined, since no explicit comparison is made with any particular control sequence.
\lastnodetype:
an internal read-only integer which returns the type of the last node on the current list as an integer in the range -1..15+ (only values -1..15 are defined in the first release, but future releases may define additional values). Textual definitions of these types may be provided through an associated macro library.

## 3.4 Generalisation of the \mark concept: a class of \marks

\marks:
this is one of Knuth's possibly good ideas', listed at the end of TeX82.Bug; whereas TeX has only one \mark, which has to be over-loaded if more than one class of information is to be saved (e.g. over-loading is necessary if separate information for recto and verso pages is to be maintained), e-TeX has a whole class of \marks (256, in the first release); thus rather than writing \mark <general text> as in TeX, in e-TeX one writes \marks 8-bit number <general text>. For example, \marks 0 could be used to retain information for the verso page, whilst \marks 1 could retain information for the recto. There are equivalent classes for the five \marks variables \botmarks, \firstmarks, \topmarks, \splitfirstmarks and \splitbotmarks. It should be noted that \marks 0 and \mark are in fact identical, as are \topmarks 0 and \topmark, \botmarks 0 and \botmark and so on.

## 3.5 Bi-directional typesetting: the TeX--XeT primitives

TeX--XeT was developed by Peter Breitenlohner based on the original TeX-XeT of Donald Knuth and Pierre MacKay; whereas TeX-XeT generated non-standard DVI files, TeX--XeT generates perfectly normal DVI files which can therefore be processed by standard DVI drivers (assuming, of course, that the necessary fonts are available). Both systems permit the direction of typesetting (conventionally left-to-right in Western documents) to be reversed for part or all of a document, which is particularly useful when setting languages such as Hebrew or Arabic.

\beginL:
indicates the start of a region (e.g. a section of text, or a pre-constructed box) which should be set left-to-right;
\beginR:
indicates the start of a region which should be set right-to-left;
\endL:
indicates the end of a region which should be set left-to-right;
\endR:
indicates the end of a region which should be set right-to-left;
\TeXXeTstate:
an internal read/write integer, its value is zero or negative to indicate that TeX--XeT features are not to be used; a positive value indicates that they may be used. As the internal data structures built by TeX--XeT differ from those built by TeX, and as the typesetting of a document by TeX--XeT may therefore differ from that performed by TeX, \TeXXeTstate defaults to zero, and even if set positive during format creation will be re-set to zero before the format is dumped. Explicit user action therefore is required to enable TeX--XeT semantics, and TeX--XeT is therefore classed as an enhancement, not simply an extension.
\predisplaydirection:
an internal read/write integer, initialised by e-TeX to indicate the direction of the last partial paragraph before a display; it is used to control the placement of elements such as equation numbers, and can be explictly set to affect this placement.

\interactionmode:
whereas in TeX there exist only explicit commands such as \scrollmode, \errorstopmode, etc., in e-TeX read/write access is provided via \interactionmode (an internal integer); assigning a numeric value sets the associated mode, whilst the current mode may be ascertained by interrogating its value. Symbolic definitions of these values may be provided through an associated macro library.
\showgroups:
(e-)TeX has many different classes of group, which should normally be properly balanced and nested; if a nesting or imbalance error occurs, it can be very difficult to track down the source of the problem. \showgroups causes e-TeX to display the level and type of all active groups from the point within which it was called.
\showtokens,
when followed by a <general text>, displays a sequence of characters corresponding to the decomposition of the <balanced text> of the unexpanded <general text>; c.f. \detokenize.
\tracinggroups:
a further aid to debugging runaway-group problems, \tracinggroups (an internal read/write integer) causes e-TeX to trace entry and exit to every group while set to a positive non-zero value.
\tracingscantokens:
an internal read/write integer, assigning it a positive non-zero value will cause an open-parenthesis and space to be displayed whenever \scantokens is invoked; the matching close-parenthesis will be recorded when the scan is complete. If a traceback occurs during the expansion of \scantokens, the first displayed line number will reflect the logical line number of the pseudo-file created from the parameter to \scantokens; thus enabling \tracingscantokens can assist in identifying why an seemingly irrational line number is shewn as the source of error (the traceback always continues until the line number of the actual source file is displayed).
If \tracingcommands is greater than 2, additional information is displayed. [More detail needed here!]

## 3.7 Miscellaneous primitives

\everyeof:
this is another of Knuth's possibly good ideas', listed at the end of TeX82.Bug; analogous to the other \every... primitives, it takes as parameter a <balanced text>, the tokens of which are inserted when the end of a file (either real or virtual, if \scantokens is used) is reached. This allows \input statements to be used within the replacement text of \edefs, and allows totally arbitrary files to be \input within an e-TeX conditional, since the necessary \fi can be inserted before e-TeX complains that it has fallen off the end of the file. It should be noted that the \everyeof tokens are not inserted if the end-of-file is forced through the use of \endinput`.
\middle:
analogous to TeX's \left and \right, \middle specifies that the following delimiter is to serve both as a right and left delimiter; it will be set with spacing appropriate to a right delimiter w.r.t. the preceding atom(s), and with spacing appropriate to a left delimiter w.r.t. the succeeding atom(s).
\unless:
TeX has, by design, a rather sparse set of conditional primitives: \ifeof, \ifodd, \ifvoid, etc., have no complementary counterparts. Whilst this normally poses no problems since each accepts both a \then (implicit) and an \else (explicit) part, they fall down when used as the final \if... of a \loop ... \if ... \repeat construct, since no \else is allowed after the final \if.... \unless allows the sense of all Boolean conditionals to be inverted, and thus (for example) \unless \ifeof yields true iff end-of-file has not yet been reached.

## References:

TeX.Web
CTAN: tex-archive/systems/knuth/tex/tex.web
TeX82.Bug
CTAN: tex-archive/systems/knuth/errata/tex82.bug
Trip test
CTAN: tex-archive/systems/knuth/tex/tripman.tex
Plain.TeX
CTAN: tex-archive/systems/knuth/lib/plain.tex
TeX--XeT
CTAN: tex-archive/systems/knuth/tex--xet
etex.src
etex.src
Discussion List NTS-L
Subscribe with e-mail to the Listserver program listserv@vm.urz.uni-heidelberg.de.
Tie (written in C)
CTAN: tex-archive/web/tie
WebMerge (written in TeX)
webmerge.tex
PatchWeb (for PC, bundled with "dos-tp")
CTAN: tex-archive/systems/msdos/dos-tp/

The NTS team

(Put on the WWW by Bernd Raichle, Member of the NTS group; subsequently updated by Philip Taylor, with corrections by Peter Breitenlohner.)