C00001 00001
C00006 00002	% This program is copyright 1982 by D. E. Knuth all rights are reserved.
C00010 00003	@* \[1] Introduction.
C00042 00004	@* \[2] The character set.
C00056 00005	@* \[3] Input and output.
C00079 00006	@* \[4] String handling.
C00095 00007	@* \[5] On-line and off-line printing.
C00111 00008	@* \[6] Reporting errors.
C00133 00009	@* \[7] Arithmetic with scaled dimensions.
C00148 00010	@* \[8] Packed data.
C00158 00011	@* \[9] Dynamic memory allocation.
C00175 00012	@* \[10] Data structures for boxes and their friends.
C00211 00013	@* \[11] Memory layout.
C00224 00014	@* \[12] Displaying boxes.
C00243 00015	@* \[13] Destroying boxes.
C00247 00016	@* \[14] Copying boxes.
C00253 00017	@* \[15] The command codes.
C00267 00018	@* \[16] The semantic nest.
C00280 00019	@* \[17] The table of equivalents.
C00332 00020	@* \[18] The hash table.
C00351 00021	@* \[19] Saving and restoring equivalents.
C00372 00022	@* \[20] Token lists.
C00384 00023	@* \[21] Introduction to the syntactic routines.
C00391 00024	@* \[22] Input stacks and states.
C00423 00025	@* \[23] Maintaining the input stacks.
C00430 00026	@* \[24] Getting the next token.
C00459 00027	@* \[25] Expanding user macros.
C00481 00028	@* \[26] Basic scanning subroutines.
C00531 00029	@* \[27] Building token lists.
C00544 00030	@* \[28] File names.
C00572 00031	@* \[29] Font metric data.
C00628 00032	@* \[30] Device-independent file format.
C00665 00033	@* \[31] Shipping pages out.
C00721 00034	@* \[32] Packaging.
C00749 00035	@* \[33] Data structures for math mode.
C00778 00036	@* \[34] Subroutines for math mode.
C00799 00037	@* \[35] Typesetting math formulas.
C00853 00038	@* \[36] Alignment.
C00902 00039	@* \[37] Breaking paragraphs into lines.
C00963 00040	@* \[38] Breaking paragraphs into lines, continued.
C00992 00041	@* \[39] Pre-hyphenation.
C01004 00042	@* \[40] Post-hyphenation.
C01020 00043	@* \[41] Hyphenation.
C01038 00044	@* \[42] Initializing the hyphenation tables.
C01067 00045	@* \[43] Breaking vertical lists into pages.
C01084 00046	@* \[44] The page builder.
C01131 00047	@* \[45] The chief executive.
C01157 00048	@* \[46] Building boxes and lists.
C01218 00049	@* \[47] Building math lists.
C01267 00050	@* \[48] Conditional processing.
C01281 00051	@* \[49] Mode-independent processing.
C01325 00052	@* \[50] Dumping and undumping the tables.
C01350 00053	@* \[51] The main program.
C01362 00054	@* \[52] Debugging.
C01367 00055	@* \[53] Extensions.
C01389 00056	@* \[54] System-dependent changes.
C01390 00057	@* \[55] Index.
C01391 ENDMK
% This program is copyright 1982 by D. E. Knuth; all rights are reserved.
% Please don't make any changes to this file unless you are D. E. Knuth!
% Version 0 is fully implemented but not yet fully tested, so beware of bugs.

% Here is TeX material that gets inserted after \input webhdr
\def\hang{\hangindent 3em\ \unskip\!}
\def\textindent#1{\hangindent 2.5em\noindent\hbox to 2.5em{\hss#1 }\!}
\def\at{@@} % use for an at sign
\chcode@@=13 \def@@{\penalty999\ } % ties words together
\def\TeX{T\hbox{\hskip-.1667em\lower.424ex\hbox{E}\hskip-.125em X}}
\font b=cmr9 \def\mc{\:b} % medium caps for names like PASCAL
\def\PASCAL{{\mc PASCAL}}
\def\ph{{\mc PASCAL-H}}
\font L=manfnt % font used for the METAFONT logo
\def\MF{{\:L META}\-{\:L FONT}}

\def\(#1){} % this is used to make module names sort themselves better
\def\9#1{} % this is used for sort keys in the index via @:sort key}{entry@>

\outer\def\N#1. \[#2]#3.{\par\mark{#1}\vfill\eject % beginning of starred module
	\gdef\position{\:a#2\:ux\:a\topmark} % for part numbers

\def\topofcontents{\hsize 5.5in
	\topspace 0pt plus 1fil minus 1fil
	\def\]##1]{\hbox to 1in{\hfil##1.\ }}
\def\botofcontents{\vskip 0pt plus 1fil minus 1fil\setpage\let\]=\let}
\def\lheader{\hbox to1.5em{\:a\hss\count0}\:m\qquad\rhead\hfill\title\qquad
	\position} % top line on left-hand pages
	\hbox to1.5em{\:a\hss\count0}} % top line on right-hand pages
\setcount0 \contentspagenumber
\ctrline{(replace this page by the contents page printed later)}
@* \[1] Introduction.
\def\glob{13} % this should be the section number of "<Globals...>"
\def\gglob{20, 26} % this should be the next two sections of "<Globals...>"
This is \TeX, a document compiler intended to produce high-quality typesetting.
The \PASCAL\ program that follows is the definition of \TeX82, a standard
version of \TeX\ that is designed to be highly portable so that identical output
will be obtainable on a great variety of different computers.

The main purpose of the following program is to explain the algorithms of \TeX\
as clearly as possible. As a result, the program will not necessarily be very
efficient when a particular \PASCAL\ compiler has translated it into a
particular machine language. However, the program has been written so that it
can be tuned to run efficiently in a wide variety of operating environments
by making comparatively few changes. Such flexibility is possible because
the documentation that follows is written in the \.{WEB} language, which is
at a higher level than \PASCAL; the preprocessing step that converts \.{WEB}
to \PASCAL\ is able to introduce most of the necessary refinements.
Semi-automatic translation to other languages is also feasible, because the
program below does not make extensive use of features that are peculiar to

A large piece of software like \TeX\ has inherent complexity that cannot
be reduced below a certain level of difficulty, although each individual
part is fairly simple by itself. The \.{WEB} language is intended to make
the algorithms as readable as possible, by reflecting the way the
individual program pieces fit together and by providing the
cross-references that connect different parts. Detailed comments about
what is going on, and about why things were done in certain ways, have
been liberally sprinkled throughout the program.  These comments explain
features of the implementation, but they rarely attempt to explain the
\TeX\ language itself, since the reader is supposed to be familiar with
the \TeX\ user's manual.

@ The present implementation has a long ancestry, beginning in the summer
of@@1977, when Michael@@F. Plass and Frank@@M. Liang designed and coded
a prototype
@^Plass, Michael Frederick@>
@^Liang, Franklin Mark@>
@^Knuth, Donald Ervin@>
based on some specifications that the author had made in April of that year.
This original proto\TeX\ included macro definitions and elementary
manipulations on boxes and glue, but it did not have line-breaking,
page-breaking, mathematical formulas, alignment routines, error recovery,
or the present semantic nest; furthermore,
it used character lists instead of token lists, so that a control sequence
like \.{\\halign} was represented by a list of seven characters. A
complete version of \TeX\ was designed and coded by the author in late
1977 and early 1978; that program, like its prototype, was written in the
{\mc SAIL} language, for which an excellent debugging system was
available. Preliminary plans to convert the {\mc SAIL} code into a form
somewhat like the present ``web'' were developed by Luis Trabb@@Pardo and
the author at the beginning of 1979, and a complete implementation was
created by Ignacio@@A. Zabala in 1979 and 1980. The \TeX82 program, which
@^Zabala Salelles, Ignacio Andr\'es@>
was written by the author during the latter part of 1981 and the early
part of 1982, also incorporates ideas from the 1979 implementation of
@^Guibas, Leonidas Ioannis@>
@^Sedgewick, Robert@>
@^Wyatt, Douglas Kirk@>
\TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick,
and Douglas Wyatt at the Xerox Palo Alto Research Center.  Several hundred
refinements were introduced into \TeX82 based on the experiences gained with
the original implementations, so that essentially every part of the system
has been substantially improved.

No doubt there still is plenty of room for enhancements, but the author
is firmly committed to keeping \TeX82 ``frozen'' from now on; stability
and reliabil\-ity are to be its main virtues.

On the other hand, the \.{WEB} description can be extended without changing
the core of \TeX82 itself, and the program has been designed so that such
extensions are not extremely difficult to make.
The |banner| string defined here should be changed whenever \TeX\
undergoes any modifications, so that it will be clear which version of
\TeX\ might be the guilty party when a problem arises.
@^extensions to \TeX@>

@d banner=='This is TeX, Version -0.25' {printed when \TeX\ starts}

@ Different \PASCAL s have slightly different conventions, and the present
\def\ph{{\mc PASCAL-H}}
@!@:PASCAL H}{\ph@>
program expresses \TeX\ in terms of the \PASCAL\ that was
available to the author in 1982. The methods used here to work with
this particular compiler, which we shall call \ph, should help the
reader to see how to make an appropriate interface for other systems
if necessary. (\ph\ is Charles Hedrick's mod\-ifi\-ca\-tion of a compiler
@^Hedrick, Charles Locke@>
for the DECsystem-10 that was originally developed at the University of
Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976),
29--42. The \TeX\ program below is intended to be adaptable, without
extensive changes, to most other versions of \PASCAL, so it does not fully
use the admirable features of \ph. Indeed, a conscious effort has been
made here to avoid using several idiosyncratic features of standard
\PASCAL\ itself, so that most of the code can be mechanically translated
into other high-level languages.)

The portions of this program that involve system-dependent code, where
changes might be necessary because of differences between \PASCAL\ compilers
and/or differences between
operating systems, can be identified by looking at the sections whose
numbers are listed under `system dependencies' in the index. Furthermore,
the index entries for `dirty \PASCAL' list all places where the restrictions
of \PASCAL\ have not been followed perfectly, for one reason or another.
@!@^system dependencies@>
@!@^dirty \PASCAL@>

@ The program begins with a normal \PASCAL\ program heading, whose
components will be filled in later, using the conventions of \.{WEB}.
For example, the portion of the program called `\X\glob:Globals in the outer
block\X' here will be replaced by a sequence of variable declarations
that starts in $\section\glob$ of this documentation. In this way, we are able
to define each individual global variable when we are prepared to
understand what it means; we do not have to define all of the globals at
once.  Cross references in $\section\glob$, where it says ``See also
sections \gglob, $\ldots$,'' also make it possible to look at the set of
all global variables, if desired.  Similar remarks apply to the other
portions of the program heading.

Actually the heading shown here is not quite normal: The |program| line
does not mention any |output| file, because \ph\ would ask the \TeX\ user
to specify a file name if |output| were specified here.
@^system dependencies@>

@d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:}
@f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'}
@f type==true {but `|type|' will not be treated as a reserved word}

@p @t\4@>@<Compiler directives@>@/
program TEX; {all file names are defined dynamically}
label @<Labels in the outer block@>@/
const @<Constants in the outer block@>@/
mtype @<Types in the outer block@>@/
var@?@<Globals in the outer block@>@/
procedure initialize; {this procedure gets things started properly}
	var@?@<Local variables for initialization@>@/
	begin @<Initialize whatever \TeX\ might access@>@;
@t\4@>@<Basic printing procedures@>@/
@t\4@>@<Error handling procedures@>@/

@ The overall \TeX\ program begins with the heading just shown, after which
comes a bunch of procedure declarations and function declarations.
Finally we will get to the main program, which begins with the
comment `@!|start_here|'. If you want to skip down to the
main program now, you can look up `|start_here|' in the index.
But the author suggests that the best way to understand this program
is to follow pretty much the order of \TeX's components as they appear in the
\.{WEB} description you are now reading, since the present ordering is
intended to combine the advantages of the ``bottom up'' and ``top down''
approaches to the problem of understanding a somewhat complicated system.

@ Three labels must be declared in the main program, so we give them
symbolic names.

@d start_of_TEX=1 {go here when \TeX's variables are initialized}
@d end_of_TEX=9998 {go here to close files and terminate gracefully}
@d final_end=9999 {this label marks the ending of the program}

@<Labels in the out...@>=
start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end;
	{key control points}

@ Some of the code below is intended to be used only when diagnosing the
strange behavior that sometimes occurs when \TeX\ is being installed or
when system wizards are fooling around with \TeX\ without quite knowing
what they are doing. Such code will not normally be compiled; it is
delimited by the codewords `$|debug|\ldotsm|gubed|$', with apologies
to people who wish to preserve the purity of English. Similarly, there
is some conditional code delimited by `$|stat|\ldotsm|tats|$'
that is intended only for use when statistics
are to be kept about \TeX's memory usage.

@d debug==@{ {change this to `$\\{debug}\eqv\null$' when debugging}
@d gubed==@} {change this to `$\\{gubed}\eqv\null$' when debugging}
@f debug==begin
@f gubed==end
@d stat==@{ {change this to `$\\{stat}\eqv\null$' when gathering
	usage statistics}
@d tats==@} {change this to `$\\{tats}\eqv\null$' when gathering
	usage statistics}
@f stat==begin
@f tats==end

@ This program has two important variations: (1) There is a long and slow
version called \.{INITEX}, which does the extra calculations need to
initialize \TeX's internal tables; and (2)@@there is a shorter and faster
production version, which cuts the initialization to a bare minimum.
Parts of the program that are needed in (1) but not in (2) are delimited by
the codewords `$|init|\ldotsm|tini|$'.

@d init== {change this to `$\\{init}\eqv\.{@@\{}$' in the production version}
@d tini== {change this to `$\\{tini}\eqv\.{@@\}}$' in the production version}
@f init==begin
@f tini==end

@<Initialize whatever...@>=
@<Set initial values of key variables@>@/
init @<Initialize table entries (done by \.{INITEX} only)@>@;@+tini

@ If the first character of a \PASCAL\ comment is a dollar sign,
@^system dependencies@>
\ph\ treats the comment as a list of ``compiler directives'' that will
affect the translation of this program into machine language.  The
directives shown below specify full checking and inclusion of the \PASCAL\
debugger when \TeX\ is being debugged, but they cause range checking and other
redundant code to be eliminated when the production system is being generated.
Arithmetic overflow will be detected in all cases.
@^Overflow in arithmetic@>

@<Compiler directives@>=
@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead}
debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging}

@ This \TeX\ implementation conforms to the rules of the {\sl PASCAL User
@^system dependencies@>
Manual} published by Jensen and Wirth in 1975, except where system-dependent
@^Wirth, Niklaus@>
@^Jensen, Kathleen@>
code is necessary to make a useful system program, and except in another
respect where such conformity would unnecessarily obscure the meaning
and clutter up the code: We assume that |case| statements may include a
default case that applies if no matching label is found. Thus, we shall use
constructions like
|case x of|\cr
1: $\langle\,$code for $x=1\,\rangle$;\cr
3: $\langle\,$code for $x=3\,\rangle$;\cr
|othercases| $\langle\,$code for |x≠1| and |x≠3|$\,\rangle$\cr
since most \PASCAL\ compilers have plugged this hole in the language by
incorporating some sort of default mechanism. For example, the \ph\
compiler allows `|others|:' as a default label, and other \PASCAL s allow
syntaxes like `\!|else|\unskip' or `\\{otherwise}' or `\\{otherwise}:',
etc. The definitions of |othercases| and |endcases| should be changed to
agree with local conventions.  Note that no semicolon appears before
|endcases| in this program, so the definition of |endcases| should include
a semicolon if the compiler wants one. (Of course, if no default mechanism
is available, the |case| statements of \TeX\ will have to be laboriously
extended by listing all remaining cases. People who are stuck with such
\PASCAL s have in fact done this, successfully but not happily!)

@d othercases == others: {default for cases not listed explicitly}
@d endcases == @+end {follows the default case in an extended |case| statement}
@f othercases == else
@f endcases == end

@ The following parameters can be changed at compile time to extend or
reduce \TeX's capacity. They may have different values in \.{INITEX} and
in production versions of \TeX.
@^system dependencies@>

@!mem_max=30000; {greatest index in \TeX's internal |mem| array,
	must be strictly less than |max_halfword|}
@!buf_size=500; {maximum number of characters simultaneously present in
	current lines of open files; must not exceed |max_halfword|}
@!error_line=64; {width of context lines on terminal error messages}
@!half_error_line=32; {width of first lines of contexts in terminal
	error messages, should be between 30 and |error_line-15|}
@!max_print_line=72; {width of longest text lines output, should be at least 60}
@!stack_size=80; {maximum number of simultaneous input sources}
@!max_in_open=6; {maximum number of input files and error insertions that
	can be going on simultaneously}
@!font_max=75; {maximum internal font number, must not exceed |max_quarterword|}
@!bad_font_code=300; {user font codes must be less than this}
@!font_mem_size=15000; {number of words of |font_info| for all fonts}
@!param_size=30; {maximum number of simultaneous macro parameters}
@!nest_size=40; {maximum number of semantic levels simultaneously active}
@!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|}
@!string_vacancies=8000; {the minimum number of characters that should be
	available for the user's control sequences and font names,
	after \TeX's own error messages are stored}
@!pool_size=30000; {maximum number of characters in strings, including all
	error messages and help texts, and the names of all fonts and
	control sequences; must be at least 22000 more than |string_vacancies|}
@!align_size=4; {maximum number of simultaneous alignments}
@!save_size=300; {space for saving values outside of current group, must be
	at most |max_halfword|}
@!trie_size=7000; {space for hyphenation patterns, should be larger for
	\.{INITEX} than it is in production versions of \TeX}
@!dvi_buf_size=800; {size of the output buffer, must be a multiple of 8}
@!file_name_size=40; {file names shouldn't be longer than this}
@!pool_name='<TeX.sources>TEX.POOL                   ';
	{string of length |file_name_size|, tells where string pool appears}
@^system dependencies@>

@ Like the preceding parameters, the following quantities can be changed
at compile time to extend or reduce \TeX's capacity. But if they are changed,
it is necessary to rerun the initialization program \.{INITEX}
to generate new tables for the production \TeX\ program.
One can't simply make helter-skelter changes to the following constants,
since certain rather complex initialization
numbers are computed from them. They are defined here using
\.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to
emphasize this distinction.

@d mem_base=0 {smallest index in the |mem| array, must not be less
	than |min_halfword|}
@d hi_mem_base=12000 {smallest index in the single-word area of |mem|,
	must be substantially larger than |mem_base| and smaller than |mem_max|}
@d font_base=0 {smallest internal font number, must not be less
	than |min_quarterword|}
@d hash_size=2100 {maximum number of control sequences; it should be at most
	about |(mem_max-hi_mem_base)/6|, but 2100 is already quite generous}
@d hash_prime=1777 {a prime number equal to about 85\%\ of |hash_size|}
@d hyph_size=307 {another prime; the number of \.{\\hyphenation} exceptions}
@^system dependencies@>

@ In case somebody has inadvertently made bad settings of the ``constants,''
\TeX\ checks them using a global variable called |bad|.

This is the first of many sections of \TeX\ where global variables are

@!bad:integer; {is some ``constant'' wrong?}

@ Later on we will say `\!|if mem_max≥max_halfword then bad←10|', or
something similar. (We can't do that until |max_halfword| has been defined.)

@<Check the ``constant'' values for consistency@>=
if (half_error_line<30)∨(half_error_line>error_line-15) then bad←1;
if max_print_line<60 then bad←2;
if dvi_buf_size mod 8≠0 then bad←3;
if (hi_mem_base<mem_base+100)∨(hi_mem_base+100>mem_max) then bad←4;
if hash_prime>hash_size then bad←5;

@ Labels are given symbolic names by the following definitions, so that
occasional |goto| statements will be meaningful. We insert
the label `|exit|:' just before the `\!|end|\unskip' of a procedure in
which we have used the `|return|' statement defined below;
the label `|restart|' is occasionally used at the very beginning of a
procedure; and the label `|reswitch|' is occasionally used just prior to
a |case| statement in which some cases change the conditions and we wish to
branch to the newly applicable case.
Loops that are set up with the |loop| construction defined below are
commonly exited by going to `|done|' or to `|found|' or to `|not_found|',
and they are sometimes repeated by going to `|continue|'.
If two or more parts of a subroutine start differently but end up the same,
the shared code may be gathered together at `|common_ending|'.

Incidentally, this program never declares a label that isn't actually used,
because some fussy \PASCAL\ compilers will complain about redundant labels.

@d exit=10 {go here to leave a procedure}
@d restart=20 {go here to start a procedure again}
@d reswitch=21 {go here to start a case statement again}
@d continue=22 {go here to resume a loop}
@d done=30 {go here to exit a loop}
@d done1=31 {like |done|, when there is more than one loop}
@d done2=32 {for exiting the second loop in a long block}
@d done3=33 {for exiting the third loop in a very long block}
@d done4=34 {for exiting the fourth loop in an extremely long block}
@d done5=35 {for exiting the fifth loop in an immense block}
@d done6=36 {for exiting the sixth loop in a block}
@d found=40 {go here when you've found it}
@d found1=41 {like |found|, when there's more than one per routine}
@d found2=42 {like |found|, when there's more than two per routine}
@d not_found=45 {go here when you've found nothing}
@d common_ending=50 {go here when you want to merge with another branch}

@ Here are some macros for common programming idioms.

@d incr(#) == #←#+1 {increase a variable by unity}
@d decr(#) == #←#-1 {decrease a variable by unity}
@d loop == @+ while true do@+ {repeat over and over until a |goto| happens}
@f loop == xclause {\.{WEB}'s |xclause| acts like `\!|while true do|\unskip'}
@d do_nothing == {empty statement}
@d return == goto exit {terminate a procedure call}
@f return == nil
@* \[2] The character set.
In order to make \TeX\ readily portable between a wide variety of
computers, all of its input text is converted to an internal seven-bit
code that is essentially standard ascii, the ``American Standard Code for
Information Interchange.''  This conversion is done immediately when each
character is read in. Conversely, characters are converted from ascii to
the user's external representation just before they are output to a
text file. 

Such an internal code is relevant to users of \TeX\ primarily because it
governs the positions of characters in the fonts. For example, the
character `\.A' has ascii code $65=@'101$, and when \TeX\ typesets
this letter it specifies character number 65 in the current font.
If that font actually has `\.A' in a different position, \TeX\ doesn't
know what the real position is; the program that does the actual printing from
\TeX's device-independent files is responsible for converting from ascii to
a particular font encoding.
@^ascii code@>

\TeX's internal code is relevant also with respect to constants
that begin with a reverse apostrophe; and it provides an index to the
\.{\\chcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode}

@ Characters of text that have been converted to \TeX's internal form 
are said to be of type |ascii_code|, which is a subrange of the integers.

@!ascii_code=0..127; {seven-bit numbers}

@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit
character sets were common, so it did not make provision for lower case
letters. Nowadays, of course, we need to deal with both upper and lower case
alphabets in a convenient way, especially in a program for typesetting;
so the present specification of \TeX\ has been written under the assumption
that the \PASCAL\ compiler and run-time system permit the use of text files
with more than 64 distinguishable characters. More precisely, we assume that
the character set contains at least the letters and symbols associated
with ascii codes @'40 through @'176; all of these characters are now
available on most computer terminals.

Since we are dealing with more characters than were present in the first
\PASCAL\ compilers, we have to decide what to call the associated data
type. Some \PASCAL s use the original name |char| for the
characters in text files, even though there now are more than 64 such
characters, while other \PASCAL s consider |char| to be a 64-element
subrange of a larger data type that has some other name.

In order to accommodate this difference, we shall use the name |text_char|
to stand for the data type of the characters that are converted to and
from |ascii_code| when they are input and output. We shall also assume
that |text_char| consists of the elements |chr(first_text_char)| through
|chr(last_text_char)|, in\-clu\-sive. The following definitions should be
adjusted if necessary.
@^system dependencies@>

@d text_char == char {the data type of characters in text files}
@d first_text_char=0 {ordinal number of the smallest element of |text_char|}
@d last_text_char=127 {ordinal number of the largest element of |text_char|}

@<Local variables for init...@>=

@ The \TeX\ processor converts between ascii code and
the user's external character set by means of arrays |xord| and |xchr|
that are analogous to \PASCAL's |ord| and |chr| functions.

@!xord: array [text_char] of ascii_code;
	{specifies conversion of input characters}
@!xchr: array [ascii_code] of text_char;
	{specifies conversion of output characters}

@ Since we are assuming that our \PASCAL\ system is able to read and write the
visible characters of standard ascii (although not necessarily using the
ascii codes to represent them), the following assignment statements initialize
most of the |xchr| array properly, without needing any system-dependent

@<Set init...@>=
xchr[@'40]←' ';
xchr[0]←' '; xchr[@'177]←' ';
	{ascii codes 0 and |@'177| do not appear in text}

@ Some of the ascii codes without visible characters have been given symbolic
names in this program because they are used with a special meaning.

@d null_code=@'0 {ascii code that might disappear}
@d carriage_return=@'15 {ascii code used at end of line}
@d invalid_code=@'177 {ascii code that should not appear}

@ The ascii code is ``standard'' only to a certain extent, since many
computer installations have found it advantageous to have ready access
to more than 94 printing characters. Appendix@@C of the \TeX\ manual
gives a complete specification of the intended correspondence between
characters and \TeX's internal representation.

If \TeX\ is being used
on a garden-variety \PASCAL\ for which only standard ascii
codes will appear in the input and output files, it doesn't really matter
what codes are specified in |xchr[1..@'37]|, but the safest policy is to
blank everything out by using the code shown below.

However, other settings of |xchr| will make \TeX\ more friendly on
computers that have an extended character set, so that users can type things
like `\.\NE' instead of `\.{\\ne}'. At MIT, for example, it would be more
appropriate to substitute the code
$$\hbox{|for i←1 to @'37 do xchr[i]←chr(i);|}$$
\TeX's character set is essentially the same as MIT's, even with respect to
characters less than@@@'40. People with extended character sets can
assign codes arbitrarily, giving an |xchr| equivalent to whatever
characters the users of \TeX\ are allowed to have in their input files,
provided that unsuitable characters do not correspond to the special
codes like |carriage_return| that are listed above. It is best
to make the codes correspond to the intended interpretations as shown
in Appendix@@C whenever possible; but this is not necessary. For example,
in countries with an alphabet of more than 26 letters, it is usually best
to map the additional letters into codes less than@@@'40.
@^character set dependencies@>
@^system dependencies@>

@<Set init...@>=
for i←1 to @'37 do xchr[i]←' ';

@ The following system-independent code makes the |xord| array contain a
suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]|
where |i<j<@'177|, the value of |xord[xchr[i]]| will turn out to be
|j| or more; hence, standard ascii code numbers will be used instead of
codes below @'40 in case there is a coincidence.

@<Set init...@>=
for i←first_text_char to last_text_char do xord[chr(i)]←invalid_code;
for i←1 to @'176 do xord[xchr[i]]←i;
@* \[3] Input and output.
The bane of portability is the fact that different operating systems treat
input and output quite differently, perhaps because computer scientists
have not given sufficient attention to this problem. People have felt somehow
that input and output are not a part of ``real'' programming. Well, it is true
that some kinds of programming are more fun than others. With existing
input/output conventions being so diverse and so messy, the only sources of
joy in such parts of the code are the rare occasions when one can find a
way to make the program a little less bad than it might have been. We have
two choices: whether to attack I/O now and get it over with, or to postpone
it until near the end. Neither prospect is very attractive, so let's
get it over with.

The basic operations we need to do are (1)@@inputting and outputting of
text, to or from a file or the user's terminal; (2)@@inputting and
outputting of eight-bit bytes, to or from a file; (3)@@instructing the
operating system to initiate (``open'') or to terminate (``close'') input or
output from a specified file; (4)@@testing whether the end of an input
file has been reached.

Note that \TeX\ needs to deal with only two kinds of files.
We shall use the term |alpha_file| for a file that contains textual data,
and the term |byte_file| for a file that contains eight-bit binary information.
These two types turn out to be the same on many computers, but
sometimes there is a significant distinction, so we shall be careful to
distinguish between them. Standard protocols for transferring
such files from computer to computer, via high-speed networks, are
now becoming available to more and more communities of users.

\TeX\ actually makes use also of a third kind of file, called a |word_file|,
when dumping and reloading format information for its own initialization.
We shall define a word file later; but it will be possible for us to
do a few simple things with them in this section before they are defined.

@!eight_bits=0..255; {unsigned one-byte quantity}
@!alpha_file=packed file of text_char; {files that contain textual data}
@!byte_file=packed file of eight_bits; {files that contain binary data}

@ Most of what we need to do with respect to input and output can be handled
by the I/O facilities that are standard in \PASCAL, i.e., the routines
called |get|, |put|, |eof|, and so on. But
standard \PASCAL\ does not allow file variables to be associated with file
names that are determined at run time, so it cannot be used to implement
\TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite|
is crucial for our purposes. We shall assume that |name_of_file| is a variable
of an appropriate type such that the \PASCAL\ run-time system being used to
implement \TeX\ can open a file whose external name is specified by
@^system dependencies@>

@!name_of_file:packed array[1..file_name_size] of char;
	{on some systems this may be a \&{record} variable}
@!name_length:0..file_name_size; {this many characters are actually
	relevant in |name_of_file| (the rest are blank)}

@ The \ph\ compiler with which the present version of \TeX\ was pre\-pared has
extended the rules of \PASCAL\ in a very convenient way. To open file@@|f|,
we can write
|reset(f,@t\\{name}@>,'/O')|ⓧfor input;\cr
|rewrite(f,@t\\{name}@>,'/O')|ⓧfor output.\cr}}$$
The `\\{name}' parameter, which is of type `\!|packed
array[@t\<\\{any}>@>] of text_char|', stands for the name of
the external file that is being opened for input or output. 
Blank spaces that might appear in \\{name} are ignored.

The `\.{/O}' parameter tells the operating system not to issue its own
error messages if something goes wrong. If a file of the specified name
cannot be found, or if such a file cannot be opened for some other reason
(e.g., it might already be in use), we will have |eof(f)=true| after an
unsuccessful |reset|, and |eof(f)=false| after an unsuccessful |rewrite|.
This allows \TeX\ to undertake appropriate corrective action.
@:PASCAL H}{\ph@>

We can now implement the file-opening procedures in the following simple way,
where the functions return |false| if no file identified by |name_of_file|
could be opened:

@p function a_open_in(var f:alpha_file):boolean;
	{open a text file for input}
begin reset(f,name_of_file,'/O'); a_open_in←not eof(f);
function a_open_out(var f:alpha_file):boolean;
	{open a text file for output}
begin rewrite(f,name_of_file,'/O'); a_open_out←eof(f);
function b_open_in(var f:byte_file):boolean;
	{open a binary file for input}
begin reset(f,name_of_file,'/O'); b_open_in←not eof(f);
function b_open_out(var f:byte_file):boolean;
	{open a binary file for output}
begin rewrite(f,name_of_file,'/O'); b_open_out←eof(f);
function w_open_in(var f:word_file):boolean;
	{open a word file for input}
begin reset(f,name_of_file,'/O'); w_open_in←not eof(f);
function w_open_out(var f:word_file):boolean;
	{open a word file for output}
begin rewrite(f,name_of_file,'/O'); w_open_out←eof(f);

@ Files can be closed with the \ph\ routine `|close(f)|', which
@^system dependencies@>
should be used when all input or output with respect to |f| has been completed.
This makes |f| available to be opened again, if desired; and if |f| was used for
output, the |close| operation makes the corresponding external file appear
on the user's area, ready to be read.

@p procedure a_close(var f:alpha_file); {close a text file}
begin close(f);
procedure b_close(var f:byte_file); {close a binary file}
begin close(f);
procedure w_close(var f:word_file); {close a word file}
begin close(f);

@ Binary input and output are done with \PASCAL's ordinary |get| and |put|
procedures, so we don't have to make any other special arrangements for
binary@@I/O. Text output is also easy to do with standard \PASCAL\ routines.
The treatment of text input is more difficult, however, because
of the necessary translation to |ascii_code| values, and because
\TeX's conventions should be efficient and they should
blend nicely with the user's operating environment.

@ Input from text files is read one line at a time, using a routine called
|input_ln|. This function is defined in terms of global variables called
|buffer|, |first|, and |last| that will be described in detail later; for
now, it suffices for us to know that |buffer| is an array of |ascii_code|
values, and that |first| and |last| are indices into this array
representing the beginning and ending of a line of text.

@!buffer:array[0..buf_size] of ascii_code; {lines of characters being read}
@!first:0..buf_size; {the first unused position in |buffer|}
@!last:0..buf_size; {end of the line just input to |buffer|}
@!max_buf_stack:0..buf_size; {largest index used in |buffer|}

@ The |input_ln| function brings the next line of input from the specified
field into available positions of the buffer array and returns the value |true|,
unless the file has already been entirely read, in which case it returns
|false| (and does nothing else).  The |ascii_code| numbers that represent
the next line of the file are input into |buffer[first]|, |buffer[first+1]|,
$\ldotss$, |buffer[last-1]|; and the global variable |last| is set equal
to |first| plus the length of the line.

An overflow error is given, however, if the normal actions of |input_ln|
would make |last≥buf_size|; this is done so that other parts of \TeX\
can safely look at the contents of |buffer[last+1]| without overstepping
the bounds of the |buffer| array. Upon entry to |input_ln|, the condition
|first<buf_size| will always hold, so there is always room for an ``empty''

The variable |max_buf_stack|, which is used to keep track of how large
the |buf_size| parameter must be to accommodate the present job, is
also kept up to date by |input_ln|.

This procedure does a |get| before looking at the first character of the
line, and it does not do a |get| when it reaches the end of the line.
Therefore it can be used to acquire input from the user's terminal as well
as from ordinary text files. Other parts of \TeX\ take care of inputting
the first line of a file, so that the first character is not lost.

Since the inner loop of |input_ln| is part of \TeX's ``inner loop''---each
character of input comes in at this place---it is wise to reduce system
overhead by making use of special routines that read in an entire array
of characters at once, if such routines are available. The following
code uses standard \PASCAL\ to illustrate what needs to be done, but
finer tuning is often possible at well-developed \PASCAL\ sites.
@^inner loop@>

@p function input_ln(var f:alpha_file):boolean; {inputs the next line
	or returns |false|}
begin get(f); {input the first character of the line into |f^|}
if eof(f) then input_ln←false
else	begin last←first; {cf.\ Matthew 19:30}
	while not eoln(f) do
		begin if last≥max_buf_stack then
			begin max_buf_stack←last+1;
			if max_buf_stack=buf_size then
				overflow("buffer size",buf_size);
		buffer[last]←xord[f^]; get(f); incr(last);

@ The user's terminal acts essentially like other files of text, except
that it is used both for input and for output. When the terminal is
considered an input file, the file variable is called |term_in|, and when it
is considered an output file the file variable is |term_out|.
@^system dependencies@>

@!term_in:alpha_file; {the terminal as an input file}
@!term_out:alpha_file; {the terminal as an output file}

@ Here is how to open the terminal files in \ph:
@^system dependencies@>

@d t_open_in==reset(term_in,'TTY:','/O') {open the terminal for text input}
@d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output}

@ Sometimes it is necessary to synchronize the input/output mixture that
happens on the user's terminal, and two procedures are used for this
purpose. The first of these, |update_terminal|, is called when we want
to make sure that everything we have output to the terminal so far has
actually left the computer's internal buffers and been sent.
The other, |clear_terminal|, is called when we wish to cancel any
input that the user may have typed ahead (since we are about to
issue an unexpected error message). The following macros show how these
two operations can be specified in \ph:
@^system dependencies@>

@d update_terminal == break(term_out) {empty the terminal output buffer}
@d clear_terminal == break_in(term_in,true) {clear the terminal input buffer}

@ We need a special routine to read the first line of \TeX\ input from
the user's terminal. This line is special because it is read before we
have opened the error transcript file; there is sort of a ``chicken and
egg'' problem here. If the user types `\.{\\input paper}' on the first
line, or if some macro invoked by that line does such an \.{\\input},
the transcript file will be named `\.{paper.err}'; but if no \.{\\input}
commands are performed during the first line of terminal input, the transcript
file will acquire its default name `\.{texput.err}'. (The transcript file
will not contain error messages generated by the first line before the
first \.{\\input} command.)

The first line is even more special if we are lucky enough to have an operating
system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object
program. It's nice to let the user start running a \TeX\ job by typing
a command line like `\.{tex paper}'; in such a case, \TeX\ will operate
as if the first line of input were `\.{paper}', i.e., the first line will
consist of the remainder of the command line, after the part that invoked

@ Different systems have different ways to get started, but regardless of
what conventions are adopted the routine that initializes the terminal
should satisfy the following specifications:

\yskip\textindent{1)}It should open file |term_in| for input from the
	terminal. (The file |term_out| will already be open for output to the

\textindent{2)}If the user has given a command line, this line should be
	considered the first line of terminal input. Otherwise the
	user should be prompted with `\.{**}', and the first line of input
	should be whatever is typed in response.

\textindent{3)}The first line of input, which might or might not be a
	command line, should appear in locations 0 to |last-1| of the
	|buffer| array.

\textindent{4)}The global variable |loc| should be set so that the
	character that \TeX\ reads next is in |buffer[loc]|. This
	character should not be blank, and we should have |loc<last|.

\yskip\noindent(It may be necessary to prompt the user several times
before a non-blank line comes in. The prompt is `\.\#' instead of the
later `\.*' because the meaning is slightly different: `\\input' need
not be typed immediately after `\.\#'.)

@d loc==cur_input.loc_field {location of first unread character in |buffer|}

@ The following program does the required initialization
with\-out retrieving a possible command line.
It should be clear how to modify this routine to deal with command lines,
if the system permits them.
@^system dependencies@>

@p function init_terminal:boolean; {gets the terminal input started}
label exit;
begin t_open_in;
loop@+begin write(term_out,'**'); update_terminal;
	if not input_ln(term_in) then {this shouldn't happen}
		begin write_ln(term_out);
		write(term_out,'! End of file on the terminal... why?');
@.End of file on the terminal...@>
		init_terminal←false; return;
	while (loc<last)∧(buffer[loc]=" ") do incr(loc);
	if loc<last then
		begin init_terminal←true;
		return; {return unless the line was all blank}
	write_ln(term_out,'Please type the name of your input file.');
@* \[4] String handling.
Control sequence names and diagnostic messages are variable-length strings
of seven-bit characters. Since \PASCAL\ does not have a well-developed string
mechanism, \TeX\ does all of its string processing by homegrown methods.

Elaborate facilities for dynamic strings are not needed, so all of the
necessary operations can be handled with a simple data structure.
The array |str_pool| contains all of the (seven-bit) ascii codes in all
of the strings, and the array |str_start| contains indices of the starting
points of each string. Strings are referred to by integer numbers, so that
string number |s| comprises the characters |str_pool[j]| for 
|str_start[s]≤j<str_start[s+1]|. Additional integer variables
|pool_ptr| and |str_ptr| indicate the number of entries used so far
in |str_pool| and |str_start|, respectively; locations
|str_pool[pool_ptr]| and |str_start[str_ptr]| are
ready for the next string to be allocated.

String numbers 0 to 127 are reserved for strings that correspond to single
ascii characters. This is in accordance with the conventions of \.{WEB},
which converts single-character strings into the ascii code number of the
single character involved, while it converts other strings into integers
and builds a string pool file. Thus, when the string constant \.{"."} appears
in the program below, \.{WEB} converts it into the integer 46, which is the
ascii code for a period, while \.{WEB} will convert a string like \.{"hello"}
into some integer greater than@@127. String number 46 will presumably be the
single character `\..'; but some ascii codes have no standard visible
representation, and \TeX\ sometimes needs to be able to print an arbitrary
ascii character, so the first 128 strings are used to specify exactly what
should be printed for each of the 128 possibilities.

Elements of the |str_pool| array must be ascii codes that can actually be
printed; i.e., they must have an |xchr| equivalent in the local
character set.

@!pool_pointer = 0..pool_size; {for variables that point into |str_pool|}
@!str_number = 0..max_strings; {for variables that point into |str_start|}

@ @<Globals...@>=
@!str_pool:packed array[pool_pointer] of ascii_code; {the characters}
@!str_start : array[str_number] of pool_pointer; {the starting pointers}
@!pool_ptr : pool_pointer; {first unused position in |str_pool|}
@!str_ptr : str_number; {start of the current string being created}

@ Several of the elementary string operations are performed using \.{WEB}
macros instead of using \PASCAL\ procedures, because many of the
operations are done quite frequently and we want to avoid the
overhead of procedure calls. For example, here is
a simple macro that computes the length of a string.

@d length(#)==(str_start[#+1]-str_start[#]) {the number of characters
	in string number \#}

@ The length of the current string is called |cur_length|:

@d cur_length == (pool_ptr - str_start[str_ptr])

@ Strings are created by appending character codes to |str_pool|.
The macro called |append_char|, defined here, does not check to see if the
value of |pool_ptr| has gotten too high; this test is supposed to be
made before |append_char| is used. There is also a |flush_char|
macro, which erases the last character appended.

To test if there is room to append |l| more characters to |str_pool|,
we shall write |str_room(l)|, which aborts \TeX\ and gives an
apologetic error message if there isn't enough room.

@d append_char(#) == {put |ascii_code| \#\ at the end of |str_pool|}
begin str_pool[pool_ptr]←#; incr(pool_ptr);
@d flush_char == decr(pool_ptr) {forget the last character in the pool}
@d str_room(#) == begin if pool_ptr+# > pool_size then
	overflow("pool size",pool_size);

@ Once a sequence of characters has been appended to |str_pool|, it
officially becomes a string when the function |make_string| is called.
This function returns the identification number of the new string as its

@p function make_string : str_number; {current string enters the pool}
begin if str_ptr=max_strings then
	overflow("number of strings",max_strings);
incr(str_ptr); str_start[str_ptr]←pool_ptr;

@ To destroy the most recently made string, we say |flush_string|.

@d flush_string==begin decr(str_ptr); pool_ptr←str_start[str_ptr];

@ The following subroutine compares string |s| with another string of the
same length that appears in |buffer| starting at position |k|;
the result is |true| if and only if the strings are equal.

@p function str_eq_buf(@!s:str_number;@!k:integer):boolean;
	{test equality of strings}
label not_found; {loop exit}
var j: pool_pointer; {running index}
@!result: boolean; {result of comparison}
begin j←str_start[s]; result←false;
while j<str_start[s+1] do
	begin if str_pool[j]≠buffer[k] then goto not_found;
	incr(j); incr(k);
not_found: str_eq_buf←result;

@ Here is a similar routine, but it compares two strings in the string pool,
and it does not assume that they have the same length.

@p function str_eq_str(@!s,@!t:str_number):boolean;
	{test equality of strings}
label not_found; {loop exit}
var j,@!k: pool_pointer; {running indices}
@!result: boolean; {result of comparison}
begin result←false;
if length(s)≠length(t) then goto not_found;
j←str_start[s]; k←str_start[t];
while j<str_start[s+1] do
	begin if str_pool[j]≠str_pool[k] then goto not_found;
	incr(j); incr(k);
not_found: str_eq_str←result;

@ The initial values of |str_pool|, |str_start|, |pool_ptr|,
and |str_ptr| are computed by the \.{INITEX} program, based in part
on the information that \.{WEB} has output while processing \TeX.
@^string pool@>

@p init function init_strings:boolean; {initializes the string pool, but
	returns |false| if something goes wrong}
label done,exit;
var k,@!l:0..127; {small indices or counters}
@!m,@!n:text_char; {characters input from |pool_file|}
@!g:str_number; {garbage}
@!a:integer; {accumulator for check sum}
@!c:boolean; {check sum has checked}
begin pool_ptr←0; str_ptr←0;
@<Make the first 128 strings@>;
@<Read the other strings from the \.{TEX.POOL} file and return |true|,
	or give an error message and return |false|@>;

@ @<Make the first 128...@>=
for k←0 to 127 do
	begin if (k<" ")∧(@<Character |k| cannot be printed@>) then
		begin append_char("^"); append_char("^");
	else if k=127 then
		begin append_char("^"); append_char("^");
	else append_char(k);

@ The first 128 strings will contain 95 standard ascii characters, and the
other 33 characters will be printed in three-symbol form like `\.{\^\^A}'
unless a system-dependent change is made here. Installations that have
an extended character set, where for example |xchr[@'32]=@t\.{\'\NE\'}@>|,
would like string @'32 to be the single character @'32 instead of the
three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand,
even people with an extended character set will want to represent string
@'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to
produce visible strings instead of tabs or line-feeds or carriage-returns
or bell-rings or characters that are treated anomalously in text files.

The boolean expression defined here should be |true| unless \TeX\ internal code
number@@$k$ corresponds to a non-troublesome visible symbol in the local
char\-ac\-ter set, given that |k<@'40|.
At MIT, for example, the appropriate formula would be
`|k in [0,@'10..@'12,@'14,@'15,@'33]|'.
@^character set dependencies@>
@^system dependencies@>

@<Character |k| cannot be printed@>=

@ When the \.{WEB} system program called \.{TANGLE} processes the \.{TEX.WEB}
de\-scription that you are now reading, it outputs the \PASCAL\ program
\.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX}
program reads the latter file, where each string appears as a two-digit decimal
length followed by the string itself, and the information is recorded in
\TeX's string memory.

@!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}}

@ @d bad_pool(#)==begin write_ln(term_out,#); init_strings←false; return;
@<Read the other strings...@>=
name_of_file←pool_name; {we needn't set |name_length|}
if a_open_in(pool_file) then
	begin c←false;
	repeat @<Read one string, but return |false| if the
		string memory space is getting too tight for comfort@>;
	until c;
else	bad_pool('! I can''t read TEX.POOL.')
@.I can't read TEX.POOL@>

@ @<Read one string...@>=
begin if eof(pool_file) then bad_pool('! TEX.POOL has no check sum.');
@.TEX.POOL has no check sum@>
read(pool_file,m,n); {read two digits of string length}
if m='*' then @<Check the pool check sum@>
else	begin if (xord[m]<"0")∨(xord[m]>"9")∨@|
			(xord[n]<"0")∨(xord[n]>"9") then
		bad_pool('! TEX.POOL line doesn''t begin with two digits.');
@.TEX.POOL line doesn't...@>
	l←xord[m]*10+xord[n]-"0"*11; {compute the length}
	if pool_ptr+l+string_vacancies>pool_size then
		bad_pool('! You have to increase POOLSIZE.');
@.You have to increase POOLSIZE@>
	for k←1 to l do
		begin if eoln(pool_file) then bad_pool('! TEX.POOL line too short.');
@.TEX.POOL line too short@>
		read(pool_file,m); append_char(xord[m]);
	read_ln(pool_file); g←make_string;

@ The \.{WEB} operation \.{\at\$} denotes the value that should be at the
end of this \.{TEX.POOL} file; any other value means that the wrong pool
file has been loaded.
@^check sum@>

@<Check the pool check sum@>=
begin a←0; k←1;
loop@+	begin if (xord[n]<"0")∨(xord[n]>"9") then
	bad_pool('! TEX.POOL check sum doesn''t have nine digits.');
@.TEX.POOL check sum...@>
	if k=9 then goto done;
	incr(k); read(pool_file,n);
done: if a≠@$ then bad_pool('! TEX.POOL doesn''t match; TANGLE me again.');
@.TEX.POOL doesn't match@>
@* \[5] On-line and off-line printing.
Messages that are sent to a user's terminal and to the error-transcript file
are produced by several `|print|' procedures. These procedures will
direct their output to a variety of places, based on the setting of
the global variable |selector|, which has the following possible

\hang |term_and_err|, the normal setting, prints on the terminal and on the
	error-transcript file.

\hang |err_only|, prints only on the error-transcript file.

\hang |term_only|, prints only on the terminal.

\hang |no_print|, doesn't print at all. This is used only in rare cases
	before the error-transcript file is open.

\hang |pseudo|, puts output into a cyclic buffer that is used 
	by the |show_context| routine; see that routine for the
	reasoning behind this curious mode.

\hang |new_string|, appends the output to the current string in the
	string pool.

\hang 0 to 15, prints on one of the sixteen files for \.{\\send} output.

\noindent The symbolic names `|term_and_err|', etc., have been assigned
numeric codes that satisfy the convenient relations |no_print+1=term_only|,
|no_print+2=err_only|, |term_only+2=err_only+1=term_and_err|.

Two additional global variables, |tally| and |offset|, record
the number of characters that have been printed since they were most
recently cleared to zero. We use |tally| to record the length of
(possibly very long) stretches of printing; |offset|, on the other
hand, keeps track of how many characters have appeared so far on the current
line of text output.

@d no_print=16 {|selector| setting that makes data disappear}
@d term_only=17 {printing is destined for the terminal only}
@d err_only=18 {printing is destined for the transcript file only}
@d term_and_err=19 {normal |selector| setting}
@d pseudo=20 {special |selector| setting for |show_context|}
@d new_string=21 {printing is deflected to the string pool}
@d max_selector=21 {highest selector setting}

@!err_file : alpha_file; {transcript of \TeX\ session}
@!selector : 0..max_selector; {where to print a message}
@!dig : array[0..22] of 0..9; {digits in a number being output}
@!tally : integer; {the number of characters recently printed}
@!offset : 0..max_print_line; {the number of characters on the current line}
@!trick_buf:array[0..error_line] of ascii_code; {circular buffer for
@!trick_count: integer; {threshold for pseudoprinting, explained later}
@!first_count: integer; {another variable for pseudoprinting}

@ @<Initialize the output routines@>=
selector←term_only; tally←0; offset←0;

@ To end a line of text output, we call |print_ln|.

@<Basic print...@>=
procedure print_ln; {prints an end-of-line}
label exit;
begin case selector of
term_and_err: begin write_ln(term_out); write_ln(err_file);
err_only: write_ln(err_file);
term_only: write_ln(term_out);
no_print: do_nothing;
pseudo,new_string: return;
othercases write_ln(send_file[selector])
offset←0; {|tally| is not affected}

@ The |print_char| procedure sends one character to the desired destination,
using the |xchr| array to map it into an external character compatible with
|input_ln|. All printing comes through |print_ln| or |print_char|.

@d preserve_offset=90 {go here when |offset| is not to be changed}

@<Basic printing...@>=
procedure print_char(@!c:ascii_code); {prints a single character}
label preserve_offset;
begin case selector of
term_and_err: begin write(term_out,xchr[c]); write(err_file,xchr[c]); end;
err_only: write(err_file,xchr[c]);
term_only: write(term_out,xchr[c]);
no_print: do_nothing;
pseudo: begin if tally<trick_count then
		trick_buf[tally mod error_line]←c;
	goto preserve_offset;
new_string: begin if pool_ptr<pool_size then append_char(c);
	goto preserve_offset; {drop characters if the string space is full}
othercases begin write(send_file[selector],xchr[c]); goto preserve_offset;
if offset=max_print_line then print_ln;

@ An entire string is output by calling |print|. Note that if we are outputting
the single standard ascii character \.c, we could call |print("c")|, since
|"c"=99| is the number of a single-character string, as explained above. But
|print_char("c")| is quicker, so \TeX\ goes directly to the |print_char|
routine when it knows that this is safe.

@<Basic print...@>=
procedure print(@!s:integer); {prints string |s|}
var j:pool_pointer; {current character code position}
begin if (s<0)∨(s≥str_ptr) then s←0; {this can't happen}
while j<str_start[s+1] do
	begin print_char(str_pool[j]); incr(j);

@ Here is the very first thing that \TeX\ prints: a headline that identifies
the version number and format package. The |offset| variable is temporarily
incorrect, but the discrepancy is not serious since we assume that the banner
and format identifier together will occupy at most |max_print_line|
character positions.

@<Initialize the output...@>=
if format_ident=0 then write_ln(term_out,' (no format preloaded)')
else	begin print(format_ident); print_ln;

@ The procedure |print_nl| is like |print|, but it makes sure that the
string appears at the beginning of a new line.

@<Basic print...@>=
procedure print_nl(@!s:str_number); {prints string |s| at beginning of line}
begin if offset>0 then print_ln;

@ The procedure |print_esc| prints a string that is preceded by
a backslash.

@<Basic print...@>=
procedure print_esc(@!s:str_number); {prints escape character, then |s|}
begin print_char("\"); print(s);

@ An array of digits is printed by |print_digs|.

@<Basic print...@>=
procedure print_digs(@!k:eight_bits); {prints |dig[k-1]|$\ldotsm$|dig[0]|}
begin while k>0 do
	begin decr(k); print_char("0"+dig[k]);

@ The following procedure, which prints out the decimal representation of a
given integer |n|, has been written carefully so that it works properly
if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div|
to negative arguments, since such operations are not implemented consistently
by all \PASCAL\ compilers.

@<Basic print...@>=
procedure print_int(@!n:integer); {prints an integer in decimal form}
var k:0..20; {index to current digit; we assume that $|n|<10^{20}$}
@!m:nonnegative_integer; {used to negate |n| in possibly dangerous cases}
begin k←0;
if n<0 then
	begin print_char("-");
	if n>-100000000 then n←-n
	else	begin m←-1-n; n←m div 10; m←(m mod 10)+1; k←1;
		if m<10 then dig[0]←m
		else	begin dig[0]←0; n←n+1;
repeat dig[k]←n mod 10; n←n div 10; incr(k);
until n=0;

@ Octal printing of nonnegative integers is accomplished by |print_octal|.

@p procedure print_octal(@!n:integer); {prints a positive integer in octal form}
var k:0..22; {index to current digit; we assume that $0\L n<8^{22}$}
begin k←0; print_char("'");
repeat dig[k]←n mod 8; n←n div 8; incr(k);
until n=0;

@ In certain situations, \TeX\ prints either a standard visible ascii
character or its octal ascii code.

@p procedure print_ascii(@!c:integer); {prints a character or its code}
begin if (c>" ") and (c≤"~") then print_char(c)
else	begin if c<0 then print_char("-");
	print_octal(abs(c)); print_char(" ");

@ Roman numerals are produced by the |print_roman_int| routine, whose
argument should be a positive integer. Readers who like puzzles might enjoy
trying to figure out how this tricky code works; therefore no explanation
will be given.

@p procedure print_roman_int(@!n:nonnegative_integer);
label exit;
var j,@!k: pool_pointer; {mysterious indices into |str_pool|}
@!u,@!v: nonnegative_integer; {mysterious numbers}
begin j←str_start["m2d5c2l5x2v5i"]; v←1000;
loop@+	begin while n≥v do
		begin print_char(str_pool[j]); n←n-v;
	if n=0 then return;
	k←j+2; u←v div (str_pool[k-1]-"0");
	if str_pool[k-1]="2" then
		begin k←k+2; u←u div (str_pool[k-1]-"0");
	if n+u≥v then
		begin print_char(str_pool[k]); n←n+u;
	else	begin j←j+2; v←v div (str_pool[j-1]-"0");

@ The |print| subroutine will not print a string that is still being
created. The following procedure will.

@p procedure print_current_string; {prints a yet-unmade string}
var j:pool_pointer; {points to current character code}
begin j←str_start[str_ptr];
while j<pool_ptr do
	begin print_char(str_pool[j]); incr(j);

@ Here is a procedure that asks the user to type a line of input,
assuming that the |selector| setting is either |term_only| or |term_and_err|.
The input is placed into locations |first| through |last-1| of the
|buffer| array, and echoed on the transcript file if appropriate.

This procedure is never called when |interaction<scroll_mode|.

@d prompt_input(#)==begin print(#); term_input;
		end {prints a string and gets a line of input}

@p procedure term_input; {gets a line from the terminal}
var k:0..buf_size; {index into |buffer|}
begin update_terminal; {Now the user sees the prompt for sure}
if not input_ln(term_in) then fatal_error("! End of file on the terminal");
@.End of file on the terminal...@>
decr(selector); {prepare to echo the input}
if last≠first then for k←first to last-1 do print(buffer[k]);
print_ln; incr(selector); {restore previous status, with |offset=0|}
@* \[6] Reporting errors.
When something anomalous is detected, \TeX\ typically does something like this:
|print_nl("! Something anomalous has been detected");|\cr
|help3("This is the first line of my offer to help.")|\cr
|("This is the second line. I'm trying to")|\cr
|("explain the best way for you to proceed.");|\cr
A two-line help message would be given using |help2|, etc.; these informal
helps should use simple vocabulary that complements the words used in the
official error message that was printed. (Outside of the U.S.A., the help
messages should preferably be translated into the local vernacular. Each
line of help is at most 60 characters long, in the present implementation,
so that |max_print_line| will not be exceeded.)

The |error| procedure supplies a `\..' after the official message, then it
shows the location of the error; and if |interaction=error_stop_mode|,
it also enters into a dialog with the user, during which time the help
message may be printed.
@^system dependencies@>

@ The global variable |interaction| has four settings, representing increasing
amounts of user interaction:

@d batch_mode=0 {omits all stops and omits terminal output}
@d nonstop_mode=1 {omits all stops}
@d scroll_mode=2 {omits error stops}
@d error_stop_mode=3 {stops at every opportunity to interact}

@!interaction:batch_mode..error_stop_mode; {current level of interaction}

@ @<Set init...@>=interaction←error_stop_mode;

@ \TeX\ is careful not to call |error| when the print |selector| setting
might be unusual. The only possible values of |selector| at the time of
error messages are

\yskip\hang|no_print| (when |interaction=batch_mode|
	and |err_file| not yet open);

\hang|term_only| (when |interaction>batch_mode| and |err_file| not yet open);

\hang|err_only| (when |interaction=batch_mode| and |err_file| is open);

\hang|term_and_err| (when |interaction>batch_mode| and |err_file| is open).

@<Initialize the print |selector| based on |interaction|@>=
if interaction=batch_mode then selector←no_print@+else selector←term_only

@ A global variable |deletions_allowed| is set |false| if the |get_next|
routine is active when |error| is called; this ensures that |get_next|
and related routines like |get_token| will never be called recursively.

The global variable |spotless| is set to false when the first error
has been detected.

Another global variable, |error_count|, is increased by one when an
|error| occurs without an interactive dialog, and it is reset to zero at
the end of every paragraph.  If |error_count| reaches 100, \TeX\ decides
that there is no point in continuing further.

@!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?}
@!spotless:boolean; {has the source input been clean so far?}
@!error_count:-1..100; {the number of scrolled errors since the
	last paragraph ended}

@ @<Set init...@>=
deletions_allowed←true; spotless←true; error_count←0;

@ Since errors can be detected almost anywhere in \TeX, we want to declare the
error procedures near the beginning of the program. But the error procedures
in turn use some other procedures, which need to be declared |forward|
before we get to |error| itself.

It is possible for |error| to be called recursively if some error arises
when |get_token| is being used to delete a token, or if some fatal error
occurs while \TeX\ is trying to fix a non-fatal one. But such recursion
is never more than one level deep.

@<Error handling...@>=
procedure@?get_token; forward;@t\2@>@/
procedure@?term_input; forward;@t\2@>@/
procedure@?show_context; forward;@t\2@>@/
procedure@?begin_file_reading; forward;@t\2@>@/
procedure@?close_files_and_terminate; forward;@t\2@>@/
procedure@?clear_for_error_prompt; forward;@t\2@>@/
@t\4@>@;@+debug@+procedure@?debug_help; forward;@;@+gubed

@ Individual lines of help are recorded in the array |help_line|, which
contains entries in positions |0..(help_ptr-1)|. They should be printed
in reverse order, i.e., with |help_line[0]| last.

@d hlp1(#)==help_line[0]←#;@+end
@d hlp2(#)==help_line[1]←#; hlp1
@d hlp3(#)==help_line[2]←#; hlp2
@d hlp4(#)==help_line[3]←#; hlp3
@d hlp5(#)==help_line[4]←#; hlp4
@d hlp6(#)==help_line[5]←#; hlp5
@d help0==help_ptr←0 {sometimes there might be no help}
@d help1==@+begin help_ptr←1; hlp1 {use this with one help line}
@d help2==@+begin help_ptr←2; hlp2 {use this with two help lines}
@d help3==@+begin help_ptr←3; hlp3 {use this with three help lines}
@d help4==@+begin help_ptr←4; hlp4 {use this with four help lines}
@d help5==@+begin help_ptr←5; hlp5 {use this with five help lines}
@d help6==@+begin help_ptr←6; hlp6 {use this with six help lines}

@!help_line:array[0..5] of str_number; {helps for the next |error|}
@!help_ptr:0..6; {the number of help lines present}

@ The |quit| procedure just cuts across all active procedure levels and jumps
out to |end_of_TEX|. This is the only nonlocal |@!goto| statement in the
whole program. It is used when there is no recovery from a particular error.

Some \PASCAL\ compilers do not implement non-local |goto| statements.
@^system dependencies@>
In such cases the body of |quit| should simply be `|close_files_and_terminate|;'
followed by a call on some system procedure that quietly terminates the program.

@<Error hand...@>=
procedure quit;
begin goto end_of_TEX;

@ Here now is the general |error| routine.

@<Error hand...@>=
procedure error; {completes the job of error reporting}
label continue, exit;
var c:ascii_code; {what the user types}
@!s1,@!s2,@!s3:integer; {used to save global variables when deleting tokens}
begin print_char("."); show_context;
if interaction=error_stop_mode then @<Get user's advice and |return|@>;
if error_count=100 then
	begin print_nl("(That makes 100 errors; please try again.)");
@.That makes 100 errors...@>
@<Put help message on the transcript file@>;

@ @<Get user's advice...@>=
loop@+begin continue: clear_for_error_prompt; prompt_input("? ");
	if last=first then return;
	if c≥"a" then c←c+"A"-"a"; {convert to upper case}
	@<Interpret code |c| and |return| if done@>;

@ It is desirable to provide an `\.E' option here that gives the user
an easy way to return from \TeX\ to the system editor, with the offending
line ready to be edited. But such an extension requires some system
wizardry, so it is not standard in \TeX\ and not included here except
as a recommendation. (The name of the file to edit, if an `\.E' option is
added, is |input_stack[base_ptr].name_field|, provided that |base_ptr>0|;
the value of |base_ptr| has been set by |show_context|.
No input files are open when |base_ptr=0|.)
@^system dependencies@>

There is a secret `\.D' option available when the debugging routines have
not been commented out.

@<Interpret code |c| and |return| if done@>=
case c of
"1","2","3","4","5","6","7","8","9": if deletions_allowed then
	@<Delete |c-"0"| tokens, |goto continue|@>;
@t\4\4@>@;@+debug "D": begin debug_help; goto continue;@+end;@+gubed@/
"H": @<Print the help information, |goto continue|@>;
"I":@<Introduce new material from the terminal and |return|@>;
"Q","R","S":@<Change the interaction level and |return|@>;
"X":begin prompt_input("Type X again to exit:");
	if (last>first)∧((buffer[first]="x")∨(buffer[first]="X")) then
		begin interaction←scroll_mode; quit;
othercases do_nothing
@<Print the menu of available options@>

@ The `\.E' option in the following menu
should be taken out of braces if it is implemented.
@^system dependencies@>

@<Print the menu...@>=
print("Type <return> to proceed, S to scroll future error messages,");@/
print_nl("R to run without stopping, Q to run quietly,");@/
print_nl("I to insert something, ");@/
@{@,@,@+if base_ptr>0 then print("E to edit your file,");@;@+@}@/
if deletions_allowed then
	print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,");
print_nl("H for help, X to quit.")

@ Here the author of \TeX\ apologizes for making use of the numerical
relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings
|batch_mode|, |nonstop_mode|, |scroll_mode|.
@^Knuth, Donald Ervin@>

@<Change the interaction...@>=
begin error_count←0; interaction←batch_mode+c-"Q";
if c="Q" then decr(selector);
print_ln; return;

@ When the following code is executed, the material inserted by the user
may appear in |buffer[(first+1)..(last-1)]|; otherwise another prompt
will be given. In order to fully understand this part of the program,
you need to be familiar with \TeX's input stacks.

@<Introduce new material...@>=
begin begin_file_reading; {enter a new syntactic level for terminal input}
{now |state←mid_line|, so an initial blank space will count as a blank}
if last>first+1 then
	begin loc←first+1; buffer[first]←" ";
else	begin prompt_input("**"); loc←first;
cur_input.limit_field←last-1; {no |carriage_return| ends this line}

@ We allow deletion of up to 99 tokens at a time.

@<Delete |c-"0"| tokens...@>=
begin s1←cur_tok; s2←cur_cmd; s3←cur_chr; OK_to_interrupt←false;
if (last>first+1) and (buffer[first+1]≥"0")∧(buffer[first+1]≤"9") then
else c←c-"0";
while c>0 do
	begin get_token; {one-level recursive call of |error| is possible}
cur_tok←s1; cur_cmd←s2; cur_chr←s3; OK_to_interrupt←true;
help2("I have just deleted some text, as you asked.")@/
("You can now delete more, or insert, or whatever.");
show_context; goto continue;

@ @<Print the help info...@>=
begin if help_ptr=0 then
	help2("Sorry, I don't know how to help in this situation.")@/
		("Maybe you should try asking a human?");
repeat decr(help_ptr); print(help_line[help_ptr]); print_ln;
until help_ptr=0;
help4("Sorry, I already gave what help I could...")@/
	("Maybe you should try asking a human?")@/
	("An error might have occurred before I noticed any problems.")@/
	("``If all else fails, read the instructions.''");
goto continue;

@ @<Put help message on the transcript file@>=
if interaction>batch_mode then decr(selector); {avoid terminal output}
while help_ptr>0 do
	begin decr(help_ptr); print_nl(help_line[help_ptr]);
if interaction>batch_mode then incr(selector); {enable terminal output}

@ A dozen or so error messages end with a parenthesized integer, so we
save a teeny bit of program space by declaring the following procedure:

@p procedure int_error(@!n:integer);
begin print(" ("); print_int(n); print_char(")"); error;

@ The following procedure prints \TeX's last words before dying.

@d succumb==begin if interaction=error_stop_mode then
		interaction←scroll_mode; {no more interaction}
	error; quit; {irrecoverable error}

@<Error hand...@>=
procedure fatal_error(@!s:str_number); {prints |s|, and that's it}
begin print_nl("! Emergency stop"); help1(s); succumb;
@.Emergency stop@>

@ Here is the most dreaded error message.

@<Error hand...@>=
procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness}
begin print_nl("! TeX capacity exceeded, sorry [");
@.TeX capacity exceeded...@>
print(s); print_char("="); print_int(n); print_char("]"); 
help2("If you really absolutely need more capacity,")@/
	("you can ask a wizard to enlarge me.");

@ The program might sometime run completely amok and print out the
following message, which is really intended for the \TeX\ maintenance
person instead of the user.
@^dry rot@>

@<Error hand...@>=
procedure confusion(@!s:str_number);
	{consistency check violated; |s| tells where}
begin if spotless then
	begin print_nl("! This can't happen ("); print(s); print_char(")");
@.This can't happen@>
	help1("I'm broken. Please show this to someone who can fix can fix");
else	begin print_nl("! I can't go on meeting you like this");
@.I can't go on...@>
	help2("One of your earlier faux pas has wounded me deeply,")@/
		("so I'm barely conscious. Please fix it and try again.");

@ Users occasionally want to interrupt \TeX\ while it's running.
If the \PASCAL\ runtime system allows this, one can implement
a routine that sets the global variable |interrupt| to some nonzero value
when such an interrupt is signalled. Otherwise there is probably at least
a way to make |interrupt| nonzero using the \PASCAL\ debugger.
@^system dependencies@>

@d check_interrupt==begin if interrupt≠0 then pause_for_instructions;

@!interrupt:integer; {should \TeX\ pause for instruction?}
@!OK_to_interrupt:boolean; {should interrupts be observed?}

@ @<Set init...@>=
interrupt←0; OK_to_interrupt←true;

@ When an interrupt has been detected, the program comes goes into its
highest interaction level and lets the user have the full flexibility of
the |error| routine.  \TeX\ checks for interrupts only at times when it is
safe to do this.

@p procedure pause_for_instructions;
begin if OK_to_interrupt then
	begin interaction←error_stop_mode;
	if (selector=err_only)∨(selector=no_print) then
	print_nl("! Interruption");
	help3("You rang?")@/
	("Try to insert some instructions for me (e.g.,`I\showlists'),")@/
	("unless you just want to quit by typing `X'.");
	deletions_allowed←false; error; deletions_allowed←true;
@* \[7] Arithmetic with scaled dimensions.
The principal computations performed by \TeX\ are done entirely in terms of
integers less than $2^{31}$ in magnitude; and divisions are done only when both
dividend and divisor are nonnegative. Thus, the arithmetic specified in this
program can be carried out in exactly the same way on a wide variety of
computers, including some small ones. Why? Because the arithmetic
calculations need to be spelled out precisely in order to guarantee that
\TeX\ will produce identical output on different machines. If some
quantities were rounded differently in different implementations, we would
find that line breaks and even page breaks might occur in different places.
Hence the arithmetic of \TeX\ has been designed with care, and systems that
claim to be implementations of \TeX82 should follow precisely the
calculations as they appear in the present program.

(Actually there are three places where \TeX\ uses |div| with a possibly negative
numerator. These are harmless; see |div| in the index.)

@ Here is a routine that calculates half of an integer, using an
unambiguous convention with respect to signed odd numbers.

@p function half(@!x:integer):integer;
begin if odd(x) then half←(x+1) div 2
else half←x @!div 2;

@ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples
of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit
positions from the right end of a binary computer word.

@d unity == @'200000 {$2^{16}$, represents 1.00000}
@d two == @'400000 {$2^{17}$, represents 2.00000}

@!scaled = integer; {this type is used for scaled integers}
@!nonnegative_integer=0..@'17777777777; {$0\L x<2^{31}$}
@!small_number=0..63; {this type is self-explanatory}

@ The following function is used to create a scaled integer from a decimal
fraction $(.d↓0d↓1\ldotsm d↓{k-1})$, where |0≤k≤16|. The digit $d↓i$ is
given in |dig[i]|, and the calculation produces a correctly rounded result.

@p function round_decimals(@!k:small_number) : scaled;
	{converts a decimal fraction}
var a:integer; {the accumulator}
begin a←0;
while k>0 do
	begin decr(k); a←(a+dig[k]*two) div 10;
round_decimals←(a+1) div 2;

@ Conversely, here is a procedure analogous to |print_int|. If the output
of this procedure is subsequently read by \TeX\ and converted by the
|round_decimals| routine above, it turns out that the original value will
be reproduced exactly. [{\sl Proof:\/} If round$(x)=\lfloor
x+{1\over2}\rfloor$ and if $\alpha<1$, it is not difficult to verify that
round$(\alpha\,\hbox{round}( \alpha^{-1}n))=n$ for all integers $n$. In
our case $\alpha=2^{16}/10^5$.]

@p procedure print_scaled(@!s:scaled); {prints scaled real, rounded to five
var k:0..4; {index to current digit of the fraction part}
begin if s<0 then
	begin print_char("-"); s←-s; {print the sign, if negative}
print_int(s div unity); {print the integer part}
s←((s mod unity) * 3125 + 1024) div 2048;
	{now |0≤s<100000| is the fraction part}
repeat print_char("0"+(s div 10000)); s←10*(s mod 10000);
until s=0;

@ Physical sizes that a \TeX\ user specifies for portions of documents are
represented internally as scaled points. Thus, if we define an `sp' (scaled
point) as a unit equal to $2^{-16}$ printer's points, every dimension
inside of \TeX\ is an integer number of sp. There are exactly
4,736,286.72 sp per inch.  Users are not allowed to specify dimensions
larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583
meters); two such quantities can be added without overflow on a 32-bit

The present implementation of \TeX\ does not check for overflow when
@^Overflow in arithmetic@>
dimensions are added or subtracted. This could be done by inserting a
few dozen tests of the form `\!|if x≥@'10000000000 then
@t\\{report\_overflow}@>|', but the chance of overflow is so remote that
such tests do not seem worthwhile.

\TeX\ needs to do only a few arithmetic operations on scaled quantities,
other than addition and subtraction, and the following subroutines do most of
the work. A single computation might use several subroutine calls, and it is
desirable to avoid producing multiple error messages in case of arithmetic
overflow; so the routines set the global variable |arith_error| to |true|
instead of reporting errors directly to the user. Another global variable,
|remainder|, holds the remainder after a division.

@!arith_error:boolean; {has arithmetic overflow occurred recently?}
@!remainder:scaled; {amount subtracted to get an exact division}

@ The first arithmetical subroutine we need computes $nx+y$, where |x|
and@@|y| are |scaled| and |n| is an integer.

@p function nx_plus_y(@!n:integer;@!x,@!y:scaled):scaled;
begin if n<0 then
	begin x←-x; n←-n;
if n=0 then nx_plus_y←y
else if ((x≤(@'7777777777-y) div n)∧(-x≤(@'7777777777+y) div n)) then
else	begin arith_error←true; nx_plus_y←0;

@ We also need to divide scaled dimensions by integers.

@p function x_over_n(@!x:scaled;@!n:integer):scaled;
var negate:boolean; {should |remainder| be negated?}
begin negate←false;
if n=0 then
	begin arith_error←true; x_over_n←0; remainder←x;
else	begin if n<0 then
		begin x←-x; n←-n; negate←true;
	if x≥0 then
		begin x_over_n←x div n; remainder←x mod n;
	else	begin x_over_n←-((-x) div n); remainder←-((-x) mod n);
if negate then remainder←-remainder;

@ Then comes the multiplication of a scaled number by a fraction |n/d|,
where |n| and |d| are nonnegative integers |≤@t$2^{16}$@>| and |d| is
positive. It would be too dangerous to multiply by@@|n| and then divide
by@@|d|, in separate operations, since overflow might well occur; and it
would be too inaccurate to divide by |d| and then multiply by |n|. Hence
this subroutine simulates 1.5-precision arithmetic.

@p function xn_over_d(@!x:scaled; @!n,@!d:integer):scaled;
var positive:boolean; {was |x≥0|?}
@!t,@!u,@!v:nonnegative_integer; {intermediate quantities}
begin if x≥0 then positive←true
else	begin x←-x; positive←false;
t←(x mod @'100000)*n;
u←(x div @'100000)*n+(t div @'100000);
v←(u mod d)*@'100000 + (t mod @'100000);
if u div d≥@'100000 then arith_error←true
else u←@'100000*(u div d) + (v div d);
if positive then
	begin xn_over_d←u; remainder←v mod d;
else	begin xn_over_d←-u; remainder←-(v mod d);

@ The next subroutine is used to compute the ``badness'' of glue, when a
total@@|t| is supposed to be made from amounts that sum to@@|s|.  According
to the \TeX\ manual, the badness of this situation is $100(t/s)^3$;
however, badness is simply a heuristic, so we need not squeeze out the
last drop of accuracy when computing it. All we really want is an
approximation that has similar properties.

The actual method used to compute the badness is easier to read from the
program than to describe in words. It produces an integer value that is a
reasonably close approximation to $100(t/s)^3$, and all implementations
of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is
treated as infinitely bad, and represented by 10000.

It is not difficult to prove that $$\hbox{|badness(t+1,s)≥badness(t,s)
≥badness(t,s+1)|}$$. The badness function defined here is capable of
computing at most 1095 distinct values, but that is plenty.

@d inf_bad = 10000 {infinitely bad value}

@p function badness(@!t,@!s:scaled):halfword; {compute badness, given |t≥0|}
var r:integer; {approximation to $\alpha t/s$, where $\alpha^3\approx
begin if t=0 then badness←0
else if s≤0 then badness←inf_bad
else	begin if t≤7230584 then	r←(t*297) div s {$297^3=99.94\times2^{18}$}
	else if s≥1663497 then r←t div (s div 297)
	else r←t;
	if r>1290 then badness←inf_bad {$1290^3<2^{31}<1291^3$}
	else badness←(r*r*r+@'400000) div @'1000000;
	end; {that was $r^3/2^{18}$, rounded to the nearest integer}

@ When \TeX\ ``packages'' a list into a box, it needs to calculate the
proportionality ratio by which the glue inside the box should stretch
or shrink. This calculation does not affect \TeX's decision making,
so the precise details of rounding, etc., in the glue calculation are not
of critical importance for the consistency of results on different computers.

We shall use the type |glue_ratio| for such proportionality ratios.
An unsigned glue ratio should take the same amount of memory as an
|integer| (usually 32 bits) if it is to blend smoothly with \TeX's
other data structures. Thus |glue_ratio| should be equivalent to
|short_real| in some implementations of \PASCAL. Alter\-natively,
it is possible to deal with glue ratios using nothing but fixed-point
arithmetic; see {\sl TUGboat \bf3} (1982), 10--27.
@^system dependencies@>

@!glue_ratio=real; {one-word representation of a glue expansion factor}
@* \[8] Packed data.
In order to make efficient use of storage space, \TeX\ bases its major data
structures on a |memory_word|, which contains either a (signed) integer,
possibly scaled, or an (unsigned) |glue_ratio|, or a small number of
fields that are one half or one quarter of the size used for storing

If |x| is a variable of type |memory_word|, it contains up to four
fields that can be referred to as follows:
|x|&.|int|&(an |integer|)\cr
|x|&.|sc|\qquad&(a |scaled| integer)\cr
|x|&.|gr|&(a |glue_ratio|)\cr
|x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr
|x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword
|x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt
	&\qquad\qquad\qquad(four quarterword fields)\cr}}$$
This is somewhat cumbersome to write, and not very readable either, but
macros will be used to make the notation shorter and more transparent.
The \PASCAL\ code below gives a formal definition of |memory_word| and
its subsidiary types, using packed variant records. \TeX\ makes no
assumptions about the relative positions of the fields within a word.

Since we are assuming 32-bit integers, a halfword must contain at least
16 bits, and a quarterword must contain at least 8 bits.
@^system dependencies@>
But it doesn't hurt to have more bits; for example, with enough 36-bit
words you might be able to have |mem_max| as large as 262142, which is
eight times as much memory as anybody had during the first four years of
\TeX's existence.

N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled
by a \PASCAL\ that packs all of the |memory_word| variants into
the space of a single integer. This means, for example, that |glue_ratio|
words should be |short_real| instead of |real| on some computers. Some
\PASCAL\ compilers will pack an integer whose subrange is `|0..255|' into
an eight-bit field, but others insist on allocating space for an additional
sign bit; on such systems you can get 256 values into a quarterword only
if the subrange is `|-128..127|'.

The present implementation tries to accommodate as many variations as possible,
so it makes rather general assumptions. If integers having the subrange
`|min_quarterword..max_quarterword|' can be packed into a quarterword,
and if integers having the subrange `|min_halfword..max_halfword|'
can be packed into a halfword, everything should work satisfactorily.

It is usually most efficient to have |min_quarterword=min_halfword=0|,
so one should try to achieve this unless it causes a severe problem.
The values defined here are recommended for most 32-bit computers.

@d min_quarterword=0 {smallest allowable value in a |quarterword|}
@d max_quarterword=255 {largest allowable value in a |quarterword|}
@d min_halfword==0 {smallest allowable value in a |halfword|}
@d max_halfword==65535 {largest allowable value in a |halfword|}

@ Here are the inequalities that the quarterword and halfword values
must satisfy (or rather, the inequalities that they mustn't satisfy):

@<Check the ``constant''...@>=
if (min_quarterword>0)∨(max_quarterword<127) then bad←11;
if (min_halfword>0)∨(max_halfword<32767) then bad←12;
if (min_quarterword<min_halfword)∨@|
	(max_quarterword>max_halfword) then bad←13;
if (mem_base<min_halfword)∨(mem_max≥max_halfword) then bad←14;
if (font_base<min_quarterword)∨(font_max>max_quarterword) then bad←15;
if (save_size>max_halfword)∨(max_strings>max_halfword) then bad←16;
if (buf_size>max_halfword) then bad←17;

@ The operation of adding or subtracting |min_quarterword| occurs quite
frequently in \TeX, so it is convenient to abbreviate this operation
by using the macros |qi| and |qo| for input and output to and from
quarterword format.

The inner loop of \TeX\ will run faster with respect to compilers
that don't optimize expressions like `|x+0|' and `|x-0|', if these
macros are simplified in the obvious way when |min_quarterword=0|.
@^inner loop@>

@d qi(#)==(#+min_quarterword)
	{to put an |eight_bits| item into a quarterword}
@d qo(#)==(#-min_quarterword)
	{to take an |eight_bits| item out of a quarterword}

@ The reader should study the following definitions closely:
@^system dependencies@>

@d sc==int {|scaled| data is equivalent to |integer|}

@!quarterword = min_quarterword..max_quarterword; {1/4 of a word}
@!halfword=min_halfword..max_halfword; {1/2 of a word}
@!two_choices = 1..2; {used when there are two variants in a record}
@!four_choices = 1..4; {used when there are four variants in a record}
@!two_halves = packed record@;@/
	case two_choices of
	1: (@!lh:halfword);
	2: (@!b0:quarterword; @!b1:quarterword);
@!four_quarters = packed record@;@/
@!memory_word = packed record@;@/
	case four_choices of
	1: (@!int:integer);
	2: (@!gr:glue_ratio);
	3: (@!hh:two_halves);
	4: (@!qqqq:four_quarters);
@!word_file = file of memory_word;

@ When debugging, we may want to print a |memory_word| without knowing
what type it is; so we print it in all modes.
@^dirty \PASCAL@>@^debugging@>

@p debug procedure print_memory_word(@!w:memory_word); {prints |w| in all ways}
begin print_int(; print_char(" ");@/
print_scaled(; print_char(" ");@/
print_scaled(round(unity*; print_ln;@/
print_int(w.hh.lh); print_char("="); print_int(w.hh.b0); print_char(":");
print_int(w.hh.b1); print_char(";"); print_int(w.hh.rh); print_char(" ");@/
print_int(w.qqqq.b0); print_char(":"); print_int(w.qqqq.b1); print_char(":");
print_int(w.qqqq.b2); print_char(":"); print_int(w.qqqq.b3);
@* \[9] Dynamic memory allocation.
The \TeX\ system does nearly all of its own mem\-ory allocation, so that it
can readily be transported into environments that do not have automatic
facilities for strings, garbage collection, etc., and so that it can be in
control of what error messages the user receives. The dynamic storage
requirements of \TeX\ are handled by providing a large array |mem| in
which consecutive blocks of words are used as nodes by the \TeX\ routines.

Pointer variables are indices into this array, or into another array
called |eqtb| that will be explained below. A pointer variable might
also be a special flag that lies outside the bounds of |mem|, so we
allow pointers to assume any |halfword| value. The minimum halfword
value represents a null pointer.

@d pointer==halfword {a flag or a location in |mem| or |eqtb|}
@d null==min_halfword {the null pointer}

@!temp_ptr:pointer; {a pointer variable for occasional emergency use}

@ The |mem| array is divided once and for all into two regions that are
allocated separately. Locations less than |hi_mem_base| are used for storing
variable-length records consisting of two or more words each. This region
is maintained using an algorithm similar to the one described in exercise
2.5--19 of {\sl The Art of Computer Programming}. However, no size field
appears in the allocated nodes: the program is responsible for knowing the
relevant size when a node is freed. The remaining region of |mem| is
allocated in single words using a conventional \.{AVAIL} stack.

Incidentally, it would be feasible to construct implementations of \TeX\ that
are based on 16-bit words instead of 32-bit words, for machines having
comparatively small memories. In such cases it would be desirable to have
two parallel arrays for the upper part of memory, called say \\{mem\_link}|[p]|
and \\{mem\_info}|[p]|,
since the single-word region in the present implementation
consists entirely of |memory_word| items of type |two_halves|.
@^small computers@>

@!mem : array[mem_base..mem_max] of memory_word; {the big dynamic storage area}

@ In order to study the memory requirements of particular applications, it
is possible to prepare a version of \TeX\ that keeps track of current and
maximum memory usage. When code between the delimiters |stat| $\ldots$
|tats| is not ``commented out,'' \TeX\ will run a bit slower but it will
report these statistics whenever a page is output and |tracing_stats| is

@!var_used, @!dyn_used : integer; {how much memory is in use}
@!max_var_used : integer; {how much memory was in use}

@ Let's consider the one-word memory region first, since it's the
simplest. The pointer variable |mem_end| holds the highest-numbered location
of |mem| that has ever been used. The free locations of |mem| that
occur between |hi_mem_base| and |mem_end|, inclusive, are of type
|two_halves|, and we write |info(p)| and |link(p)| for the |lh|
and |rh| fields of |mem[p]| when it is of this type. The single-word
free locations form a linked list
terminated by |null|.

@d link(#) == mem[#].hh.rh {the |link| field of a memory word}
@d info(#) == mem[#].hh.lh {the |info| field of a memory word}

@!avail : pointer; {head of the list of available one-word nodes}
@!mem_end : pointer; {the last one-word node used in |mem|}

@ If one-word memory is exhausted, it might mean that the user has forgotten
a right brace. We will define some procedures later that try to help
pinpoint the trouble.

@p @<Declare the procedure called |show_token_list|@>@/
@<Declare the procedure called |runaway|@>

@ The function |get_avail| returns a pointer to a new one-word node whose
|link| field is null. However, \TeX\ will halt if there is no more room left.

@p function get_avail : pointer; {single-word node allocation}
var p:pointer; {the new node being got}
begin p←avail; {get top location in the |avail| stack}
if p≠null then avail←link(avail) {and pop it off}
else if mem_end<mem_max then {or go into virgin territory}
	begin incr(mem_end); p←mem_end;
else	begin runaway; {if memory is exhausted, display possible runaway text}
	overflow("memory size",mem_max-mem_base); {quit; all one-word nodes are busy}
link(p)←null; {provide an oft-desired initialization of the new node}
stat incr(dyn_used);@+tats@;{maintain statistics}

@ Conversely, a one-word node is recycled by calling |free_avail|.
This routine is part of \TeX's ``inner loop,'' so we want it to be fast.
@^inner loop@>

@d free_avail(#)== {single-word node liberation}
	begin link(#)←avail; avail←#;
	stat decr(dyn_used);@+tats@/

@ The procedure |flush_list(p)| frees an entire linked list of
one-word nodes that starts at position |p|.

@p procedure flush_list(@!p:pointer); {makes list of single-word nodes
var q:pointer; {the successor of node |p|}
begin while p≠null do
	begin q←link(p); free_avail(p); p←q;

@ The available-space list that keeps track of the variable-size portion
of |mem| is a nonempty, doubly-linked circular list of empty nodes,
pointed to by the roving pointer |rover|.

Each empty node has size 2 or more; the first word contains the special
value |max_halfword| in its |link| field and the size in its |info| field;
the second word contains the two pointers for double linking.

Each nonempty node also has size 2 or more. Its first word is of type
|two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|.
Otherwise there is complete flexibility with respect to the contents
of its other fields and its other words.

@d empty_flag == max_halfword {the |link| of an empty variable-size node}
@d is_empty(#) == (link(#)=empty_flag) {tests for empty node}
@d node_size == info {the size field in empty variable-size nodes}
@d llink(#) == info(#+1) {left link in doubly-linked list of empty nodes}
@d rlink(#) == link(#+1) {right link in doubly-linked list of empty nodes}

@!rover : pointer; {points to some node in the list of empties}

@ A call to |get_node| with argument |s| returns a pointer to a new node
of size@@|s|, which must be 2@@or more. The |link| field of the first word
of this new node is set to null. An overflow stop occurs if no suitable
space exists.

If |get_node| is called with $s=2^{30}$, it simply merges adjacent free
areas and returns the value |max_halfword|.

@p function get_node(@!s:integer):pointer; {variable-size node liberation}
label found,exit;
var p:pointer; {the node currently under inspection}
@!q:pointer; {the node physically after node |p|}
@!r:integer; {the newly allocated node, or a candidate for this honor}
begin p←rover; {start at some free node in the ring}
repeat @<Try to allocate within node |p| and its physical successors,
	and |goto found| if allocation was possible@>;
p←rlink(p); {move to the next node in the ring}
until p=rover; {repeat until the whole list has been traversed}
if s=@'10000000000 then
	begin get_node←max_halfword; return;
overflow("var size",hi_mem_base-mem_base); {sorry, nothing satisfactory is left}
found: link(r)←null; {this node is now nonempty}
stat var_used←var_used+s; {maintain usage statistics}
if var_used>max_var_used then max_var_used←var_used;

@ The following operations remove an empty node from the doubly-linked
list, knowing that it is not the only item in the list.

@d remove_node(#) ==@;
	if #=rover then rover←rlink(#);

@ @<Try to allocate...@>=
q←p+node_size(p); {find the physical successor}
while is_empty(q) do {merge |p| with |q|}
	begin remove_node(q); q←q+node_size(q);
if r>p+1 then @<Allocate from top of node |p| and |goto found|@>;
if (r=p) and ((rlink(p)≠rover) or (llink(p)≠rover)) then
	@<Allocate entire node |p| and |goto found|@>;
node_size(p)←q-p {reset the size in case it grew}

@ @<Allocate from top...@>=
begin node_size(p)←r-p; {store the remaining size}
rover←p; {start searching here next time}
goto found;

@ @<Allocate entire...@>=
begin remove_node(p); {delete node |p| from the ring}
rover←rlink(p); {let |rover| rove around}
goto found;

@ Conversely, when some variable-size node |p| of size |s| is no longer needed,
the operation |free_node(p,s)| will make its words available, by inserting
|p| as a new empty node just before where |rover| now points.

@p procedure free_node(@!p:pointer; @!s:halfword); {variable-size node
var q:pointer; {|llink(rover)|}
begin if @<Node |p| isn't in the variable-size |mem|@> then
@:confusion free_node}{\quad freenode@>
node_size(p)←s; link(p)←empty_flag;
q←llink(rover); llink(p)←q; rlink(p)←rover; {set both links}
llink(rover)←p; rlink(q)←p; {insert |p| into the ring}
stat var_used←var_used-s;@+tats@;{maintain statistics}

@ Just before \.{INITEX} writes out the memory, it sorts the doubly linked
available space list. The list is probably very short at such times, so a
simple insertion sort is used. The smallest available location will be
pointed to by |rover|, the next-smallest by |rlink(rover)|, etc.

@p init procedure sort_avail; {sorts the available variable-size nodes
	by location}
var p,@!q,@!r: pointer; {indices into |mem|}
@!old_rover:pointer; {initial |rover| setting}
begin p←get_node(@'10000000000); {merge adjacent free areas}
p←rlink(rover); rlink(rover)←max_halfword; old_rover←rover;
while p≠old_rover do @<Sort |p| into the list starting at |rover|
	and advance |p| to |rlink(p)|@>;
while rlink(p)≠max_halfword do
	begin llink(rlink(p))←p; p←rlink(p);
rlink(p)←rover; llink(rover)←p;

@ The following |while| loop terminates, since the list that starts at
|rover| ends with |max_halfword| during the sorting procedure.

@<Sort |p|...@>=
if p<rover then
	begin q←p; p←rlink(q); rlink(q)←rover; rover←q;
else	begin q←rover;
	while rlink(q)<p do q←rlink(q);
	r←rlink(p); rlink(p)←rlink(q); rlink(q)←p; p←r;
@* \[10] Data structures for boxes and their friends.
From the computer's standpoint, \TeX's chief mission is to create
horizontal and vertical lists. We shall now investigate how the elements
of these lists are represented internally as nodes in the dynamic memory.

A horizontal or vertical list is linked together by |link| fields in
the first word of each node. Individual nodes represent boxes, glue,
penalties, or special things like discretionary hyphens; because of this
variety, some nodes are longer than others, and we must distinguish different
kinds of nodes. We do this by putting a `|type|' field in the first word,
together with the link and an optional `|subtype|'.

@d type(#) == mem[#].hh.b0 {identifies what kind of node this is}
@d subtype(#) == mem[#].hh.b1 {secondary identification in some cases}

@ A |@!char_node|, which represents a single character, is the most important
kind of node because it accounts for the vast majority of all boxes.
Special precautions are therefore taken to ensure that a |char_node| does
not take up much memory space. Every such node is one word long, and in fact
it is identifiable by this property, since other kinds of nodes have at least
two words, and they appear in |mem| locations less than |hi_mem_base|.
This makes it possible to omit the |type| field in a |char_node|, leaving
us room for two bytes that identify a |font| and a |character| within
that font.

Note that the format of a |char_node| allows for up to 256 different
fonts and up to 256 characters per font; but most implementations will
probably limit the total number of fonts to fewer than 75 per job,
and most fonts will stick to characters less than 128 (since higher codes
are accessed outside of math mode only via ligatures and the \.{\\char}

Extensions of \TeX\ intended for oriental languages will need even more
than $256\times256$ possible characters, when we consider different sizes
@^oriental characters@>@^Chinese characters@>@^Japanese characters@>
and styles of type.  It is suggested that Chinese and Japanese fonts be
handled by representing such characters in two consecutive |char_node|
entries: The first of these would identify the font and the character
dimensions, and it would also link to the second, where the full halfword
|info| field would address the desired character. Such an extension of
\TeX\ would not be difficult; further details are left to the reader.
The saving feature about oriental characters is that most of them have
the same box dimensions.

In order to make sure that the |character| code fits in a quarterword,
\TeX\ adds the quantity |min_quarterword| to the actual code.

Character nodes appear only in horizontal lists, never in vertical lists.

@d is_char_node(#) == (#>hi_mem_base)
	{does the argument point to a |char_node|?}
@d font == type {the font code in a |char_node|}
@d character == subtype {the character code in a |char_node|}

@ An |hlist_node| stands for a box that was made from a horizontal list.
Each |hlist_node| is seven words long, and contains the following fields
(in addition to the mandatory |type| and |link|, which we shall not
mention explicitly when discussing the other node types): The |height| and
|width| and |depth| are scaled integers denoting the dimensions of the
box.  There is also a |shift_amount| field, a scaled integer indicating
how much this box should be lowered (if it appears in a horizontal list),
or how much it should be moved to the right (if it appears in a vertical
list). There is a |list_ptr| field, which points to the beginning of the
list from which this box was fabricated; if |list_ptr| is |null|, the box
is empty. Finally, there are three fields that represent the setting of
the glue:  |glue_set(p)| is a word of type |glue_ratio| that represents
the proportionality constant for glue setting; |glue_sign(p)| is
|stretching| or |shrinking| or |normal| depending on whether or not the
glue should stretch or shrink or remain rigid; and |glue_order(p)|
specifies the order of infinity to which glue setting applies (|normal|,
|fil|, |fill|, or |filll|). The |subtype| field is not used.

@d hlist_node=0 {|type| of hlist nodes}
@d box_node_size=7 {number of words to allocate for a box node}
@d width_offset=1 {position of |width| field in a box node}
@d depth_offset=2 {position of |depth| field in a box node}
@d height_offset=3 {position of |height| field in a box node}
@d width(#) == mem[#+width_offset].sc {width of the box, in sp}
@d depth(#) == mem[#+depth_offset].sc {depth of the box, in sp}
@d height(#) == mem[#+height_offset].sc {height of the box, in sp}
@d shift_amount(#) == mem[#+4].sc {repositioning distance, in sp}
@d list_offset=5 {position of |list_ptr| field in a box node}
@d list_ptr(#) == link(#+list_offset) {beginning of the list inside the box}
@d glue_order(#) == subtype(#+list_offset) {applicable order of infinity}
@d glue_sign(#) == type(#+list_offset) {stretching or shrinking}
@d normal=0 {the most common case when several cases are named}
@d stretching = 1 {glue setting applies to the stretch components}
@d shrinking = 2 {glue setting applies to the shrink components}
@d glue_set(#) == mem[#+6].gr {a word of type |glue_ratio| for glue setting}

@ The |new_null_box| function returns a pointer to an |hlist_node| in
which all subfields have the values corresponding to `\.{\\hbox\{\}}'.
The |subtype| field is set to |min_quarterword|, since that is the |span_count|
value that is desired in case this |hlist_node| is changed to an |unset_node|.

@p function new_null_box:pointer; {creates a new box node}
var p:pointer; {the new node}
begin p←get_node(box_node_size); type(p)←hlist_node;
width(p)←0; depth(p)←0; height(p)←0; shift_amount(p)←0; list_ptr(p)←null;
glue_sign(p)←normal; glue_order(p)←normal; glue_set(p)←0.0;

@ A |vlist_node| is like an |hlist_node| in all respects except that it
was made from a vertical list.

@d vlist_node=1 {|type| of vlist nodes}

@ A |rule_node| stands for a solid black rectangle; it has |width|,
|depth|, and |height| fields just as in an |hlist_node|. However, if
any of these dimensions is $-2^{30}$, the actual value will be determined
by running the rule up to the boundary of the innermost enclosing box.
This is called a ``running dimension.'' The |width| is never running in
an hlist; the |height| and |depth| are never running in a vlist.

@d rule_node=2 {|type| of rule nodes}
@d rule_node_size=4 {number of words to allocate for a rule node}
@d null_flag==-@'10000000000 {$-2^{30}$, signifies a missing item}
@d is_running(#) == (#=null_flag) {tests for a running dimension}

@ A new rule node is delivered by the |new_rule| function. It
makes all the dimensions ``running,'' so you have to change the
ones that are not allowed to run.

@p function new_rule:pointer;
var p:pointer; {the new node}
begin p←get_node(rule_node_size); type(p)←rule_node;
subtype(p)←0; {the |subtype| is not used}
width(p)←null_flag; depth(p)←null_flag; height(p)←null_flag;

@ Insertions are represented by |ins_node| records, where the |subtype|
indicates the corresponding box number. For example, `\.{\\insert 250}'
leads to an |ins_node| whose |subtype| is |250+min_quarterword|.
The |width| field of an |ins_node| is slightly misnamed; it actually holds
the natural height plus depth of the vertical list being inserted.
There is one more field, the |ins_ptr|, which points to the beginning of
the vlist for the insertion. In the present implementation, |ins_ptr|
occupies a full word, although half a word would be enough.

@d ins_node=3 {|type| of insertion nodes}
@d ins_node_size=3 {number of words to allocate for an insertion}
@d ins_ptr(#)==mem[#+2].int {the vertical list to be inserted}

@ A |mark_node| has a |mark_ptr| field that points to the reference count
of a token list that contains the user's \.{\\mark} text. Like the |ins_ptr|
field, this one occupies a full word instead of a halfword just because
there's nothing to put in the other halfword; it is easier in \PASCAL\ to
use the full word than to risk leaving garbage in the unused half.

@d mark_node=4 {|type| of a mark node}
@d small_node_size=2 {number of words to allocate for most node types}
@d mark_ptr(#)==mem[#+1].int {head of the token list for a mark}

@ An |adjust_node|, which occurs only in horizontal lists, specifies
material that will be moved out into the surrounding vertical list;
i.e., it is used to implement \TeX's `\.{\\vadjust}' operation.
The |adjust_ptr| field points to the vlist containing this material.

@d adjust_node=5 {|type| of an adjust node}
@d adjust_ptr==mark_ptr {vertical list to be moved out of horizontal list}

@ A |ligature_node|, which occurs only in horizontal lists, specifies a
composite character that was formed from two or more actual characters.
The second word of the node, which is called the |lig_char| word, contains
|font| and |character| fields just as in a |char_node|. The characters
that generated the ligature have not been forgotten, since they are needed
for diagnostic messages and for hyphenation; the |lig_ptr| field points to
a linked list of character nodes for those characters.

@d ligature_node=6 {|type| of a ligature node}
@d lig_char(#)==#+1 {the word where the ligature is to be found}
@d lig_ptr(#)==link(lig_char(#)) {the list of characters}

@ The |new_ligature| function creates a ligature node having given
contents of the |font|, |character|, and |lig_ptr| fields.

@p function new_ligature(@!f,@!c:quarterword; @!q:pointer):pointer;
var p:pointer; {the new node}
begin p←get_node(small_node_size); type(p)←ligature_node;
subtype(p)←0; {the |subtype| is not used}
font(lig_char(p))←f; character(lig_char(p))←c; lig_ptr(p)←q;

@ A |disc_node|, which occurs only in horizontal lists, specifies a
``dis\-cretion\-ary'' line break. If such a break occurs at node |p|, the text
that starts at |pre_break(p)| will precede the break, the text that starts at
|post_break(p)| will follow the break, and text that appears in the next
|replace_count(p)| nodes will be ignored. For example, an ordinary
discretionary hyphen, indicated by `\.{\\-}', yields a |disc_node| with
|pre_break| pointing to a |char_node| containing a hyphen, |post_break=null|,
and |replace_count=0|. All three of the discretionary texts must be
lists that consist entirely of character, kern, and ligature nodes.

If |pre_break(p)=null|, the |ex_hyphen_penalty| will be charged for this
break.  Otherwise the |hyphen_penalty| will be charged.  The texts will
actually be substituted into the list by the line-breaking algorithm if it
decides to make the break, and the discretionary node will disappear at
that time; thus, the output routine sees only discretionaries that were
not chosen.

@d disc_node=7 {|type| of a discretionary node}
@d replace_count==subtype {how many subsequent nodes to replace}
@d pre_break==llink {text that precedes a discretionary break}
@d post_break==rlink {text that follows a discretionary break}

@p function new_disc:pointer; {creates an empty |disc_node|}
var p:pointer; {the new node}
begin p←get_node(small_node_size); type(p)←disc_node;
replace_count(p)←0; pre_break(p)←null; post_break(p)←null;

@ A |whatsit_node| is a wild card reserved for extensions to \TeX. The
|subtype| field in its first word says what `\\{whatsit}' it is, and
implicitly determines the node size (which must be 2 or more) and the
format of the remaining words. When a |whatsit_node| is encountered
in a list, special actions are invoked; knowledgeable people who are
careful not to mess up the rest of \TeX\ are able to make \TeX\ do new
things by adding code to these extension modules. For example, there
might be a `\TeX nicolor' extension to specify different colors of ink,
@^extensions to \TeX@>
and the whatsit node might contain the desired parameters.

The present implementation of \TeX\ treats the features associated with
`\.{\\send}' and `\.{\\xsend}' as if they were extensions, in order to
illustrate how such routines might be coded. We shall defer further
discussion of extensions until the end of this program.

@d whatsit_node=8 {|type| of special extension nodes}

@ A |math_node|, which occurs only in horizontal lists, appears before and
after mathematical formulas. The |subtype| field is |before| before the
formula and |after| after it. There is a |width| field, which represents
the amount of surrounding space inserted by \.{\\mathsurround}.

@d math_node=9 {|type| of a math node}
@d before=0 {|subtype| for math node that introduces a formula}
@d after=1 {|subtype| for math node that winds up a formula}

@p function new_math(@!w:scaled;@!s:small_number):pointer;
var p:pointer; {the new node}
begin p←get_node(small_node_size); type(p)←math_node;
subtype(p)←s; width(p)←w; new_math←p;

@ \TeX\ makes use of the fact that |hlist_node|, |vlist_node|,
|rule_node|, |ins_node|, |mark_node|, |adjust_node|, |ligature_node|,
|disc_node|, |whatsit_node|, and |math_node| are at the low end of the
type codes, by permitting a break at glue in a list if and only if the
|type| of the previous node is |math_node| or less. Furthermore, a
node is discarded after a break if its type is |math_node| or more.

@d precedes_break(#)==(type(#)≤math_node)
@d non_discardable(#)==(type(#)<math_node)

@ A |glue_node| represents glue in a list. However, it is really only
a pointer to a separate glue specification, since \TeX\ makes use of the
fact that many essentially identical nodes of glue are usually present.
If |p| points to a |glue_node|, |glue_ptr(p)| points to
another packet of words that specify the stretch and shrink components, etc.

Glue nodes also serve to represent leaders; the |subtype| is used to
distinguish between ordinary glue (which is called |normal|) and the three
kinds of leaders (which are called |a_leaders|, |c_leaders|, and |x_leaders|).
The |leader_ptr| field points to a rule node or to a box node containing the
leaders; it is set to |null| in ordinary glue nodes.

Many kinds of glue are computed from \TeX's ``skip'' parameters, and
it is helpful to know which parameter has led to a particular glue node.
Therefore the |subtype| is set to indicate the source of glue, whenever
it originated as a parameter. We will be defining symbolic names for the
parameter numbers later (e.g., |line_skip_code=0|, |baseline_skip_code=1|,
etc.); it suffices for now to say that the |subtype| of parametric glue
will be the same as the parameter number, plus one.

In math formulas there are two more possibilities for the |subtype| in a
glue node: |mu_glue| denotes an \.{\\mskip} (where the units are scaled \.{mu}
instead of scaled \.{pt}); and |cond_math_glue| denotes the `\.{\\nonscript}'
feature that cancels the glue node immediately following if it appears
in a subscript.
@d glue_node=10 {|type| of node that points to a glue specification}
@d mu_glue=98 {|subtype| for mskip glue}
@d cond_math_glue=99 {special |subtype| to suppress glue in the next node}
@d a_leaders=100 {|subtype| for aligned leaders}
@d c_leaders=101 {|subtype| for centered leaders}
@d x_leaders=102 {|subtype| for expanded leaders}
@d glue_ptr==llink {pointer to a glue specification}
@d leader_ptr==rlink {pointer to box or rule node for leaders}

@ A glue specification has a halfword reference count in its first word,
@^reference counts@>
representing |null| plus the number of glue nodes that point to it (less one).
Note that the reference count appears in the same position as
the |link| field in list nodes; this is the field that is initialized
to |null| when a node is allocated, and it is also the field that is flagged
by |empty_flag| in empty nodes.

Glue specifications also contain three |scaled| fields, for the |width|,
|stretch|, and |shrink| dimensions. Finally, there are two one-byte
fields called |stretch_order| and |shrink_order|; these contain the
orders of infinity (|normal|, |fil|, |fill|, or |filll|)
corresponding to the stretch and shrink values.

@d glue_spec_size=4 {number of words to allocate for a glue specification}
@d glue_ref_count(#) == link(#) {reference count of a glue specification}
@d stretch(#) == mem[#+2].sc {the stretchability of this glob of glue}
@d shrink(#) == mem[#+3].sc {the shrinkability of this glob of glue}
@d stretch_order == type {order of infinity for stretching}
@d shrink_order == subtype {order of infinity for shrinking}
@d fil=1 {first-order infinity}
@d fill=2 {second-order infinity}
@d filll=3 {third-order infinity}

@!glue_ord=normal..filll; {infinity to the 0, 1, 2, or 3 power}

@ Here is a function that returns a pointer to a copy of a glue spec.

@p function new_spec(@!p:pointer):pointer; {duplicates a glue specification}
var q:pointer; {the new spec}
begin q←get_node(glue_spec_size);@/
mem[q]←mem[p]; glue_ref_count(q)←null;@/
width(q)←width(p); stretch(q)←stretch(p); shrink(q)←shrink(p);

@ And here's a function that creates a glue node for a given parameter;
for example, |new_param_glue(disp_skip_code)| returns a pointer to a glue
node for the current \.{\\dispskip}.

@p function new_param_glue(@!n:small_number):pointer;
var p:pointer; {the new node}
@!q:pointer; {the glue specification}
begin p←get_node(small_node_size); type(p)←glue_node; subtype(p)←n+1;
q←@<Current |mem| equivalent of glue parameter number |n|@>@t@>;
glue_ptr(p)←q; incr(glue_ref_count(q));

@ Glue nodes that are more or less anonymous are created by |new_glue|,
whose argument points to a glue specification.

@p function new_glue(@!q:pointer):pointer;
var p:pointer; {the new node}
begin p←get_node(small_node_size); type(p)←glue_node; subtype(p)←normal;
leader_ptr(p)←null; glue_ptr(p)←q; incr(glue_ref_count(q));

@ Still another subroutine is needed: this one is sort of a combination
of |new_param_glue| and |new_glue|. It creates a glue node for one of
the current glue parameters, but it makes a fresh copy of the glue
specification, since that specification will probably be subject to change,
while the parameter will stay put. The global variable |temp_ptr| is
set to the address of the new spec.

@p function new_skip_param(@!n:small_number):pointer;
var p:pointer; {the new node}
begin temp_ptr←new_spec(@<Current |mem| equivalent of glue parameter...@>);
p←new_glue(temp_ptr); glue_ref_count(temp_ptr)←null; subtype(p)←n+1;

@ A |kern_node| has a |width| field to specify a (normally negative)
amount of spacing. This spacing correction appears in horizontal lists
between letters like A and V when the font designer said that it looks
better to move them closer together or further apart. A kern node can
also appear in a vertical list, when its `|width|' denotes additional
spacing in the vertical direction. The |subtype| is either |normal| or
|mu_glue|, where the latter is used for \.{\\mkern} specifications
in math formulas.

@d kern_node=11 {|type| of a kern node}

@ The |new_kern| function creates a kern node having a given width.

@p function new_kern(@!w:scaled):pointer;
var p:pointer; {the new node}
begin p←get_node(small_node_size); type(p)←kern_node;

@ A |penalty_node| specifies the penalty associated with line or page
breaking, in its |penalty| field. This field is a fullword integer, but
the full range of integer values is not used: Any penalty |≥10000| is
treated as infinity, and no break will be allowed for such high values.
Similarly, any penalty |≤-10000| is treated as negative infinity, and a
break will be forced.

@d penalty_node=12 {|type| of a penalty node}
@d inf_penalty=inf_bad {``infinite'' penalty value}
@d eject_penalty=-inf_penalty {``negatively infinite'' penalty value}
@d penalty(#) == mem[#+1].int {the added cost of breaking a list here}

@ Anyone who has been reading the last few sections of the program will
be able to guess what comes next.

@p function new_penalty(@!m:integer):pointer;
var p:pointer; {the new node}
begin p←get_node(small_node_size); type(p)←penalty_node;
subtype(p)←0; {the |subtype| is not used}
penalty(p)←m; new_penalty←p;

@ You might think that we have introduced enough node types by now. Well,
almost, but there is one more: An |unset_node| has nearly the same format
as an |hlist_node| or |vlist_node|; it is used for entries in \.{\\halign}
or \.{\\valign} that are not yet in their final form, since the box
dimensions are their ``natural'' sizes before any glue adjustment has been
made. The |glue_set| word is not present; instead, we have a |glue_stretch|
field, which contains the total stretch of order |glue_order| that is
present in the hlist or vlist being boxed.
Similarly, the |shift_amount| field is replaced by a |glue_shrink| field,
containing the total shrink of order |glue_sign| that is present.
The |subtype| field is called |span_count|; an unset box typically
contains the data for |qo(span_count)+1| columns.
Unset nodes will be changed to box nodes when alignment is completed.

@d unset_node=13 {|type| for an unset node}
@d glue_stretch(#)==mem[#+6].sc {total stretch in an unset node}
@d glue_shrink==shift_amount {total shrink in an unset node}
@d span_count==subtype {indicates the number of spanned columns}

@ In fact, there are still more types coming. When we get to math formula
processing we will see that a |style_node| has |type=14|; and a number
of larger type codes will also be defined, for use in math mode only.

@ Warning: If any changes are made to these data structure layouts, such as
changing any of the node sizes or even reordering the words of nodes,
the |copy_node_list| procedure and the memory initialization code
below may have to be changed. However, other references to the nodes are made
symbolically in terms of the \.{WEB} macro definitions above, so format changes
will leave \TeX's other algorithms intact.
@^system dependencies@>
@* \[11] Memory layout.
Some areas of |mem| are dedicated to fixed usage, since static allocation is
more efficient than dynamic allocation when we can get away with it. For
example, locations |mem_base| to |mem_base+3| are always used to store the
specification for glue that is `\.{0pt plus 0pt minus 0pt}'. The
following macro definitions accomplish the static allocation by giving
symbolic names to the fixed positions. Dynamic allocation of variable-size
nodes is restricted to locations |first_mem| through |(hi_mem_base-1)|,
and single-word nodes are dynamically allocated in locations |second_mem|
through |mem_max|, inclusive.

@d zero_glue==mem_base {specification for \.{0pt plus 0pt minus 0pt}}
@d fil_glue==zero_glue+glue_spec_size {\.{0pt plus 1fil minus 0pt}}
@d fill_glue==fil_glue+glue_spec_size {\.{0pt plus 1fill minus 0pt}}
@d ss_glue==fill_glue+glue_spec_size {\.{0pt plus 1fil minus 1fil}}
@d fil_neg_glue==ss_glue+glue_spec_size {\.{0pt plus -1fil minus 0pt}}
@d first_mem==fil_neg_glue+glue_spec_size {first dynamically allocatable word
	in the variable-size |mem|}
@d page_ins_head==hi_mem_base {list of insertion data for current page}
@d contrib_head==hi_mem_base+1 {vlist of items not yet on current page}
@d page_head==hi_mem_base+2 {vlist for current page}
@d temp_head==hi_mem_base+3 {head of a temporary list of some kind}
@d hold_head==hi_mem_base+4 {head of a temporary list of another kind}
@d scan_head==hi_mem_base+5 {head of token list built by |scan_keyword|}
@d adjust_head==hi_mem_base+6 {head of adjustment list returned by |hpack|}
@d active==hi_mem_base+7 {head of active list in |line_break|, needs two words}
@d align_head==hi_mem_base+9 {head of preamble list for alignments}
@d end_span==hi_mem_base+10 {tail of spanned-width lists}
@d omit_template==hi_mem_base+11 {a constant token list}
@d lig_trick==hi_mem_base+12 {a ligature masquerading as a |char_node|}
@d garbage==hi_mem_base+13 {used for scrap information}
@d second_mem==hi_mem_base+14 {first dynamically allocatable word in
	the one-word |mem|}

@<Node |p| isn't in the variable-size |mem|@>=(p<first_mem)∨(p≥hi_mem_base)

@ The following code gets |mem| off to a good start, when \TeX\ is
initializing itself the slow way.

@<Local variables for init...@>=
@!k:integer; {index into |mem|, |eqtb|, etc.}

@ @<Initialize table entries...@>=
for k←mem_base+1 to first_mem-1 do mem[k].sc←0;
	{all glue dimensions are zeroed}
k←mem_base;@+while k<first_mem do {set first words of glue specifications}
	begin glue_ref_count(k)←null+1;
	stretch_order(k)←normal; shrink_order(k)←normal;
stretch(fil_glue)←unity; stretch_order(fil_glue)←fil;@/
stretch(fill_glue)←unity; stretch_order(fill_glue)←fill;@/
stretch(ss_glue)←unity; stretch_order(ss_glue)←fil;@/
shrink(ss_glue)←unity; shrink_order(ss_glue)←fil;@/
stretch(fil_neg_glue)←-unity; stretch_order(fil_neg_glue)←fil;@/
rover←first_mem; link(rover)←empty_flag; {now initialize the dynamic memory}
node_size(rover)←hi_mem_base-rover; {which is one big available node}
llink(rover)←rover; rlink(rover)←rover;@/
link(hi_mem_base)←null; info(hi_mem_base)←null;
for k←hi_mem_base+1 to second_mem-1 do
	mem[k]←mem[hi_mem_base];{clear list heads}
@<Initialize the special list heads and constant nodes@>;
avail←null; mem_end←second_mem-1; {initialize the one-word memory}
var_used←first_mem-mem_base; dyn_used←second_mem-hi_mem_base;
max_var_used←var_used; {initialize statistics}

@ If \TeX\ is extended improperly, the |mem| array might get screwed up.
For example, some pointers might be wrong, or some ``dead'' nodes might not
have been freed when the last reference to them disappeared. Procedures
|check_mem| and |search_mem| are available to help diagnose such
problems. These procedures make use of two arrays called |free| and
|was_free| that are present only if \TeX's debugging routines have
been included. (You may want to decrease the size of |mem| while you
are debugging; decreasing |mem_max| saves space and time,
and decreasing |hi_mem_base| saves time.)

debug @!free: packed array [mem_base..mem_max] of boolean; {free cells}
@t\hskip1em@>@!was_free: packed array [mem_base..mem_max] of boolean;
	{previously free cells}
@t\hskip1em@>@!was_mem_end: pointer; {previous |mem_end|}
@t\hskip1em@>@!panicking:boolean; {do we want to check memory constantly?}

@ @<Set initial...@>=
debug was_mem_end←mem_base; {indicate that everything was previously free}

@ Procedure |check_mem| makes sure that the available space lists of
|mem| are well formed, and it optionally prints out all locations
that are reserved now but were free the last time this procedure was called.

@p debug procedure check_mem(@!print_locs : boolean);
label done1,done2; {loop exits}
var p,@!q:pointer; {current locations of interest in |mem|}
@!clobbered:boolean; {is something amiss?}
begin for p←mem_base to mem_end do free[p]←false; {you can probably
	do this faster}
@<Check single-word |avail| list@>;
@<Check variable-size |avail| list@>;
@<Check flags of unavailable nodes@>;
if print_locs then @<Print newly busy locations@>;
for p←0 to mem_end do was_free[p]←free[p]; {|was_free←free| might be faster}

@ @<Check single-word...@>=
p←avail; q←null; clobbered←false;
while p≠null do
	begin if (p>mem_end)∨(p<second_mem) then clobbered←true
	else if free[p] then clobbered←true;
	if clobbered then
		begin print_nl("AVAIL list clobbered at ");
		print_int(q); goto done1;
	free[p]←true; q←p; p←link(q);

@ @<Check variable-size...@>=
p←rover; q←null; clobbered←false;
repeat if (p≥hi_mem_base)∨(p<first_mem) then clobbered←true
	else if (rlink(p)≥hi_mem_base)∨(rlink(p)<first_mem) then clobbered←true
	else if  ¬(is_empty(p))∨(node_size(p)<2)∨@|
	 (p+node_size(p)>hi_mem_base)∨@| (llink(rlink(p))≠p) then clobbered←true;
	if clobbered then
	begin print_nl("Double-AVAIL list clobbered at ");
	print_int(q); goto done2;
for q←p to p+node_size(p)-1 do {mark all locations free}
	begin if free[q] then
		begin print_nl("Doubly free location at ");
		print_int(q); goto done2;
q←p; p←rlink(p);
until p=rover;

@ @<Check flags...@>=
while p≤hi_mem_base do {node |p| should not be empty}
	begin if is_empty(p) then
		begin print_nl("Bad flag at "); print_int(p);
	while (p≤hi_mem_base) and not free[p] do incr(p);
	while (p≤hi_mem_base) and free[p] do incr(p);

@ @<Print newly busy...@>=
begin print_nl("New busy locs:");
for p←mem_base to mem_end do
	if not free[p] and ((p>was_mem_end) or was_free[p]) then
		begin print_char(" "); print_int(p);

@ The |search_mem| procedure attempts to answer the question ``Who points
to node@@|p|?'' In doing so, it fetches |link| and |info| fields of |mem|
that might not be of type |two_halves|. Strictly speaking, this is
@^dirty \PASCAL@>
undefined in \PASCAL, and it can lead to ``false drops'' (words that seem to
point to |p| purely by coincidence). But for debugging purposes, we want
to rule out the places that do {\sl not\/} point to |p|, so a few false
drops are tolerable.

@p debug procedure search_mem(@!p:pointer); {look for pointers to |p|}
var q:pointer; {current position being searched}
begin for q←mem_base to mem_end do
	begin if link(q)=p then
		begin print_nl("LINK("); print_int(q); print_char(")");
	if info(q)=p then
		begin print_nl("INFO("); print_int(q); print_char(")");
@<Search |eqtb| for equivalents equal to |p|@>; 
@* \[12] Displaying boxes.
We can reinforce our knowledge of the data structures just introduced
by considering two procedures that display a list in symbolic form.
The first of these, called |short_display|, is used in ``overfull box''
messages to give the top-level description of a list. The other one,
called |show_node_list|, prints a detailed description of exactly what
is in the data structure.

The philosophy of |short_display| is to ignore the fine points about exactly
what is inside boxes, except that ligatures and discretionary breaks are
expanded. As a result, |short_display| is a recursive procedure, but the
recursion is never more than one level deep.

A global variable |font_in_short_display| keeps track of the font code that
is assumed to be present when |short_display| begins; deviations from this
font will be printed.

@!font_in_short_display:integer; {an internal font number}

@ Boxes, rules, inserts, whatsits, marks, and things in general that are
sort of ``complicated'' are indicated only by printing `\.{[]}'.

@p procedure short_display(@!p:integer); {prints highlights of list |p|}
var n:integer; {for replacement counts}
begin while p>null do
	begin if is_char_node(p) then
		begin if p≤mem_end then
			begin if font(p)≠font_in_short_display then
				begin print_esc(":");
				if (font(p)<0)∨(font(p)>font_max) then
				else print_int(font_code[font(p)]);
				print_char(" "); font_in_short_display←font(p);
	else @<Print a short indication of the contents of node |p|@>;
@ @<Print a short indication of the contents of node |p|@>=
case type(p) of
	unset_node: print("[]");
rule_node: print_char("|");
glue_node: if glue_ptr(p)≠zero_glue then print_char(" ");
math_node: print_char("$");
ligature_node: short_display(lig_ptr(p));
disc_node: begin short_display(pre_break(p));
	while n>0 do
		begin p←link(p); decr(n);
othercases do_nothing

@ The |show_node_list| routine requires some auxiliary subroutines: one to
print a font-and-character combination, one to print a token list without
its reference count, and one to print a rule dimension.

@p procedure print_font_and_char(@!p:integer); {prints |char_node| data}
begin if p>mem_end then print_esc("CLOBBERED.")
else	begin print_esc(":");
	if (font(p)<0)∨(font(p)>font_max) then print_char("*")
	else print_int(font_code[font(p)]);
	print_char(" "); print_ascii(qo(character(p)));
procedure print_mark(@!p:integer); {prints token list data in braces}
begin print_char("{");
if (p<hi_mem_base)∨(p>mem_end) then print_esc("CLOBBERED.")
else show_token_list(link(p),null,max_print_line-10);
procedure print_rule_dimen(@!d:scaled); {prints dimension in rule node}
begin if is_running(d) then print_char("*") else print_scaled(d);

@ Then there is a subroutine that prints glue stretch and shrink, possibly
followed by the name of finite units:

@p procedure print_glue(@!d:scaled;@!order:integer;@!s:str_number);
	{prints a glue component}
begin print_scaled(d);
if (order<normal)∨(order>filll) then print("foul")
else if order>normal then
	begin print("fil");
	while order>fil do
		begin print_char("l"); decr(order);
else if s≠0 then print(s);

@ The next subroutine prints a whole glue specification.

@p procedure print_spec(@!p:pointer;@!s:str_number);
	{prints a glue specification}
begin if (p<mem_base)∨(p≥hi_mem_base) then print_char("*")
else	begin print_scaled(width(p));
	if s≠0 then print(s);
	if stretch(p)≠0 then
		begin print(" plus "); print_glue(stretch(p),stretch_order(p),s);
	if shrink(p)≠0 then
		begin print(" minus "); print_glue(shrink(p),shrink_order(p),s);

@ We also need to declare some procedures that appear later in this

@p @<Declare procedures needed for displaying the elements of mlists@>@;
@<Declare the procedure called |print_skip_param|@>

@ Since boxes can be inside of boxes, |show_node_list| is inherently recursive,
up to a given maximum number of levels.  The history of nesting is indicated
by the current string, which will be printed at the beginning of each line;
the length of this string, namely |cur_length|, is the depth of nesting.

Recursive calls on |show_node_list| therefore use the following pattern:

@d node_list_display(#)==
	begin append_char("."); show_node_list(#); flush_char;
	end {|str_room| need not be checked; see |show_box| below}

@ A global variable called |depth_threshold| is used to record the maximum
depth of nesting for which |show_node_list| will show information.  If we
have |depth_threshold=0|, for example, only the top level information will
be given and no sublists will be traversed. Another global variable, called
|breadth_max|, tells the maximum number of items to show at each level;
|breadth_max| had better be positive, or you won't see anything.

@!depth_threshold : integer; {maximum nesting depth in box displays}
@!breadth_max : integer; {maximum number of items shown at the same list level}

@ Now we are ready for |show_node_list| itself. This procedure has been
written to be ``extra robust'' in the sense that it should not crash or get
into a loop even if the data structures have been messed up by bugs in
the rest of the program. You can safely call its parent routine
|show_box(p)| for arbitrary values of |p| when you are debugging \TeX.
However, in the presence of bad data, the procedure may
@^dirty \PASCAL@>@^debugging@>
fetch a |memory_word| whose variant is different from the way it was stored;
for example, it might try to read |mem[p].hh| when |mem[p]|
contains a scaled integer, if |p| is a pointer that has been
clobbered or chosen at random.

@p procedure show_node_list(@!p:pointer); {prints a node list symbolically}
label exit;
var n:integer; {the number of items already printed at this level}
begin if cur_length>depth_threshold then
	begin if p>null then print("[]");
		{indicate that there's been some truncation}
while p>null do
	begin print_ln; print_current_string; {display the nesting history}
	if p>mem_end then {pointer out of range}
		begin print("Bad link, display aborted."); return;
	incr(n); if n>breadth_max then {time to stop}
		begin print("etc."); return;
	@<Display node |p|@>;

@ @<Display node |p|@>=
if is_char_node(p) then print_font_and_char(p)
else	case type(p) of
	hlist_node,vlist_node,unset_node: @<Display box |p|@>;
	rule_node: @<Display rule |p|@>;
	ins_node: @<Display insertion |p|@>;
	whatsit_node: @<Display the whatsit node |p|@>;
	glue_node: @<Display glue |p|@>;
	kern_node: @<Display kern |p|@>;
	math_node: @<Display math node |p|@>;
	ligature_node: @<Display ligature |p|@>;
	penalty_node: @<Display penalty |p|@>;
	disc_node: @<Display discretionary |p|@>;
	mark_node: @<Display mark |p|@>;
	adjust_node: @<Display adjustment |p|@>;
	@t\4@>@<Cases of |show_node_list| that arise in mlists only@>@;
	othercases print("Unknown node type!")

@ @<Display box |p|@>=
begin if type(p)=hlist_node then print_esc("h")
else if type(p)=vlist_node then print_esc("v")
else print_esc("unset");
print("box("); print_scaled(height(p)); print_char("+");
print_scaled(depth(p)); print(")x"); print_scaled(width(p));
if type(p)=unset_node then
	@<Display special fields of the unset node |p|@>
else	begin @<Display the value of |glue_set(p)|@>;
	if shift_amount(p)≠0 then
		begin print(", shifted "); print_scaled(shift_amount(p));
node_list_display(list_ptr(p)); {recursive call}

@ @<Display special fields of the unset node |p|@>=
begin if span_count(p)≠min_quarterword then
	begin print(" ("); print_int(qo(span_count(p))+1);
	print(" columns)");
print(", stretch "); print_glue(glue_stretch(p),glue_order(p),0);
print(", shrink "); print_glue(glue_shrink(p),glue_sign(p),0);

@ The code will have to change in this place if |glue_ratio| is
a structured type instead of an ordinary |real|.

@<Display the value of |glue_set(p)|@>=
if glue_set(p)≠0 then
	begin print(", glue set ");
	if glue_sign(p)=shrinking then print("- ");
	if abs(glue_set(p))>20000.0 then
		begin if glue_set(p)>0 then print_char(">")
		else print("< -");
	else print_glue(round(glue_set(p)*unity),glue_order(p),0);

@ @<Display rule |p|@>=
begin print_esc("rule("); print_rule_dimen(height(p)); print_char("+");
print_rule_dimen(depth(p)); print(")x"); print_rule_dimen(width(p));

@ @<Display insertion |p|@>=
begin print_esc("insert"); print_int(qo(subtype(p)));
print(", natural size "); print_scaled(width(p));
node_list_display(ins_ptr(p)); {recursive call}

@ @<Display glue |p|@>=
if subtype(p)≥a_leaders then @<Display leaders |p|@>
else	begin print_esc("glue");
	if subtype(p)≠normal then
		begin print_char("(");
		if subtype(p)<cond_math_glue then
		else if subtype(p)=cond_math_glue then print_esc("non_script")
		else print_esc("mskip");
	if subtype(p)≠cond_math_glue then
		begin print_char(" ");
		if subtype(p)<cond_math_glue then print_spec(glue_ptr(p),0)
		else print_spec(glue_ptr(p),"mu");

@ @<Display leaders |p|@>=
begin print_char("\"); 
if subtype(p)=c_leaders then print_char("c")
else if subtype(p)=x_leaders then print_char("x");
print("leaders "); print_spec(glue_ptr(p),0);
node_list_display(leader_ptr(p)); {recursive call}

@ @<Display kern |p|@>=
if subtype(p)=normal then
	begin print_esc("kern"); print_scaled(width(p));
else	begin print_esc("mkern"); print_scaled(width(p)); print("mu");

@ @<Display math node |p|@>=
begin print_esc("math");
if subtype(p)=before then print("on")
else print("off");
if width(p)≠0 then
	begin print(", surrounded "); print_scaled(width(p));

@ @<Display ligature |p|@>=
begin print_font_and_char(lig_char(p)); print(" (ligature ");
short_display(lig_ptr(p)); print_char(")");

@ @<Display penalty |p|@>=
begin print_esc("penalty "); print_int(penalty(p));

@ The |post_break| list of a discretionary node is indicated by a prefixed
`\.{\char'174}' instead of the `\..' before the |pre_break| list.

@<Display discretionary |p|@>=
begin print_esc("discretionary");
if replace_count(p)>0 then
	begin print(" replacing "); print_int(replace_count(p));
if pre_break(p)=null then print(" (exhyphen)")
else node_list_display(pre_break(p)); {recursive call}
append_char("|"); show_node_list(post_break(p)); flush_char; {recursive call}

@ @<Display mark |p|@>=
begin print_esc("mark"); print_mark(mark_ptr(p));

@ @<Display adjustment |p|@>=
begin print_esc("vadjust"); node_list_display(adjust_ptr(p)); {recursive call}

@ The recursive machinery is started by calling |show_box|.

@p procedure show_box(@!p:pointer);
begin @<Assign the values |depth_threshold←show_box_depth| and
if breadth_max≤0 then breadth_max←5;
if pool_ptr+depth_threshold≥pool_size then
	{now there's enough room for prefix string}
show_node_list(p); {the show starts at |p|}
@* \[13] Destroying boxes.
When we are done with a node list, we are obliged to return it to free
storage, including all of its sublists. The recursive procedure
|flush_node_list| does this for us.

@ First, however, we shall consider two non-recursive procedures that do
simpler tasks. The first of these, |delete_token_ref|, is called when
a pointer to a token list's reference count is being removed. This means
that the token list should disappear if the reference count was |null|,
otherwise the count should be decreased by one.
@^reference counts@>

@d token_ref_count(#) == info(#) {reference count preceding a token list}

@p procedure delete_token_ref(@!p:pointer); {|p| points to the reference count
	of a token list that is losing one reference}
begin if token_ref_count(p)=null then flush_list(p)
else decr(token_ref_count(p));

@ Similarly, |delete_glue_ref| is called when a pointer to a glue
specification is being withdrawn.
@^reference counts@>

@p procedure delete_glue_ref(@!p:pointer); {|p| points to a glue specification}
begin if glue_ref_count(p)=null then free_node(p,glue_spec_size)
else decr(glue_ref_count(p));

@ Now we are ready to delete any node list, recursively.

@p procedure flush_node_list(@!p:pointer); {erase list of nodes starting at |p|}
label done; {go here when node |p| has been freed}
var q:pointer; {successor to node |p|}
begin while p≠null do
	begin q←link(p);
	if is_char_node(p) then free_avail(p)
	else	begin case type(p) of
		hlist_node,vlist_node,unset_node: begin flush_node_list(list_ptr(p));
			free_node(p,box_node_size); goto done;
		rule_node: begin free_node(p,rule_node_size); goto done;
		ins_node: begin flush_node_list(ins_ptr(p));
			free_node(p,ins_node_size); goto done;
		whatsit_node: @<Wipe out the whatsit node |p| and |goto done|@>;
		glue_node: begin delete_glue_ref(glue_ptr(p));
		kern_node,math_node,penalty_node: do_nothing;
		ligature_node: flush_node_list(lig_ptr(p));
		mark_node: delete_token_ref(mark_ptr(p));
		disc_node: begin flush_node_list(pre_break(p));
		adjust_node: flush_node_list(adjust_ptr(p));
		othercases confusion("flushing")
@:confusion flushing}{\quad flushing@>
@* \[14] Copying boxes.
Another recursive operation on boxes is sometimes needed: The procedure
|copy_node_list| returns a pointer to another node list that has the
same structure and meaning as the original. Note that since glue
specifications and token lists have reference counts, we need not
make copies of them. Reference counts can never get too large to fit in a
halfword, since each pointer to a node is in a different memory address,
and the total number of memory addresses fits in a halfword.
@^reference counts@>

(Well, there actually are also references from outside |mem|; if the
|save_stack| is made arbitrarily large, it would theoretically be possible
to break \TeX\ by overflowing a reference count. But who would want to do that?)

@d add_token_ref(#)==incr(token_ref_count(#)) {new reference to a token list}
@d add_glue_ref(#)==incr(glue_ref_count(#)) {new reference to a glue spec}

@ The copying procedure copies words en masse without bothering
to look at their individual fields. If the node format changes---for
example, if the size is altered, or if some link field is moved to another
relative position---then this code may need to be changed too.

@p function copy_node_list(@!p:pointer):pointer; {makes a duplicate of the
	node list that starts at |p| and returns a pointer to the new list}
var h:pointer; {temporary head of copied list}
@!q:pointer; {previous position in new list}
@!r:pointer; {current node being fabricated for new list}
@!words:0..5; {number of words remaining to be copied}
begin h←get_avail; q←h;
while p≠null do
	begin @<Make a copy of node |p| in node |r|@>;
	link(q)←r; q←r; p←link(p);
link(q)←null; q←link(h); free_avail(h);

@ @<Make a copy of node |p|...@>=
words←1; {this setting occurs in more branches than any other}
if is_char_node(p) then r←get_avail
else @<Case statement to copy different types and set |words| to the number
	of initial words not yet copied@>;
while words>0 do
	begin decr(words); mem[r+words]←mem[p+words];

@ @<Case statement to copy...@>=
case type(p) of
hlist_node,vlist_node,unset_node: begin r←get_node(box_node_size);
	mem[r+6]←mem[p+6]; mem[r+5]←mem[p+5]; {copy the last two words}
	list_ptr(r)←copy_node_list(list_ptr(p)); {this affects |mem[r+5]|}
rule_node: begin r←get_node(rule_node_size); words←rule_node_size;
ins_node: begin r←get_node(ins_node_size);
	ins_ptr(r)←copy_node_list(ins_ptr(p)); words←ins_node_size-1;
whatsit_node:@<Make a partial copy of the whatsit node |p| and make |r|
	point to it; set |words| to the number of initial words not yet copied@>;
glue_node: begin r←get_node(small_node_size); add_glue_ref(glue_ptr(p));
	glue_ptr(r)←glue_ptr(p); leader_ptr(r)←copy_node_list(leader_ptr(p));
kern_node,math_node,penalty_node: begin r←get_node(small_node_size);
ligature_node: begin r←get_node(small_node_size);
	mem[lig_char(r)]←mem[lig_char(p)]; {copy |font| and |character|}
disc_node: begin r←get_node(small_node_size);
mark_node: begin r←get_node(small_node_size); add_token_ref(mark_ptr(p));
adjust_node: begin r←get_node(small_node_size);
othercases confusion("copying")
@:confusion copying}{\quad copying@>
@* \[15] The command codes.
Before we can go any further, we need to define symbolic names for the internal
code numbers that represent the various commands obeyed by \TeX. These codes
are somewhat arbitrary, but not completely so. For example, the command
codes for character types are fixed by the language, since a user says,
e.g., `\.{\\chcode \`\\\${} = 3}' to make \.{\char'44} a math delimiter,
and the command code |math_delim| is equal to@@3. Some other codes have
been made adjacent so that |case| statements in the program need not consider
cases that are widely spaced, or so that |case| statements can be replaced
by |if| statements.

At any rate, here is the list, for future reference. First come the
``chcode'' commands, several of which are used also as ordinary commands
when the chcode cannot emerge from \TeX's scanning routine.

@d escape=0 {escape delimiter (called \.{\\} in the \TeX\ manual)}
@d relax=0 { do nothing ( \.{\\relax} )}
@d left_brace=1 {beginning of a group ( \.{\{} )}
@d right_brace=2 {ending of a group ( \.{\}} )}
@d math_delim=3 {mathematics delimiter ( \.{\$} )}
@d tab_mark=4 {alignment delimiter ( \.{\&}, \.{\\span} )}
@d car_ret=5 {carriage return ( |carriage_return|, also \.{\\cr} )}
@d out_param=5 {output a macro parameter}
@d mac_param=6 {macro parameter symbol ( \.{\#} )}
@d sup_mark=7 {superscript ( \.{\char'136} )}
@d sub_mark=8 {subscript ( \.{\char'176} )}
@d ignore=9 {characters to ignore ( \.{\^\^J} )}
@d endv=9 {end of \<v↓j> list in alignment template}
@d spacer=10 {characters equivalent to blank space ( \.{\ } )}
@d letter=11 {characters regarded as letters ( \.{A..Z}, \.{a..z} )}
@d other_char=12 {none of the special character types}
@d active_char=13 {characters that invoke macros ( \.{\^\^[} )}
@d par_end=13 {end of paragraph ( \.{\\par} )}
@d match=13 {match a macro parameter}
@d end_line=14 {characters that introduce comments ( \.{\char'45} )}
@d end_match=14 {end of parameters to macro}
@d stop=14 {end of input ( \.{\\end}, \.{\\dump} )}
@d invalid_char=15 {characters that shouldn't appear ( \.{\^\^?} )}
@d delim_num=15 {specify delimiter numerically ( \.{\\delimiter} )}
@d max_char_code=15 {largest chcode for individual characters}

@ Next are the ordinary run-of-the-mill command codes.

@d char_num=16 {character specified numerically ( \.{\\char} )}
@d math_char_num=17 {explicit math code ( \.{\\mathchar} )}
@d mark=18 {mark definition ( \.{\\mark} )}
@d input=19 {input a source file ( \.{\\input} )}
@d xray=20 {peek inside of \TeX\ ( \.{\\show}, \.{\\showbox}, etc.@@)}
@d make_box=21 {make a box ( \.{\\box}, \.{\\copy}, \.{\\hbox}, etc.@@)}
@d hmove=22 {horizontal motion ( \.{\\moveleft}, \.{\\moveright} )}
@d vmove=23 {vertical motion ( \.{\\raise}, \.{\\lower} )}
@d unbox=24 {unglue a box ( \.{\\unbox} )}
@d unskip=25 {nullify glue ( \.{\\unskip} )}
@d hskip=26 {horizontal glue ( \.{\\hskip}, \.{\\hfil}, etc.@@)}
@d vskip=27 {vertical glue ( \.{\\vskip}, \.{\\vfil}, etc.@@)}
@d mskip=28 {math glue ( \.{\\mskip} )}
@d kern=29 {fixed space ( \.{kern})}
@d mkern=30 {math kern ( \.{\\mkern} )}
@d leader_ship=31 {use a box ( \.{\\shipout}, \.{\\leaders}, etc.@@)}
@d halign=32 {horizontal table alignment ( \.{\\halign} )}
@d valign=33 {vertical table alignment ( \.{\\valign} )}
@d no_align=34 {temporary escape from alignment ( \.{\\noalign} )}
@d vrule=35 {vertical rule ( \.{\\vrule} )}
@d hrule=36 {horizontal rule ( \.{\\hrule} )}
@d insert=37 {vlist inserted in box ( \.{\\insert} )}
@d vadjust=38 {vlist inserted in enclosing paragraph ( \.{\\vadjust} )}
@d ignore_space=39 {gobble |spacer| tokens ( \.{\\ignorespace} )}
@d break_penalty=40 {additional badness ( \.{\\penalty}, \.{\\dpenalty}@@)}
@d start_par=41 {begin paragraph ( \.{\\indent}, \.{\\noindent} )}
@d ital_corr=42 {italic correction ( \.{\\/} )}
@d accent=43 {attach accent in text ( \.{\\accent} )}
@d math_accent=44 {attach accent in math ( \.{\\mathaccent} )}
@d discretionary=45 {discretionary texts ( \.{\\discretionary} )}
@d eq_no=46 {equation number ( \.{\\eqno}, \.{\\leqno} )}
@d left_right=47 {variable delimiter ( \.{\\left}, \.{\\right} )}
@d math_comp=48 {component of formula ( \.{\\mathbin}, etc.@@)}
@d limit_switch=49 {diddle limit conventions ( \.{\\limitswitch}@@)}
@d above=50 {generalized fraction ( \.{\\above}, \.{\\atop}, etc.@@)}
@d math_style=51 {style specification ( \.{\\displaystyle}, etc.@@)}
@d non_script=52 {conditional math glue ( \.{\\nonscript} )}
@d vcenter=53 {vertically center a vbox ( \.{\\vcenter} )}
@d case_shift=54 {force specific case ( \.{\\lowercase}, \.{\\uppercase}@@)}
@d if_test=55 {select conditional text ( \.{\\if}, \.{\\ifeven}, etc.@@)}
@d case_branch=56 {choose numbered text ( \.{\\case} )}
@d else_code=57 {delimiter for conditionals ( \.{\\else} )}
@d message=58 {send to user ( \.{\\message}, \.{\\errmessage} )}
@d extension=59 {extensions to \TeX\ ( \.{\\send}, \.{\\open}, etc.@@)}
@d group_begin=60 {begin local grouping ( \.{\\groupbegin} )}
@d group_end=61 {end local grouping ( \.{\\groupend} )}
@d omit=62 {omit alignment template ( \.{\\omit} )}
@d ex_space=63 {explicit space ( \.{\\\ } )}
@d radical=64 {square root and similar signs ( \.{\\radical} )}
@d the=66 {convert to arabic or roman numerals ( \.{\\number} )}
@d number=65 {read a parameter or register ( \.{\\the}, \.{\\minus} )}
@d register=67 {read a register ( \.{\\count}, \.{\\dimen}, \.{\\skip} )}
@d last_skip=68 {most recent glue ( \.{\\lastskip} )}
@d set_aux=69 {specify state info ( \.{\\spacefactor}, \.{\\prevdepth} )}
@d max_non_prefixed_command=69 {largest command code that can't be \.{\\global}}

@ The next codes are special; they all relate to mode-independent
assignment of values to \TeX's internal registers or tables.

@d assign_toks=70 {special token list ( \.{\\output}, \.{\\everypar} )}
@d assign_int=71 {user-defined integer ( \.{\\tolerance}, \.{\\day}, etc.@@)}
@d assign_dimen=72 {user-defined length ( \.{\\hsize}, etc.@@)}
@d assign_glue=73 {user-defined glue ( \.{\\baselineskip}, etc.@@)}
@d assign_tex_info=74 {user-defined font parameter ( \.{\\texinfo} )}
@d hang_indent=75 {specify hanging indentation ( \.{\\hangindent} )}
@d def_code=76 {define a character code ( \.{\\chcode}, etc.@@)}
@d def_family=77 {declare math fonts ( \.{\\textfont}, etc.@@)}
@d set_font=78 {set current font ( \.{\\:} )}
@d set_family=79 {set current family ( \.{\\fam} )}
@d prefix=80 {qualify a definition ( \.{\\global}, \.{\\long}, \.{\\outer} )}
@d let=81 {alternative name for a control sequence ( \.{\\let} )}
@d def=82 {macro definition ( \.{\\def}, \.{\\gdef}, \.{\\xdef}, \.{\\edef} )}
@d def_font=83 {define a font file ( \.{\\font} )}
@d set_register=84 {set a register ( \.{\\setcount}, \.{\\setdimen},
	\.{\\setskip} )}
@d adv_register=85 {advance a register ( \.{\\advcount}, \.{\\advdimen},
	\.{\\advskip} )}
@d mult_register=86 {multiply a register ( \.{\\multcount}, \.{\\multdimen},
	\.{\\multskip} )}
@d div_register=87 {divide a register ( \.{\\divcount}, \.{\\divdimen},
	\.{\\divskip} )}
@d set_box=88 {set a box ( \.{\\setbox} )}
@d set_shape=89 {specify fancy paragraph shape ( \.{\\parshape} )}
@d hyph_data=90 {hyphenation data ( \.{\\hyphenation}, \.{\\patterns} )}
@d set_interaction=91 {define level of interaction ( \.{\\batchmode}, etc.@@)}
@d max_command=91 {the largest command code seen at |big_switch|}

@ The remaining command codes are extra special, since they cannot get through
\TeX's scanner to the main control routine. They have been given values higher
than |max_command| so that their special nature is easily discernable.

@d undefined_cs=max_command+1 {initial state of most |eq_type| fields}
@d top_bot_mark=max_command+2 {inserted mark ( \.{\\topmark}, etc.\ }
@d call=max_command+3 {non-long, non-outer control sequence}
@d long_call=max_command+4 {long, non-outer control sequence}
@d outer_call=max_command+5 {non-long, outer control sequence}
@d long_outer_call=max_command+6 {long, outer control sequence}
@d glue_ref=max_command+7 {pointer to a glue specification}
@d shape_ref=max_command+8 {pointer to a parshape specification}
@d box_ref=max_command+9 {pointer to a box node, or |null|}
@d data=max_command+10 {the equivalent is simply a halfword number}
@* \[16] The semantic nest.
\TeX\ is typically in the midst of building many lists at once. For example,
when a math formula is being processed, \TeX\ is in math mode and
working on an mlist; this formula has temporarily interrupted \TeX\ from
being in horizontal mode and building the hlist of a paragraph; and this
paragraph has temporarily interrupted \TeX\ from being in vertical mode
and building the vlist for the next page of a document. Similarly, when a
\.{\\vbox} occurs inside of an \.{\\hbox}, \TeX\ is temporarily
interrupted from working in restricted horizontal mode, and it enters
internal vertical mode.  The ``semantic nest'' is a stack that
keeps track of what lists and modes are currently suspended.

At each level of processing we are in one of six modes:

\yskip\hang|vmode| stands for vertical mode (the page builder);

\hang|hmode| stands for horizontal mode (the paragraph builder);

\hang|mmode| stands for displayed formula mode;

\hang|-vmode| stands for internal vertical mode (e.g., in a \.{\\vbox});

\hang|-hmode| stands for restricted horizontal mode (e.g., in an \.{\\hbox});

\hang|-mmode| stands for math formula mode (not displayed).

\yskip\noindent The mode is temporarily set to zero while processing \.{\\send}
texts in the |ship_out| routine. 

Numeric values are assigned to |vmode|, |hmode|, and |mmode| so that
\TeX's ``big semantic switch'' can select the appropriate thing to
do by computing the value |abs(mode)+cur_cmd|, where |mode| is the current
mode and |cur_cmd| is the current command code.

@d vmode=1 {vertical mode}
@d hmode=vmode+max_command+1 {horizontal mode}
@d mmode=hmode+max_command+1 {math mode}

@p procedure print_mode(@!m:integer); {prints the current mode}
begin if m>0 then
	case m div (max_command+1) of
	2:print("displayed math");
else	case (-m) div (max_command+1) of
	0:print("internal vertical");
	1:print("restricted horizontal");
print(" mode");

@ The state of affairs at any semantic level can be represented by
five values:

\yskip\hang|mode| is the number representing the semantic mode, as
just explained.

\yskip\hang|head| is a |pointer| to a list head for the list being built;
|link(head)| therefore points to the first element of the list, or
to |null| if the list is empty.

\yskip\hang|tail| is a |pointer| to the final node of the list being
built; thus, |tail=head| if and only if the list is empty.

\yskip\hang|already| is the number of lines of the current paragraph that
have already been put into the present vertical list.

\yskip\hang|aux| is an auxiliary integer that gives further information
that is needed to characterize the situation.

In vertical mode, |aux| is also known as |prev_depth|; it is the scaled
value representing the depth of the previous box, for use in baseline
calculations, or it is |≤1000|pt if the next box on the vertical list is to
be exempt from baseline calculations.  In horizontal mode, |aux| is also
known as |space_factor|; it holds the current space factor use in spacing
calculations. In math mode, |aux| is also known as |incompleat_noad|; if
not |null|, it points to a record that represents the numerator of a
generalized fraction for which the denominator is currently being formed
in the current list.

There is also a sixth quantity, |mode_line|, which is used to correlate
the semantic nest with the user's input; |mode_line| contains the source
line number at which the current level of nesting was entered. The negative
of this line number is used as the |mode_line| at the level of the
user's output routine.

The semantic nest is an array called |nest| that holds the |mode|, |head|,
|tail|, |already|, |aux|, and |mode_line| values for all semantic levels
below the currently active one. Information about the currently active
level is kept in the global quantities |mode|, |head|, |tail|, |already|,
|aux|, and |mode_line|, which live in a \PASCAL\ record that is ready to
be pushed onto |nest| if necessary.

@d ignore_depth==-65536000 {|prev_depth| value that is ignored}

	@!head_field,@!tail_field: pointer;
	@!already_field,@!aux_field,@!ml_field: integer;

@ @d mode==cur_list.mode_field {current mode}
@d head==cur_list.head_field {header node of current list}
@d tail==cur_list.tail_field {final node on current list}
@d already==cur_list.already_field {number of paragraph lines accumulated}
@d aux==cur_list.aux_field {auxiliary data about the current list}
@d prev_depth==aux {the name of |aux| in vertical mode}
@d space_factor==aux {the name of |aux| in horizontal mode}
@d incompleat_noad==aux {the name of |aux| in math mode}
@d mode_line==cur_list.ml_field {source file line number at beginning of list}

@!nest:array[0..nest_size] of list_state_record;
@!nest_ptr:0..nest_size; {first unused location of |nest|}
@!max_nest_stack:0..nest_size; {maximum of |nest_ptr| when pushing}
@!cur_list:list_state_record; {the ``top'' semantic state}
@!shown_mode:-mmode..mmode; {most recent mode shown by \.{\\tracingcommands}}

@ Here is a common way to make the current list grow:

@d tail_append(#)==begin link(tail)←#; tail←link(tail);

@ We will see later that the vertical list at the bottom semantic level is split
into two parts; the ``current page'' runs from |page_head| to |page_tail|,
and the ``contribution list'' runs from |contrib_head| to |tail| of
semantic level zero. The idea is that contributions are first formed in
vertical mode, then ``contributed'' to the current page (during which time
the page-breaking decisions are made). For now, we don't need to know
any more details about the page-building process.

@<Set init...@>=
nest_ptr←0; max_nest_stack←0;
mode←vmode; head←contrib_head; tail←contrib_head;
prev_depth←ignore_depth; mode_line←0; already←0; shown_mode←0;@/
@<Start a new current page@>;

@ When \TeX's work on one level is interrupted, the state is saved by
calling |push_nest|. This routine changes |head| and |tail| so that
a new (empty) list is begun; it does not change |mode| or |aux|.

@p procedure push_nest; {enter a new semantic level, save the old}
begin if nest_ptr>max_nest_stack then
	begin max_nest_stack←nest_ptr;
	if nest_ptr=nest_size then overflow("semantic nest size",nest_size);
nest[nest_ptr]←cur_list; {stack the record}
incr(nest_ptr); head←get_avail; tail←head; already←0; mode_line←line;

@ Conversely, when \TeX\ is finished on the current level, the former
state is restored by calling |pop_nest|. This routine will never be
called at the lowest semantic level, nor will it be called unless |head|
is a node that should be returned to free memory.

@p procedure pop_nest; {leave a semantic level, re-enter the old}
begin free_avail(head); decr(nest_ptr); cur_list←nest[nest_ptr];

@ Here is a procedure that displays what \TeX\ is working on, at all levels.

@p procedure show_activities;
var p:0..nest_size; {index into |nest|}
@!m:-mmode..mmode; {mode}
@!a:integer; {auxiliary}
@!q,@!r:pointer; {for showing the current page}
begin nest[nest_ptr]←cur_list; {put the top level into the array}
for p←nest_ptr downto 0 do
	begin m←nest[p].mode_field; a←nest[p].aux_field;
	print_nl("### "); print_mode(m);
	print(" entered at line "); print_int(abs(nest[p].ml_field));
	if nest[p].ml_field<0 then print(" (\output routine)");
	if p=0 then
		begin print_nl("### current page:");
		@<Show the status of the current page@>;
		print_nl("### recent contributions:");
	case abs(m) div (max_command+1) of
	0: begin print_nl("prevdepth ");
		if a≤ignore_depth then print("ignored")
		else print_scaled(a);
		if nest[p].already_field≠0 then
			begin print(", already ");
	1: begin print_nl("spacefactor "); print_int(a);
	2: if a≠null then
		begin print("this will be denominator of:"); show_box(a);
	end; {there are no other cases}
@* \[17] The table of equivalents.
Now that we have studied the data structures for \TeX's semantic routines,
let us consider the data structures used by its syntactic routines. In
other words, we turn now to the tables that \TeX\ looks at when it is scanning
what the user has written.

The biggest and most important such table is called |eqtb|. It holds the
current ``equiv\-a\-lents'' of things; i.e., it explains what things mean
or what their current values are, for all quantities that are subject to
the nesting structure provided by \TeX's grouping mechanism. There are six
parts to |eqtb|:

\yskip\hang 1) |eqtb[single_base..(hash_base-1)]| holds the current
equivalents of single-character control sequences.

\yskip\hang 2) |eqtb[hash_base..(glue_base-1)]| holds the current
equivalents of multi-letter control sequences.

\yskip\hang 3) |eqtb[glue_base..(local_base-1)]| holds the current
equivalents of glue parameters like the current baselineskip.

\yskip\hang 4) |eqtb[local_base..(int_base-1)]| holds the current
equivalents of local halfword quantities like the current box registers,
the current ``chcodes,'' the current font, and a pointer to the current
paragraph shape.

\yskip\hang 5) |eqtb[int_base..(dimen_base-1)]| holds the current
equivalents of fullword integer parameters like the current hyphenation

\yskip\hang 6) |eqtb[dimen_base..eqtb_size]| holds the current equivalents
of fullword di\-men\-sion parameters like the current hsize or amount of
hanging indentation.

\yskip\noindent Note that, for example, the current amount of
baselineskip glue is determined by the setting of a particular location
in region@@3 of |eqtb|, while the current meaning of the control sequence
`\.{\\baselineskip}' (which might have been changed by \.{\\def} or
\.{\\let}) appears in region@@2.

@ Each entry in |eqtb| is a |memory_word|. Most of these words are of type
|two_halves|, and subdivided into three fields:

\yskip\hang 1) The |eq_level| (a quarterword) is the level of grouping at
which this equivalent was defined. If the level is |level_zero|, the
equivalent has never been defined; |level_one| refers to the outer level
(outside of all groups), and this level is also used for global
definitions that never go away. Higher levels are for equivalents that
will disappear at the end of their group.  @^global definitions@>

\yskip\hang 2) The |eq_type| (another quarterword) specifies what kind of
entry this is. There are many types, since each \TeX\ primitive like
\.{\\hbox}, \.{\\def}, etc., has its own special code. The list of
command codes above includes all possible settings of the |eq_type| field.

\yskip\hang 3) The |equiv| (a halfword) is the current equivalent value.
This may be a font number, a pointer into |mem|, or a variety of other

@d eq_level_field(#)==#.hh.b1
@d eq_type_field(#)==#.hh.b0
@d equiv_field(#)==#.hh.rh
@d eq_level(#)==eq_level_field(eqtb[#]) {level of definition}
@d eq_type(#)==eq_type_field(eqtb[#]) {command code for equivalent}
@d equiv(#)==equiv_field(eqtb[#]) {equivalent value}
@d level_zero=min_quarterword {level for undefined quantities}
@d level_one=level_zero+1 {outermost level for defined quantities}

@ Many locations in |eqtb| have symbolic names. The purpose of the next
paragraphs is to define these names, and to set up the initial values of the

In the first region we have 128 equivalents for single-character control
sequences, followed by 128 equivalents for ``active characters'' that
act as control sequences.

Then comes region@@2, which corresponds to the hash table that we will
define later.  The maximum address in this region is used for a dummy
control sequence that is perpetually undefined. There also are three
locations for special control sequences that are perpetually defined
(since they are used in error recovery).

@d single_base=1 {beginning of region 1, for single-letter control sequences}
@d active_base=single_base+128 {equivalents of active characters}
@d hash_base=active_base+128 {beginning of region 2, for the hash table}
@d special_control_sequence=hash_base+hash_size {for error recovery}
@d undefined_control_sequence=hash_base+hash_size+3 {dummy location}
@d glue_base=undefined_control_sequence+1 {beginning of region 3}

@<Initialize table entries...@>=
for k←single_base to undefined_control_sequence-1 do

@ Region 3 of |eqtb| contains the 256 \.{\\skip} registers, as well as the
glue parameters defined here. It is important that the ``mskip''
parameters have larger numbers than the others.

@d line_skip_code=0 {interline glue if |baseline_skip| is unfeasible}
@d baseline_skip_code=1 {desired glue between baselines}
@d par_skip_code=2 {extra glue just above a paragraph}
@d disp_skip_code=3 {extra glue just above and below displayed math}
@d disp_a_skip_code=4 {glue above displayed math following short lines}
@d disp_b_skip_code=5 {glue below displayed math following short lines}
@d left_skip_code=6 {glue at left of justified lines}
@d right_skip_code=7 {glue at right of justified lines}
@d top_skip_code=8 {glue at top of main pages}
@d split_top_skip_code=9 {glue at top of split pages}
@d tab_skip_code=10 {glue between aligned entries}
@d space_skip_code=11 {glue between words (if not |zero_glue|)}
@d xspace_skip_code=12 {glue after sentences (if not |zero_glue|)}
@d par_fill_skip_code=13 {glue on last line of paragraph}
@d thin_mskip_code=14 {thin space in math formula}
@d med_mskip_code=15 {medium space in math formula}
@d thick_mskip_code=16 {thick space in math formula}
@d glue_pars=17 {total number of glue parameters}
@d skip_base=glue_base+glue_pars {table of 256 ``skip'' registers}
@d local_base=skip_base+256 {beginning of region 6}
@d skip(#)==equiv(skip_base+#) {|mem| location of glue specification}
@d glue_par(#)==equiv(glue_base+#) {|mem| location of glue specification}
@d line_skip==glue_par(line_skip_code)
@d baseline_skip==glue_par(baseline_skip_code)
@d par_skip==glue_par(par_skip_code)
@d disp_skip==glue_par(disp_skip_code)
@d disp_a_skip==glue_par(disp_a_skip_code)
@d disp_b_skip==glue_par(disp_b_skip_code)
@d left_skip==glue_par(left_skip_code)
@d right_skip==glue_par(right_skip_code)
@d top_skip==glue_par(top_skip_code)
@d split_top_skip==glue_par(split_top_skip_code)
@d tab_skip==glue_par(tab_skip_code)
@d space_skip==glue_par(space_skip_code)
@d xspace_skip==glue_par(xspace_skip_code)
@d par_fill_skip==glue_par(par_fill_skip_code)
@d thin_mskip==glue_par(thin_mskip_code)
@d med_mskip==glue_par(med_mskip_code)
@d thick_mskip==glue_par(thick_mskip_code)

@<Current |mem| equivalent of glue parameter number |n|@>=glue_par(n)

@ Sometimes we need to convert \TeX's internal code numbers into symbolic
form. The |print_skip_param| routine gives the symbolic name of a glue

@<Declare the procedure called |print_skip_param|@>=
procedure print_skip_param(@!n:integer);
begin case n of
line_skip_code: print_esc("lineskip");
baseline_skip_code: print_esc("baselineskip");
par_skip_code: print_esc("parskip");
disp_skip_code: print_esc("dispskip");
disp_a_skip_code: print_esc("dispaskip");
disp_b_skip_code: print_esc("dispbskip");
left_skip_code: print_esc("leftskip");
right_skip_code: print_esc("rightskip");
top_skip_code: print_esc("topskip");
split_top_skip_code: print_esc("splittopskip");
tab_skip_code: print_esc("tabskip");
space_skip_code: print_esc("spaceskip");
xspace_skip_code: print_esc("xspaceskip");
par_fill_skip_code: print_esc("parfillskip");
thin_mskip_code: print_esc("thinmskip");
med_mskip_code: print_esc("medmskip");
thick_mskip_code: print_esc("thickmskip");
othercases print("[unknown glue parameter!]")

@ The symbolic names for glue parameters are put into \TeX's hash table
by using the routine called |primitive|, defined below. Let us enter them
now, so that we don't have to list all those parameter names anywhere else.

@<Put each of \TeX's primitives into the hash table@>=
@!@:line_skip_}{\.{\\lineskip} primitive@>
@!@:baseline_skip_}{\.{\\baselineskip} primitive@>
@!@:par_skip_}{\.{\\parskip} primitive@>
@!@:disp_skip_}{\.{\\dispskip} primitive@>
@!@:disp_a_skip_}{\.{\\dispaskip} primitive@>
@!@:disp_b_skip_}{\.{\\dispbskip} primitive@>
@!@:left_skip_}{\.{\\leftskip} primitive@>
@!@:right_skip_}{\.{\\rightskip} primitive@>
@!@:top_skip_}{\.{\\topskip} primitive@>
@!@:split_top_skip_}{\.{\\splittopskip} primitive@>
@!@:tab_skip_}{\.{\\tabskip} primitive@>
@!@:space_skip_}{\.{\\spaceskip} primitive@>
@!@:xspace_skip_}{\.{\\xspaceskip} primitive@>
@!@:par_fill_skip_}{\.{\\parfillskip} primitive@>
@!@:thin_mskip_}{\.{\\thinskip} primitive@>
@!@:med_mskip_}{\.{\\medmskip} primitive@>
@!@:thick_mskip_}{\.{\\thickmskip} primitive@>

@ @<Cases of |print_cmd_chr| for symbolic printing of primitives@>=

@ All glue parameters and registers are initially `\.{0pt plus0pt minus0pt}'.

@<Initialize table entries...@>=
equiv(glue_base)←zero_glue; eq_level(glue_base)←level_one;
for k←glue_base+1 to local_base-1 do eqtb[k]←eqtb[glue_base];

@ Region 4 of |eqtb| contains the local quantities defined here. The
bulk of this region is taken up by five tables that are indexed by seven-bit
characters; these tables are important to both the syntactic and semantic
portions of \TeX. There are also a bunch of special things like font and
token parameters, as well as the table of 256 \.{\\box} registers.

@d par_shape_loc=local_base {specifies paragraph shape}
@d output_routine_loc=local_base+1 {points to token list for \.{\\output}}
@d every_par_loc=local_base+2 {points to token list for \.{\\everypar}}
@d box_base=local_base+3 {table of 256 box registers}
@d cur_font_loc=box_base+256 {internal font number outside math mode}
@d math_font_base=cur_font_loc+1 {table of 48 math font numbers}
@d cur_fam_loc=math_font_base+48 {current font family}
@d ch_code_base=cur_fam_loc+1 {table of 128 character command codes}
@d math_code_base=ch_code_base+128 {table of 128 math mode mappings}
@d lc_code_base=math_code_base+128 {table of 128 lower case mappings}
@d uc_code_base=lc_code_base+128 {table of 128 upper case mappings}
@d sf_code_base=uc_code_base+128 {table of 128 space factor mappings}
@d int_base=sf_code_base+128 {beginning of region 5}
@d par_shape_ptr==equiv(par_shape_loc)
@d output_routine==equiv(output_routine_loc)
@d every_par==equiv(every_par_loc)
@d box(#)==equiv(box_base+#)
@d cur_font==equiv(cur_font_loc)
@d cur_fam==equiv(cur_fam_loc)
@d fam_fnt(#)==equiv(math_font_base+#)
@d ch_code(#)==equiv(ch_code_base+#)
@d math_code(#)==equiv(math_code_base+#)
@d lc_code(#)==equiv(lc_code_base+#)
@d uc_code(#)==equiv(uc_code_base+#)
@d sf_code(#)==equiv(sf_code_base+#)

@<Put each...@>=
@!@:output_}{\.{\\output} primitive@>
@!@:every_par_}{\.{\\everypar} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
assign_toks: if chr_code=output_routine_loc then print_esc("output")
	else print_esc("everypar");

@ We initialize most things to null or undefined values. An undefined font
is represented by the internal code |font_base|.

However, the character code tables are given initial values based on the
conventional interpretation of ascii code.

@d undefined_font==font_base

@<Initialize table entries...@>=
par_shape_ptr←null; eq_type(par_shape_loc)←shape_ref;
box(0)←null; eq_type(box_base)←box_ref; eq_level(box_base)←level_one;
for k←box_base+1 to box_base+255 do eqtb[k]←eqtb[box_base];
cur_font←undefined_font; eq_type(cur_font_loc)←data;
for k←math_font_base to math_font_base+47 do eqtb[k]←eqtb[cur_font_loc];
cur_fam←0; eq_type(cur_fam_loc)←data; eq_level(cur_fam_loc)←level_one;@/
for k←ch_code_base to int_base-1 do eqtb[k]←eqtb[cur_fam_loc];
for k←0 to 127 do
	begin ch_code(k)←other_char; math_code(k)←k; sf_code(k)←1000;
ch_code(carriage_return)←car_ret; ch_code(" ")←spacer; ch_code("\")←escape;
ch_code(invalid_code)←invalid_char; ch_code(null_code)←ignore;
for k←"A" to "Z" do
	begin ch_code(k)←letter; ch_code(k+"a"-"A")←letter;@/
	math_code(k)←k+@'70400; math_code(k+"a"-"A")←k+"a"-"A"+@'70400;@/
	lc_code(k)←k+"a"-"A"; lc_code(k+"a"-"A")←k+"a"-"A";@/
	uc_code(k)←k; uc_code(k+"a"-"A")←k;@/

@ Region 5 of |eqtb| contains the integer parameters and registers defined
here, as well as the |del_code| table. The latter table differs from the
|ch_code..sf_code| tables that precede it, since delimiter codes are
fullword integers while the other kinds of codes occupy at most a
halfword. This is what makes region@@5 different from region 4.

@d pretolerance_code=0 {badness tolerance before hyphenation}
@d tolerance_code=1 {badness tolerance after hyphenation}
@d line_penalty_code=2 {added to the badness of every line}
@d hyphen_penalty_code=3 {penalty for break after discretionary hyphen}
@d ex_hyphen_penalty_code=4 {penalty for break after explicit hyphen}
@d widow_penalty_code=5 {penalty for creating a widow line}
@d display_widow_penalty_code=6 {ditto, just before a display}
@d broken_penalty_code=7 {penalty for breaking a page at a broken line}
@d bin_op_penalty_code=8 {penalty for breaking after a binary operation}
@d rel_penalty_code=9 {penalty for breaking after a relation}
@d pre_display_penalty_code=10
	{penalty for breaking just before a displayed formula}
@d post_display_penalty_code=11
	{penalty for breaking just after a displayed formula}
@d inter_line_penalty_code=12 {additional penalty between lines}
@d double_hyphen_demerits_code=13 {demerits for double hyphen break}
@d final_hyphen_demerits_code=14 {demerits for final hyphen break}
@d adj_demerits_code=15 {demerits for adjacent incompatible lines}
@d mag_code=16 {magnification ratio}
@d delimiter_factor_code=17 {ratio for variable-size delimiters}
@d looseness_code=18 {change in number of lines for a paragraph}
@d time_code=19 {current time of day}
@d day_code=20 {current day of the month}
@d month_code=21 {current month of the year}
@d year_code=22 {current year of our Lord}
@d show_box_breadth_code=23 {nodes per level in |show_box|}
@d show_box_depth_code=24 {maximum level in |show_box|}
@d hbadness_code=25 {show hboxes exceeding this badness}
@d vbadness_code=26 {show vboxes exceeding this badness}
@d pause_code=27 {pause after each line is read from a file}
@d tracing_online_code=28 {show diagnostic output on terminal}
@d tracing_macros_code=29 {show macros as they are being expanded}
@d tracing_stats_code=30 {show memory usage if \TeX\ knows it}
@d tracing_output_code=31 {show boxes when they are shipped out}
@d tracing_lost_chars_code=32 {show characters that aren't in the font}
@d tracing_commands_code=33 {show command codes at |big_switch|}
@d uc_hyph_code=34 {hyphenate words beginning with a capital letter}
@d output_penalty_code=35 {penalty found at current page break}
@d hang_after_code=36 {hanging indentation changes after this many lines}
@d int_pars=37 {total number of integer parameters}
@d count_base=int_base+int_pars {256 user \.{\\count} registers}
@d del_code_base=count_base+256 {128 delimiter code mappings}
@d dimen_base=del_code_base+128 {beginning of region 6}
@d del_code(#)==eqtb[del_code_base+#].int
@d count(#)==eqtb[count_base+#].int
@d int_par(#)==eqtb[int_base+#].int {an integer parameter}
@d pretolerance==int_par(pretolerance_code)
@d tolerance==int_par(tolerance_code)
@d line_penalty==int_par(line_penalty_code)
@d hyphen_penalty==int_par(hyphen_penalty_code)
@d ex_hyphen_penalty==int_par(ex_hyphen_penalty_code)
@d widow_penalty==int_par(widow_penalty_code)
@d display_widow_penalty==int_par(display_widow_penalty_code)
@d broken_penalty==int_par(broken_penalty_code)
@d bin_op_penalty==int_par(bin_op_penalty_code)
@d rel_penalty==int_par(rel_penalty_code)
@d pre_display_penalty==int_par(pre_display_penalty_code)
@d post_display_penalty==int_par(post_display_penalty_code)
@d inter_line_penalty==int_par(inter_line_penalty_code)
@d double_hyphen_demerits==int_par(double_hyphen_demerits_code)
@d final_hyphen_demerits==int_par(final_hyphen_demerits_code)
@d adj_demerits==int_par(adj_demerits_code)
@d mag==int_par(mag_code)
@d delimiter_factor==int_par(delimiter_factor_code)
@d looseness==int_par(looseness_code)
@d time==int_par(time_code)
@d day==int_par(day_code)
@d month==int_par(month_code)
@d year==int_par(year_code)
@d show_box_breadth==int_par(show_box_breadth_code)
@d show_box_depth==int_par(show_box_depth_code)
@d hbadness==int_par(hbadness_code)
@d vbadness==int_par(vbadness_code)
@d pause==int_par(pause_code)
@d tracing_online==int_par(tracing_online_code)
@d tracing_macros==int_par(tracing_macros_code)
@d tracing_stats==int_par(tracing_stats_code)
@d tracing_output==int_par(tracing_output_code)
@d tracing_lost_chars==int_par(tracing_lost_chars_code)
@d tracing_commands==int_par(tracing_commands_code)
@d uc_hyph==int_par(uc_hyph_code)
@d output_penalty==int_par(output_penalty_code)
@d hang_after==int_par(hang_after_code)

@<Assign the values |depth_threshold←show_box_depth|...@>=

@ We can print the symbolic name of an integer parameter as follows.

@p procedure print_param(@!n:integer);
begin case n of
othercases print("[unknown integer parameter!]")

@ The integer parameter names must be entered into the hash table.

@<Put each...@>=
@!@:pretolerance_}{\.{\\pretolerance} primitive@>
@!@:tolerance_}{\.{\\tolerance} primitive@>
@!@:line_penalty_}{\.{\\linepenalty} primitive@>
@!@:hyphen_penalty_}{\.{\\hyphenpenalty} primitive@>
@!@:ex_hyphen_penalty_}{\.{\\exhyphenpenalty} primitive@>
@!@:widow_penalty_}{\.{\\widowpenalty} primitive@>
@!@:display_widow_penalty_}{\.{\\displaywidowpenalty} primitive@>
@!@:broken_penalty_}{\.{\\brokenpenalty} primitive@>
@!@:bin_op_penalty_}{\.{\\binoppenalty} primitive@>
@!@:rel_penalty_}{\.{\\relpenalty} primitive@>
@!@:pre_display_penalty_}{\.{\\predisplaypenalty} primitive@>
@!@:post_display_penalty_}{\.{\\postdisplaypenalty} primitive@>
@!@:inter_line_penalty_}{\.{\\interlinepenalty} primitive@>
@!@:double_hyphen_demerits_}{\.{\\doublehyphendemerits} primitive@>
@!@:final_hyphen_demerits_}{\.{\\finalhyphendemerits} primitive@>
@!@:adj_demerits_}{\.{\\adjdemerits} primitive@>
@!@:mag_}{\.{\\mag} primitive@>
@!@:delimiter_factor_}{\.{\\delimiterfactor} primitive@>
@!@:looseness_}{\.{\\looseness} primitive@>
@!@:time_}{\.{\\time} primitive@>
@!@:day_}{\.{\\day} primitive@>
@!@:month_}{\.{\\month} primitive@>
@!@:year_}{\.{\\year} primitive@>
@!@:show_box_breadth_}{\.{\\showboxbreadth} primitive@>
@!@:show_box_depth_}{\.{\\showboxdepth} primitive@>
@!@:hbadness_}{\.{\\hbadness} primitive@>
@!@:vbadness_}{\.{\\vbadness} primitive@>
@!@:pause_}{\.{\\pause} primitive@>
@!@:tracing_online_}{\.{\\tracingonline} primitive@>
@!@:tracing_macros_}{\.{\\tracingmacros} primitive@>
@!@:tracing_stats_}{\.{\\tracingstats} primitive@>
@!@:tracing_output_}{\.{\\tracingoutput} primitive@>
@!@:tracing_lost_chars_}{\.{\\tracinglostchars} primitive@>
@!@:tracing_commands_}{\.{\\tracingcommands} primitive@>
@!@:uc_hyph_}{\.{\\uchyph} primitive@>
@!@:output_penalty_}{\.{\\outputpenalty} primitive@>
@!@:hang_after_}{\.{\\hangafter} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
assign_int: print_param(chr_code);

@ The integer parameters should really be initialized by a macro package;
the following initialization does the minimum to keep \TeX\ from
complete failure.

@<Initialize table entries...@>=
for k←int_base to del_code_base-1 do eqtb[k].int←0;
mag←1000; tolerance←10000;
for k←0 to 127 do del_code(k)←-1;

@ The following procedure, which is called just before \TeX\ initializes its
input and output, establishes the initial values of the date and time.
@^system dependencies@>
Since standard \PASCAL\ cannot provide such information, something special
is needed. The program here simply specifies July 4, 1776, at noon; but
users probably want a better approximation to the truth.

@p procedure fix_date_and_time;
begin time←12*60; {minutes since midnight}
day←4; {fourth day of the month}
month←7; {seventh month of the year}
year←1776; {Anno Domini}

@ The final region of |eqtb| contains the dimension parameters defined
here, and the 256 \.{\\dimen} registers.  It also contains the current
amount of hanging indentation, which is slightly special.

@d par_indent_code=0 {indentation of paragraphs}
@d math_surround_code=1 {space around math in text}
@d var_unit_code=2 {variable unit for dimensions}
@d line_skip_limit_code=3 {threshold for |line_skip| instead of |baseline_skip|}
@d hsize_code=4 {line width in horizontal mode}
@d vsize_code=5 {page height in vertical mode}
@d max_depth_code=6 {maximum depth of boxes on main pages}
@d split_max_depth_code=7 {maximum depth of boxes on split pages}
@d hfuzz_code=8 {tolerance for overfull hbox messages}
@d vfuzz_code=9 {tolerance for overfull vbox messages}
@d delimiter_limit_code=10 {maximum amount uncovered by variable delimiters}
@d null_delimiter_space_code=11 {blank space in null delimiters}
@d script_space_code=12 {extra space after subscript or superscript}
@d pre_display_size_code=13 {length of text preceding a display}
@d display_width_code=14 {length of line for displayed equation}
@d display_indent_code=15 {indentation of line for displayed equation}
@d overfull_rule_code=16 {width of rule that identifies overfull hboxes}
@d dimen_pars=17 {total number of dimension parameters}
@d hanging_indent_code=dimen_pars {amount of hanging indentation}
@d scaled_base=dimen_base+dimen_pars+1
	{table of 256 user-defined \.{\\dimen} registers}
@d eqtb_size=scaled_base+255 {largest subscript of |eqtb|}
@d dimen(#)==eqtb[scaled_base+#].sc
@d dimen_par(#)==eqtb[dimen_base+#].sc {a scaled quantity}
@d par_indent==dimen_par(par_indent_code)
@d math_surround==dimen_par(math_surround_code)
@d var_unit==dimen_par(var_unit_code)
@d line_skip_limit==dimen_par(line_skip_limit_code)
@d hsize==dimen_par(hsize_code)
@d vsize==dimen_par(vsize_code)
@d max_depth==dimen_par(max_depth_code)
@d split_max_depth==dimen_par(split_max_depth_code)
@d hfuzz==dimen_par(hfuzz_code)
@d vfuzz==dimen_par(vfuzz_code)
@d delimiter_limit==dimen_par(delimiter_limit_code)
@d null_delimiter_space==dimen_par(null_delimiter_space_code)
@d script_space==dimen_par(script_space_code)
@d pre_display_size==dimen_par(pre_display_size_code)
@d display_width==dimen_par(display_width_code)
@d display_indent==dimen_par(display_indent_code)
@d overfull_rule==dimen_par(overfull_rule_code)
@d hanging_indent==dimen_par(hanging_indent_code)

@p procedure print_length_param(@!n:integer);
begin case n of
othercases print("[unknown dimen parameter!]")

@ @<Put each...@>=
@!@:par_indent_}{\.{\\parindent} primitive@>
@!@:math_surround_}{\.{\\mathsurround} primitive@>
@!@:var_unit_}{\.{\\varunit} primitive@>
@!@:line_skip_limit_}{\.{\\lineskiplimit} primitive@>
@!@:hsize_}{\.{\\hsize} primitive@>
@!@:vsize_}{\.{\\vsize} primitive@>
@!@:max_depth_}{\.{\\maxdepth} primitive@>
@!@:split_max_depth_}{\.{\\splitmaxdepth} primitive@>
@!@:hfuzz_}{\.{\\hfuzz} primitive@>
@!@:vfuzz_}{\.{\\vfuzz} primitive@>
@!@:delimiter_limit_}{\.{\\delimiterlimit} primitive@>
@!@:null_delimiter_space_}{\.{\\nulldelimiterspace} primitive@>
@!@:script_space_}{\.{\\scriptspace} primitive@>
@!@:pre_display_size_}{\.{\\predisplaysize} primitive@>
@!@:display_width_}{\.{\\displaywidth} primitive@>
@!@:display_indent_}{\.{\\displayindent} primitive@>
@!@:overfull_rule_}{\.{\\overfullrule} primitive@>

@ @<Cases of |print_cmd_chr|...@>=

@ @<Initialize table entries...@>=
for k←dimen_base to eqtb_size do eqtb[k].sc←0;

@ The last two regions of |eqtb| have fullword values instead of the
three fields |eq_level|, |eq_type|, and |equiv|. An |eq_type| is unnecessary,
but \TeX\ needs to store the |eq_level| information in another array
called |xeq_level|.

@!eqtb:array[single_base..eqtb_size] of memory_word;
@!xeq_level:array[int_base..eqtb_size] of quarterword;

@ @<Set init...@>=
for k←int_base to eqtb_size do xeq_level[k]←level_one;

@ When the debugging routine |search_mem| is looking for pointers having a
given value, it is interested only in regions 1 to@@3 of@@|eqtb|, and in the
first part of region@@4.

@<Search |eqtb| for equivalents equal to |p|@>=
for q←single_base to box_base+255 do
	begin if equiv(q)=p then
		begin print_nl("EQUIV("); print_int(q); print_char(")");

@ \TeX\ is occasionally supposed to print diagnostic information that
goes only into the transcript file, unless |tracing_online| is nonzero.
Here are two routines that adjust the destination of print commands:

@p procedure begin_diagnostic; {prepare to do some tracing}
begin old_setting←selector; save_offset←offset;
if (tracing_online=0)∧(selector=term_and_err) then decr(selector);
procedure end_diagnostic; {restore proper conditions after tracing}
begin if offset>0 then print_ln;
if (tracing_online=0)∧(old_setting=term_and_err) then offset←save_offset;

@ Of course we had better declare two more global variables, if the previous
routines are going to work.

@* \[18] The hash table.
Control sequences are stored and retrieved by means of a fairly standard hash
table algorithm called the method of ``coalescing lists'' (cf.\ Algorithm 6.4C
in {\sl The Art of Computer Programming\/}). Once a control sequence enters the
table, it is never removed again, because there are complicated situations
involving \.{\\gdef} where the removal of a control sequence at the end of
a group would be a mistake preventable only by the introduction of a
complicated reference-count mechanism.

The actual sequence of letters forming a control sequence identifier is
stored in the |str_pool| array together with all the other strings. An
auxiliary array |hash| consists of items with two halfword fields per
word. The first of these, called |next(p)|, points to the next identifier
belonging to the same coalesced list as the identifier corresponding to@@|p|;
and the other, called |text(p)|, points to the |str_start| entry for
|p|'s identifier. If position@@|p| of the hash table is empty, we have
|text(p)=0|; if position |p| is either empty or the end of a coalesced
hash list, we have |next(p)=0|. An auxiliary pointer variable called
|hash_used| is maintained in such a way that all locations |p≥hash_used|
are nonempty. The global variable |cs_count| tells how many multi-letter
control sequences have been defined, if statistics are being kept.

A global boolean variable called |no_new_control_sequence| is set to
|true| during the time that new hash table entries are forbidden. Another
global variable, |cs_ptr|, records the position in |eqtb| of the most recently
found control sequence.

@d next(#) == hash[#].lh {link for coalesced lists}
@d text(#) == hash[#].rh {string number for control sequence name}
@d hash_is_full == (hash_used=hash_base) {test if all positions are occupied}

@!hash: array[hash_base..undefined_control_sequence-1] of two_halves;
	{the hash table}
@!hash_used:pointer; {allocation pointer for |hash|}
@!no_new_control_sequence:boolean; {are new identifiers legal?}
@!cs_ptr:pointer; {control sequence found here, zero if none found}
@!cs_count:integer; {total number of known identifiers}

@ @<Set init...@>=
no_new_control_sequence←true; {new identifiers are usually forbidden}
next(hash_base)←0; text(hash_base)←0;
for k←hash_base+1 to undefined_control_sequence-1 do hash[k]←hash[hash_base];

@ @<Initialize table entries...@>=
hash_used←special_control_sequence; {nothing is used}

@ Here is the subroutine that searches the hash table for an identifier
that matches a given string of length |l>1| appearing in |buffer[j..
(j+l-1)]|. If the identifier is found, the corresponding hash table address
is returned. Otherwise, if the global variable |no_new_control_sequence|
is |true|, the dummy address |undefined_control_sequence| is returned.
Otherwise the identifier is inserted into the hash table and its location
is returned.

@p function id_lookup(@!j,@!l:integer):pointer; {search the hash table}
label found; {go here if you found it}
var h:integer; {hash code}
@!p:pointer; {index in |hash| array}
@!k:pointer; {index in |buffer| array}
begin @<Compute the hash code |h|@>;
p←h+hash_base; {we start searching here; note that |0≤h<hash_prime|}
loop@+begin if (text(p)>0) and (length(text(p))=l) then
		if str_eq_buf(text(p),j) then goto found;
	if next(p)=0 then
		begin if no_new_control_sequence then
		else @<Insert a new control sequence after |p|, then make
			|p| point to it@>;
		goto found;
found: id_lookup←p;

@ @<Insert a new control...@>=
begin if text(p)>0 then
	begin repeat if hash_is_full then overflow("hash size",hash_size);
	until text(hash_used)=0; {search for an empty location in |hash|}
	next(p)←hash_used; p←hash_used;
for k←j to j+l-1 do append_char(buffer[k]);
stat incr(cs_count);@+tats@;@/

@ The value of |hash_prime| should be roughly 85\%\ of |hash_size|, and it
should be a prime number.  The theory of hashing tells us to expect fewer
than two table probes, on the average, when the search is successful.

@<Compute the hash code |h|@>=
for k←j+1 to j+l-1 do
	begin h←h+h+buffer[k];
	while h≥hash_prime do h←h-hash_prime;

@ Single-character control sequences do not need to be looked up in a hash
table, since we can use the character code itself as a direct address.
The procedure |print_cs| prints the name of a control sequence, given
a pointer to its address in |eqtb|. A space is printed after the name
if it consists entirely of letters. This procedure might be invoked with
invalid data, so it is ``extra robust''.

@<Basic printing...@>=
procedure print_cs(@!p:pointer); {prints a purported control sequence}
begin if p<hash_base then {single character}
	if p≥active_base then print(p-active_base)
	else if p<single_base then print_esc("IMPOSSIBLE.")
	else	begin print_esc(p-single_base);
		if ch_code(p)=letter then print_char(" ");
else if p≥undefined_control_sequence then print_esc("IMPOSSIBLE.")
else if text(p)=0 then print_esc("NONEXISTENT.")
else	begin print_esc(text(p)); print_char(" ");

@ Here is a similar procedure; it avoids the error checks, and it never
prints a space after the control sequence.

@p procedure sprint_cs(@!p:pointer); {prints a control sequence}
begin if p<hash_base then
	if p<active_base then print_esc(p-single_base)
	else print(p-active_base)
else print_esc(text(p));

@ We need to put \TeX's ``primitive'' control sequences into the hash
table, together with their command code (which will be the |eq_type|)
and an operand (which will be the |equiv|). The |primitive| procedure
does this, in a way that no \TeX\ user can. The global value |cur_val|
contains the new |eqtb| pointer after |primitive| has acted.

@p init procedure primitive(@!s:str_number;@!c:quarterword;@!o:halfword);
var k:pool_pointer; {index into |str_pool|}
@!j:small_number; {index into |buffer|}
@!l:small_number; {length of the string}
begin if s<128 then cur_val←s+single_base
else	begin k←str_start[s]; l←str_start[s+1]-k;
		{we will move |s| into the (empty) |buffer|}
	for j←0 to l-1 do buffer[j]←str_pool[k+j];
	cur_val←id_lookup(0,l); {|no_new_control_sequence| is |false|}
	flush_string; text(cur_val)←s; {we don't want to have the string twice}
eq_level(cur_val)←level_one; eq_type(cur_val)←c; equiv(cur_val)←o;

@ Many of \TeX's primitives need no |equiv|, since they are identifiable
by their |eq_type| alone. These primitives are loaded into the hash table
as follows:

@<Put each of \TeX's primitives into the hash table@>=
@!@:relax_}{\.{\\relax} primitive@>
@!@:let_}{\.{\\let} primitive@>
@!@:char_}{\.{\\char} primitive@>
@!@:math_char_}{\.{\\mathchar} primitive@>
@!@:mark_}{\.{\\mark} primitive@>
@!@:input_}{\.{\\input} primitive@>
@!@:penalty_}{\.{\\penalty} primitive@>
@!@:font_}{\.{\\font} primitive@>
@!@:Single-character primitives :}{\quad\.{\\:}@>
@!@:fam_}{\.{\\fam} primitive@>
@!@:number_}{\.{\\number} primitive@>
@!@:set_box_}{\.{\\setbox} primitive@>
@!@:unbox_}{\.{\\unbox} primitive@>
@!@:unskip_}{\.{\\unskip} primitive@>
@!@:last_skip_}{\.{\\lastskip} primitive@>
@!@:halign_}{\.{\\halign} primitive@>
@!@:valign_}{\.{\\valign} primitive@>
@!@:no_align_}{\.{\\noalign} primitive@>
@!@:vrule_}{\.{\\vrule} primitive@>
@!@:hrule_}{\.{\\hrule} primitive@>
@!@:insert_}{\.{\\insert} primitive@>
@!@:vadjust_}{\.{\\vadjust} primitive@>
@!@:ignore_space_}{\.{\\ignorespace} primitive@>
@!@:par_shape_}{\.{\\parshape} primitive@>
@!@:Single-character primitives /}{\quad\.{\\/}@>
@!@:accent_}{\.{\\accent} primitive@>
@!@:math_accent_}{\.{\\mathaccent} primitive@>
@!@:tex_info_}{\.{\\texinfo} primitive@>
@!@:delimiter_}{\.{\\delimiter} primitive@>
@!@:limit_switch_}{\.{\\limitswitch} primitive@>
@!@:non_script_}{\.{\\nonscript} primitive@>
@!@:vcenter_}{\.{\\vcenter} primitive@>
@!@:case_}{\.{\\case} primitive@>
@!@:else_}{\.{\\else} primitive@>
@!@:omit_}{\.{\\omit} primitive@>
@!@:group_begin_}{\.{\\groupbegin} primitive@>
@!@:group_end_}{\.{\\groupend} primitive@>
primitive(" ",ex_space,0);@/
@!@:Single-character primitives /}{\quad\.{\\\ }@>
@!@:radical_}{\.{\\radical} primitive@>

@ Each primitive has a corresponding inverse, so that it is possible to
display the cryptic numeric contents of |eqtb| in symbolic form.
Every call of |primitive| in this program is therefore accompanied by some
straightforward code that forms part of the |print_cmd_chr| routine
explained below.

@<Cases of |print_cmd_chr|...@>=
relax: print_esc("relax");
let: print_esc("let");
char_num: print_esc("char");
math_char_num: print_esc("mathchar");
mark: print_esc("mark");
input: print_esc("input");
break_penalty: print_esc("penalty");
def_font: print_esc("font");
set_font: print_esc(":");
set_family: print_esc("fam");
number: print_esc("number");
set_box: print_esc("setbox");
unbox: print_esc("unbox");
unskip: print_esc("unskip");
last_skip: print_esc("lastskip");
halign: print_esc("halign");
valign: print_esc("valign");
no_align: print_esc("noalign");
vrule: print_esc("vrule");
hrule: print_esc("hrule");
insert: print_esc("insert");
vadjust: print_esc("vadjust");
ignore_space: print_esc("ignorespace");
set_shape: print_esc("parshape");
ital_corr: print_esc("/");
accent: print_esc("accent");
math_accent: print_esc("mathaccent");
assign_tex_info: print_esc("texinfo");
delim_num: print_esc("delimiter");
limit_switch: print_esc("limitswitch");
non_script: print_esc("nonscript");
vcenter: print_esc("vcenter");
case_branch: print_esc("case");
else_code: print_esc("else");
omit: print_esc("omit");
group_begin: print_esc("groupbegin");
group_end: print_esc("groupend");
ex_space: print_esc(" ");
radical: print_esc("radical");

@ We will deal with the other primitives later, at some point in the program
where their |eq_type| and |equiv| values are more meaningful.  For example,
the primitives for math mode will be loaded when we consider the routines
that deal with formulas. It is easy to find where each particular
primitive was treated by looking in the index at the end; for example, the
module where |"radical"| entered |eqtb| is listed under `\.{\\radical}
primitive'. (Primitives consisting of a single nonalphabetic character,
@!like `\.{\\/}', are listed under `Single-character primitives'.)
@!@^Single-character primitives@>
@* \[19] Saving and restoring equivalents.
The nested structure provided by `$\.{\char'173}\ldotsm\.{\char'175}$' groups
in \TeX\ means that |eqtb| entries valid in outer groups should be saved
and restored later if they are overridden inside the braces. When a new |eqtb|
value is being assigned, the program therefore checks to see if the previous
entry belongs to an outer level. In such a case, the old value is placed
on the |save_stack| just before the new value enters |eqtb|. At the
end of a grouping level, i.e., when the right brace is sensed, the
|save_stack| is used to restore the outer values, and the inner ones are

Entries on the |save_stack| are of type |memory_word|. The top item on
this stack is |save_stack[p]|, where |p=save_ptr-1|; it contains three
fields called |save_type|, |save_level|, and |save_index|, and it is
interpreted in one of three ways:

\yskip\hang 1) If |save_type(p)=restore_old_value|, then
|save_index(p)| is a location in |eqtb| whose current value should
be destroyed at the end of the current group and replaced by |save_stack[p-1]|.
Furthermore if |save_index(p)≥int_base|, then |save_level(p)|
should replace the corresponding entry in |xeq_level|.

\yskip\hang 2) If |save_type(p)=restore_zero|, then |save_index(p)|
is a location in |eqtb| whose current value should be destroyed at the end
of the current group and replaced by |eqtb[undefined_control_sequence]|.

\yskip\hang 3) If |save_type(p)=level_boundary|, then |save_level(p)|
is a code explaining what kind of group we were previously in, and
|save_index(p)| points to the level boundary word at the bottom of
the entries for that group.

@d save_type(#)==save_stack[#].hh.b0 {classifies a |save_stack| entry}
@d save_level(#)==save_stack[#].hh.b1
	{saved level for regions 6 and 7, or group code}
@d save_index(#)==save_stack[#].hh.rh
	{|eqtb| location or |save_stack| location}
@d restore_old_value=0 {|save_type| when a value should be restored later}
@d restore_zero=1 {|save_type| when an undefined entry should be restored}
@d level_boundary=2 {|save_type| corresponding to beginning of group}

@ Here are the group codes that are used to discriminate between different
kinds of groups. They allow \TeX\ to decide what special actions, if any,
should be performed when a group ends.

Some groups are not supposed to be ended by right braces. For example,
the `\.\$' that begins a math formula causes a |math_delim_group| to
be started, and this should be terminated by a matching `\.\$'. Similarly,
a group that starts with \.{\\left} should end with \.{\\right}, and
one that starts with \.{\\groupbegin} should end with \.{\\groupend}.

@d bottom_level=0 {group code for the outside world}
@d simple_group=1 {group code for local structure only}
@d hbox_group=2 {code for `\.{\\hbox}\grp'}
@d vbox_group=3 {code for `\.{\\vbox}\grp'}
@d vtop_group=4 {code for `\.{\\vtop}\grp'}
@d true_group=5 {code for `\.{\\if...}\grp'}
@d false_group=6 {code for `\.{\\else}\grp'}
@d case_group=7 {code for `\.{\\case}\grp'}
@d align_group=8 {code for `\.{\\halign}\grp', `\.{\\valign}\grp'}
@d no_align_group=9 {code for `\.{\\noalign}\grp'}
@d output_group=10 {code for output routine}
@d math_group=11 {code for, e.g, `\.{\char'176}\grp'}
@d disc_group=12 {code for `\.{\\discretionary}\grp\grp\grp'}
@d insert_group=13 {code for `\.{\\insert}\grp', `\.{\\vadjust}\grp'}
@d vcenter_group=14 {code for `\.{\\vcenter}\grp'}
@d semi_simple_group=15 {code for `\.{\\groupbegin...\\groupend}'}
@d math_delim_group=16 {code for `\.{\$...\$}'}
@d math_left_group=17 {code for `\.{\\left...\\right}'}
@d max_group_code=17

@!group_code=0..max_group_code; {|save_index| for a level boundary}

@ The global variable |cur_group| keeps track of what sort of group we are
currently in. Another global variable, |cur_boundary|, points to the
topmost |level_boundary| word.  And |cur_level| is the current depth
of nesting, not counting |true_group| and |false_group| and |case_group|
levels. The routines are designed to preserve the condition that no entry
in the |save_stack| or in |eqtb| ever has a level greater than |cur_level|.

@ @<Globals...@>=
@!save_stack : array[0..save_size] of memory_word;
@!save_ptr : 0..save_size; {first unused entry on |save_stack|}
@!max_save_stack:0..save_size; {maximum usage of save stack}
@!cur_level: quarterword; {current nesting level for groups}
@!cur_group: group_code; {current group type}
@!cur_boundary: 0..save_size; {where the current level begins}

@ At this time it might be a good idea for the reader to review the introduction
to |eqtb| that was given above just before the long lists of parameter names.
Recall that the ``outer level'' of the program is |level_one|, since
undefined control sequences are assumed to be ``defined'' at |level_zero|.

@<Set init...@>=
save_ptr←0; cur_level←level_one; cur_group←bottom_level; cur_boundary←0;

@ The following macro is used to test if there is room for up to six more
entries on |save_stack|. By making a conservative test like this, we can
get by with testing for overflow in only a few places.

@d check_full_save_stack==if save_ptr>max_save_stack then
	begin max_save_stack←save_ptr;
	if max_save_stack>save_size-6 then overflow("save size",save_size);

@ Procedure |new_save_level| is called when a group begins. The
argument is a group identification code like `|hbox_group|'. After
calling this routine, it is safe to put five more entries on |save_stack|.

In some cases a few integer-valued items are placed onto the
|save_stack| just below a |level_boundary| word, because this is a
convenient place to keep information that is supposed to ``pop up'' just
when the group has finished.
For example, when `\.{\\hbox to 100pt}\grp' is being treated, the 100pt
dimension is stored on |save_stack| just before |new_save_level| is

We use the notation |saved(k)| to stand for an integer item that
appears in location |save_ptr+k| of the save stack.

@d saved(#)==save_stack[save_ptr+#].int

@p procedure new_save_level(@!c:group_code); {begin a new level of grouping}
begin check_full_save_stack;
save_type(save_ptr)←level_boundary; save_level(save_ptr)←cur_group;
if cur_level=max_quarterword then overflow("grouping levels",
	{quit if |(cur_level+1)| is too big to be stored in |eqtb|}
cur_boundary←save_ptr; incr(cur_level); incr(save_ptr); cur_group←c;

@ Just before an entry of |eqtb| is changed, the following procedure should
be called to update the other data structures properly. It is important
to keep in mind that reference counts in |m| include references from
within |save_stack|, so these counts must be handled carefully.

@p procedure eq_destroy(@!w:memory_word); {gets ready to forget |w|}
var q:pointer; {|equiv| field of |w|}
begin case eq_type_field(w) of
call,long_call,outer_call,long_outer_call: delete_token_ref(equiv_field(w));
glue_ref: delete_glue_ref(equiv_field(w));
shape_ref: begin q←equiv_field(w); {we need to free a \.{\\parshape} block}
	if q≠null then free_node(q,link(q)+link(q)+1);
box_ref: flush_node_list(equiv_field(w));
othercases do_nothing

@ To save a value of |eqtb[p]| that was defined at level |l|, we
can use the following subroutine.

@p procedure eq_save(@!p:pointer;@!l:quarterword); {saves |eqtb[p]|}
begin check_full_save_stack;
if l=level_zero then save_type(save_ptr)←restore_zero
else	begin save_stack[save_ptr]←eqtb[p]; incr(save_ptr);
save_level(save_ptr)←l; save_index(save_ptr)←p; incr(save_ptr);

@ The procedure |eq_define| defines an |eqtb| entry having specified
|eq_type| and |equiv| fields, and saves the former value if appropriate.
This procedure is used only for entries in the first four regions of |eqtb|,
i.e., only for entries that have |eq_type| and |equiv| fields.
After calling this routine, it is safe to put four more entries on
|save_stack|, provided that there was room for four more entries before
the call, since |eq_save| makes the necessary test.

@p procedure eq_define(@!p:pointer;@!t:quarterword;@!e:halfword);
	{new data for |eqtb|}
begin if eq_level(p)=cur_level then eq_destroy(eqtb[p])
else if cur_level>level_one then eq_save(p,eq_level(p));
eq_level(p)←cur_level; eq_type(p)←t; equiv(p)←e;

@ The counterpart of |eq_define| for the remaining (fullword) positions in
|eqtb| is called |eq_word_define|. Since |xeq_level[p]≥level_one| for all
|p|, a `|restore_zero|' will never be used in this case.

@p procedure eq_word_define(@!p:pointer;@!w:integer);
begin if xeq_level[p]≠cur_level then
	begin eq_save(p,xeq_level[p]); xeq_level[p]←cur_level;

@ The |eq_define| and |eq_word_define| routines take care of local definitions.
@^global definitions@>
Global defi\-ni\-tions are done in almost the same way, but there is no need
to save old values, and the new value is associated with |level_one|.

@p procedure geq_define(@!p:pointer;@!t:quarterword;@!e:halfword);
	{global |eq_define|}
begin eq_destroy(eqtb[p]);
eq_level(p)←level_one; eq_type(p)←t; equiv(p)←e;
procedure geq_word_define(@!p:pointer;@!w:integer); {global |eq_word_define|}
begin eqtb[p].int←w; xeq_level[p]←level_one;

@ The |unsave| routine goes the other way, taking items off of |save_stack|;
it is not used when |cur_group| is |true_group|, |false_group|, or |case_group|,
since those groups are ``transparent'' with respect to their environment.

This routine takes care of restoration when a level ends; everything
belonging to the topmost group is cleared off of the save stack.

@p procedure unsave; {pops the top level off the save stack}
label done;
var p:pointer; {position to be restored}
@!l:quarterword; {saved level, if in fullword regions of |eqtb|}
begin if cur_level>level_one then
	begin decr(cur_level);
	@<Clear off top level from |save_stack|@>;
else confusion("curlevel"); {|unsave| is not used when |cur_group=bottom_level|}
@:confusion curlevel}{\quad curlevel@>

@ @<Clear off...@>=
loop@+begin decr(save_ptr);
	if save_type(save_ptr)=level_boundary then goto done;
	if save_type(save_ptr)=restore_old_value then
		begin l←save_level(save_ptr); decr(save_ptr);
	else save_stack[save_ptr]←eqtb[undefined_control_sequence];
	@<Store \(s)|save_stack[save_ptr]| in |eqtb[p]|, unless
		|eqtb[p]| holds a global value@>;
done: cur_group←save_level(save_ptr); cur_boundary←save_index(save_ptr)

@ A global definition, which sets the level to |level_one|,
@^global definitions@>
will not be undone by |unsave|. If at least one global definition of
|eqtb[p]| has been carried out within the group that just ended, the
last such definition will therefore survive.

@<Store \(s)|save...@>=
if p<int_base then
	if eq_level(p)=level_one then
		eq_destroy(save_stack[save_ptr]) {destroy the saved value}
	else	begin eq_destroy(eqtb[p]); {destroy the current value}
		eqtb[p]←save_stack[save_ptr]; {restore the saved value}
else if xeq_level[p]≠level_one then
	begin eqtb[p]←save_stack[save_ptr]; xeq_level[p]←l;

@ When a |true_group| or one of its relatives comes to an end, we merely
want to delete the top |level_boundary| word, restoring the former
values of |cur_group| and |cur_boundary|. The |leave_transparent_group|
routine does this.

@p procedure leave_transparent_group;
var j,@!k:0..save_size; {indices into |save_stack|}
begin j←save_index(cur_boundary); cur_group←save_level(cur_boundary);
if save_ptr>1 then
	for k←cur_boundary to save_ptr-2 do save_stack[k]←save_stack[k+1];
decr(save_ptr); cur_boundary←j;

@ The magnification parameter should not be varied during a \TeX\ job, since
a single magnification is applied to an entire run. The global variable
|mag_set| is set to the current magnification whenever it becomes necessary
to ``freeze'' the magnification at a particular value.

@!mag_set:integer; {if nonzero, this magnification should be used henceforth}

@ @<Set init...@>=

@ The |prepare_mag| subroutine is called whenever \TeX\ wants to use |mag|
for magnification.

@p procedure prepare_mag;
begin if (mag_set>0)∧(mag≠mag_set) then
	begin print_nl("! Incompatible magnification ("); print_int(mag);
@.Incompatible magnification@>
	print("); the previous value will be retained");
	help2("I can handle only one magnification ratio per job. So I've")@/
	("reverted to the magnification you used earlier on this run.");@/
	geq_word_define(int_base+mag_code,mag_set); {|mag←mag_set|}
if (mag≤0)∨(mag>32768) then
	begin print_nl("! Illegal magnification has been changed to 1000");@/
@.Illegal magnification...@>
	help1("The magnification ratio must be between 1 and 32768.");
	int_error(mag); geq_word_define(int_base+mag_code,1000);
@* \[20] Token lists.
A \TeX\ token is either a character or a control sequence, and it is
represented internally in one of two ways: (1)@@A character whose ascii code
number is |c| and whose command code is |m| is represented as the
number $2^8m+c$; the command code is in the range
|1≤m≤14|. (2)@@A control sequence whose |eqtb| address is |p| is
represented as the number |cs_token_flag+p|. Here |cs_token_flag=@t$2^{12}$@>|
is larger than $2^8m+c$, yet it is small enough that |cs_token_flag+p<
max_halfword|; thus, a token fits comfortably in a halfword.

A token |t| represents a |left_brace| command if and only if
|t<left_brace_limit|; it represents a |right_brace| command if and only if
we have |left_brace_limit≤t<right_brace_limit|; and it represents a |match| or
|end_match| command if and only if |match_token≤t≤end_match_token|.
The following definitions take care of these token-oriented constants
and a few others.

@d cs_token_flag==@'10000 {amount added to the |eqtb| location in a
	token that stands for a control sequence}
@d left_brace_token=@'0400 {$2^8\cdot|left_brace|$}
@d left_brace_limit=@'1000 {$2^8\cdot(|left_brace|+1)$}
@d right_brace_token=@'1000 {$2^8\cdot|right_brace|$}
@d right_brace_limit=@'1400 {$2^8\cdot(|right_brace|+1)$}
@d math_delim_token=@'1400 {$2^8\cdot|math_delim|$}
@d out_param_token=@'2400 {$2^8\cdot|out_param|$}
@d endv_token=@'4400 {$2^8\cdot|endv|$}
@d space_token=@'5040 {$2^8\cdot|spacer|+|" "|$}
@d letter_token=@'5400 {$2^8\cdot|letter|$}
@d other_token=@'6000 {$2^8\cdot|other_char|$}
@d match_token=@'6400 {$2^8\cdot|match|$}
@d end_match_token=@'7000 {$2^8\cdot|end_match|$}

@ @<Check the ``constant''...@>=
if cs_token_flag+undefined_control_sequence>max_halfword then bad←21;

@ A token list is a singly linked list of one-word nodes in |mem|, where
each word contains a token and a link. Macro definitions, output-routine
definitions, marks, and \.{\\send} texts are kept in \TeX's memory in the
form of token lists, preceded by a node that has a reference count in its
|token_ref_count| field. The token stored in location |p| is called

Three special commands appear in the token lists of macro definitions.
When |m=match|, it means that \TeX\ should scan a parameter
for the current macro; when |m=end_match|, it means that parameter
matching should end and \TeX\ should start reading the macro text; and
when |m=out_param|, it means that \TeX\ should insert parameter
number |c| into the text at this point.

The enclosing \.{\char'173} and \.{\char'175} characters of a macro
definition are omitted, but the final right brace of an output routine
is included at the end of its token list.

Here is an example macro definition that illustrates these conventions.
After \TeX\ processes the text
$$\.{\\def\\mac a\#1\#2 \\b \{\#1\\-a \#\#1\#2 \#2\}}$$
the definition of \.{\\mac} is represented as a token list containing
(reference count), |letter|\,\.a, |match|\,\#, |match|\,\#, |spacer|\,\.\ ,
|letter|\,\.{\\b}, |end_match|,\cr
|out_param|\,1, \.{\\-}, |letter|\,\.a, |spacer|\,\.\ , |mac_param|\,\#,
|out_param|\,2, |spacer|\,\.\ , |out_param|\,2.\cr}}$$
The procedure |scan_toks| builds such token lists, and |macro_call|
does the param\-eter matching.
@^reference counts@>

Examples such as
$$\.{\\def\\m\{\\def\\m\{a\}\ b\}}$$
explain why reference counts would be needed even if \TeX\ had no \.{\\let}
operation: When the token list for \.{\\m} is being read, the redefinition of
\.{\\m} changes the |eqtb| entry before the token list has been fully
consumed, so we dare not simply destroy a token list when its
control sequence is being redefined.

If the parameter-matching part of a definition ends with `\.{\#\{}',
the corresponding token list will have `\.\{' just before the `|end_match|'
and also at the very end. The first `\.\{' is used to delimit the parameter; the
second one keeps the first from disappearing.

@ The procedure |show_token_list|, which prints a symbolic form of
the token list that starts at a given node |p|, illustrates these
conventions. The token list being displayed should not begin with a reference
count. However, the procedure is intended to be robust, so that if the
memory links are awry or if |p| is not really a pointer to a token list,
nothing catastrophic will happen.

An additional parameter |q| is also given; this parameter is either zero
or it points to a node in the token list where a certain magic computation
takes place that will be explained later. (Basically, |q| is nonzero when
we are printing the two-line context information at the time of an error
message; |q| marks the place corresponding to where the second line
should begin.)

For example, if |p| points to the node containing the first \.a in the
token list above, then |show_token_list| will print the string
$$\hbox{`\.{a\#1\#2\ \\b\ ->\#1\\-a\ \#\#1\#2\ \#2}';}$$
and if |q| points to the node containing the second \.a,
the magic computation will be performed just before the second \.a is printed.

The generation will stop, and `\.{\\ETC.}' will be printed, if the length
of printing exceeds a given limit@@|l|. Anomalous entries are printed in the
form of control sequences that are not followed by a blank space, e.g.,
`\.{\\BAD.}'; this cannot be confused with actual control sequences because
a real control sequence named \.{BAD} would come out `\.{\\BAD\ }'.

@<Declare the procedure called |show_token_list|@>=
procedure show_token_list(@!p,@!q:integer;@!l:integer);
label exit;
var m,@!c:integer; {pieces of a token}
@!match_chr:ascii_code; {character used in a `|match|'}
@!n:ascii_code; {the highest parameter number, as an ascii digit}
begin match_chr←"#"; n←"0"; tally←0;
while (p≠null) and (tally<l) do
	begin if p=q then @<Do magic computation@>;
	@<Display token |p|, and |return| if there are problems@>;
if p≠null then print_esc("ETC.");

@ @<Display token |p|...@>=
if (p<hi_mem_base) or (p>mem_end) then
	begin print_esc("CLOBBERED."); return;
if info(p)≥cs_token_flag then print_cs(info(p)-cs_token_flag)
else	begin m←info(p) div @'400; c←info(p) mod @'400;
	if (info(p)<0)∨(c>127) then print_esc("BAD.")
	else @<Display the token $(|m|,|c|)$@>;

@ The procedure usually ``learns'' the character code used for macro
param\-eters by seeing one in a |match| command before it runs into any
|out_param| commands.

@<Display the token ...@>=
case m of
	letter,other_char: print(c);
left_brace:begin if (link(p)≥hi_mem_base)∧(link(p)≤mem_end) then
		if info(link(p)) div @'400=end_match then
mac_param: begin print(c); print(c);
out_param: begin print(match_chr);
	if c≤9 then print_char(c+"0")
	else	begin print_char("!"); return;
match: begin match_chr←c; print(c); incr(n); print_char(n);
	if n>"9" then return;
end_match: print("->");
othercases print_esc("BAD.")

@ Here's the way we sometimes want to display a token list, given a pointer
to its reference count; the pointer may be null.

@p procedure token_show(@!p:pointer);
begin if p=null then print("(null)")
else show_token_list(link(p),null,1000);
@* \[21] Introduction to the syntactic routines.
Let's pause a moment now and try to look at the Big Picture.
The \TeX\ program consists of three main parts: syntactic routines,
semantic routines, and output routines. The chief purpose of the
syntactic routines is to deliver the user's input to the semantic routines,
one token at a time. The semantic routines act as an interpreter
responding to these tokens, which may be regarded as commands. And the
output routines are periodically called on to convert box-and-glue
lists into a compact set of instructions that will be sent
to a typesetter. We have discussed the basic data structures and utility
routines of \TeX, so we are good and ready to plunge into the real activity by
considering the syntactic routines.

Our current goal is to come to grips with the |get_next| procedure,
which is the keystone of \TeX's input mechanism. Each call of |get_next|
sets the value of two variables |cur_cmd| and |cur_chr|, representing
the next input token.
$$\lpile{\hbox{|cur_cmd| denotes a command code from the long list of codes
given above;}\cr
\hbox{|cur_chr| denotes a character code or other modifier of the command
Underlying this external behavior of |get_next| is all the machinery
necessary to convert from character files to tokens. At a given time we
may be only partially finished with the reading of several files (for
which \.{\\input} was specified), and partially finished with the expansion
of some user-defined macros and/or some macro parameters, and partially
finished with the generation of some text in a template for \.{\\halign},
and so on. When reading a character file, special characters must be
classified as math delimiters, etc.; comments and extra blank spaces must
be removed, paragraphs must be recognized, and control sequences must be
found in the hash table. Furthermore there are occasions in which the
scanning routines have looked ahead for a word like `\.{plus}' but only
part of that word was found, hence a few characters must be put back
into the input and scanned again.

To handle these situations, which might all be present simultaneously,
\TeX\ uses various stacks that hold information about the incomplete
activities, and there is a finite state control for each level of the
input mechanism. These stacks record the current state of an implicitly
recursive process, but the |get_next| procedure is not recursive.
Therefore it will not be difficult to translate these algorithms into
low-level languages that do not support recursion.

@!cur_cmd : eight_bits; {current command set by |get_next|}
@!cur_chr: halfword; {operand of current command}
@!cur_tok:halfword; {packed representative of |cur_cmd| and |cur_chr|}

@ The |print_cmd_chr| routine prints a symbolic interpretation of a
command code and its modifier. This is used in certain `\.{You can\'t}'
error messages, and in the implementation of diagnostic routines like

The body of |print_cmd_chr| is a rather tedious listing of print
commands, and most of it is essentially an inverse to the |primitive|
routine that enters a \TeX\ primitive into |eqtb|. Therefore much of
this procedure appears elsewhere in the program,
together with the corresponding |primitive| calls.

@d chr_cmd(#)==begin print(#); print_ascii(chr_code);

@p procedure print_cmd_chr(@!cmd:quarterword;@!chr_code:halfword);
begin case cmd of
left_brace: chr_cmd("begin-group character ");
right_brace: chr_cmd("end-group character ");
math_delim: chr_cmd("math mode character ");
tab_mark: if chr_code=128 then print_esc("span") {|span_code=128|}
	else chr_cmd("alignment tab character ");
car_ret: print_esc("cr");
mac_param: chr_cmd("macro parameter character ");
sup_mark: chr_cmd("superscript character ");
sub_mark: chr_cmd("subscript character ");
endv: print("end of alignment template");
spacer: print("blank space");
letter: chr_cmd("the letter ");
other_char: chr_cmd("the character ");
@t\4@>@<Cases of |print_cmd_chr| for symbolic printing of primitives@>@/
othercases print("[unknown command code!]")
@* \[22] Input stacks and states.
\TeX\ uses two different conventions for representing sequential stacks.
@^stack conventions@>@^conventions for representing stacks@>

\yskip\hang 1) If there is frequent access to the top entry, and if the
stack is essentially never empty, then the top entry is kept in a global
variable (even better would be a machine register), and the other entries
appear in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the
semantic stack described above is handled this way, and so is the input
stack that we are about to study.

\yskip\hang 2) If there is infrequent top access, the entire stack contents
are in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the |save_stack|
is treated this way, as we have seen.

The state of \TeX's input mechanism appears in the input stack, whose
entries are records with six fields, called |state|, |index|, |start|, |loc|,
|limit|, and |name|. This stack is maintained with
convention@@(1), so it is declared in the following way:

@!in_state_record = record 
	@!state_field, @!index_field: quarterword;
	@!start_field,@!loc_field, @!limit_field, @!name_field: halfword;

@ @<Globals...@>=
@!input_stack : array[0..stack_size] of in_state_record;
@!input_ptr : 0..stack_size; {first unused location of |input_stack|}
@!max_in_stack: 0..stack_size; {largest value of |input_ptr| when pushing}
@!cur_input : in_state_record;
	{the ``top'' input state, according to convention (1)}

@ We've already defined the special variables |@!loc==cur_input.loc_field|
in our discussion of basic input-output routines. The other components of
the |cur_input| are defined in the same way:

@d state==cur_input.state_field {current scanner state}
@d index==cur_input.index_field {reference for buffer information}
@d start==cur_input.start_field {starting position in |buffer|}
@d limit==cur_input.limit_field {end of current line in |buffer|}
@d name==cur_input.name_field {name of the current file}

@ Let's look more closely now at the six control variables
assuming that \TeX\ is reading a line of characters that have been input
from some file or from the user's terminal. There is an array called
|buffer| that acts as a stack of all lines of characters that are
currently being read from files, including all lines on subsidiary
levels of the input stack that are not yet completed. We will return to
the other lines when we are finished with the present input file.

(Incidentally, on a machine with byte-oriented addressing, it would be
appropriate to combine |buffer| with the |str_pool| array,
letting the buffer entries grow downward from the top of the string pool
and checking that these two tables don't bump into each other.)

The line we are currently working on begins in position |start| of the
buffer; the next character we are about to read is |buffer[loc]|; and
|limit| is the location of the last character present.  If |loc>limit|,
the line has been completely read. Usually |buffer[limit]| is a
|carriage_return| character, denoting the end of a line, but this is not
true if the current line is an insertion that was entered on the user's
terminal in response to an error message. 

The |name| variable is a string number that designates the name of
the current file, if we are reading a text file; it is zero if we
are reading from the terminal.

The |state| variable has one of three values, when we are scanning such
$$\baselineskip 15pt\vbox{\halign{#\hfil\cr
1) |state=mid_line| is the normal state.\cr
2) |state=skip_blanks| is like |mid_line|, but blanks are ignored.\cr
3) |state=new_line| is the state at the beginning of a line.\cr}}$$
These state values are assigned numeric codes so that if we add the state
code to the next character's command code, we get distinct values. For
example, `|mid_line+spacer|' stands for the case that a blank
space character occurs in the middle of a line when it is not being
ignored; after this case is processed, the next value of |state| will
be |skip_blanks|.

@d mid_line=1 {|state| code when scanning a line of characters}
@d skip_blanks=2+max_char_code {|state| code when ignoring blanks}
@d new_line=3+max_char_code+max_char_code {|state| code at start of line}

@ Additional information about the current line is available via the
|index| variable, which counts how many lines of characters are present
in the buffer below the current level. We have |index=0| when reading
from the terminal and prompting the user for each line; then if the user types,
e.g., `\.{\\input paper}', we will have |index=1| while reading
the file \.{paper.tex}. However, it does not follow that |index| is the
same as the input stack pointer, since many of the levels on the input
stack come from token lists. For example, the instruction `\.{\\input paper}'
might occur in a token list.

The variable |in_open| is equal to the |index|
value of the highest non-token-list level. Thus, the number of partially read
lines in the buffer is |in_open+1|, and we have |in_open=index|
when we are not reading a token list.

If we are not currently reading from the terminal, we are reading from the
file variable |input_file[index]|. We use the notation |terminal_input| as
a convenient abbreviation for |name=0|, and |cur_file| as an abbreviation
for |input_file[index]|.

The global variable |line| contains the line number in the topmost
open file, for use in error messages. If we are not reading from
the terminal, |line_stack[index]| holds the line number for the
enclosing level, so that |line| can be restored when the current
file has been read.

If more information about the input state is needed, it can be
included in small arrays like those shown here. For example,
the current page or segment number in the input file might be
put into a variable |@!page|, maintained for enclosing levels in
`\unskip|@!page_stack:array[0..max_in_open] of integer|'
by analogy with |line_stack|.
@^system dependencies@>

@d terminal_input==(name=0) {are we reading from the terminal?}
@d cur_file==input_file[index] {the current |alpha_file| variable}

@!in_open : 0..max_in_open; {the number of lines in the buffer, less one}
@!input_file : array[1..max_in_open] of alpha_file;
@!line : integer; {current line number in the current source file}
@!line_stack : array[0..max_in_open] of integer;

@ Users of \TeX\ sometimes forget to balance left and right braces properly,
and one of the ways \TeX\ tries to spot such errors is by considering an
input file as broken into subfiles by control sequences that
are declared to be \.{\\outer}.

A variable called |scanner_status| tells \TeX\ whether or not to complain
when a subfile ends. This variable has five possible values:

\yskip\hang|normal|, means that a subfile can safely end here without incident.

\yskip\hang|skipping|, means that a subfile can safely end here, but not a file,
because we're reading past some conditional text that was not selected.

\yskip\hang|defining|, means that a subfile shouldn't end now because a
macro is being defined.

\yskip\hang|matching|, means that a subfile shouldn't end now because a
macro is being used and we are searching for the end of its arguments.

\yskip\hang|aligning|, means that a subfile shouldn't end now because we are
not finished with the preamble of an \.{\\halign} or \.{\\valign}.

If the |scanner_status| is not |normal|, the variable |warning_index| points
to the |eqtb| location for the relevant control sequence name to print
in an error message.

@d skipping=1 {|scanner_status| when passing conditional text}
@d defining=2 {|scanner_status| when reading a macro definition}
@d matching=3 {|scanner_status| when reading macro arguments}
@d aligning=4 {|scanner_status| when reading an alignment preamble}

@!scanner_status : normal..aligning; {can a subfile end now?}
@!warning_index : pointer; {identifier relevant to non-|normal| scanner status}

@ Here is a procedure that uses |scanner_status| to print a warning message
when a subfile has ended, and at certain other crucial times:

@<Declare the procedure called |runaway|@>=
procedure runaway;
var p:pointer; {head of runaway list}
begin if scanner_status>skipping then
	begin print_nl("Runaway ");
	case scanner_status of
	defining: begin print("definition?"); p←temp_head;
	matching: begin print("argument?"); p←temp_head;
	aligning: begin print("preamble?"); p←hold_head;
	end; {there are no other cases}
	print_ln; show_token_list(link(p),null,error_line-10);

@ However, all this discussion about input state really applies only to the
case that we are inputting from a file. There is another important case,
namely when we are currently getting input from a token list. In this case
|state=token_list|, and the conventions about the other state variables
are different:

\yskip\hang|loc| is a pointer to the current node in the token list, i.e.,
the node that will be read next. If |loc=null|, the token list has been
fully read.

\yskip\hang|start| points to the first node of the token list; this node
may or may not contain a reference count, depending on the type of token
list involved.

\yskip\hang|token_type|, which takes the place of |index| in the
discussion above, is a code number that explains what kind of token list
is being scanned.

\yskip\hang|name| points to the |eqtb| address of the control sequence
being expanded, if the current token list is a macro.

\yskip\hang|param_start|, which takes the place of |limit|, tells where
the parameters of the current macro begin in the |param_stack|, if the
current token list is a macro.

\yskip\noindent The |token_type| can take several values, depending on
where the current token list came from:

\yskip\hang|parameter|, if a parameter is being scanned;

\hang|u_template|, if the \<u↓j> part of an alignment
template is being scanned;

\hang|v_template|, if the \<v↓j> part of an alignment
template is being scanned;

\hang|inserted|, if the token list being scanned has been inserted as
`to be read again' or as the text expansion of a \.{\\count} or similar

\hang|macro|, if a user-defined control sequence is being scanned;

\hang|output_text|, if an \.{\\output} routine is being scanned;

\hang|every_par_text|, if the text of \.{\\everypar} is being scanned;

\hang|mark_text|, if the text of a \.{\\mark} is being scanned.

\hang|send_text|, if the text of a \.{\\send} is being scanned.

The token list begins with a reference count if and only if |token_type≥
@^reference counts@>

@d token_list=0 {|state| code when scanning a token list}
@d token_type==index {type of current token list}
@d param_start==limit {base of macro parameters in |param_stack|}
@d parameter=0 {|token_type| code for parameter}
@d u_template=1 {|token_type| code for \<u↓j> template}
@d v_template=2 {|token_type| code for \<v↓j> template}
@d inserted=3 {|token_type| code for short inserted texts}
@d macro=4 {|token_type| code for defined control sequences}
@d output_text=5 {|token_type| code for output routines}
@d every_par_text=6 {|token_type| code for \.{\\everypar}}
@d mark_text=7 {|token_type| code for \.{\\topmark}, etc.}
@d send_text=8 {|token_type| code for \.{\\send}}

@ The |param_stack| is an auxiliary array used to hold pointers to the token
lists for parameters at the current level and subsidiary levels of input.
This stack is maintained with convention (2), and it grows at a different
rate from the others.

@!param_stack:array [0..param_size] of pointer;
	{token list pointers for parameters}
@!param_ptr:0..param_size; {first unused entry in |param_stack|}
	{largest value of |param_ptr|, will be |≤param_size+9|}

@ The input routines must also interact with the processing of
\.{\\halign} and \.{\\valign}, since the appearance of tab marks and
\.{\\cr} in certain places is supposed to trigger the beginning of special
\<v↓j> template text in the scanner. This magic is accomplished by an
|align_state| variable that is increased by@@1 when a `\.{\char'173}' is
scanned and decreased by@@1 when a `\.{\char'175}' is scanned. The |align_state|
is nonzero during the \<u↓j> template, after which it is set to zero; the
\<v↓j> template begins when a tab mark or \.{\\cr} occurs at a time that

@!align_state:integer; {group level with respect to current alignment}

@ Thus, the ``current input state'' can be very complicated indeed, since there
can be many levels and each level can arise in a variety of ways. The
|show_context| procedure, which is used by \TeX's error-reporting routine to
print out the current input state on all levels down to the most recent
line of characters from an input file, illustrates most of these conventions.
The global variable |base_ptr| contains the lowest level that was
displayed by this procedure.

@!base_ptr:0..stack_size; {shallowest level shown by |show_context|}

@ The status at each level is indicated by printing two lines, where the first
line indicates what was read so far and the second line shows what remains
to be read. The context is cropped, if necessary, so that the first line
contains at most |half_error_line| characters, and the second contains
at most |error_line|. Input levels whose |token_type| is `|inserted|'
are shown only if they have not been fully read.

@p procedure show_context; {prints where the scanner is}
label done;
var old_setting:0..max_selector; {saved |selector| setting}
@<Local variables for formatting calculations@>@/
begin base_ptr←input_ptr; input_stack[base_ptr]←cur_input;
	{store current state}
loop@+begin cur_input←input_stack[base_ptr]; {enter into the context}
	@<Display the current context@>;
	if (state≠token_list) then
		if (not terminal_input) or (base_ptr=0) then goto done;
done: cur_input←input_stack[input_ptr]; {restore original state}

@ @<Display the current context@>=
if (state≠token_list) or (token_type≠inserted) or (loc≠null) then
{we omit inserted token lists that have already been read}
	begin tally←0; {get ready to count characters}
	if state≠token_list then
		begin @<Print location of current line@>;
		@<Pseudoprint the line@>;
	else	begin @<Print type of token list@>;
		@<Pseudoprint the token list@>;
	selector←old_setting; {stop pseudoprinting}
	@<Print two lines using the tricky pseudoprinted information@>;

@ This routine should be changed, if necessary, to give the best possible
indication of where the current line resides in the input file.
For example, on some systems it is best to print both a page and line number.
@^system dependencies@>

@<Print location of current line@>=
if terminal_input then
	if base_ptr=0 then print_nl("<*>") else print_nl("<**>")
else	begin print_nl("l."); print_int(line);
print_char(" ")

@ @<Print type of token list@>=
case token_type of
parameter: print_nl("<argument> ");
u_template,v_template: print_nl("<template> ");
inserted: print_nl("<to be read again> ");
macro: begin print_ln; print_cs(name);
output_text: print_nl("<output> ");
every_par_text: print_nl("<everypar> ");
mark_text: print_nl("<mark> ");
send_text: print_nl("<send> ");
othercases print_nl("?") {this should never happen}

@ Here it is necessary to explain a little trick. We don't want to store a long
string that corresponds to a token list, because that string might take up
lots of memory; and we are printing during a time when an error message is
being given, so we dare not do anything that might overflow one of \TeX's
tables. So `pseudoprinting' is the answer: We enter a mode of printing
that stores char\-acters into a buffer of length |error_line|, where character
$k+1$ is placed into \hbox{|trick_buf[k mod error_line]|} if
|k<trick_count|, otherwise character |k| is dropped. Initially we set
|tally←0| and |trick_count←1000000|; then when we reach the
point where transition from line 1 to line 2 should occur, we
set |first_count←tally| and |trick_count←@tmax@>(error_line,
tally+1+error_line-half_error_line)|. At the end of the
pseudoprinting, the values of |first_count|, |tally|, and
|trick_count| give us all the information we need to print the two lines,
and all of the necessary text is in |trick_buf|.

Namely, let |l| be the length of the descriptive information that appears
on the first line. The length of the context information gathered for that
line is |k=first_count|, and the length of the context information
gathered for line@@2 is $m=\min(|tally|, |trick_count|)-k$. If |l+k≤h|,
where |h=half_error_line|, we print |trick_buf[0..k-1]| after the
descriptive information on line@@1, and set |n←l+k|; here |n| is the
length of line@@1. If $l+k>h$, some cropping is necessary, so we set |n←h|
and print `\.{...}' followed by
where subscripts of |trick_buf| are circular modulo |error_line|. The
second line consists of |n|@@spaces followed by |trick_buf[k..(k+m-1)]|,
unless |n+m>error_line|; in the latter case, further cropping is done.
This is easier to program than to explain.

@<Local variables for formatting...@>=
@!i:0..buf_size; {index into |buffer|}
@!j:0..buf_size; {end of current line in |buffer|}
@!l:0..half_error_line; {length of descriptive information on line 1}
@!m:integer; {context information gathered for line 2}
@!n:0..error_line; {length of line 1}
@!p: integer; {starting or ending place in |trick_buf|}
@!q: integer; {temporary index}

@ The following code sets up the print routines so that they will gather
the desired information.

@d begin_pseudoprint==
	begin l←tally; tally←0; selector←pseudo;
@d set_trick_count==
	begin first_count←tally;
	if trick_count<error_line then trick_count←error_line;

@ And the following code uses the information after it has been gathered.

@<Print two lines using the tricky pseudoprinted information@>=
if trick_count=1000000 then set_trick_count;
	{|set_trick_count| must be performed}
if tally<trick_count then m←tally-first_count
else m←trick_count-first_count; {context on line 2}
if l+first_count≤half_error_line then
	begin p←0; n←l+first_count;
else	begin print("..."); p←l+first_count-half_error_line+3;
for q←p to first_count-1 do print_char(trick_buf[q mod error_line]);
for q←1 to n do print_char(" "); {print |n| spaces to begin line@@2}
if m+n≤error_line then p←first_count+m else p←first_count+(error_line-n-3);
for q←first_count to p-1 do print_char(trick_buf[q mod error_line]);
if m+n>error_line then print("...")

@ But the trick is distracting us from our current goal, which is to
understand the input state. So let's concentrate on the data structures that
are being pseudoprinted as we finish up the |show_context| procedure.

@<Pseudoprint the line@>=
if buffer[limit]=carriage_return then j←limit
else j←limit+1; {determine the effective end of the line}
if j>0 then for i←start to j-1 do
	begin if i=loc then set_trick_count;

@ @<Pseudoprint the token list@>=
if token_type<macro then show_token_list(start,loc,100000)
else show_token_list(link(start),loc,100000) {avoid reference count}

@ Here is the missing piece of |show_token_list| that is activated when the
token beginning line@@2 is about to be shown:

@<Do magic computation@>=set_trick_count
@* \[23] Maintaining the input stacks.
The following subroutines change the input status in commonly needed ways.

First comes |push_input|, which stores the current state and creates a
new level (having, initially, the same properties as the old).

@p procedure push_input; {enter a new input level, save the old}
begin if input_ptr>max_in_stack then
	begin max_in_stack←input_ptr;
	if input_ptr=stack_size then overflow("input stack size",stack_size);
input_stack[input_ptr]←cur_input; {stack the record}

@ And of course what goes up must come down.

@p procedure pop_input; {leave an input level, re-enter the old}
begin decr(input_ptr); cur_input←input_stack[input_ptr];

@ Here is a procedure that starts a new level of token-list input, given
a token list |p| and its type |t|. If |t=macro|, the calling routine should
set |name| and |loc|.

@d ins_list(#)==begin_token_list(#,inserted) {inserts a simple token list}

@p procedure begin_token_list(@!p:pointer;@!t:quarterword);
begin push_input; state←token_list; start←p; token_type←t;
if t≥macro then {the token list starts with a reference count}
	begin add_token_ref(p);
	if t=macro then param_start←param_ptr@+else loc←link(p);
else loc←p;

@ When a token list has been fully scanned, the following computations
should be done as we leave that level of input.

@p procedure end_token_list; {leave a token-list input level}
begin if token_type≥inserted then {token list to be deleted}
	begin if token_type=inserted then flush_list(start)
	else	begin delete_token_ref(start); {update reference count}
		if token_type=macro then {parameters must be flushed}
			while param_ptr>param_start do
				begin decr(param_ptr);
else if token_type=u_template then align_state←0;

@ Sometimes \TeX\ has read too far and it wants to ``unscan'' what it has
seen. The |back_input| procedure takes care of this by putting the token
just scanned back into the input stream, ready to be read again. This
procedure can be used only if |cur_tok| represents the token to be

@p procedure back_input; {undoes one token of input}
var p:pointer; {a token list of length one}
begin p←get_avail; info(p)←cur_tok;
if cur_tok<right_brace_limit then
	if cur_tok<left_brace_limit then decr(align_state)
	else incr(align_state);

@ The |back_error| routine is used when we want to replace an offending token
just before issuing an error message. This routine, like |back_input|,
requires that |cur_tok| has been set.

@p procedure back_error; {back up one token and call |error|}
begin back_input; error;

@ The |begin_file_reading| procedure starts a new level of input for lines
of characters to be read from a file, or as an insertion from the
terminal. It does not take care of opening the file, nor does it set |loc|
or |limit| or |line|.
@^system dependencies@>

@p procedure begin_file_reading;
begin if in_open=max_in_open then overflow("text input levels",max_in_open);
if first=buf_size then overflow("buffer size",buf_size);
incr(in_open); push_input; index←in_open;
line_stack[index]←line; start←first; state←mid_line;
name←0; {|terminal_input| is now |true|}

@ Conversely, the variables must be downdated when such a level of input
is finished:

@p procedure end_file_reading;
begin first←start; line←line_stack[index];
if not terminal_input then a_close(cur_file); {forget it}
pop_input; decr(in_open);

@ In order to keep the stack from overflowing during a long sequence of
inserted `\.{\\show}' commands, the following routine removes completed
error-inserted lines from memory.

@p procedure clear_for_error_prompt;
begin while (state≠token_list)∧ terminal_input ∧@|
	(input_ptr>0)∧(loc>limit) do end_file_reading;
print_ln; clear_terminal;

@ To get \TeX's whole input mechanism going, we perform the following

@<Initialize the input routines@>=
begin input_ptr←0; max_in_stack←0;
in_open←0; max_buf_stack←0;
param_ptr←0; max_param_stack←0;
scanner_status←normal; first←0;
state←new_line; start←0; index←0; line←0; name←0;
if not init_terminal then goto final_end;
	{|init_terminal| sets |loc| and |last|}
limit←last; first←last+1;
@* \[24] Getting the next token.
The heart of \TeX's input mechanism is the |get_next| procedure, which
we shall develop in the next few paragraphs. Perhaps we shouldn't actually
call it the ``heart,'' however, because it really acts as \TeX's eyes and
mouth, reading the source files and gobbling them up. And it also helps
\TeX\ to regurgitate stored token lists that are to be processed again.
@^eyes and mouth@>

The main duty of |get_next| is to input one token and to set |cur_cmd|
and |cur_chr| to that token's command code and modifier. Furthermore, if
the input token is a control sequence, the |eqtb| location of that control
sequence is stored in |cs_ptr|; otherwise |cs_ptr| is set to zero.

Underlying this simple description is a certain amount of complexity
because of all the cases that need to be handled, as explained above.
However, the inner loop of |get_next| is reasonably short and fast.

@ The value of |par_loc| is the |eqtb| address of `\.{\\par}'. This quantity
is needed because a blank line of input is supposed to be exactly equivalent
to the appearance of \.{\\par}; we must set |cs_ptr←par_loc|
when detecting a blank line.

@!par_loc:pointer; {location of `\.{\\par}' in |eqtb|}
@!par_token:halfword; {token representing '\.{\\par}'}

@ @<Put each...@>=
primitive("par",par_end,0); par_loc←cur_val; par_token←cs_token_flag+par_loc;
@!@:par_}{\.{\\par} primitive@>

@ @<Cases of |print_cmd_chr|...@>=

@ Before getting into |get_next|, let's consider the subroutine that
is called when an `\.{\\outer}' control sequence has been scanned or
when the end of a file has been reached. These two cases are distinguished
by |cs_ptr|, which is zero at the end of a file.

@p procedure check_outer_validity;
var p:pointer; {points to inserted token list}
begin if scanner_status≠normal then
	if scanner_status>skipping then @<Tell the user what has run away@>
	else if cs_ptr=0 then
		begin print_nl("! File ended in skipped conditional text");
@.File ended in skipped...text@>
		help3("This kind of error happens when you say something like")@/
		("`\if...{...' or `\else{...' and forget the matching `}'.")@/
		("You probably should type `I}' now."); error;

@ @<Tell the user what has run away@>=
begin runaway; {print a definition, argument, or preamble}
if cs_ptr=0 then print_nl("! File ended")
@.File ended while scanning...@>
else	begin p←get_avail; info(p)←cs_token_flag+cs_ptr;
	ins_list(p); {prepare to read the control sequence again}
	cs_ptr←0; cur_cmd←spacer; cur_chr←" "; {replace it by a space}
	print_nl("! Forbidden control sequence found");
@.Forbidden control sequence...@>
print(" while scanning ");
case scanner_status of
end; {there are no other cases}
print(" of "); sprint_cs(warning_index);
help4("I suspect you have forgotten a `}', causing me")@/
("to read past where you wanted me to stop.")@/
("Either type `I}' to try recovering,")@/
("or type 'X' and fix your file.");@/
deletions_allowed←false; error; deletions_allowed←true;

@ We need to mention two procedures here that may be called by |get_next|.

@p procedure@?open_err_file; forward;@t\2@>
procedure@?firm_up_the_line; forward;

@ Now we're ready to take the plunge into |get_next| itself.

@d switch=25 {a label in |get_next|}
@d start_cs=26 {another}

@p procedure get_next; {sets |cur_cmd|, |cur_chr|, |cs_ptr| to next token}
label restart, {go here to get the next input token}
	switch, {go here to eat the next character from a file}
	reswitch, {go here to digest it again}
	start_cs, {go here to start looking for a control sequence}
	found; {go here when a control sequence has been found}
var k:0..buf_size; {an index into |buffer|}
@!t:halfword; {a token}
begin restart: cs_ptr←0;
if state≠token_list then
@<Input from external file, |goto restart| if no input found@>
else @<Input from token list, |goto restart| if end of list or
	if a parameter needs to be expanded@>;
if @<End of alignment entry sensed@> then
	@<Insert the \(v)\<v↓j> template and |goto restart|@>;

@ An alignment entry ends when a tab or \.{\\cr} occurs, provided that the
current level of braces is the same as the level that was present at the
beginning of that alignment entry; i.e., provided that |align_state| has
returned to the value it had after the \<u↓j> template for that entry.

@<End of alignment entry sensed@>=

@ @<Input from external file, |goto restart| if no input found@>=
begin switch: if loc≤limit then {current line not yet finished}
	begin cur_chr←buffer[loc]; incr(loc); 
	reswitch: cur_cmd←ch_code(cur_chr);
	@<Change state if necessary, and |goto switch| if the
		current character should be ignored,
		or |goto reswitch| if the current character
		changes to another@>;
else	begin @<Move to next line of file, 
		or |goto restart| if there is no next line@>;
	goto switch;

@ The following 48-way switch accomplishes the scanning quickly, assuming
that a decent \PASCAL\ compiler has translated the code. Note that the numeric
values for |mid_line|, |skip_blanks|, and |new_line| are spaced
|(max_char_code+1)| apart from each other, so we can add a character's
command code to the state to get a single number that characterizes both.

@d any_state_plus(#) == mid_line+#,skip_blanks+#,new_line+#

@<Change state if necessary...@>=
case state+cur_cmd of
@<Cases where character is ignored@>: goto switch;
any_state_plus(escape): @<Scan a control sequence
	and set |state←skip_blanks|@>;
any_state_plus(active_char): @<Process an active-character control sequence
	and set |state←mid_line|@>;
any_state_plus(sup_mark): @<If this |sup_mark| starts a control character
	like@@\.{\^\^A}, then |goto reswitch|, otherwise set |state←mid_line|@>;
any_state_plus(invalid_char): @<Decry the invalid character and
	|goto switch|@>;
@t\4@>@<Handle situations involving spaces, braces, changes of state@>@;
othercases do_nothing

@ @<Cases where character is ignored@>=

@ @<Decry the invalid...@>=
begin print_nl("! Text line contains an invalid character");
@.Text line contains...@>
help2("A funny symbol that I can't read has just been input.")@/
("Continue, and I'll forget that it ever happened.");
goto switch;

@ @d add_delims_to(#)==#+math_delim,#+tab_mark,#+mac_param,

@<Handle situations involving spaces, braces, changes of state@>=
mid_line+spacer:@<Enter |skip_blanks| state, emit a space@>;
mid_line+car_ret:@<Enter |new_line| state, finish line, emit a space@>;
	@<Enter |new_line| state, finish line, |goto switch|@>;
new_line+car_ret:@<Finish line, emit a \.{\\par}@>;
mid_line+left_brace: incr(align_state);
skip_blanks+left_brace,new_line+left_brace: begin
	state←mid_line; incr(align_state);
mid_line+right_brace: decr(align_state);
skip_blanks+right_brace,new_line+right_brace: begin
	state←mid_line; decr(align_state);
add_delims_to(skip_blanks),add_delims_to(new_line): state←mid_line;

@ When a character of type |spacer| gets through, its character code is
changed to $\.{"\ "}=@'40$. This means that the ascii codes for tab and space,
and for the space inserted at the end of a line, will
be treated alike when macro parameters are being matched. We do this
since such characters are indistinguishable on most computer terminal displays.

@<Enter |new_line| state, finish line, emit a space@>=
begin state←new_line; loc←limit+1; cur_cmd←spacer; cur_chr←" ";

@ The following code is performed only when |cur_cmd=spacer|.

@<Enter |skip_blanks| state, emit a space@>=
begin state←skip_blanks; cur_chr←" ";

@ @<Enter |new_line| state, finish line, |goto switch|@>=
begin state←new_line; loc←limit+1; goto switch;

@ @<Finish line, emit a \.{\\par}@>=
begin loc←limit+1; cs_ptr←par_loc; cur_cmd←eq_type(cs_ptr);
if cur_cmd≥outer_call then check_outer_validity;

@ @<If this |sup_mark| starts a control character...@>=
begin if (cur_chr=buffer[loc])∧(loc<limit) then
	if (buffer[loc+1]≤"_")∧(buffer[loc+1]≥"?") then
		begin if buffer[loc+1]="?" then cur_chr←@'177
		else cur_chr←buffer[loc+1]-"@@";
		loc←loc+2; goto reswitch;
@ @<Process an active-character...@>=
begin cs_ptr←cur_chr+active_base;
cur_cmd←eq_type(cs_ptr); cur_chr←equiv(cs_ptr); state←mid_line;
if cur_cmd≥outer_call then check_outer_validity;

@ Control sequence names are scanned only when they appear in some line of
a file; once they have been scanned the first time, their |eqtb| location
serves as a unique identification, so \TeX\ doesn't need to refer to the
original name any more except when it prints the equivalent in symbolic form.

The program that scans a control sequence has been written carefully
in order to avoid the blowups that might otherwise occur if a malicious
user tried something like `\.{\\chcode\'15=0}'. The algorithm might
look at |buffer[limit+1]|, but it never looks at |buffer[limit+2]|.

If expanded control characters like `\.{\^\^A}' appear in or just following
a control sequence name, they are converted to single characters in the
buffer and the process is repeated, slowly but surely.

@<Scan a control...@>=
begin if loc>limit then cs_ptr←single_base+null_code
else	begin start_cs: k←loc; cur_chr←buffer[k]; incr(k);
	if (ch_code(cur_chr)=letter)∧(k≤limit) then
		@<Scan ahead in the buffer until finding a nonletter;
		if an expanded control code is encountered, reduce it
		and |goto start_cs|; otherwise if a multiletter control
		sequence is found, adjust |cs_ptr| and |loc|, and
		|goto found|@>
	else @<If an expanded control code is present, reduce it
		and |goto start_cs|@>;
	cs_ptr←single_base+buffer[loc]; incr(loc);
found: cur_cmd←eq_type(cs_ptr); cur_chr←equiv(cs_ptr); state←skip_blanks;
if cur_cmd≥outer_call then check_outer_validity;

@ Whenever we reach the following piece of code, we will have
|cur_chr=buffer[k-1]| and |k≤limit+1|. If an expanded control code like
\.{\^\^A} appears in |buffer[(k-1)..(k+1)]|, we will store the corresponding
code in |buffer[k-1]| and shift the rest of the buffer left two places.

@<If an expanded...@>=
begin if (buffer[k]=cur_chr)∧(ch_code(cur_chr)=sup_mark)∧@|
		(k<limit) then
	begin cur_chr←buffer[k+1];
	if (cur_chr≤"_")∧(cur_chr≥"?") then
		begin if cur_chr>"?" then buffer[k-1]←cur_chr-"@@"
		else buffer[k-1]←@'177;
		limit←limit-2; first←first-2;
		while k≤limit do
			begin buffer[k]←buffer[k+2]; incr(k);
		goto start_cs;

@ @<Scan ahead in the buffer...@>=
begin repeat cur_chr←buffer[k]; incr(k);
until (ch_code(cur_chr)≠letter)∨(k>limit);
@<If an expanded...@>;
if ch_code(buffer[k-1])≠letter then decr(k);
	{now |k| points to first nonletter}
if k>loc+1 then {multi-letter control sequence has been scanned}
	begin cs_ptr←id_lookup(loc,k-loc); loc←k; goto found;

@ Let's consider now what happens when |get_next| is looking at a token list.

@<Input from token list, |goto restart| if end of list or
	if a parameter needs to be expanded@>=
if loc≠null then {list not exhausted}
	begin t←info(loc); loc←link(loc); {move to next}
	if t≥cs_token_flag then {a control sequence token}
		begin cs_ptr←t-cs_token_flag;
		cur_cmd←eq_type(cs_ptr); cur_chr←equiv(cs_ptr);
		if cur_cmd≥outer_call then check_outer_validity;
	else	begin cur_cmd←t div @'400; cur_chr←t mod @'400;
		case cur_cmd of
		left_brace: incr(align_state);
		right_brace: decr(align_state);
		out_param: @<Insert macro parameter and |goto restart|@>;
		othercases do_nothing
else	begin {we are done with this token list}
	end_token_list; goto restart; {resume previous level}

@ @<Insert macro parameter...@>=
begin begin_token_list(param_stack[param_start+cur_chr-1],parameter);
goto restart;

@ All of the easy branches of |get_next| have now been taken care of.
There is one more branch.

@<Move to next line of file, or |goto restart| if there is no next line@>=
if not terminal_input then @<Read next line of file into |buffer|, or
	|goto restart| if the file has ended@>
else	begin if input_ptr>0 then {text was inserted during error recovery}
		begin end_file_reading; goto restart; {resume previous level}
	if selector<err_only then open_err_file;
	if interaction>nonstop_mode then
		begin if limit=start then {previous line was empty}
			print_nl("(Type a command or say `\end')");
		print_ln; first←start;
		prompt_input("*"); {input on-line into |buffer|}
		limit←last; buffer[limit]←carriage_return; first←last+1;
	else fatal_error("*** (job aborted, no legal \end found)");
@.job aborted@>
		{nonstop mode, which is intended for overnight batch processing,
		never waits for on-line input}

@ @<Read next line of file into |buffer|, or
	|goto restart| if the file has ended@>=
begin incr(line); first←start;
if input_ln(cur_file) then {not end of file}
	firm_up_the_line {this sets |limit|}
else if limit≠start then limit←start {insert empty line at end of file}
else	begin check_outer_validity;
	print_char(")"); update_terminal; {show user that file has been read}
	end_file_reading; goto restart; {resume previous level}
buffer[limit]←carriage_return; first←limit+1; loc←start; {ready to read}

@ If the user has set the |pause| parameter to some nonzero value,
and if nonstop mode has not been selected, each line of input is displayed
on the terminal and the transcript file, followed by `\.{=>}'.
\TeX\ waits for a response. If the response is simply |carriage_return|, the
line is accepted as it stands, otherwise the line typed is
used instead of the line in the file.

@p procedure firm_up_the_line;
var k:0..buf_size; {an index into |buffer|}
begin limit←last;
if (pause≠0)∧(interaction>nonstop_mode) then
	begin print_ln;
	if start<limit then for k←start to limit-1 do print(buffer[k]); 
	first←limit; prompt_input("=>"); {wait for user response}
	if last>first then
		begin for k←first to last-1 do {move line down in buffer}

@ Since |get_next| is used so frequently in \TeX, it is convenient
to define three related procedures that do a little more:

\yskip\hang|get_token| not only sets |cur_cmd| and |cur_chr|, it
also sets |cur_tok|, a packed halfword version of the current token.

\yskip\hang|get_nc_token|, meaning ``get the non-|call| token,'' is like
|get_token|, but if the current token turns out to be a user-defined
control sequence (i.e., a macro call), or something like \.{\\topmark},
it is eliminated from the input by beginning the expansion of the macro.

\yskip\hang|nc_token| is like |get_nc_token| except that it assumes that
|get_next| has already been called.

In fact, these three procedures account for {\sl all\/} uses of |get_next|,
except for one place in the ``inner loop'' when |cur_tok| need not be set.

@ The |get_token| routine also has a special test built into it to make
sure that the token found is not `|endv|', i.e., the end of a template,
since this would be a bad case of misalignment. We wouldn't want an |endv|
to be deleted or to infiltrate another token list, and |get_token| is used
only in the |error| routine or when building token lists, or in a few
other very special cases.  The |endv| problem
hardly ever arises in a form where recovery is feasible, so no recovery
mechanism has been provided.

No new control sequences will be defined except during a call of
|get_token|, because |no_new_control_sequence| is always |true| at
other times.

@p procedure get_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|}
begin no_new_control_sequence←false; get_next; no_new_control_sequence←true;
if cs_ptr=0 then
	begin if cur_cmd=endv then
		fatal_error("(alignment is fouled up somehow)");
@.alignment is fouled up@>
else cur_tok←cs_token_flag+cs_ptr;

@ The only command codes |>max_command| that can be returned by the
|get_next| routine are |undefined_cs|, |top_bot_mark|, |call|, |long_call|,
|outer_call|, and |long_outer_call|, in increasing numerical order.

@p@t\4@>@<Declare the procedure called |macro_call|@>
procedure get_nc_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|,
	and expands macros}
label done;
begin loop begin get_next;
	if cur_cmd≤max_command then goto done;
	if cur_cmd≥call then macro_call
	else if cur_cmd=top_bot_mark then
		@<Insert the \(a)appropriate mark text into the scanner@>
	else @<Complain about an undefined macro@>;
done: if cs_ptr=0 then cur_tok←(cur_cmd*@'400)+cur_chr
else cur_tok←cs_token_flag+cs_ptr;

@ @<Complain about an undefined macro@>=
begin print_nl("! Undefined control sequence");
@.Undefined control sequence@>
help5("The control sequence at the end of the top line")@/
("of your error message was never \def'ed. If you have")@/
("misspelled it (e.g., `\hobx'), type `I' and the correct")@/
("spelling (e.g., `I\hbox'). Otherwise just continue,")@/
("and I'll forget about whatever was undefined.");

@ The |get_nc_token| procedure is equivalent to two consecutive
procedure calls: |get_next; nc_token|.

@p procedure nc_token; {|get_nc_token| without the initial |get_next|}
label done;
begin loop begin if cur_cmd≤max_command then goto done;
	if cur_cmd≥call then macro_call
	else if cur_cmd=top_bot_mark then
		@<Insert the \(a)appropriate mark text into the scanner@>
	else @<Complain about an undefined macro@>;
done: if cs_ptr=0 then cur_tok←(cur_cmd*@'400)+cur_chr
else cur_tok←cs_token_flag+cs_ptr;
@* \[25] Expanding user macros.
A control sequence that has been \.{\\def}'ed by the user is expanded by
\TeX's |macro_call| procedure.

Before we get into the details of |macro_call|, however, let's consider the
treatment of primitives like \.{\\topmark}, since they are essentially
macros without parameters. The token lists for such marks are kept in a
global array of five pointers; we refer to the individual entries of this
array by symbolic names |top_mark|, etc. The value of |top_mark| is either
|null| or a pointer to the reference count of a token list.

@d top_mark_code=0 {the mark in effect at the previous page break}
@d first_mark_code=1 {the first mark between |top_mark| and |bot_mark|}
@d bot_mark_code=2 {the mark in effect at the current page break}
@d split_first_mark_code=3 {the first mark found by \.{\\vsplit}}
@d split_bot_mark_code=4 {the last mark found by \.{\\vsplit}}
@d top_mark==cur_mark[top_mark_code]
@d first_mark==cur_mark[first_mark_code]
@d bot_mark==cur_mark[bot_mark_code]
@d split_first_mark==cur_mark[split_first_mark_code]
@d split_bot_mark==cur_mark[split_bot_mark_code]

@!cur_mark:array[top_mark_code..split_bot_mark_code] of pointer;
	{token lists for marks}

@ @<Set init...@>=
top_mark←null; first_mark←null; bot_mark←null;
split_first_mark←null; split_bot_mark←null;

@ @<Put each...@>=
@!@:top_mark_}{\.{\\topmark} primitive@>
@!@:first_mark_}{\.{\\firstmark} primitive@>
@!@:bot_mark_}{\.{\\botmark} primitive@>
@!@:split_first_mark_}{\.{\\splitfirstmark} primitive@>
@!@:split_bot_mark_}{\.{\\splitbotmark} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
top_bot_mark: case chr_code of
	first_mark_code: print_esc("firstmark");
	bot_mark_code: print_esc("botmark");
	split_first_mark_code: print_esc("splitfirstmark");
	split_bot_mark_code: print_esc("splitbotmark");
	othercases print_esc("topmark")

@ The following code is activated when |cur_cmd=top_bot_mark| and
when |cur_chr| is a code like |top_mark_code|.

@<Insert the \(a)appropriate mark text into the scanner@>=
begin if cur_mark[cur_chr]≠null then

@ Now let's consider |macro_call| itself, which is invoked when |cur_cmd|
is either |call|, |long_call|, |outer_call|, or |long_outer_call|.  The
control sequence definition appears in the token list whose reference count
is in location |cur_chr| of |mem|.

The parameters, if any, must be scanned before the macro is expanded.
Pa\-ram\-eters are token lists without reference counts. They are placed on
an auxiliary stack called |pstack| while they are being scanned, since
the |param_stack| may be losing entries during the matching process.
(Note that |param_stack| can't be gaining entries, since |macro_call| is
the only routine that puts anything onto |param_stack|, and it
is not recursive.)

@!pstack:array[0..8] of pointer; {arguments supplied to a macro}

@ After parameter scanning is complete, the parameters are moved to the
|param_stack|. Then the macro body is fed to the scanner; in other words,
|macro_call| places the defined text of the control sequence at the
top of the \TeX's input stack, so that |get_next| will proceed to read it

The global variable |cs_ptr| contains the |eqtb| address of the control sequence
being expanded, when |macro_call| begins. If this control sequence has not been
declared \.{\\long}, i.e., if its command code in the |eq_type| field is
not |long_call| or |long_outer_call|, its parameters are not allowed to contain
the control sequence \.{\\par}. If an illegal \.{\\par} appears, the macro
call is aborted, and the \.{\\par} will be rescanned.

@<Declare the procedure called |macro_call|@>=
procedure macro_call; {invokes a user-defined control sequence}
label exit, continue, done, done1, found;
var r:pointer; {current node in the macro's token list}
@!p:pointer; {current node in parameter token list being built}
@!q:pointer; {new node being put into the token list}
@!s:pointer; {backup pointer for parameter matching}
@!t:pointer; {cycle pointer for backup recovery}
@!u,@!v:pointer; {auxiliary pointers for backup recovery}
@!rbrace_ptr:pointer; {one step before the last |right_brace| token}
@!n:small_number; {the number of parameters scanned}
@!unbalance:halfword; {unmatched left braces in current parameter}
@!m:halfword; {the number of tokens or groups (usually)}
@!ref_count:pointer; {start of the token list}
@!non_long:boolean; {is \.{\\par} forbidden in parameters?}
@!save_scanner_status:small_number; {|scanner_status| upon entry}
@!save_warning_index:pointer; {|warning_index| upon entry}
begin save_scanner_status←scanner_status; save_warning_index←warning_index;
warning_index←cs_ptr; ref_count←cur_chr; r←link(ref_count); n←0;
if tracing_macros≠0 then @<Show the text of the macro being expanded@>;
if info(r)≠end_match_token then
	@<Scan the parameters and make |link(r)| point to the macro body; but
		|return| if an illegal \.{\\par} is detected@>;
@<Feed the macro body and its parameters to the scanner@>;
exit:scanner_status←save_scanner_status; warning_index←save_warning_index;

@ Before we put a new token list on the input stack, it is wise to clean off
all token lists that have recently been depleted. Then a user macro that ends
with a call to itself will not require unbounded stack space.

@<Feed the macro body and its parameters to the scanner@>=
while (state=token_list)∧(loc=null) do end_token_list; {conserve stack space}
begin_token_list(ref_count,macro); name←warning_index; loc←link(r);
if n>0 then
	begin if param_ptr+n>max_param_stack then
		begin max_param_stack←param_ptr+n;
		if max_param_stack>param_size then
			overflow("parameter stack size",param_size);
	for m←0 to n-1 do param_stack[param_ptr+m]←pstack[m];

@ The |macro_call| procedure and some other routines that construct token lists
find it convenient to use the following macro, which is valid only if
the variables |p| and |q| are reserved for token-list building.

@d store_new_token(#)==begin q←get_avail; link(p)←q; info(q)←#;
	p←q; {|link(p)| is |null|}

@ At this point, the reader will find it advisable to review the explanation
of token list format that was presented earlier, since many aspects of that
format are of importance chiefly in the |macro_call| routine.

The token list might begin with a string of compulsory tokens before the
first |match| or |end_match|. In that case the macro name is supposed to be
followed by those tokens; the following program will set |s=null| to
represent this restriction. Otherwise |s| will be set to the first token of
a string that will delimit the next parameter.

@<Scan the parameters and make |link(r)| point to the macro body...@>=
begin scanner_status←matching; unbalance←0;
if (eq_type(cs_ptr)=call)∨(eq_type(cs_ptr)=outer_call) then non_long←true
else non_long←false;
repeat if (info(r)>match_token+255)∨(info(r)<match_token) then s←null
else	begin s←link(r); r←s; p←temp_head; link(p)←null; m←0;
@<Scan a parameter until its delimiter string has been found; or, if |s=null|,
	simply scan the delimiter string@>;@/
{now |info(r)| is a token whose command code is either |match| or |end_match|}
until info(r)=end_match_token;

@ If |info(r)| is a |match| or |end_match| command, it cannot be equal to
any token found by |get_token|. Therefore an undelimiter parameter---i.e.,
a |match| that is immediately followed by |match| or |end_match|---will
always fail the test `|cur_tok=info(r)|' in the following algorithm.

@<Scan a parameter until its delimiter string has been found; or, ...@>=
continue: get_token; {set |cur_tok| to the next token of input}
if cur_tok=info(r) then
	@<Advance \(r)|r|; |goto found| if the parameter delimiter has been
		fully matched, otherwise |goto continue|@>;
@<Contribute the recently matched tokens to the current parameter, and
	|goto continue| if a partial match is still in effect;
	but abort if |s=null|@>;
if (cur_tok=par_token)∧non_long then @<Report a runaway argument and abort@>;
if cur_tok<right_brace_limit then
	if cur_tok<left_brace_limit then
		@<Contribute an entire group to the current parameter@>
	else @<Report an extra right brace and |goto continue|@>
else store_new_token(cur_tok);
if (info(r)<match_token)∨(info(r)>end_match_token) then goto continue;
found: if s≠null then @<Tidy up the parameter just scanned, and tuck it away@>

@ A slightly subtle point arises here: When the parameter delimiter ends
with `\.{\#\{}', the token list will have a left brace both before and
after the |end_match|\kern-.4pt. Only one of these should affect the
|align_state|, but both will be scanned, so we must make a correction.

@<Advance \(r)|r|; |goto found| if the parameter delimiter has been fully...@>=
begin r←link(r);
if (info(r)≥match_token)∧(info(r)≤end_match_token) then
	begin if cur_tok<left_brace_limit then decr(align_state);
	goto found;
else goto continue;

@ @<Report an extra right brace and |goto continue|@>=
begin print_nl("Argument of "); sprint_cs(warning_index);
print(" has an extra }");
help5("I've deleted a `}' that doesn't seem to match anything.")@/
	("For example, `\def\a#1{...}' and `\a}' would produce this")@/
	("error. If your `}' was spurious, just proceed. Otherwise,")@/
	("type `I\par}' (including the `}') and I'll tell you about")@/
	("a runaway argument that might be the root of the problem.");
incr(align_state); non_long←true; error;

@ @<Report a runaway argument and abort@>=
begin runaway; print_nl("! Paragraph ended before "); sprint_cs(warning_index);
@.Paragraph ended before...@>
print(" was complete");
help3("I suspect you've forgotten a `}', causing me to apply this")@/
	("control sequence to too much text. How can we recover?")@/
	("My plan is to forget the whole thing and hope for the best.");
pstack[n]←link(temp_head); align_state←align_state-unbalance;
for m←0 to n do flush_list(pstack[m]);
back_error; return;

@ When the following code becomes active, we have matched tokens from |s| to
the predecessor of |r|, and we have found that |cur_tok≠info(r)|. An
interesting situation now presents itself: If the parameter is to be
delimited by a string such as `\.{ab}', and if we have scanned `\.{aa}',
we want to contribute one `\.a' to the current parameter and resume
looking for a `\.b'. The program must account for such partial matches and
for others that can be quite complex.  But most of the time we have |s=r|
and nothing needs to be done.

Incidentally, it is possible for \.{\\par} tokens to sneak in to certain
parameters of non-\.{\\long} macros. For example, consider a case like
`\.{\\def\\a\#1\\par!\{...\}}' where the first \.{\\par} is not followed
by an exclamation point. In such situations it does not seem appropriate
to prohibit the \.{\\par}, so \TeX\ keeps quiet about this bending of
the rules.

@<Contribute the recently matched tokens to the current parameter...@>=
if s≠r then
	if s=null then @<Report an improper use of the macro and abort@>
	else	begin t←s;
		repeat store_new_token(info(t)); incr(m); u←link(t); v←s;
		loop@+	begin if u=r then
				if cur_tok≠info(v) then goto done
				else	begin r←link(v); goto continue;
			if info(u)≠info(v) then goto done;
			u←link(u); v←link(v);
		done: t←link(t);
		until t=r;

@ @<Report an improper use...@>=
begin print_nl("! Use of "); sprint_cs(warning_index);
@.Use of x doesn't match...@>
print(" doesn't match its definition");
help4("If you say, e.g., `\def\a1{...}', then you must always")@/
	("put `1' after `\a', since control sequence names are")@/
	("made up of letters only. The macro here has not been")@/
	("followed by the required stuff, so I'm ignoring it.");
error; return;

@ @<Contribute an entire group to the current parameter@>=
begin unbalance←1;
loop@+	begin store_new_token(cur_tok); get_token;
	if (cur_tok=par_token)∧ non_long then
		@<Report a runaway argument and abort@>;
	if cur_tok<right_brace_limit then
		if cur_tok<left_brace_limit then incr(unbalance)
		else	begin decr(unbalance);
			if unbalance=0 then goto done1;
done1: rbrace_ptr←p; store_new_token(cur_tok);

@ If the parameter consists of a single group enclosed in braces, we must
strip off the enclosing braces. That's why |rbrace_ptr| was introduced.

@<Tidy up the parameter just scanned, and tuck it away@>=
begin if (m=1)∧(info(p)<right_brace_limit)∧(p≠temp_head) then
	begin link(rbrace_ptr)←null; free_avail(p);
	p←link(temp_head); pstack[n]←link(p); free_avail(p);
else pstack[n]←link(temp_head);
if tracing_macros≠0 then
	begin begin_diagnostic; print_nl("#"); print_int(n);
	print("<-"); show_token_list(pstack[n-1],null,1000);

@ @<Show the text of the macro being expanded@>=
begin begin_diagnostic; print_ln; print_cs(warning_index);
token_show(ref_count); end_diagnostic;
@* \[26] Basic scanning subroutines.
Let's turn now to some procedures that \TeX\ calls upon frequently to digest
certain kinds of patterns in the input. Most of these are quite simple;
some are quite elaborate.

@ The |scan_left_brace| routine is called when a left brace is supposed to be
the next non-blank token. (The term ``left brace'' means, more precisely,
a character whose chcode is |left_brace|.)

@p procedure scan_left_brace; {reads a mandatory |left_brace|}
begin @<Get the next non-blank non-call token@>;
if cur_cmd≠left_brace then
	begin print_nl("! Missing { inserted");
@.Missing {\{} inserted@>
	help4("A left brace was mandatory here, so I've put one in.")@/
		("You might want to delete and/or insert some corrections")@/
		("so that I will find a matching right brace soon.")@/
		("If you're confused by all this, try typing `I}' now.");
	back_error; cur_tok←left_brace_token+"{"; cur_cmd←left_brace;
	cur_chr←"{"; incr(align_state);

@ @<Get the next non-blank non-call...@>=
repeat get_nc_token;
until cur_cmd≠spacer

@ The |scan_optional_equals| routine looks for an optional `\.=' sign preceded
by optional spaces.

@p procedure scan_optional_equals;
begin  @<Get the next non-blank non-call...@>;
if cur_tok≠other_token+"=" then back_input;

@ Here is a procedure that ignores text until coming to a right brace at
level zero, assuming that we start at a given level |l| of nested braces.
The closing right brace is followed by an optional space.

@p procedure pass_block(@!l:integer);
label done;
begin scanner_status←skipping;
loop@+	begin get_token;
	if cur_cmd=right_brace then
		begin l←l-1;
		if l≤0 then goto done;
	else if cur_cmd=left_brace then l←l+1;
done: if l<0 then
	begin incr(align_state); print_nl("! Missing { inserted");
@.Missing {\{} inserted@>
	help2("There should have been a `{' before the `}' that")@/
		("I just looked at. So I put one in."); error;
scanner_status←normal; @<Scan an optional space@>;

@ @<Scan an optional space@>=
begin get_nc_token; if cur_cmd≠spacer then back_input;

@ In case you are getting bored, here is a slightly less trivial routine:
Given a string of lower case letters, like `\.{pt}' or `\.{plus}' or
`\.{after}', the |scan_keyword| routine checks to see whether the next
tokens of input match this string. The match must be exact, except that
upper case letters will match their lower case counterparts; upper case
equivalents are determined by subtracting |"a"-"A"|, rather than using the
|uc_code| table, since \TeX\ uses this routine only for its own limited
set of keywords.

If a match is found, the characters are effectively removed from the input
and |true| is returned. Otherwise |false| is returned, and the input
is left essentially unchanged (except for the fact that some macros
may have been expanded, etc.).

@p function scan_keyword(@!s:str_number):boolean; {look for a given string}
label exit;
var p:pointer; {tail of the backup list}
@!q:pointer; {new node being added to the token list via |store_new_token|}
@!k:pool_pointer; {index into |str_pool|}
begin p←scan_head; link(p)←null; k←str_start[s];
while k<str_start[s+1] do
	begin get_nc_token;
	if (cur_cmd=letter)∧@|
	 ((cur_chr=str_pool[k])∨(cur_chr=str_pool[k]-"a"+"A")) then
		begin store_new_token(cur_tok); incr(k)
	else	begin back_input;
		if p≠scan_head then ins_list(link(scan_head));
		scan_keyword←false; return;
flush_list(link(scan_head)); scan_keyword←true;

@ The next routine `|scan_the|' is used to handle the `\.{\\the}' in
constructions like `\.{\\the\\month}' and `\.{\\the\\hsize}' and
`\.{\\the\\baselineskip}'. Soon we will be considering the |scan_int|
procedure, which calls |scan_the|; on the other hand, |scan_the| also
calls |scan_int|, for constructions like `\.{\\the\\chcode\`\\\$}' or
`\.{\\the\\texinfo 20 3}'. So we have to declare |scan_int| as a
|forward| procedure.

@p procedure@?scan_int; forward; {scans an integer value}
@t\4\4@>@<Declare procedures that scan restricted classes of integers@>

@ A single word `\.{\\the}' is used for integers, dimensions, and glue
speci\-fi\-ca\-tions, so \TeX\ doesn't know exactly what to expect when
|scan_the| begins. For example, any of the three types could occur
immediately after `\.{\\hskip\\the}'; and one can even use \.{\\the} with
respect to token lists in constructions like
`\.{\\xdef\\o\{\\the\\output\}}'.  On the other hand, only integers are
allowed after a construction like `\.{\\count\\the}'. To handle the
various possibilities, |scan_the| has a |level| parameter, which tells the
``highest'' kind of quantity that |scan_the| is allowed to produce. Five
levels are distinguished, namely |int_val|, |dimen_val|, |glue_val|,
|mu_val|, and |tok_val|.

The output of |scan_the| (and of the other routines |scan_int|, |scan_dimen|,
and |scan_glue| below) is put into the global variable |cur_val|, and its
level is put into |cur_val_level|. The highest values of |cur_val_level| are
special: |mu_val| is used only when |cur_val| points to one of the three
parameters \.{\\thinmskip}, \.{\\midmskip}, \.{\\thickmskip}, and |tok_val|
is used only in cases that `\.{\\the\\output}' and `\.{\\the\\everypar}'
are legitimate.

If the output is glue, |cur_val| will point to a glue specification, and
the reference count of that glue will have been updated to reflect this
reference; if the output is a token list, |cur_val| will point to its
reference count, but in this case the count will not have been updated.
Otherwise |cur_val| will contain the integer or scaled value in question.

@d int_val=0 {integer values}
@d dimen_val=1 {dimension values}
@d glue_val=2 {glue specifications}
@d mu_val=3 {math glue specifications}
@d tok_val=4 {token lists}

@!cur_val:integer; {value returned by numeric scanners}
@!cur_val_level:int_val..tok_val; {the ``level'' of this value}

@ The hash table is initialized with `\.{\\count}', `\.{\\dimen}', and
`\.{\\skip}' all having |register| as their command code; they are distinguished
by the |chr_code|, which is either |int_val|, |dimen_val|, or |glue_val|.
The `\.{\\hangindent}' primitive is constructed so that |scan_the| can
treat it exactly like a dimension parameter. The two primitives whose
command code is `|the|' are different solely because the |equiv| is
zero for `\.{\\the}' and nonzero for `\.{\\minus}'.

@<Put each...@>=
@!@:count_}{\.{\\count} primitive@>
@!@:dimen_}{\.{\\dimen} primitive@>
@!@:skip_}{\.{\\skip} primitive@>
@!@:hangindent_}{\.{\\hangindent} primitive@>
@!@:the_}{\.{\\the} primitive@>
@!@:minus_}{\.{\\minus} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
register: if chr_code=int_val then print_esc("count")
	else if chr_code=dimen_val then print_esc("dimen")
	else print_esc("skip");
the: if chr_code=0 then print_esc("the")@+else print_esc("minus");

@ OK, we're ready for |scan_the| itself. A second parameter, |negative|,
is set |true| if we want the value to be negated (e.g., if `\.{\\minus}'
has appeared instead of `\.{\\the}').

@d scanned_result_end(#)==cur_val_level←#; end
@d scanned_result(#)==@+begin cur_val←#;scanned_result_end

@p procedure scan_the(@!level:small_number;@!negative:boolean);
	{fetch an internal parameter}
label restart;
var m:halfword; {|chr_code| part of the operand token}
begin restart: get_nc_token; m←cur_chr;
case cur_cmd of
def_code: @<Fetch a character code from some table@>;
assign_toks: @<Fetch a token list, provided that |level=tok_val|@>;
assign_int: scanned_result(int_par(m))(int_val);
assign_dimen,hang_indent: scanned_result(dimen_par(m))(dimen_val);
assign_glue: if m≥thin_mskip then scanned_result(glue_par(m))(mu_val)
	else scanned_result(glue_par(m))(glue_val);
set_aux: @<Fetch the |space_factor| or the |prev_depth|@>;
set_family: scanned_result(cur_fam)(int_val);
assign_tex_info: @<Fetch a font parameter@>;
register: @<Fetch a register@>;
last_skip: @<Fetch the glue in the current node, if any@>;
set_font: scanned_result(font_code[cur_font])(int_val);
def_family: @<Fetch a math font code@>;
the: begin if m>0 then negative←¬ negative;
	goto restart;
othercases @<Complain that \.{\\the} can't do this; give zero result@>
while cur_val_level>level do @<Convert \(c)|cur_val| to a lower level@>;
@<Fix the reference count, if any, and negate |cur_val| if |negative|@>;

@ @<Fetch a character code from some table@>=
begin scan_seven_bit_int;
if m<del_code_base then scanned_result(equiv(m+cur_val))(int_val)
else scanned_result(eqtb[m+cur_val].int)(int_val);

@ @<Fetch a token list, provided that |level=tok_val|@>=
if (level≠tok_val)∨ negative then
	begin print_nl("! Improper use of \the");
@.Improper use of \\the@>
	help1("I'm forgetting what you said and using zero for this \the.");
	back_error; scanned_result(0)(dimen_val);
else scanned_result(equiv(m))(tok_val)

@ A user is allowed to refer to `\.{\\the\\spacefactor}' only in horizontal
mode, and to `\.{\\the\\prevdepth}' only in vertical mode. 

@<Put each...@>=
@!@:space_factor_}{\.{\\spacefactor} primitive@>
@!@:prev_depth_}{\.{\\prevdepth} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
set_aux: if chr_code=vmode then print_esc("prevdepth")
else print_esc("spacefactor");

@ @<Fetch the |space_factor| or the |prev_depth|@>=
if abs(mode)≠m then
	begin print_nl("! Improper use of \the");
@.Improper use of \\the@>
	help4("You can say \the\spacefactor only in horizontal mode,")@/
		("and \the\prevdepth only in vertical mode; and")@/
		("neither of these is meaningful inside \send. So")@/
		("I'm forgetting what you said and using zero for this \the.");
	back_error; scanned_result(0)(dimen_val);
else	begin cur_val←aux;
	if m=vmode then cur_val_level←dimen_val@+else cur_val_level←int_val;

@ Here is where \.{\\lastskip} is implemented. The reference count will be
updated later.
@:last_skip_}{\.{\\lastskip} primitive@>

@<Fetch the glue in the current node, if any@>=
begin cur_val←zero_glue;
if not is_char_node(tail)∧(mode≠0) then
	begin if type(tail)=glue_node then cur_val←glue_ptr(tail);
else if (mode=vmode)∧(tail=head)∧(last_page_glue≠max_halfword) then

@ @<Fetch a font parameter@>=
begin scan_tex_info(false); font_info[fmem_ptr].sc←0;

@ @<Fetch a register@>=
begin scan_eight_bit_int;
case m of
end; {there are no other cases}

@ @<Fetch a math font code@>=
begin scan_four_bit_int; scanned_result(font_code[equiv(m+cur_val)])(int_val);

@ @<Complain that \.{\\the} can't do this; give zero result@>=
begin print_nl("! You can't use "); print_cmd_chr(cur_cmd,cur_chr);
@.You can't use x after \\the@>
print(" after \the");
help1("I'm forgetting what you said and using zero for this \the.");
back_error; scanned_result(0)(dimen_val);

@ When a |glue_val| changes to a |dimen_val|, we use the width component
of the glue; there is no need to decrease the reference count, since it
has not yet been increased.  When a |dimen_val| changes to an |int_val|,
we use scaled points so that the value doesn't actually change. And when a
|mu_val| changes to a |glue_val|, the value doesn't change either.

@<Convert \(c)|cur_val| to a lower level@>=
begin if cur_val_level=glue_val then cur_val←width(cur_val);

@ If |cur_val| points to a glue specification at this point, the reference
count for the glue does not yet include the reference by |cur_val|.

@<Fix the reference count, if any, ...@>=
if negative then
	if cur_val_level≥glue_val then
		begin cur_val←new_spec(cur_val);
		@<Negate all three glue components of |cur_val|@>;
	else cur_val←-cur_val
else if (cur_val_level≥glue_val)∧(cur_val_level≠tok_val) then

@ Our next goal is to write the |scan_int| procedure, which scans anything that
\TeX\ treats as an integer. But first we might as well look at some simple
applications of |scan_int| that have already been made inside of |scan_the|:

@<Declare procedures that scan restricted classes of integers@>=
procedure scan_seven_bit_int;
begin scan_int;
if (cur_val<0)∨(cur_val>127) then
	begin print_nl("! Bad character code");
@.Bad character code@>
	help2("The numeric code for a character must be between 0 and 127.")@/
		("I changed this one to zero."); int_error(cur_val); cur_val←0;

@ @<Declare procedures that scan restricted classes of integers@>=
procedure scan_eight_bit_int;
begin scan_int;
if (cur_val<0)∨(cur_val>255) then
	begin print_nl("! Bad register code");
@.Bad register code@>
	help2("Boxes, counts, dimens, and skips must be between 0 and 255.")@/
		("I changed this one to zero."); int_error(cur_val); cur_val←0;

@ @<Declare procedures that scan restricted classes of integers@>=
procedure scan_four_bit_int;
begin scan_int;
if (cur_val<0)∨(cur_val>15) then
	begin print_nl("! Bad number");
@.Bad number@>
	help2("Since I expected to read a number between 0 and 15,")@/
		("I changed this one to zero."); int_error(cur_val); cur_val←0;

@ While we're at it, we might as well deal with similar routines that
will be needed later.

@<Declare procedures that scan restricted classes of integers@>=
procedure scan_char_num;
begin scan_int;
if (cur_val<0)∨(cur_val>255) then
	begin print_nl("! Bad \char code");
@.Bad \\char code@>
	help2("The numeric code following \char must be between 0 and 255.")@/
		("I changed this one to zero."); int_error(cur_val); cur_val←0;

@ @<Declare procedures that scan restricted classes of integers@>=
procedure scan_fifteen_bit_int;
begin scan_int;
if (cur_val<0)∨(cur_val>@'77777) then
	begin print_nl("! Bad \mathchar code");
@.Bad \\mathchar code@>
	help2("A numeric \mathchar code must be between 0 and 32767.")@/
		("I changed this one to zero."); int_error(cur_val); cur_val←0;

@ @<Declare procedures that scan restricted classes of integers@>=
procedure scan_twenty_seven_bit_int;
begin scan_int;
if (cur_val<0)∨(cur_val>@'777777777) then
	begin print_nl("! Bad delimiter code");
@.Bad delimiter code@>
	help2("A numeric delimiter code must be between 0 and 2^{27}-1.")@/
		("I changed this one to zero."); int_error(cur_val); cur_val←0;

@ An integer number can be preceded by any number of spaces and `\.+' or
`\.-' signs. Then comes either a decimal constant (i.e., radix 10), an
octal constant (i.e., radix 8, preceded by@@\.\'), a hexadecimal constant
(radix 16, preceded by@@\."), an alphabetic constant (preceded by@@\.\`), or
an internal variable obtained by prefixing \.{\\the}.  (The `\.{\\the}'
can be omitted before `\.{\\count}'.) After scanning is complete,
|cur_val| will contain the answer, which must be at most
$2^{31}-1=2147483647$ in absolute value. The value of |radix| is set to
10, 8, or 16 in the cases of decimal, octal, or hexadecimal constants,
otherwise |radix| is set to zero. An optional space follows the number.

@d plus_token=other_token+"+" {plus sign}
@d minus_token=other_token+"-" {minus sign}
@d zero_token=other_token+"0" {zero, the smallest digit}
@d A_token=letter_token+"A" {the smallest special hex digit}
@d octal_token=other_token+"'" {apostrophe, indicates an octal constant}
@d hex_token=other_token+"""" {double quote, indicates a hex constant}
@d alpha_token=other_token+"`" {reverse apostrophe, precedes alpha constants}
@d point_token=other_token+"." {decimal point}

@!radix:small_number; {|scan_int| sets this to 8, 10, 16, or zero}

@ The |scan_int| routine is used also to scan the integer part of a
fraction; for example, the `\.3' in `\.{3.14159}' will be found by
|scan_int|. The |scan_dimen| routine assumes that |cur_tok=point_token|
after the integer part of such a fraction has been scanned by |scan_int|,
and that the decimal point has been backed up to be scanned again.

@p procedure scan_int; {sets |cur_val| to an integer}
label done;
var negative:boolean; {should the answer be negated?}
@!m:integer; {|@t$2^{31}$@> div radix|, the threshold of danger}
@!d:small_number; {the digit just scanned}
@!vacuous:boolean; {have no digits appeared?}
@!OK_so_far:boolean; {has an error message been issued?}
begin radix←0; OK_so_far←true;@/
@<Get the next non-blank non-sign token; set |negative| appropriately@>;
if cur_tok=alpha_token then @<Scan an alphabetic character code into |cur_val|@>
else if (cur_cmd=the)∨(cur_cmd=register) then @<Fetch an internal integer@>
else @<Scan a numeric constant@>;
if negative then cur_val←-cur_val;

@ @<Get the next non-blank non-sign token...@>=
repeat @<Get the next non-blank non-call token@>;
if cur_tok=minus_token then
	begin negative ← ¬ negative; cur_tok←plus_token;
until cur_tok≠plus_token

@ A space is ignored after an alphabetic character constant, so that
such constants behave like numeric ones.

@<Scan an alphabetic character code into |cur_val|@>=
begin get_token; {suppress macro expansion}
if cur_tok<cs_token_flag then cur_val←cur_chr
else if cur_tok<cs_token_flag+active_base then
else cur_val←cur_tok-cs_token_flag-active_base;
if cur_val>127 then
	begin print_nl("! Improper alphabetic constant");
@.Improper alphabetic constant@>
	help2("A one-character control sequence belongs after a ` mark.")@/
		("So I'm essentially inserting \0 here.");
	cur_val←"0"; back_error;
else @<Scan an optional space@>;

@ @<Fetch an internal integer@>=
begin if cur_cmd=register then back_input {implied `\.{\\the}'}
else if cur_chr≠0 then negative ← ¬ negative; {handle `\.{\\minus}'}

@ @<Scan a numeric constant@>=
begin radix←10; m←214748364;
if cur_tok=octal_token then
	begin radix←8; m←@'2000000000; get_nc_token;
else if cur_tok=hex_token then
	begin radix←16; m←@'1000000000; get_nc_token;
vacuous←true; cur_val←0;@/
@<Accumulate the constant until |cur_tok| is not a suitable digit@>;
if vacuous then @<Express astonishment that no number was here@>;
if cur_cmd≠spacer then back_input;

@ @d infinity==@'17777777777 {the largest positive value that \TeX\ knows}

@<Accumulate the constant...@>=
loop@+	begin if (cur_tok<zero_token+radix)∧(cur_tok≥zero_token)∧
		(cur_tok≤zero_token+9) then d←cur_tok-zero_token
	else if (radix=16)∧(cur_tok≤A_token+5)∧(cur_tok≥A_token) then
	else goto done;
	if (cur_val≥m)∧((cur_val>m)∨(d>7)∨(radix≠10)) then
		begin if OK_so_far then
			begin print_nl("! Number too big");
@.Number too big@>
			help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")@/
				("so I'm using that number instead of yours.");
			error; cur_val←infinity; OK_so_far←false;
	else cur_val←cur_val*radix+d;

@ @<Express astonishment...@>=
begin print_nl("! Missing number");
@.Missing number@>
help3("A number should have been here; I inserted `0'.")@/
	("(If you can't figure out why I needed to see a number,")@/
	("look up `weird error' in the TeX manual index.)");

@ The |scan_dimen| routine is similar to |scan_int|, but it sets |cur_val| to
a |scaled| value, i.e., an integral number of sp. One of its main tasks
is therefore to interpret the abbreviations for various kinds of units and
to convert measurements to scaled points.

There are three parameters: |mu| is |true| if the finite units must be
`\.{mu}', while |mu| is |false| if `\.{mu}' units are disallowed;
|inf| is |true| if the infinite units `\.{fil}', `\.{fill}', `\.{filll}'
are permitted; and |shortcut| is |true| if |cur_val| already contains
an integer and only the units need to be considered.

The order of infinity that was found in the case of infinite glue is returned
in the global variable |cur_order|.

@!cur_order:glue_ord; {order of infinity found by |scan_dimen|}

@ Constructions like `\.{-\'77 pt}' are legal dimensions, so |scan_dimen|
may begin with |scan_int|. This explains why it is convenient to use
|scan_int| also for the integer part of a decimal fraction.

Several branches of |scan_dimen| work with |cur_val| as an integer and
with an auxiliary fraction |f|, so that the actual quantity of interest is
$|cur_val|+|f|/2^{16}$. At the end of the routine, this ``unpacked''
representation is put into the single word |cur_val|, which suddenly
switches significance from |integer| to |scaled|.

@d attach_fraction=88 {go here to pack |cur_val| and |f| into |cur_val|}
@d attach_sign=89 {go here when |cur_val| is correct except perhaps for sign}

@p procedure scan_dimen(@!mu,@!inf,@!shortcut:boolean);
	{sets |cur_val| to a dimension}
label done, found, not_found, attach_fraction, attach_sign;
var negative:boolean; {should the answer be negated?}
@!f:integer; {numerator of a fraction whose denominator is $2^{16}$}
@<Local variables for dimension calculations@>@;
begin f←0; arith_error←false; cur_order←normal; negative←false;
if not shortcut then
	begin @<Get the next non-blank non-sign...@>;
	if (cur_cmd=the)∨(cur_cmd=register) then
		@<Fetch an internal dimension and |goto attach_sign|,
			or fetch an internal integer@>
	else	begin back_input;
		if cur_tok≠point_token then scan_int
		else	begin radix←10; cur_val←0;
		if (radix=10)∧(cur_tok=point_token) then @<Scan decimal fraction@>;
if cur_val<0 then {in this case |f<0|}
	begin negative ← ¬ negative; cur_val←-cur_val;
@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$, where there
	are $x$ units per sp@>;
attach_sign: if negative then cur_val←-cur_val;
if arith_error ∨(abs(cur_val)≥@'10000000000) then
	@<Report that this dimension is out of range@>;
@<Scan an optional space@>;

@ @<Fetch an internal dimension and |goto attach_sign|...@>=
begin if cur_cmd=register then back_input {implied `\.{\\the}'}
else if cur_chr≠0 then negative ← ¬ negative; {handle `\.{\\minus}'}
if cur_val_level=dimen_val then goto attach_sign;

@ @<Local variables for dimension calculations@>=
@!num,@!denom:1..65536; {conversion ratio for the scanned units}
@!k:small_number; {number of digits in a decimal fraction}
@!j:small_number; {index into a box node}
@!v:scaled; {an internal dimension}
@!save_cur_val:scaled; {temporary storage of |cur_val|}

@ When the following code is executed, we have |cur_tok=point_token|, but this
token has been backed up using |back_input|; we must first discard it.

It turns out that a decimal point all by itself is equivalent to `\.{0.0}'.
Let's hope people don't use that fact.

@<Scan decimal fraction@>=
begin k←0; get_token; {|point_token| is being re-scanned}
loop@+	begin get_nc_token;
	if (cur_tok>zero_token+9)∨(cur_tok<zero_token) then goto done;
	if k<16 then {digits for |k≥16| cannot affect the result}
		begin dig[k]←cur_tok-zero_token; incr(k);
done: f←round_decimals(k); back_input;

@ Now comes the harder part: At this point in the program, |cur_val| is a
nonnegative integer and $f/2^{16}$ is a nonnegative fraction less than 1;
we want to multiply the sum of these two quantities by the appropriate
factor, based on the specified units, in order to produce a |scaled|
result, and we want to do the calculation with fixed point arithmetic that
does not overflow. This section of the program is followed by the label

@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$...@>=
if inf then @<Scan for \.{fil} units; |goto attach_fraction| if found@>;
if mu then @<Scan for \.{mu} units and |goto attach_fraction|@>;
@<Scan for units that are internal dimensions,
	and |goto attach_sign| if found@>;
if scan_keyword("true") then @<Adjust \(f)for the magnification ratio@>;
if scan_keyword("pt") then goto attach_fraction; {the easy case}
@<Scan for all other units and adjust |cur_val| and |f| accordingly@>;
attach_fraction: if cur_val≥@'40000 then arith_error←true
else cur_val←cur_val*unity+f

@ @<Scan for \.{fil} units...@>=
if scan_keyword("fil") then
	begin cur_order←fil;
	while scan_keyword("l") do
		begin if cur_order=filll then
			begin help1("I dddon't go any higher than filll.");
			print_nl("! Illegal unit of measure (");
@.Illegal unit of measure@>
			print("replaced by filll)"); error;
		else incr(cur_order);
	goto attach_fraction;

@ @<Scan for \.{mu} units and |goto attach_fraction|@>=
if scan_keyword("mu") then goto attach_fraction
else	begin print_nl("! Illegal unit of measure ("); print("mu inserted)");
@.Illegal unit of measure@>
	help4("The unit of measurement in \mskip glue must be mu.")@/
		("To recover gracefully from this error, it's best to")@/
		("delete the erroneous units; e.g., type `2' to delete")@/
		("two letters. (See Chapter 27 of the manual.)");
	error; goto attach_fraction;

@ @d set_internal_dimen(#)==begin v←#; goto found;

@<Scan for units that are internal dimensions...@>=
if scan_keyword("em") then set_internal_dimen(@<The em width for |cur_font|@>);
if scan_keyword("ex") then set_internal_dimen(@<The x-height for |cur_font|@>);
if scan_keyword("vu") then set_internal_dimen(var_unit);
if scan_keyword("wd") then j←width_offset
else if scan_keyword("dp") then j←depth_offset
else if scan_keyword("ht") then j←height_offset
else goto not_found;
save_cur_val←cur_val; scan_eight_bit_int;
if box(cur_val)=null then v←0@+else v←mem[box(cur_val)+j].sc;
goto attach_sign;

@ @<Adjust \(f)for the magnification ratio@>=
begin @<Scan an optional space@>;
if mag≠1000 then
	begin cur_val←xn_over_d(cur_val,1000,mag);
	f←(1000*f+@'200000*remainder) div mag;
	cur_val←cur_val+(f div @'200000); f←f mod @'200000;

@ All of the necessary conversion factors can be specified exactly as a
fraction whose numerator and denominator are 65536 or less.

@d set_conversion_end(#)== denom←#; end
@d set_conversion(#)==@+begin num←#; set_conversion_end

@<Scan for all other units and adjust |cur_val| and |f| accordingly@>=
if scan_keyword("in") then set_conversion(7227)(100)
else if scan_keyword("pc") then set_conversion(12)(1)
else if scan_keyword("cm") then set_conversion(7227)(254)
else if scan_keyword("mm") then set_conversion(7227)(2540)
else if scan_keyword("bp") then set_conversion(7227)(7200)
else if scan_keyword("dd") then set_conversion(215)(201)
else if scan_keyword("cc") then set_conversion(2580)(201)
else if scan_keyword("sp") then goto attach_sign
else @<Complain about unknown unit and |goto attach_fraction|@>;
f←(num*f+@'200000*remainder) div denom;@/
cur_val←cur_val+(f div @'200000); f←f mod @'200000

@ @<Complain about unknown unit...@>=
begin print_nl("! Illegal unit of measure ("); print("pt inserted)");
@.Illegal unit of measure@>
help6("Dimensions can be in units of em, ex, vu, wd, dp, ht, in,")@/
	("pt, pc, cm, mm, bp, dd, cc, or sp, but yours is a new one.")@/
	("I'll assume that you meant to say pt, for printers' points.")@/
	("To recover gracefully from this error, it's best to")@/
	("delete the erroneous units; e.g., type `2' to delete")@/
	("two letters. (See Chapter 27 of the manual.)");
error; goto attach_fraction;

@ @d max_dimen==@'7777777777 {$2^{30}-1$}

@<Report that this dimension is out of range@>=
begin print_nl("! Dimension too large");
@.Dimension too large@>
help2("I can't work with sizes bigger than about 19 feet.")@/
	("Continue and I'll use the largest value I can.");@/
error; cur_val←max_dimen;

@ The final member of \TeX's value-scanning trio is |scan_glue|, which
makes |cur_val| point to a glue specification. The reference count of that
glue spec will take account of the fact that |cur_val| is pointing to it.

As before, the |mu| parameter is |true| if the glue is supposed to be
an \.{\\mskip}.

Since |scan_dimen| was so much more complex than |scan_int|, we might expect
|scan_glue| to be even worse. But fortunately, it is very simple, since
most of the work has already been done for us.

@p procedure scan_glue(@!mu:boolean); {sets |cur_val| to a glue spec pointer}
label exit;
var negative:boolean; {should the answer be negated?}
@!q:pointer; {new glue specification}
begin @<Get the next non-blank non-sign...@>;
if (cur_cmd=the)∨(cur_cmd=register) then
	begin if cur_cmd=register then back_input {implied `\.{\\the}'}
	else if cur_chr≠0 then negative ← ¬ negative; {handle `\.{\\minus}'}
	scan_the(glue_val,negative); negative←false;
	if cur_val_level=glue_val then return;
	if cur_val_level=int_val then scan_dimen(mu,false,true);
else	begin back_input; scan_dimen(mu,false,false);
@<Create a new glue specification whose width is |cur_val|; scan for its
	stretch and shrink components@>;
if negative then @<Negate all three glue components of |cur_val|@>;

@ @<Create a new glue specification whose width is |cur_val|...@>=
q←new_spec(zero_glue); width(q)←cur_val;
if scan_keyword("plus") then
	begin scan_dimen(mu,true,false);
	stretch(q)←cur_val; stretch_order(q)←cur_order;
if scan_keyword("minus") then
	begin scan_dimen(mu,true,false);
	shrink(q)←cur_val; shrink_order(q)←cur_order;

@ @<Negate all three...@>=
begin width(cur_val)←-width(cur_val);

@ Here's a similar procedure that returns a pointer to a rule node. This
routine is called just after \TeX\ has seen \.{\\hrule} or \.{\\vrule};
therefore |cur_cmd| will be either |hrule| or |vrule|. The idea is to store
the default rule dimensions in the node, then to override them if
`\.{height}' or `\.{width}' or `\.{depth}' specifications are
found (in any order).

@d default_rule=26215 {.40001 pt}

@p function scan_rule_spec:pointer;
label reswitch;
var q:pointer; {the rule node being created}
begin q←new_rule; {|width|, |depth|, and |height| all equal |null_flag| now}
if cur_cmd=vrule then width(q)←default_rule
else	begin height(q)←default_rule; depth(q)←0;
reswitch: if scan_keyword("width") then
	begin scan_dimen(false,false,false); width(q)←cur_val; goto reswitch;
if scan_keyword("height") then
	begin scan_dimen(false,false,false); height(q)←cur_val; goto reswitch;
if scan_keyword("depth") then
	begin scan_dimen(false,false,false); depth(q)←cur_val; goto reswitch;
@* \[27] Building token lists.
The token lists for macros and for other things like \.{\\mark} and \.{\\output}
and \.{\\send} are produced by a procedure called |scan_toks|.

Before we get into the details of |scan_toks|, let's consider a much simpler
task, that of converting the current string into a token list. The |str_toks|
function does this; it classifies spaces as type |spacer|, characters |≥"a"|
as type |letter|, and everything else as type |other_char|. (These
three categories are sufficient, since |str_toks| is used only with the
special strings that can occur when \.{\\the} or \.{\\number} constructions
are being expanded.)

The token list created by |str_toks| begins at |link(temp_head)| and ends
at the value |p| that is returned. (If |p=temp_head|, the list is empty.)

@p function str_toks:pointer; {changes the current string to a token list}
var p:pointer; {tail of the token list}
@!q:pointer; {new node being added to the token list via |store_new_token|}
@!t:halfword; {token being appended}
@!k:pool_pointer; {index into |str_pool|}
begin str_room(1);
p←temp_head; link(p)←null; k←str_start[str_ptr];
while k<pool_ptr do
	begin t←str_pool[k];
	if t=" " then t←space_token
	else if t≥"a" then t←letter_token+t
	else t←other_token+t;
pool_ptr←str_start[str_ptr]; str_toks←p;

@ The main reason for wanting |str_toks| is the next function,
|the_toks|, which has similar input/output characteristics.
This procedure is supposed to scan something like `\.{\\skip\\count12}',
i.e., whatever can follow `\.{\\the}' or `\.{\\minus}', and it
constructs a token list containing something like `\.{-3.0pt
minus 0.5fill}'.

@p function the_toks(@!negative:boolean):pointer;
var old_setting:0..max_selector; {holds |selector| setting}
@!p,@!q,@!r:pointer; {used for copying a token list}
begin scan_the(tok_val,negative);
if cur_val_level=tok_val then @<Copy the token list@>
else begin old_setting←selector; selector←new_string;
	case cur_val_level of
	dimen_val:begin print_scaled(cur_val); print("pt");
	glue_val: begin print_spec(cur_val,"pt"); delete_glue_ref(cur_val);
	mu_val: begin print_spec(cur_val,"mu"); delete_glue_ref(cur_val);
	end; {there are no other cases}
	selector←old_setting; the_toks←str_toks;

@ @<Copy the token list@>=
begin r←link(cur_val); p←temp_head; link(p)←null;
while r≠null do
	begin store_new_token(info(r)); r←link(r);

@ Similarly, we have |num_toks|, which prepares the token list
following `\.{\\number}':

@p function num_toks:pointer;
var old_setting:0..max_selector; {holds |selector| setting}
begin old_setting←selector;
scan_the(int_val,false); selector←new_string;
if cur_val≥0 then print_int(cur_val)@+else print_roman_int(-cur_val);
selector←old_setting; num_toks←str_toks;

@ Now we can't postpone the difficulties any longer; we must bravely tackle
|scan_toks|. This function returns a pointer to the tail of a new token
list, and it also makes |cur_val| point to the reference count at the head
of that list.

There are two boolean parameters, |macro_def| and |xpand|. If |macro_def|
is true, the goal is to create the token list for a macro definition;
otherwise the goal is to create the token list for some other \TeX\
primitive: \.{\\mark}, \.{\\output}, \.{\\everypar}, \.{\\lowercase},
\.{\\uppercase}, \.{\\message}, \.{\\errmessage}, \.{\\send}, or
\.{\\xsend}. In the latter cases a left brace must be scanned next; this
left brace will not be part of the token list, nor will the matching right
brace that comes at the end. If |xpand| is false, the token list will
simply be copied from the input using |get_token|. Otherwise all macros
and occurrences of `\.{\\the}' and `\.{\\minus}' and `\.{\\number}' will
be expanded, unless they are preceded by some control sequence like
`\.{\\def}' whose command code is |def|. If both |macro_def| and |xpand|
are true, the expansion applies only to the macro body (i.e., to the
material following the first |left_brace| character).

The value of |cs_ptr| when |scan_toks| begins should be the |eqtb|
address of the control sequence to display in ``runaway'' error

After the closing right brace, this routine will remove a space. % (rhymes)

@p function scan_toks(@!macro_def,@!xpand:boolean):pointer;
label found,done,done1,done2;
var r: pointer; {reference count location}
@!t:halfword; {token representing the highest parameter number}
@!s:halfword; {saved token}
@!p:pointer; {tail of the token list being built}
@!q:pointer; {new node being added to the token list via |store_new_token|}
@!unbalance:halfword; {number of unmatched left braces}
@!hash_brace:halfword; {possible `\.{\#\{}' token}
begin scanner_status←defining; warning_index←cs_ptr; r←get_avail; info(r)←0;
p←r; hash_brace←0;
if macro_def then @<Scan and build the parameter part of the macro definition@>
else scan_left_brace; {remove the compulsory left brace}
@<Scan and build the body of the token list; |goto found| when finished@>;
found: scanner_status←normal; @<Scan an optional space@>;
if hash_brace≠0 then store_new_token(hash_brace);
cur_val←r; scan_toks←p;

@ @<Scan and build the parameter part...@>=
begin t←zero_token; {this represents |"0"|}
loop@+	begin get_token; {set |cur_cmd|, |cur_chr|, |cur_tok|}
	if cur_cmd≤right_brace then goto done1;
	if cur_cmd=mac_param then
		@<If the next character is a parameter number, make |cur_tok|
			a |match| token; but if it is a left brace, store
			`|left_brace|, |end_match|' and |goto done|@>;
done1: store_new_token(end_match_token);
if cur_cmd=right_brace then
	@<Express shock at the missing left brace; |goto found|@>;
done: end

@ @<Express shock...@>=
begin print_nl("! Missing { inserted"); incr(align_state);
@.Missing {\{} inserted@>
help2("Where was the left brace? You said something like `\def\a}',")@/
	("which I'm going to interpret as `\def\a{}'."); error; goto found;

@ @<If the next character is a parameter number...@>=
begin s←match_token+cur_chr; get_token;
if cur_cmd=left_brace then
	begin store_new_token(cur_tok); store_new_token(end_match_token);
	goto done;
if t=zero_token+9 then
	begin print_nl("! You already have nine parameters");
@.You already have nine...@>
	help1("I'm going to ignore the # sign you just used."); error;
else	begin incr(t);
	if cur_tok≠t then
		begin print_nl("! Parameters must be numbered consecutively");
		help2("I've inserted the digit you should have used after the #.")@/
			("Type `1' to delete what you did use."); back_error;

@ @<Scan and build the body of the token list; |goto found| when finished@>=
loop@+	begin if xpand ∧(cur_cmd≠def) then
		@<Expand the next part of the input@>
	else get_token;
	if cur_cmd≤right_brace then
		if cur_cmd<right_brace then incr(unbalance)
		else	begin decr(unbalance);
			if unbalance=0 then goto found;
	else if cur_cmd=mac_param then @<Look for parameter number or \.{\#\#}@>;

@ @<Expand the next part of the input@>=
begin loop begin get_nc_token;
	if cur_cmd=the then q←the_toks(cur_chr>0)
	else if cur_cmd=number then q←num_toks
	else goto done2;
	link(p)←link(temp_head); p←q;

@ @<Look for parameter number...@>=
begin s←cur_tok; get_token;
if cur_cmd≠mac_param then
	if (cur_chr<"1")∨(cur_chr>t-zero_token+"0")∨@|(cur_cmd≠other_char) then
		begin print_nl("! Illegal parameter number in definition of ");
@.Illegal parameter number...@>
		help3("You meant to type ## instead of #, right?")@/
		("Or maybe a } was forgotten somewhere earlier, and things")@/
		("are all screwed up? I'm going to assume you meant ##.");
		back_error; cur_tok←s;
	else cur_tok←out_param_token-"0"+cur_chr;
@* \[28] File names.
Besides the fact that different operating systems treat files in different ways,
we must cope with the fact that completely different naming conventions
are used. The following programs show what is required for one particular
operating system; similar routines for other systems are not difficult 
to devise.
@^system dependencies@>

\TeX\ assumes that a file name has three parts: the name proper, its
``exten\-sion'', and a ``file area'' where it is found in an external file
system.  The extension of an input file or a send file is assumed to be
`\.{.tex}' unless otherwise specified; it is `\.{.err}' on the error
transcript file that records each run of \TeX; it is `\.{.tfm}' on the font
metric files that describe characters in the fonts \TeX\ uses; it is
`\.{.dvi}' on the output files that specify typesetting information; and it
is `\.{.fmt}' on the format files written by \.{INITEX} to initialize \TeX.
The file area can be arbitrary on input files, but it is usually the
user's current area when a file is output.  If an input file cannot be
found on the specified area, \TeX\ will look for it on a special system
area; this special area is intended for commonly used input files like

Simple uses of \TeX\ refer only to file names that have no explicit extension
or area. For example, one usually says `\.{\\input paper}' or
`\.{\\font100=helvetica}' instead of `\.{\\input}' or
`\.{\\font100=<csd.knuth>test}'. Simple file names are best, because they
make the \TeX\ source files portable; whenever a file name consists entirely
of letters and digits, it should be treated in the same way by all
implementations of \TeX. However, users need the ability to refer to other
files in their environment, especially when responding to error messages
concerning unopenable files; therefore we want to let them use the syntax
that appears in their favorite operating system.

@ In order to isolate the system-dependent aspects of file names, the
@^system dependencies@>
system-independent parts of \TeX\ make use of three system-dependent
procedures that are called |begin_name|, |more_name|, and |end_name|. In
essence, if the user-specified characters of the file name are $c↓1\ldotsm c↓n$,
the system-independent driver program does the operations
These three procedures communicate with each other via global variables.
After\-wards the file name will appear in the string pool as three strings
called |cur_name|\penalty10000\hskip-.05em,
|cur_area|, and |cur_ext|; the latter two are null (i.e.,
|""|), unless they were explicitly specified by the user.

Actually the situation is slightly more complicated, because \TeX\ needs
to know when the file name ends. The |more_name| routine is a function
(with side effects) that returns |true| on the calls |more_name|$(c↓1)$,
$\ldotss$, |more_name|$(c↓{n-1})$. The final call |more_name|$(c↓n)$
returns |false|; or, it returns |true| and the token following $c↓n$ is
something like `\.{\\hbox}' (i.e., not a character). In other words,
|more_name| is supposed to return |true| unless it is sure that the
file name has been completely scanned; and |end_name| is supposed to be able
to finish the assembly of |cur_name|, |cur_area|, and |cur_ext| regardless of
whether $|more_name|(c↓n)$ returned |true| or |false|.

@!cur_name:str_number; {name of file just scanned}
@!cur_area:str_number; {file area just scanned, or \.{""}}
@!cur_ext:str_number; {file extension just scanned, or \.{""}}

@ The file names we shall deal with have the following structure:
If the name contains `\.>' or `\.:', the file area consists of all characters
up to and including the final such character; otherwise the file area is null.
If the remaining file name contains `\..', the file extension consists of all
such characters from the first `\..' to the end, otherwise the file extension
is null.
@^system dependencies@>

We can scan such file names easily by using two global variables that keep track
of the occurrences of area and extension delimiters:

@!area_delimiter:pool_pointer; {the most recent `\.>' or `\.:', if any}
@!ext_delimiter:pool_pointer; {the relevant `\..', if any}

@ Input files that can't be found in the user's area may appear in a standard
system area called |TEX_area|. Font files whose areas are not given explicitly
are assumed to appear in a standard system area called |TEX_font_area|.
These system area names will, of course, vary from place to place.
@^system dependencies@>

@d TEX_area=="<TeX>"
@d TEX_font_area=="<TeX.fonts>"

@ Here now is the first of the system-dependent routines for file name scanning.
@^system dependencies@>

@p procedure begin_name;
begin area_delimiter←0; ext_delimiter←0;

@ And here's the second.
@^system dependencies@>

@p function more_name(@!c:ascii_code):boolean;
begin if c=" " then more_name←false
else	begin if (c=">")∨(c=":") then
		begin area_delimiter←pool_ptr; ext_delimiter←0;
	else if (c=".")∧(ext_delimiter=0) then ext_delimiter←pool_ptr;
	str_room(1); append_char(c); {contribute |c| to the current string}

@ The third.
@^system dependencies@>

@p procedure end_name;
begin if str_ptr+3>max_strings then overflow("number of strings",max_strings);
if area_delimiter=0 then cur_area←""
else	begin cur_area←str_ptr; incr(str_ptr);
if ext_delimiter=0 then
	begin cur_ext←""; cur_name←make_string;
else	begin cur_name←str_ptr; incr(str_ptr);
	str_start[str_ptr]←ext_delimiter; cur_ext←make_string;

@ Conversely, here is a routine that takes three strings and prints a file
name that might have produced them. (The routine is system dependent, because
some operating systems put the file area last instead of first.)
@^system dependencies@>

@<Basic printing...@>=
procedure print_file_name(@!n,@!a,@!e:integer);
begin print(a); print(n); print(e);

@ Another system-dependent routine is needed to convert three \TeX\ strings
into the |name_of_file| value that is used to open files.
@^system dependencies@>

@d append_to_name(#)==begin c←#; incr(k);
	if (c≥"a")∧(c≤"z") then c←c-@'40; {convert to upper case}

@p procedure pack_file_name(@!n,@!a,@!e:str_number);
var k:integer; {number of positions filled in |name_of_file|}
@!c: ascii_code; {character being packed}
@!j:pool_pointer; {index into |string_pool|}
begin if length(a)+length(n)+length(e)>file_name_size then
	overflow("file name size",file_name_size);
for j←str_start[a] to str_start[a+1]-1 do append_to_name(str_pool[j]);
for j←str_start[n] to str_start[n+1]-1 do append_to_name(str_pool[j]);
for j←str_start[e] to str_start[e+1]-1 do append_to_name(str_pool[j]);
for k←name_length+1 to file_name_size do name_of_file[k]←' ';

@ A messier routine is also needed, since format file names must be scanned
before \TeX's string mechanism has been initialized. We shall use the
global variable |TEX_format_default| to supply the text for default system areas
and extensions related to format files.
@^system dependencies@>

@d format_default_length=22 {length of the |TEX_format_default| string}
@d format_area_length=13 {length of its area part}

@!TEX_format_default:packed array[1..format_default_length] of char;

@ @<Set init...@>=

@ @<Check the ``constant'' values for consistency@>=
if format_default_length>file_name_size then bad←31;

@ Here is the messy routine that was just mentioned. It sets |name_of_file|
from the first |n| characters of |TEX_format_default|, followed by
|buffer[a..b]|, followed by the last four characters of |format_default|.

We dare not give error messages here, since \TeX\ calls this routine before
the |error| routine is ready to roll. Instead, we simply drop excess characters,
since the error will be detected in another way when a strange file name
isn't found.
@^system dependencies@>

@p procedure pack_buffered_name(@!n:small_number;@!a,@!b:integer);
var k:integer; {number of positions filled in |name_of_file|}
@!c: ascii_code; {character being packed}
@!j:integer; {index into |buffer| or |TEX_format_default|}
begin if n+b-a+5>file_name_size then b←a+file_name_size-n-5;
for j←1 to n do append_to_name(xord[TEX_format_default[j]]);
for j←a to b do append_to_name(buffer[j]);
for j←format_default_length-3 to format_default_length do
for k←name_length+1 to file_name_size do name_of_file[k]←' ';

@ Here is the only place we use |pack_buffered_name|. This part of the program
becomes active when a ``virgin'' \TeX\ is trying to get going, just after
the preliminary initialization. The buffer contains the first line of input
in |buffer[loc..(last-1)]|, where |loc<last| and |buffer[loc]≠" "|.

@<Declare the function called |open_fmt_file|@>=
function open_fmt_file:boolean;
label found,exit;
var j:0..buf_size; {the first space after the file name}
begin if buffer[loc]≠"\" then
	begin j←loc+1; buffer[last]←" ";
	while buffer[j]≠" " do incr(j);
	pack_buffered_name(0,loc,j-1); {try first without the system file area}
	if w_open_in(fmt_file) then
		begin loc←j; goto found;
	{now try the system format file area}
	if w_open_in(fmt_file) then
		begin loc←j; goto found;
	{now pull out all the stops: try for the system \.{BASIC} file}
if ¬ w_open_in(fmt_file) then
	begin write_ln(term_out,'I can''t find the BASIC format file!');
	open_fmt_file←false; return;

@ Operating systems often make it possible to determine the exact name (and
possible version number) of a file that has been opened. The following routine,
which simply makes a \TeX\ string from the value of |name_of_file|, should
ideally be changed to deduce the full name of file@@|f|, if it is
possible to do this in a \PASCAL\ program.
@^system dependencies@>

@p function make_name_string:str_number;
var k:1..file_name_size; {index into |name_of_file|}
begin str_room(name_length);
for k←1 to name_length do append_char(xord[name_of_file[k]]);
function a_make_name_string(var f:alpha_file):str_number;
begin a_make_name_string←make_name_string;
function b_make_name_string(var f:byte_file):str_number;
begin b_make_name_string←make_name_string;
function w_make_name_string(var f:word_file):str_number;
begin w_make_name_string←make_name_string;

@ Similarly, the ideal |write_name_string| would write the full
external name of |fmt_file|, assuming that |fmt_file| has been opened.
The procedure shown here merely copies |name_of_file|. Output in this
case goes to the user's terminal.

@p procedure write_name_string;
var k:1..file_name_size; {index into |name_of_file|}
begin for k←1 to name_length do write(term_out,name_of_file[k]);

@ Now let's consider the routines by which \TeX\ deals with file names
in a (hopefully) system-independent manner.
First comes a procedure that looks for a file name in the input by
calling |get_nc_token| for the information.

@p procedure scan_file_name;
label done;
begin begin_name;
@<Get the next non-blank non-call...@>;
loop@+begin if (cur_cmd>other_char)∨(cur_chr>127) then {not a character}
		begin back_input; goto done;
	if ¬ more_name(cur_chr) then goto done;
done: end_name;

@ The global variable |job_name| contains the file name that was first
\.{\\input} by the user. This name is extended by `\.{err}' and `\.{dvi}'
and `\.{fmt}' in order to make the names of \TeX's output files.

@!job_name:str_number; {principal file name}

@ Initially |job_name=0|; it becomes nonzero as soon as the true name is known.
We have |job_name=0| if and only the `\.{err}' file has not been opened.

@<Initialize the output...@>=job_name←0;

@ Here is a routine that manufactures the output file names, assuming that
|job_name≠0|. It ignores and changes the current settings of |cur_area|
and |cur_ext|.

@d pack_cur_name==pack_file_name(cur_name,cur_area,cur_ext)

@p procedure pack_job_name(@!s:str_number); {|s = ".err"|, |".dvi"|, or
begin cur_area←""; cur_ext←s;
cur_name←job_name; pack_cur_name;

@ If some trouble arises when \TeX\ tries to open a file, the following
routine calls upon the user to supply another file name. Parameter@@|s|
is used in the error message to identify the type of file; parameter@@|e|
is the default extension if none is given. Upon exit from the routine,
variables |cur_name|, |cur_area|, |cur_ext|, and |name_of_file| are
ready for another attempt at file opening.

@p procedure prompt_file_name(@!s,@!e:str_number);
label done;
var k:0..buf_size; {index into |buffer|}
begin if s="input file name" then print_nl("! I can't find file `")
@.I can't find file x@>
else print_nl("! I can't write on file `");
@.I can't write on file x@>
print_file_name(cur_name,cur_area,cur_ext); print("'.");
if e=".tex" then show_context;
print_nl("Please type another "); print(s);
if interaction<scroll_mode then
	fatal_error("*** (job aborted, file error in nonstop mode)");
clear_terminal; prompt_input(": "); @<Scan file name in the buffer@>;
if cur_ext="" then cur_ext←e;

@ @<Scan file name in the buffer@>=
begin begin_name; k←first;
while (buffer[k]=" ")∧(k<last) do incr(k);
loop@+	begin if k=last then goto done;
	if ¬ more_name(buffer[k]) then goto done;

@ Here's an example of how these conventions are used. We shall use the
macro |ensure_dvi_open| when it is time to ship out a box of stuff.

@d ensure_dvi_open==if output_file_name=0 then
	begin if job_name=0 then open_err_file;
	while ¬ b_open_out(dvi_file) do
		prompt_file_name("file name for output",".dvi");

@!dvi_file: byte_file; {the device-independent output goes here}
@!output_file_name: str_number; {full name of the output file}

@ @<Initialize the output...@>=output_file_name←0;

@ The |open_err_file| routine is used to open the transcript file and to help
it catch up to what has previously been printed on the terminal.

@p procedure open_err_file;
var old_setting:0..max_selector; {previous |selector| setting}
@!k:0..buf_size; {index into |months| and |buffer|}
@!months:packed array [1..36] of char; {abbreviations of month names}
begin old_setting←selector; print_nl("");
if job_name=0 then job_name←"texput";
while ¬ a_open_out(err_file) do
	prompt_file_name("`err' file name for the transcript",".err");
@<Print the banner line, including the date and time@>;
input_stack[input_ptr]←cur_input; {make sure bottom level is in memory}
for k←1 to input_stack[0].limit_field do print(buffer[k-1]);
print_ln; {now the transcript file contains the first line of input}

@ @<Print the banner...@>=
begin write(err_file,banner);
print(format_ident); print("  ");
print_int(day); print_char(" ");
for k←3*month-2 to 3*month do write(err_file,months[k]);
print_char(" "); print_int(year); print_char(" ");
print_int(time div 60); print_char(":");
print_char("0"+((time mod 60) div 10));
print_char("0"+(time mod 10)); print_ln;

@ Let's turn now to the procedure that is used to initiate file reading
when an `\.{\\input}' command is being processed.

@p procedure start_input; {\TeX\ will \.{\\input} something}
label done;
begin scan_file_name; {set |cur_name| to desired file name}
if cur_ext="" then cur_ext←".tex";
loop@+	begin begin_file_reading; {set up |cur_file| and new level of input}
	if a_open_in(cur_file) then goto done;
	if a_open_in(cur_file) then goto done;
	end_file_reading; {remove the level that didn't work}
	prompt_file_name("input file name",".tex");
done: name←a_make_name_string(cur_file); state←new_line;
if job_name=0 then
	begin job_name←cur_name; open_err_file;
	end; {|open_err_file| doesn't |show_context|, so |limit|
		and |loc| needn't be set to meaningful values yet}
print(" ("); print(name); update_terminal;
@<Read the first line of the new file@>;

@ Here we have to remember that the |input_ln| routine
starts with a |get|, so we have to read the first character of the file
before it is lost. 

@<Read the first line...@>=
begin if eoln(cur_file) then last←start
else	begin buffer[start]←xord[cur_file^]; first←start+1;
	if ¬ input_ln(cur_file) then confusion("input");
@:confusion input}{\quad input@>
buffer[limit]←carriage_return; first←limit+1; loc←start; line←1;
@* \[29] Font metric data.
\TeX\ gets its knowledge about fonts from font metric files, also called
\.{TFM} files; the `\.T' in `\.{TFM} stands for \TeX,
but other programs know about them too.
@:TFM files}{\.{TFM} files@>
@^font metric files@>

The information in a \.{TFM} file appears in a sequence of 8-bit bytes.
Since the number of bytes is always a multiple of 4, we could
also regard the file as a sequence of 32-bit words, but \TeX\ uses the
byte interpretation. The format of \.{TFM} files was designed by
Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds
@^Ramshaw, Lyle Harold@>
of information in a compact but useful form.


@ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit
integers that give the lengths of the various subsequent portions
of the file. These twelve integers are, in order:
|lf|&length of the entire file, in words;\cr
|lh|&length of the header data, in words;\cr
|bc|&smallest character code in the font;\cr
|ec|&largest character code in the font;\cr
|nw|&number of words in the width table;\cr
|nh|&number of words in the height table;\cr
|nd|&number of words in the depth table;\cr
|ni|&number of words in the italic correction table;\cr
|nl|&number of words in the lig/kern table;\cr
|nk|&number of words in the kern table;\cr
|ne|&number of words in the extensible character table;\cr
|np|&number of font parameter words.\cr}}$$
They are all nonnegative and less than $2^{15}$. We must have |bc-1≤ec≤255|,
Note that a font may contain as many as 256 characters (if |bc=0| and |ec=255|),
and as few as 0 characters (if |bc=ec+1|). An exception to these rules is
planned for oriental fonts, which will be identified by the condition |ec=256|;
such fonts are not allowed except in extensions to \TeX82.
@^oriental characters@>@^Chinese characters@>@^Japanese characters@>

Incidentally, when two or more 8-bit bytes are combined to form an integer of
16 or more bits, the most significant bytes appear first in the file.
This is called BigEndian order.
@!@^BigEndian order@>

@ The rest of the \.{TFM} file may be regarded as a sequence of ten data
arrays having the informal specification
$$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2}
The most important data type used here is a |@!fix_word|, which is
a 32-bit representation of a binary fraction. A |fix_word| is a signed
quantity, with the two's complement of the entire word used to represent
negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the
binary point; thus, the largest |fix_word| value is $2048-2^{-20}$, and
the smallest is $-2048$. We will see below, however, that all but two of
the |fix_word| values must lie between $-16$ and $+16$.

@ The first data array is a block of header information, which contains
general facts about the font. The header must contain at least two words,
|header[0]| and |header[1]|, whose meaning is explained below.
Additional header information of use to other software routines might
also be included, but \TeX82 does not need to know about such details.
For example, 16 more words of header information are in use at the Xerox
Palo Alto Research Center; the first ten specify the character coding
scheme used (e.g., `\.{XEROX TEXT}' or `\.{TeX MATHSY}'), the next five
give the font family name (e.g., `\.{HELVETICA}' or `\.{CMSY}'), and the
last gives the ``face byte.'' The program that converts \.{DVI} files
to Xerox printing format gets this information by looking at the \.{TFM}
file, which it needs to read anyway because of other information that
is not explicitly repeated in \.{DVI} format.

\yskip\hang|header[0]| is a 32-bit check sum that \TeX\ will copy into
the \.{DVI} output file. Later on when the \.{DVI} file is printed,
possibly on another computer, the actual font that gets used is supposed
to have a check sum that agrees with the one in the \.{TFM} file used by
\TeX. In this way, users will be warned about potential incompatibilities.
(However, if the check sum is zero in either the font file or the \.{TFM}
file, no check is made.)  The actual relation between this check sum and
the rest of the \.{TFM} file is not important; the check sum is simply an
identification number with the property that incompatible fonts almost
always have distinct check sums.
@^check sum@>

\yskip\hang|header[1]| is a |fix_word| containing the design size of
the font, in units of points. This number must be at least 1.0; it is
fairly arbitrary, but usually the design size is 10.0 for a ``10 point''
font, i.e., a font that was designed to look best at a 10-point size,
whatever that really means. When a \TeX\ user asks for a font
`\.{at} $\delta$ \.{pt}', the effect is to override the design size
and replace it by $\delta$, and to multiply the $x$ and@@$y$ coordinates
of the points in the font image by a factor of $\delta$ divided by the
design size.  {\sl All other dimensions in the\/\ \.{TFM} file are
|fix_word| numbers in design-size units.} Thus, for example, the value
of |param[6]|, one \.{em} or \.{\\quad}, is often the |fix_word| value
$2^{20}=1.0$, since many fonts have a design size equal to one em.
The other dimensions must be less than 16 design-size units in absolute
value; thus, |header[1]| and |param[1]| are the only |fix_word|
entries in the whole \.{TFM} file whose first byte might be something
besides 0 or 255.

@ Next comes the |char_info| array, which contains one |char_info_word|
per character. Each |@!char_info_word| contains six fields packed into
four bytes as follows.

\yskip\hang first byte: |@!width_index| (8 bits)\par
\hang second byte: |@!height_index| (4 bits) times 16, plus |@!depth_index|
\hang third byte: |@!italic_index| (6 bits) times 4, plus |@!tag|
\hang fourth byte: |@!remainder| (8 bits)\par
The actual width of a character is \\{width}|[width_index]|, in design-size
units; this is a device for compressing information, since many characters
have the same width. Since it is quite common for many characters
to have the same height, depth, or italic correction, the \.{TFM} format
imposes a limit of 16 different heights, 16 different depths, and
64 different italic corrections.

@!@^italic correction@>
The italic correction of a character has three different uses.
(a)@@In ordinary text, the italic correction is added to the width only if
the \TeX\ user specifies `\.{\\/}' after the character.
(b)@@In math formulas, the italic correction is always added to the width,
unless the character has a subscript but no superscript.
(c)@@In math formulas for which the character is a mathop in display style,
a nonzero italic correction also affects the ``limit switch.'' For example,
an integral sign $\int$ has a nonzero italic correction, but a summation
sign $\sum$ does not.

Incidentally, the relation $\\{width}[0]=\\{height}[0]=\\{depth}[0]=
\\{italic}[0]=0$ should always hold, so that an index of zero implies a
value of zero.  The |width_index| should never be zero unless the
character does not exist in the font, since a character is valid if and
only if it lies between |bc| and |ec| and has a nonzero |width_index|.

@ The |tag| field in a |char_info_word| has four values that explain how to
interpret the |remainder| field.

\yskip\hang|tag=0| (|no_tag|) means that |remainder| is unused.\par
\hang|tag=1| (|lig_tag|) means that this character has a ligature/kerning
program starting at |lig_kern[remainder]|.\par
\hang|tag=2| (|list_tag|) means that this character is part of a chain of
characters of ascending sizes, and not the largest in the chain.  The
|remainder| field gives the character code of the next larger character.\par
\hang|tag=3| (|ext_tag|) means that this character code represents an
extensible character, i.e., a character that is built up of smaller pieces
so that it can be made arbitrarily large. The pieces are specified in
Characters with |tag=2| and |tag=3| are treated as characters with |tag=0|
unless they are used in special circumstances in math formulas. For example,
the \.{\\sum} operation looks for a |list_tag|, and the \.{\\left}
operation looks for both |list_tag| and |ext_tag|.

@d no_tag=0 {vanilla character}
@d lig_tag=1 {character has a ligature/kerning program}
@d list_tag=2 {character has a successor in a charlist}
@d ext_tag=3 {character is extensible}

@ The |lig_kern| array contains instructions in a simple programming language
that explains what to do for special letter pairs. Each word in this array is a
|@!lig_kern_command| of four bytes.

\yskip\hang first byte: |stop_bit|, indicates the final program step
	if the byte is 128 or more.\par
\hang second byte: |next_char|, ``if |next_char| follows the current character,
	then perform the operation and stop, otherwise continue.''\par
\hang third byte: |op_bit|, indicates a ligature step if less than@@128,
	a kern step otherwise.\par
\hang fourth byte: |remainder|.\par
In a ligature step the current character and |next_char| are replaced by
the single character whose code is |remainder|. In a kern step, an
additional space equal to |@!kern[remainder]| is inserted between the
current character and |next_char|. (The value of |kern[remainder]| is
often negative, so that the characters are brought closer together
by kerning, but it might be positive.)

@d stop_flag=128+min_quarterword
	{value indicating `\.{STOP}' in a lig/kern program}
@d kern_flag=128+min_quarterword {op code for a kern step}
@d stop_bit(#)==#.b0
@d next_char(#)==#.b1
@d op_bit(#)==#.b2
@d rem_byte(#)==#.b3

@ Extensible characters are specified by an |@!extensible_recipe|, which
consists of four bytes called |@!top|, |@!mid|, |@!bot|, and |@!rep| (in this
order). These bytes are the character codes of individual pieces used to
build up a large symbol.  If |top|, |mid|, or |bot| are zero, they are not
present in the built-up result. For example, an extensible vertical line is
like an extensible bracket, except that the top and bottom pieces are missing.

Let $T$, $M$, $B$, and $R$ denote the respective pieces, or an empty box
if the piece isn't present. Then the extensible characters have the form
$TR^kMR^kB$ from top to bottom, for some |k≥0|, unless $M$ is absent;
in the latter case we can have $TR^kB$ for both even and odd values of@@|k|.
The width of the extensible character is the width of $R$; and the
height-plus-depth is the sum of the individual height-plus-depths of the
components used, since the pieces are butted together in a vertical list.

@d ext_top(#)==#.b0 {|top| piece in a recipe}
@d ext_mid(#)==#.b1 {|mid| piece in a recipe}
@d ext_bot(#)==#.b2 {|bot| piece in a recipe}
@d ext_rep(#)==#.b3 {|rep| piece in a recipe}

@ The final portion of a \.{TFM} file is the |param| array, which is another
sequence of |fix_word| values.

\yskip\hang|param[1]=slant| is the amount of italic slant, which is used
to help position accents. For example, |slant=.25| means that when you go
up one unit, you also go .25 units to the right. The |slant| is a pure
number; it is the only |fix_word| other than the design size itself that is
not scaled by the design size.

\hang|param[2]=space| is the normal spacing between words in text.
Note that character @'40 in the font need not have anything to do with
blank spaces.

\hang|param[3]=space_stretch| is the amount of glue stretching between words.

\hang|param[4]=space_shrink| is the amount of glue shrinking between words.

\hang|param[5]=x_height| is the size of one ex in the font; it is also
the height of letters for which accents don't have to be raised or lowered.

\hang|param[6]=quad| is the size of one em in the font.

\hang|param[7]=extra_space| is the amount added to |param[2]| at the
ends of sentences.

If fewer than seven parameters are present, \TeX\ sets the missing parameters
to zero. Fonts used for math symbols are required to have
additional parameter information, which is explained later.

@d slant_code=1
@d space_code=2
@d space_stretch_code=3
@d space_shrink_code=4
@d x_height_code=5
@d quad_code=6
@d extra_space_code=7

@ So that is what \.{TFM} files hold. Since \TeX\ has to absorb such information
about lots of fonts, it stores most of the data in a large array called
|font_info|. Each item of |font_info| is a |memory_word|; the |fix_word|
data gets converted into |scaled| entries, while everything else goes into
words of type |four_quarters|.

When the user defines \.{\\font 100}, say, \TeX\ assigns an
internal number to the user's font 100. For example, if this internal number
is 13, we will have |font_code[13]=100| and |font_number[100]=13|.

@!user_font_code=0..bad_font_code; {font identification in \TeX\ input}
@!internal_font_number=font_base..font_max; {|font| in a |char_node|}

@ Here now is the (rather formidable) array of font arrays.

@!font_info:array[0..font_mem_size] of memory_word;
	{the big collection of font data}
@!fmem_ptr:0..font_mem_size; {first unused word of |font_info|}
@!font_ptr:internal_font_number; {largest internal font number in use}
@!font_code:array[internal_font_number] of user_font_code; {the first user
	font code corresponding to an internal font number}
@!font_number:array[user_font_code] of internal_font_number; {the internal
	font number corresponding to a user's font code}
@!font_check:array[internal_font_number] of four_quarters; {check sum}
@!font_size:array[internal_font_number] of scaled; {``at'' size}
@!font_scaled:array[internal_font_number] of boolean; {is the ``at'' size
	different from the design size?}
@!font_params:array[internal_font_number] of halfword; {how many font
	parameters are present}
@!font_name:array[internal_font_number] of str_number; {name of the font}
@!font_area:array[internal_font_number] of str_number; {area of the font}
@!font_bc:array[internal_font_number] of eight_bits; {beginning (smallest)
	character code}
@!font_ec:array[internal_font_number] of eight_bits; {ending (largest)
	character code}
@!font_glue:array[internal_font_number] of pointer; {glue specification
	for interword space, |null| if not allocated}
@!font_used:array[internal_font_number] of boolean; {has a character from
	this font actually appeared in the output?}

@ Besides the arrays just enumerated, we have directory arrays that make it
easy to get at the individual entries in |font_info|. For example, the
|char_info| data for character |c| in font |f| will be in
|font_info[char_base[f]+c].qqqq|; and if |w| is the |width_index|
part of this word (the |b0| field), |font_info[width_base[f]+w].sc|
is the width of the character. (These formulas assume that |min_quarterword|
has already been added to |c| and to |w|, since \TeX\ stores its
quarterwords that way.)

@!char_base:array[internal_font_number] of integer;
	{base addresses for |char_info|}
@!width_base:array[internal_font_number] of integer;
	{base addresses for widths}
@!height_base:array[internal_font_number] of integer;
	{base addresses for heights}
@!depth_base:array[internal_font_number] of integer;
	{base addresses for depths}
@!italic_base:array[internal_font_number] of integer;
	{base addresses for italic corrections}
@!lig_kern_base:array[internal_font_number] of integer;
	{base addresses for ligature/kerning programs}
@!kern_base:array[internal_font_number] of integer;
	{base addresses for kerns}
@!exten_base:array[internal_font_number] of integer;
	{base addresses for extensible recipes}
@!param_base:array[internal_font_number] of integer;
	{base addresses for font parameters}

@ In order to cope more easily with erroneous situations, the undefined font
is assumed to have seven valid parameters, all equal to zero.

@<Set init...@>=
for k←0 to bad_font_code do font_number[k]←undefined_font;
font_ptr←font_base; fmem_ptr←7; font_code[undefined_font]←bad_font_code;
font_bc[undefined_font]←1; font_ec[undefined_font]←0;
font_glue[undefined_font]←zero_glue; font_params[undefined_font]←7;
for k←0 to 6 do font_info[k].sc←0;
for k←font_base to font_max do font_used[k]←false;

@ Of course we want to define macros that suppress the detail of how font
information is actually packed, so that we don't have to write things like
too often. The \.{WEB} definitions here make |char_info(f)(c)| the
|four_quarters| word of font information corresponding to character
|c| of font |f|. If |q| is such a word, |char_width(f)(q)| will be
the character's width, so that the long formula above is at least
abbreviated to
Usually, of course, we will fetch |q| first and look at several of its
fields at the same time.

The italic correction of a character will be denoted by
|char_italic(f)(q)|, so it is analogous to |char_width|.  But we will get
at the height and depth in a slightly different way, since we usually want
to compute both height and depth if we want either one.  The value of
|height_depth(q)| will be the 8-bit quantity
$$b=|height_index|\times16+|depth_index|,$$ and if |b| is such a byte we
will write |char_height(f)(b)| and |char_depth(f)(b)| for the height and
depth of the character |c| for which |q=char_info(f)(c)|. Got that?

The tag field will be called |char_tag(q)|; the remainder byte will be
called |rem_byte(q)|, using a macro that we have already defined above.

Access to a character's |height|, |depth|, and |tag| fields is part of
\TeX's inner loop, so we want these macros to produce code that is as fast
as possible under the circumstances.
@^inner loop@>

@d char_info_end(#)==#].qqqq
@d char_info(#)==font_info[char_base[#]+char_info_end
@d char_width_end(#)==#.b0].sc
@d char_width(#)==font_info[width_base[#]+char_width_end
@d char_exists(#)==(#.b0>min_quarterword)
@d char_italic_end(#)==qo(#.b2) div 4].sc
@d char_italic(#)==font_info[italic_base[#]+char_italic_end
@d height_depth(#)==qo(#.b1)
@d char_height_end(#)==# div 16].sc
@d char_height(#)==font_info[height_base[#]+char_height_end
@d char_depth_end(#)==# mod 16].sc
@d char_depth(#)==font_info[depth_base[#]+char_depth_end
@d char_tag(#)==(qo(#.b2) mod 4)

@ The global variable |null_character| is set up to be a word of
|char_info| for a character that doesn't exist. Such a word provides a
convenient way to deal with erroneous situations.

@!null_character:four_quarters; {nonexistent character information}

@ @<Set init...@>=
null_character.b0←min_quarterword; null_character.b1←min_quarterword;
null_character.b2←min_quarterword; null_character.b3←min_quarterword;

@ Here are some macros that help process ligatures and kerns.
We write |char_kern(f)(j)| to find the amount of kerning specified by
kerning command |j|.

@d lig_kern_start(#)==lig_kern_base[#]+rem_byte {beginning of lig/kern program}
@d char_kern_end(#)==rem_byte(#)].sc
@d char_kern(#)==font_info[kern_base[#]+char_kern_end

@ Font parameters are referred to as |slant(f)|, |space(f)|, etc.

@d param_end(#)==param_base[#]].sc
@d param(#)==font_info[#+param_end
@d slant==param(slant_code) {slant to the right, per unit distance upward}
@d space==param(space_code) {normal space between words}
@d space_stretch==param(space_stretch_code) {stretch between words}
@d space_shrink==param(space_shrink_code) {shrink between words}
@d x_height==param(x_height_code) {one ex}
@d quad==param(quad_code) {one em}
@d extra_space==param(extra_space_code) {additional space at end of sentence}

@<The em width for |cur_font|@>=quad(cur_font)

@ @<The x-height for |cur_font|@>=x_height(cur_font)

@ \TeX\ checks the information of a \.{TFM} file for validity as the
file is being read in, so that no further checks will be needed when
typesetting is going on. The somewhat tedious subroutine that does this
is called |read_font_info|. It has four parameters: the user font code@@|u|,
the file name and area strings |nom| and |aire|, and the ``at'' size@@|s|;
if |s|@@is nonzero, |read_font_info| substitutes |s| for the design size.
In the latter case, |s| must be positive and less than 2048 (i.e.,
it is less than $2^{27}$ when considered as an integer).

The subroutine opens and closes a global file variable called |tfm_file|.
If an error is detected, an error message is issued and no font
information is stored.

@d bad_tfm=11 {label for |read_font_info|}
@d abort==goto bad_tfm {do this when the \.{TFM} data is wrong}

@p procedure read_font_info(@!u:user_font_code;
	@!nom,@!aire:str_number; @!s:scaled); {input a \.{TFM} file}
label done,bad_tfm,not_found;
var k:0..font_mem_size; {index into |font_info|}
@!file_opened:boolean; {was |tfm_file| successfully opened?}
	{sizes of subfiles}
@!f:internal_font_number; {the new font's number}
@!a,@!b,@!c,@!d:eight_bits; {byte variables}
@!qw:four_quarters;@!sw:scaled; {accumulators}
@!z:scaled; {the design size or the ``at'' size}
	{auxiliary quantities used in fixed-point multiplication}
begin @<Read and check the font data; |abort| if the \.{TFM} file is
	malformed; if there's no room for this font, say so and |goto
	done|; otherwise |incr(font_ptr)| and |goto done|@>;
bad_tfm: @<Report that the font won't be loaded@>;
done: b_close(tfm_file);

@ There are programs called \.{TFtoPL} and \.{PLtoTF} that convert
between the \.{TFM} format and a symbolic property-list format
that can be easily edited. These programs contain extensive
diagnostic information, so \TeX\ does not have to bother giving
precise details about why it rejects a particular \.{TFM} file.
@.TFtoPL@> @.PLtoTF@>

@d start_font_error_message==print_nl("! Font "); print_int(u);
	print_char("="); print_file_name(nom,aire,"");
	if s≠0 then
		begin print(" at "); print_scaled(s); print("pt");

@<Report that the font won't be loaded@>=
@.Font x=xx not loadable...@>
if file_opened then print(" not loadable: Bad metric (TFM) file")
else print(" not loadable: Metric (TFM) file not found");
help5("I wasn't able to read the size data for this font,")@/
("so I will ignore the font specification.")@/
("[Wizards can fix TFM files using TFtoPL/PLtoTF.]")@/
("You might try inserting a different font spec;")@/
("e.g., type `I\font<same number>=<new name>'.");

@ @<Read and check...@>=
@<Open |tfm_file| for input@>;
@<Read the {\.{TFM}} size fields@>;
@<Use size fields to allocate font information@>;
@<Read the {\.{TFM}} header@>;
@<Read character data@>;
@<Read box dimensions@>;
@<Read ligature/kern program@>;
@<Read extensible character recipes@>;
@<Read font parameters@>;
@<Make final adjustments and |goto done|@>

@ @<Open |tfm_file| for input@>=
if aire="" then pack_file_name(nom,TEX_font_area,".tfm")
else pack_file_name(nom,aire,".tfm");
if not b_open_in(tfm_file) then goto bad_tfm;

@ Note: A malformed \.{TFM} file might be shorter than it claims to be;
thus |eof(tfm_file)| might be true when |read_font_info| refers to
|tfm_file^| or when it says |get(tfm_file)|. If such circumstances
cause system error messages, you will have to defeat them somehow,
for example by defining |fget| to be `\!|begin get(tfm_file);|
|if eof(tfm_file) then abort; end|\unskip'.
@^system dependencies@>

@d fget==get(tfm_file)
@d fbyte==tfm_file^
@d read_sixteen(#)==begin #←fbyte;
	if #>127 then abort;
	fget; #←#*@'400+fbyte;
@d store_four_quarters(#)==begin fget; a←fbyte; qw.b0←qi(a);
	fget; b←fbyte; qw.b1←qi(b);
	fget; c←fbyte; qw.b2←qi(c);
	fget; d←fbyte; qw.b3←qi(d);

@ @<Read the {\.{TFM}} size fields@>=
begin read_sixteen(lf);
fget; read_sixteen(lh);
fget; read_sixteen(bc);
fget; read_sixteen(ec);
if (bc>ec+1)∨(ec>255) then abort;
fget; read_sixteen(nw);
fget; read_sixteen(nh);
fget; read_sixteen(nd);
fget; read_sixteen(ni);
fget; read_sixteen(nl);
fget; read_sixteen(nk);
fget; read_sixteen(ne);
fget; read_sixteen(np);
if lf≠6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np then abort;

@ The preliminary settings of the index variables |char_base|,
|width_base|, |lig_kern_base|, |kern_base|, and |exten_base| will be
corrected later by subtracting |min_quarterword| from them; and we will
subtract 1 from |param_base| too. It's best to forget about such anomalies
until later.

@<Use size fields to allocate font information@>=
lf←lf-6-lh; {|lf| words should be loaded into |font_info|}
if np<7 then lf←lf+7-np; {at least seven parameters will appear}
if (font_ptr=font_max)∨(fmem_ptr+lf>font_mem_size) then
	@<Apologize for not loading the font, |goto done|@>;

@ @<Apologize for not loading...@>=
begin start_font_error_message;
print(" not loaded: No more room.");
@.Font x=xx not loaded...@>
help4("I'm afraid I won't be able to make use of this font,")@/
("because my memory for character-size data is too small.")@/
("If you're really stuck, ask a wizard to enlarge me.")@/
("Or maybe try `I\font<same number>=<name of loaded font>'.");
error; goto done;

@ Only the first two words of the header are needed by \TeX82.

@<Read the {\.{TFM}} header@>=
begin if lh<2 then abort;
fget; read_sixteen(z); {this rejects a negative design size}
fget; z←z*@'400+fbyte; fget; z←(z*@'20)+(fbyte div@'20);
if z<unity then abort;
while lh>2 do
	begin fget;fget;fget;fget;decr(lh); {ignore the rest of the header}
if s=0 then font_scaled[f]←false
else if z=s then font_scaled[f]←false
else	begin font_scaled[f]←true; z←s;

@ @<Read character data@>=
for k←fmem_ptr to width_base[f]-1 do
	begin store_four_quarters(font_info[k].qqqq);
	if (a≥nw)∨(b div @'20≥nh)∨(b mod @'20≥nd)∨
		(c div 4≥ni) then abort;
	case c mod 4 of
	lig_tag: if d≥nl then abort;
	ext_tag: if d≥ne then abort;
	list_tag: @<Check for charlist cycle@>;
	othercases do_nothing {|no_tag|}

@ We want to make sure that there is no cycle of characters linked together
by |list_tag| entries, since such a cycle would get \TeX\ into an endless
loop. If such a cycle exists, the routine here detects it when processing
the largest character code in the cycle.

@d check_byte_range(#)==begin if (#<bc)∨(#>ec) then abort@+end
@d current_character_being_worked_on==k+bc-fmem_ptr

@<Check for charlist cycle@>=
begin check_byte_range(d);
while d<current_character_being_worked_on do
	begin qw←char_info(f)(d);
	{N.B.: not |qi(d)|, since |char_base[f]| hasn't been adjusted yet}
	if char_tag(qw)≠list_tag then goto not_found;
	d←qo(rem_byte(qw)); {next character on the list}
if d=current_character_being_worked_on then abort; {yes, there's a cycle}

@ A |fix_word| whose four bytes are $(a,b,c,d)$ from left to right represents
the number
$$x=\left\{\vcenter{\halign{\lft{$#$,}\qquad&if \lft{$#$}\cr
(No other choices of $a$ are allowed, since the magnitude of a number in
design-size units must be less than 16.)  We want to multiply this
quantity by the integer@@|z|, which is known to be less than $2^{27}$. Let
$\alpha=16z$.  If $|z|<2^{23}$, the individual multiplications $b\cdot z$,
$c\cdot z$, $d\cdot z$ cannot overflow; otherwise we will divide |z| by 2,
4, 8, or 16, to obtain a multiplier less than $2^{23}$, and we can
compensate for this later. If |z| has thereby been replaced by
$|z|^\prime=|z|/2^e$, let $\beta=2^{4-e}$; we shall compute
if $a=0$, or the same quantity minus $\alpha$ if $a=255$.
This calculation must be done exactly, in order to guarantee portability
of \TeX\ between computers.

@d store_scaled(#)==begin fget; a←fbyte; fget; b←fbyte;
	fget; c←fbyte; fget; d←fbyte;@/
	sw←(((((d*z)div@'400)+(c*z))div@'400)+(b*z))div beta;
	if a=0 then #←sw@+else if a=255 then #←sw-alpha@+else abort;

@<Read box dimensions@>=
begin @<Replace |z| by $|z|^\prime$ and compute $\alpha,\beta$@>;
for k←width_base[f] to lig_kern_base[f]-1 do
if font_info[width_base[f]].sc≠0 then abort; {\\{width}[0] must be zero}
if font_info[height_base[f]].sc≠0 then abort; {\\{height}[0] must be zero}
if font_info[depth_base[f]].sc≠0 then abort; {\\{depth}[0] must be zero}
if font_info[italic_base[f]].sc≠0 then abort; {\\{italic}[0] must be zero}

@ @<Replace |z|...@>=
begin alpha←16*z; beta←16;
while z≥@'40000000 do
	begin z←z div 2; beta←beta div 2;

@ @<Read ligature/kern program@>=
begin for k←lig_kern_base[f] to kern_base[f]-1 do
	begin store_four_quarters(font_info[k].qqqq);
	if c<kern_flag then check_byte_range(d)
	else if d≥nk then abort;
if (nl>0)∧(a<stop_flag) then abort; {check for stop bit on last command}
for k←kern_base[f] to exten_base[f]-1 do

@ @<Read extensible character recipes@>=
for k←exten_base[f] to param_base[f]-1 do
	begin store_four_quarters(font_info[k].qqqq);
	if a≠0 then check_byte_range(a);
	if b≠0 then check_byte_range(b);
	if c≠0 then check_byte_range(c);

@ We check to see that the \.{TFM} file doesn't end prematurely; but
no error message is given for files having more than |lf| words.

@<Read font parameters@>=
begin for k←1 to np do
	if k=1 then {the |slant| parameter is a pure number}
		begin fget; sw←fbyte; if sw>127 then sw←sw-256;
		fget; sw←sw*@'400+fbyte; fget; sw←sw*@'400+fbyte;
		fget; font_info[param_base[f]].sc←
			(sw*@'20)+(fbyte div@'20);
	else store_scaled(font_info[param_base[f]+k-1].sc);
if eof(tfm_file) then abort;
for k←np+1 to 7 do font_info[param_base[f]+k-1].sc←0;

@ Now to wrap it up, we have checked all the necessary things about the \.{TFM}
file, and all we need to do is put the finishing touches on the data for
the new font.

@d adjust(#)==#[f]←qo(#[f])
	{correct for the excess |min_quarterword| that was added}

@<Make final adjustments...@>=
font_code[f]←u; font_number[u]←f;
if np≥7 then font_params[f]←np@+else font_params[f]←7;
font_bc[f]←bc; font_ec[f]←ec; font_glue[f]←null;
adjust(char_base); adjust(width_base); adjust(lig_kern_base);
adjust(kern_base); adjust(exten_base);
fmem_ptr←fmem_ptr+lf; font_ptr←f; goto done

@ Before we forget about the format of these tables, let's deal with two
of \TeX's basic scanning routines related to font information.

@<Declare procedures that scan restricted classes of integers@>=
procedure scan_font_number;
var f:internal_font_number;
begin scan_int;
if (cur_val<0)∨(cur_val≥bad_font_code) then f←undefined_font
else f←font_number[cur_val];
if f=undefined_font then
	begin print_nl("! Undefined font code");
@.Undefined font code@>
	help3("Watch out---you must define a font code before you")@/
		("try to use the font, so you may have to start over.")@/
		("(Chapter 27 of the manual explains how to survive this.)");

@ The following routine is used to implement `\.{\\texinfo} |m| |n|'. If
font |m| is undefined, no error message is needed since |scan_font_number|
will have already issued one.

The boolean parameter |writing| is set |true| if the calling program
intends to change the parameter value.

@<Declare procedures that scan restricted classes of integers@>=
procedure scan_tex_info(@!writing:boolean);
	{sets |cur_val| to |font_info| location}
var f:internal_font_number;
begin scan_font_number; f←cur_val; scan_int;
if f=undefined_font then cur_val←fmem_ptr {a harmless location}
else	begin if cur_val≤0 then cur_val←fmem_ptr
	else	begin if writing ∧(cur_val≤space_shrink_code)∧@|
			(cur_val≥space_code)∧(font_glue[f]≠null) then
			begin delete_glue_ref(font_glue[f]);
		if cur_val>font_params[f] then
			if f<font_ptr then cur_val←fmem_ptr
			else @<Increase the number of parameters in the last font@>
		else cur_val←cur_val+param_base[f];
	@<Issue an error message if |cur_val=fmem_ptr|@>;

@ @<Issue an error message if |cur_val=fmem_ptr|@>=
if cur_val=fmem_ptr then
	begin help2("To increase the number of font parameters, you must")@/
	  ("use \texinfo immediately after the font code is defined.");
	print_nl("! Font "); print_int(font_code[f]);
@.Font x has n texinfo...@>
	print(" has "); print_int(font_params[f]);
	print(" texinfo parameters"); error;

@ @<Increase the number of parameters...@>=
repeat if fmem_ptr=font_mem_size then overflow("font memory",font_mem_size);
font_info[fmem_ptr].sc←0; incr(fmem_ptr); incr(font_params[f]);
until cur_val=font_params[f]

@ When \TeX\ wants to typeset a character that doesn't exist, the
character node is not created; thus the output routine can assume
that characters exist when it sees them. The following procedure 
prints a warning message unless the user has suppressed it.

@p procedure char_warning(@!f:internal_font_number;@!c:eight_bits);
begin if tracing_lost_chars≠0 then
	begin begin_diagnostic;
	print_nl("Missing character: There is no ");
@.Missing character@>
	print_ascii(c); print(" in font ");
	print(font_name[f]); print_char("!"); end_diagnostic;

@ Here is a function that returns a pointer to a character node for a
given character in a given font. If that character doesn't exist,
|null| is returned instead.

@p function new_character(@!f:internal_font_number;@!c:eight_bits):pointer;
label exit;
var p:pointer; {newly allocated node}
begin if (font_bc[f]≤c)∧(font_ec[f]≥c) then
	if char_exists(char_info(f)(qi(c))) then
		begin p←get_avail; font(p)←f; character(p)←qi(c);
		new_character←p; return;
@* \[30] Device-independent file format.
The most important output pro\-duced by a run of \TeX\ is the ``device
independent'' (\.{DVI}) file that specifies where characters and rules
are to appear on printed pages. The form of these files was designed by
David R. Fuchs in 1979. Almost any reasonable device can be driven by
@^Fuchs, David Raymond@>
@:DVI_files}{\.{DVI} files@>
a program that takes \.{DVI} files as input, and dozens of such
\.{DVI}-to-whatever programs have been written. Thus, it is possible to
print the output of \TeX\ on many different kinds of equipment, using \TeX\
as a device-independent ``front end.''

A \.{DVI} file is a stream of 8-bit bytes, which may be regarded as a
series of commands in a machine-like language. The first byte of each command
is the operation code, and this code is followed by zero or more bytes
that provide parameters to the command. The parameters themselves may consist
of several consecutive bytes; for example, the `|set_rule|' command has two
parameters, each of which is four bytes long. Parameters are usually
regarded as nonnegative integers; but four-byte-long parameters,
and shorter parameters that denote distances, can be
either positive or negative. Such parameters are given in two's complement
notation. For example, a two-byte-long distance parameter has a value between
$-2^{15}$ and $2^{15}-1$.

A \.{DVI} file consists of a sequence of one or more ``pages,'' followed by
a ``postamble.'' A ``page'' consists of a |bop| command, followed by any number
of other commands that tell where characters are to be placed on a physical
page, followed by an |eop| command. The pages appear in the order that \TeX\
generated them. If we ignore |nop| commands (which are allowed between
any two commands in the file), each |eop| command is immediately followed by
a |bop| command, or by a |pst| command; in the latter case, there are no
more pages in the file, and the remaining bytes form the postamble.
Further details about the postamble will be explained later.

Some parameters in \.{DVI} commands are ``pointers.'' These are four-byte
quantities that give the location number of some other byte in the file;
the first byte is number@@0, then comes number@@1, and so on. For example,
one of the parameters of a |bop| command points to the previous |bop|;
this makes it feasible to read the pages in backwards order, in case you
are producing output on devices that stack their output face up. If the
first page on a \.{DVI} file occupies bytes 0 to 99, and if the second
page occupies bytes 100 to 299, then the |bop| that starts in byte 100
points to 0 and the |bop| thats starts in byte 300 points to 100. (The
first |bop|, i.e., the one that starts in byte 0, has a pointer of $-1$.)

@ The \.{DVI} format is intended to be both compact and easily interpreted
by a machine. Compactness is achieved by making most of the information
implicit instead of explicit; when a \.{DVI}-reading program reads the
commands for a page, it keeps track of several quantities: (a)@@The current
font |f| is an integer; this value is changed only
by \\{fnt} and \\{fnt\_num} commands. (b)@@The current position on the page
is given by two numbers called the horizontal and vertical coordinates,
|h| and |v|. Both coordinates are zero at the upper left corner of the page;
moving to the right corresponds to increasing the horizontal coordinate, and
moving down corresponds to increasing the vertical coordinate. Thus, the
coordinates are essentially Cartesian, except that vertical directions are
flipped; the Cartesian version of |(h,v)| would be |(h,-v)|.  (c)@@The
current spacing amounts are given by four numbers |w|, |x|, |y|, and |z|,
where |w| and@@|x| are used for horizontal spacing and where |y| and@@|z|
are used for vertical spacing. (d)@@There is a stack containing
|(h,v,w,x,y,z)| values; the \.{DVI} commands |push| and |pop| are used to
change the current level of operation. Note that the current font@@|f| is
not pushed and popped; the stack contains only information about

The values of |h|, |v|, |w|, |x|, |y|, and |z| are signed integers having up
to 32 bits, including the sign. Since they represent physical distances,
there is a small unit of measurement such that increasing |h| by@@1 means
moving a certain tiny distance to the right. The actual unit of
measurement is variable, as explained below; \TeX\ sets things up so that
its \.{DVI} output is in sp units, i.e., scaled points, in agreement with
all the |scaled| dimensions in \TeX's data structures.

@ Here is list of all the commands that may appear in a \.{DVI} file. With
each command we give its symbolic name (e.g., |bop|), its opcode byte
(e.g., 129), and its parameters (if any). The parameters are followed
by a bracketed number telling how many bytes they occupy; for example,
`|p[4]|' means that parameter |p| is four bytes long.

\yskip\hang|set_char_0| 0. Typeset character number@@0 from font@@|f|
such that the reference point of the character is at |(h,v)|. Then
increase |h| by the width of that character. Note that a character may
have zero or negative width, so one cannot be sure that |h| will advance
after this command; but |h| usually does increase.

\yskip\hang|set_char_1| through |set_char_127| (opcodes 1 to 127).
Do the operations of |set_char_0|, but use the appropriate character number
instead of char\-acter@@0.

\yskip\hang|set1| 128 |c[1]|. Same as |set_char_0|, except that character
number@@|c| is typeset. \TeX82 uses this command for characters in the
range |128≤c<256|.

\yskip\hang|set2| 129 |c[2]|. Same as |set1|, except that@@|c| is two
bytes long, so it is in the range |0≤c<65536|. \TeX82 never uses this
command, but it should come in handy for extensions of \TeX\ that deal
with oriental languages.
@^oriental characters@>@^Chinese characters@>@^Japanese characters@>

\yskip\hang|set3| 130 |c[3]|. Same as |set1|, except that@@|c| is three
bytes long, so it can be as large as $2^{24}-1$. Not even the Chinese
language has this many characters, but this command might prove useful
in some yet unforeseen extension.

\yskip\hang|set4| 131 |c[4]|. Same as |set1|, except that@@|c| is four
bytes long. Imagine that.

\yskip\hang|set_rule| 132 |a[4]| |b[4]|. Typeset a solid black rectangle
of height@@|a| and width@@|b|, with its bottom left corner at |(h,v)|. Then
set |h←h+b|. If either |a≤0| or |b≤0|, nothing should be typeset. Note
that if |b<0|, the value of |h| will decrease even though nothing else happens.
See below for details about how to typeset rules so that consistency with
\MF\ is guaranteed.

\yskip\hang|put1| 133 |c[1]|. Typeset character number@@|c| from font@@|f|
such that the reference point of the character is at |(h,v)|. (The `put'
commands are exactly like the `set' commands, except that they simply put out a
character or a rule without moving the reference point afterwards.)

\yskip\hang|put2| 134 |c[2]|. Same as |set2|, except that |h| is not changed.

\yskip\hang|put3| 135 |c[3]|. Same as |set3|, except that |h| is not changed.

\yskip\hang|put4| 136 |c[4]|. Same as |set4|, except that |h| is not changed.

\yskip\hang|put_rule| 137 |a[4]| |b[4]|. Same as |set_rule|, except that
|h| is not changed.

\yskip\hang|nop| 138. No operation, do nothing. Any number of |nop|'s
may occur between \.{DVI} commands, but a |nop| cannot be inserted between
a command and its parameters or between two parameters.

\yskip\hang|bop| 139 $c↓0[4]$ $c↓1[4]$ $\ldots$ $c↓9[4]$ $p[4]$. Beginning
of a page: Set |(h,v,w,x,y,z)←(0,0,0,0,0,0)| and set the stack empty. Set
the current font |f| to an undefined value.  The ten $c↓i$ parameters hold
the values of \.{\\count0} $\ldots$ \.{\\count9} in \TeX\ at the time
\.{\\shipout} was invoked for this page; they can be used to identify
pages, if a user wants to print only part of a \.{DVI} file. The parameter
|p| points to the previous |bop| command in the file, where the first
|bop| has $p=-1$.

\yskip\hang|eop| 140.  End of page: Print what you have read since the
previous |bop|. At this point the stack should be empty. (The \.{DVI}-reading
programs that drive most output devices will have kept a buffer of the
material that appears on the page that has just ended. This material is
largely, but not entirely, in order by |v| coordinate and (for fixed |v|) by
|h|@@coordinate; so it usually needs to be sorted into some order that is
appropriate for the device in question.)

\yskip\hang|push| 141. Push the current values of |(h,v,w,x,y,z)| onto the
top of the stack; do not change any of these values. Note that |f| is
not pushed.

\yskip\hang|pop| 142. Pop the top six values off of the stack and assign
them respectively to |(h,v,w,x,y,z)|. The number of pops should never
exceed the number of pushes, since it would be highly embarrassing if the
stack were empty at the time of a |pop| command.

\yskip\hang|right1| 143 |b[1]|. Set |h←h+b|, i.e., move right |b| units.
The parameter is a signed number in two's complement notation, |-128≤b<128|;
if |b<0|, the reference point actually moves left.

\yskip\hang|right2| 144 |b[2]|. Same as |right1|, except that |b| is a
two-byte quantity in the range |-32768≤b<32768|.

\yskip\hang|right3| 145 |b[3]|. Same as |right1|, except that |b| is a
three-byte quantity in the range |@t$-2^{23}$@>≤b<@t$2^{23}$@>|.

\yskip\hang|right4| 146 |b[4]|. Same as |right1|, except that |b| is a
four-byte quantity in the range |@t$-2^{31}$@>≤b<@t$2^{31}$@>|.

\yskip\hang|w0| 147. Set |h←h+w|; i.e., move right |w| units. With luck,
this parameterless command will usually suffice, because the same kind of motion
will occur several times in succession; the following commands explain how
|w| gets particular values.

\yskip\hang|w1| 148 |b[1]|. Set |w←b| and |h←h+b|. The value of |b| is a
signed quantity in two's complement notation, |-128≤b<128|. This command
changes the current |w|@@spacing and moves right by |b|.

\yskip\hang|w2| 149 |b[2]|. Same as |w1|, but |b| is two bytes long,

\yskip\hang|w3| 150 |b[3]|. Same as |w1|, but |b| is three bytes long,

\yskip\hang|w4| 151 |b[4]|. Same as |w1|, but |b| is four bytes long,

\yskip\hang|x0| 152. Set |h←h+x|; i.e., move right |x| units. The `|x|'
commands are like the `|w|' commands except that they involve |x| instead
of |w|.

\yskip\hang|x1| 153 |b[1]|. Set |x←b| and |h←h+b|. The value of |b| is a
signed quantity in two's complement notation, |-128≤b<128|. This command
changes the current |x|@@spacing and moves right by |b|.

\yskip\hang|x2| 154 |b[2]|. Same as |x1|, but |b| is two bytes long,

\yskip\hang|x3| 155 |b[3]|. Same as |x1|, but |b| is three bytes long,

\yskip\hang|x4| 156 |b[4]|. Same as |x1|, but |b| is four bytes long,

\yskip\hang|down1| 157 |a[1]|. Set |v←v+a|, i.e., move down |a| units.
The parameter is a signed number in two's complement notation, |-128≤a<128|;
if |a<0|, the reference point actually moves up.

\yskip\hang|down2| 158 |a[2]|. Same as |down1|, except that |a| is a
two-byte quantity in the range |-32768≤a<32768|.

\yskip\hang|down3| 159 |a[3]|. Same as |down1|, except that |a| is a
three-byte quantity in the range |@t$-2^{23}$@>≤a<@t$2^{23}$@>|.

\yskip\hang|down4| 160 |a[4]|. Same as |down1|, except that |a| is a
four-byte quantity in the range |@t$-2^{31}$@>≤a<@t$2^{31}$@>|.

\yskip\hang|y0| 161. Set |v←v+y|; i.e., move down |y| units. With luck,
this parameterless command will usually suffice, because the same kind of motion
will occur several times in succession; the following commands explain how
|y| gets particular values.

\yskip\hang|y1| 162 |a[1]|. Set |y←a| and |v←v+a|. The value of |a| is a
signed quantity in two's complement notation, |-128≤a<128|. This command
changes the current |y|@@spacing and moves down by |a|.

\yskip\hang|y2| 163 |a[2]|. Same as |y1|, but |a| is two bytes long,

\yskip\hang|y3| 164 |a[3]|. Same as |y1|, but |a| is three bytes long,

\yskip\hang|y4| 165 |a[4]|. Same as |y1|, but |a| is four bytes long,

\yskip\hang|z0| 166. Set |v←v+z|; i.e., move down |z| units. The `|z|' commands
are like the `|y|' commands except that they involve |z| instead of |y|.

\yskip\hang|z1| 167 |a[1]|. Set |z←a| and |v←v+a|. The value of |a| is a
signed quantity in two's complement notation, |-128≤a<128|. This command
changes the current |z|@@spacing and moves down by |a|.

\yskip\hang|z2| 168 |a[2]|. Same as |z1|, but |a| is two bytes long,

\yskip\hang|z3| 169 |a[3]|. Same as |z1|, but |a| is three bytes long,

\yskip\hang|z4| 170 |a[4]|. Same as |z1|, but |a| is four bytes long,

\yskip\hang|fnt_num_0| 171. Set |f←0|.

\yskip\hang|fnt_num_1| through |fnt_num_63| (opcodes 172 to 234). Set
|f←1|, $\ldotss$, \hbox{|f←63|}, respectively.

\yskip\hang|fnt1| 235 |n[1]|. Set |f←n|. \TeX82 uses this command for font
numbers in the range |64≤n<256|.

\yskip\hang|fnt2| 236 |n[2]|. Same as |fnt1|, except that@@|n| is two
bytes long, so it is in the range |0≤n<65536|. \TeX82 never generates this
command, but large font numbers may prove useful for specifications of
color or texture, or they may be used for special fonts that have fixed
numbers in some external coding scheme.

\yskip\hang|fnt3| 237 |n[3]|. Same as |fnt1|, except that@@|n| is three
bytes long, so it can be as large as $2^{24}-1$.

\yskip\hang|fnt4| 238 |n[4]|. Same as |fnt1|, except that@@|n| is four
bytes long; this is for the really big font numbers. The value $-1$
is forbidden, so the legal values of@@|f| are $-2^{31}\L f<-1$ and
$0\L f<2^{31}-1$.

\yskip\hang|xxx| 239 |m[1]| |x[m]|. This command is undefined in
general; it functions as an $(m+2)$-byte |nop| unless special \.{DVI}-reading
programs are being used. \TeX82 generates this command when an \.{\\xsend}
appears, setting |m| to the number of bytes being sent. It is recommended that
|x| be a string having the form of a keyword followed by possible parameters
relevant to that keyword.

\yskip\hang|pst| 240. Beginning of the postamble, see below.

\yskip\noindent Commands 241--255 are undefined at the present time.

@ @d set_char_0=0 {typeset character 0 and move right}
@d set1=128 {typeset a character and move right}
@d set_rule=132 {typeset a rule and move right}
@d put1=133 {typeset a character}
@d put_rule=137 {typeset a rule}
@d nop=138 {no operation}
@d bop=139 {beginning of page}
@d eop=140 {ending of page}
@d push=141 {save the current positions}
@d pop=142 {restore previous positions}
@d right1=143 {move right}
@d w0=147 {move right by |w|}
@d w1=148 {move right and set |w|}
@d x0=152 {move right by |x|}
@d x1=153 {move right and set |x|}
@d down1=157 {move down}
@d y0=161 {move down by |y|}
@d y1=162 {move down and set |y|}
@d z0=166 {move down by |z|}
@d z1=167 {move down and set |z|}
@d fnt_num_0=171 {set current font to 0}
@d fnt1=235 {set current font}
@d xxx=239 {extension to \.{DVI} primitives}
@d pst=240 {postamble}

@ Sometimes it is desirable to make horizontal or vertical rules line up
precisely with certain features in characters of a font. It is possible to
guarantee the correct matching between \.{DVI} output and the characters
generated by \MF\ by adhering to the following principles: (1)@@The \MF\
characters should be positioned so that a bottom edge or left edge that is
supposed to line up with the bottom or left edge of a rule appears at the
reference point, i.e., in row@@0 and column@@0 of the \MF\ raster. This
ensures that the position of the rule will not be rounded differently when
the pixel size is not a perfect multiple of the units of measurement in
the \.{DVI} file. (2)@@A typeset rule of height $a>0$ and width $b>0$
should be equivalent to a \MF-generated character having black pixels in
precisely those raster positions whose \MF\ coordinates satisfy
|0≤x<@t$\alpha$@>b| and |0≤y<@t$\alpha$@>a|, where $\alpha$ is the number
of pixels per \.{DVI} unit.
@^alignment of rules with characters@>
@^rules aligning with characters@>

@ The last page in a \.{DVI} file is followed by `|pst|'; this command
introduces the postamble, which summarizes important facts that \TeX\ has
accumulated about the file. The postamble has the form
$$\hbox{|p[4]| |n[4]| |d[4]| |m[4]| |l[4]| |u[4]| |s[2]| |t[2]|
  $\langle\,$font definitions$\,\rangle$
  $(-1)[4]$ |q[4]| |i[1]| 223's|[≥4]|}$$
Here |p| is a pointer to the final |bop| in the file. The next two parameters,
|n| and |d|, are positive integers that define the units of measurement;
they are the numerator and denominator of a fraction by which all dimensions
in the \.{DVI} file could be multiplied in order to get lengths in units
of $10^{-7}$ meters. (Since there are 72.27 points per inch and 2.54 centimeters
per inch, and since \TeX82 works with scaled points where there are $2^{16}$
sp in a point, \TeX82 sets |n=25400000| and $d=7227\cdot2^{16}=473628672$.)

The next parameter, |m|, is \TeX's \.{\\mag} parameter, i.e., 1000 times the
desired magnification. The actual fraction by which dimensions are multiplied
is therefore $mn/1000d$. Note that if a \TeX\ source document does not call
for any `\.{true}' dimensions, and if you change it only by specifying a
different \.{\\mag} setting, the \.{DVI} file that \TeX\ creates will be
completely unchanged except for the value of |m| in the postamble. (Fancy
\.{DVI}-reading programs allow users to override the |m|@@setting when a
\.{DVI} file is being printed.)

Parameters |l| and |u| give respectively the height-plus-depth of the tallest
page and the width of the widest page, in the same units as other dimensions
of the file. These numbers might be used by a \.{DVI}-reading program to
position individual ``pages'' on large sheets of film or paper.
Parameter |s| is the maximum stack depth (i.e., the excess of |push| commands
over |pop| commands) needed to process this file. Then comes |t|, the total
number of pages (|bop| commands) present.

The postamble continues with font definitions, which are any number of
speci\-fi\-ca\-tions having the form
$$\hbox{|f[4]| $c\f[4]$ $s\f[4]$ $a\f[1]$ $l\f[1]$ $n\f[a\f+l\f]$.}$$
The first parameter in a font definition is the font number, $f$; this
must be different from $-1$ and distinct from the font numbers in other
definitions. (Note that the font definitions are followed by the four-byte
value $-1$, so it will be clear when the definitions have ended.) The
next parameter, $c\f$, is the check sum that \TeX\ found in the \.{TFM}
file for this font; it should match the check sum of the font found by
@^check sum@>
programs that read this \.{DVI} file.
Parameter $s\f$ contains a fixed-point scale factor that is applied to the
character widths in font |f|; font dimensions in \.{TFM} files and other font
files are relative to this quantity, which is called the ``at size'' elsewhere
in this documentation. The value of $s\f$ is always positive and less than
$2^{27}$. It is given in the same units as the other dimensions of the
file, i.e., in sp.

The remaining part of a font definition gives the external name of the font,
which is an ascii string of length $a\f+l\f$. The number $a\f$ is the length
of the ``area'' or directory, and $l\f$ is the length of the font name itself;
the standard local system font area is supposed to be used when $a\f=0$.
The $n\f$ field contains the area in its first $a\f$ bytes.

@ The last part of the postamble, following the phony font number
$-1$, contains |q|, a pointer to the |pst| command that started the
postamble.  An identification byte, |i|, comes next; currently this byte
is always set to@@2. (Some day we will set |i=3|, when \.{DVI} format
makes another incompatible change---perhaps in 1992.)

Following the |i| byte there are four or more bytes that are all equal to
the decimal number 223 (i.e., @'337 in octal). \TeX\ puts out four to seven of
these trailing bytes, until the total length of the file is a multiple of
four bytes, since this works out best on machines that pack four bytes per
word; but any number of 223's is allowed, as long as there are at least four
of them. In effect, 223 is a sort of signature that is added at the very end.
@^Fuchs, David Raymond@>

This curious way to finish off a \.{DVI} file makes it feasible for
\.{DVI}-reading programs to find the postamble first, on most computers,
even though \TeX\ wants to write the postamble last. Most operating
systems permit random access to individual words or bytes of a file, so
the \.{DVI} reader starts at the end and skips backwards over the 223's
until finding the identification byte. Then it backs up four bytes, reads
|q|, and goes to byte |q| of the file. This byte should, of course,
contain the value 240 (|pst|); now the postamble can be read, so the
\.{DVI} reader discovers all the information needed for typesetting the
pages. Note that it is also possible to skip through the \.{DVI} file at
reasonably high speed to locate a particular page, if that proves

The reason for reading the postamble first is that the \.{DVI} reader must
know the widths of characters, in order to find out where things go on a page;
and it needs to know the names of the fonts, so that it can get their widths
from a \.{TFM} file or from some other kind of font-information file.
The reason for writing the postamble last is that \TeX\ can't put out all
the font names until it has finished generating the pages of the \.{DVI}
file, since new fonts can occur anywhere in a \TeX\ job; and the alternative
of sprinkling font definitions throughout a \.{DVI} file is unattractive,
since that would make it necessary to read the whole file even when
printing only one page. Furthermore, we wouldn't want to copy the
information in the first part of a \.{DVI} file to the end of another file
that begins with the postamble information, since the first part
is typically quite long.

Unfortunately, however, standard \PASCAL\ does not include the ability to
@^system dependencies@>
access a random position in a file, or even to determine the length of a file.
Almost all systems nowadays provide the necessary capabilities, so \.{DVI}
format has been designed to work most efficiently with modern operating systems.
But if \.{DVI} files have to be processed under the restrictions of standard
\PASCAL, one can simply read them twice, first skipping to the postamble
and then doing the pages.

@d id_byte=2 {identifies the kind of \.{DVI} files described here}
@* \[31] Shipping pages out.
After considering \TeX's eyes and stomach, we come now to the bowels.

The |ship_out| procedure is given a pointer to a box; its mission is
to describe that box in \.{DVI} form, outputting a ``page'' to |dvi_file|.
The \.{DVI} coordinates $(h,v)=(0,0)$ should correspond to the upper left
corner of the box being shipped.

Since boxes can be inside of boxes inside of boxes, the main work of
|ship_out| is done by two mutually recursive routines, |hlist_out|
and |vlist_out|, which traverse the hlists and vlists inside of horizontal
and vertical boxes.

As individual pages are being processed, we need to accumulate
information about the entire set of pages, since such statistics must be
reported in the postamble. The global variables |total_pages|, |max_v|,
|max_h|, |max_push|, and |last_bop| are used to record this information.

The variable |doing_leaders| is |true| while leaders are being output.

A few additional global variables are also defined here for use in
|vlist_out| and |hlist_out|. They could have been local variables, but
that would waste stack space when boxes are deeply nested, since the
values of these variables are not needed during recursive calls.

@!total_pages:integer; {the number of pages that have been shipped out}
@!max_v:scaled; {maximum height-plus-depth of pages shipped so far}
@!max_h:scaled; {maximum width of pages shipped so far}
@!max_push:integer; {deepest nesting of |push| commands encountered so far}
@!last_bop:integer; {location of previous |bop| in the \.{DVI} output}
@!doing_leaders:boolean; {are we inside a leader box?}
@!c,@!f:quarterword; {character and font in current |char_node|}
@!hd:quarterword; {height and depth indices for a character}
@!rule_ht,@!rule_dp,@!rule_wd:scaled; {size of current rule being output}
@!g:pointer; {current glue specification}
@!lq,@!lr,@!lx:integer; {quantities used in calculations for leaders}

@ @<Set init...@>=
total_pages←0; max_v←0; max_h←0; max_push←0; last_bop←-1;

@ The \.{DVI} bytes are output to a buffer instead of being written directly
to the output file. This makes it possible to reduce the overhead of
subroutine calls, thereby measurably speeding up the computation, since
output of \.{DVI} bytes is part of \TeX's inner loop. And it has another
advantage as well, since we can change instructions in the buffer in order to
make the output more compact. For example, a `|down2|' command can be
changed to a `|y2|', thereby making a subsequent `|y0|' command possible,
saving two bytes.

The output buffer is divided into two parts of equal size; the bytes found
in |dvi_buf[0..half_buf-1]| constitute the first half, and those in
|dvi_buf[half_buf..dvi_buf_size-1]| constitute the second. The global
variable |dvi_ptr| points to the position that will receive the next
output byte. When |dvi_ptr| reaches |dvi_limit|, which is always equal
to one of the two values |half_buf| or |dvi_buf_size|, the half buffer that
is about to be invaded next is sent to the output and |dvi_limit| is
changed to its other value. Thus, there is always at least a half buffer's
worth of information present, except at the very beginning of the job.

Bytes of the \.{DVI} file are numbered sequentially starting with 0;
the next byte to be generated will be number |dvi_offset+dvi_ptr|.
A byte is present in the buffer only if its number is |≥dvi_gone|.

@!dvi_index=0..dvi_buf_size; {an index into the output buffer}

@ Some systems may find it more efficient to make |dvi_buf| a |packed|
array, since output of four bytes at once may be facilitated.

@!dvi_buf:array[dvi_index] of eight_bits; {buffer for \.{DVI} output}
@!half_buf:dvi_index; {half of |dvi_buf_size|}
@!dvi_limit:dvi_index; {end of the current half buffer}
@!dvi_ptr:dvi_index; {the next available buffer address}
@!dvi_offset:integer; {|dvi_buf_size| times the number of times the
	output buffer has been fully emptied}
@!dvi_gone:integer; {the number of bytes already output to |dvi_file|}

@ Initially the buffer is all in one piece; we will output half of it only
after it first fills up.

@<Set init...@>=
half_buf←dvi_buf_size div 2; dvi_limit←dvi_buf_size; dvi_ptr←0;
dvi_offset←0; dvi_gone←0;

@ The actual output of |dvi_buf[a..b]| to |dvi_file| is performed by calling
|write_dvi(a,b)|. For best results, this procedure should be optimized to
run as fast as possible on each particular system, since it is part of
\TeX's inner loop. It is safe to assume that |a| and |b+1| will both be
multiples of 4 when |write_dvi(a,b)| is called; therefore it is possible on
many machines to use efficient methods to pack four bytes per word and to
output an array of words with one system call.
@^system dependencies@>
@^inner loop@>

@p procedure write_dvi(@!a,@!b:dvi_index);
var k:dvi_index;
begin for k←a to b do write(dvi_file,dvi_buf[k]);

@ To put a byte in the buffer without paying the cost of invoking a procedure
each time, we use the macro |dvi_out|.

@d dvi_out(#)==@+begin dvi_buf[dvi_ptr]←#; incr(dvi_ptr);
	if dvi_ptr=dvi_limit then dvi_swap;

@p procedure dvi_swap; {outputs half of the buffer}
begin if dvi_limit=dvi_buf_size then
	begin write_dvi(0,half_buf-1); dvi_limit←half_buf;
	dvi_offset←dvi_offset+dvi_buf_size; dvi_ptr←0;
else	begin write_dvi(half_buf,dvi_buf_size-1); dvi_limit←dvi_buf_size;

@ Here is how we clean out the buffer when \TeX\ is all through; |dvi_ptr|
will be a multiple of@@4.

@<Empty the last bytes out of |dvi_buf|@>=
if dvi_limit=half_buf then write_dvi(half_buf,dvi_buf_size-1);
if dvi_ptr>0 then write_dvi(0,dvi_ptr-1)

@ The |dvi_four| procedure outputs four bytes in two's complement notation,
without risking arithmetic overflow.

@p procedure dvi_four(@!x:integer);
begin if x≥0 then dvi_out(x div @'100000000)
else	begin x:=x+@'10000000000;
	dvi_out((x div @'100000000) + 128);
x←x mod @'100000000; dvi_out(x div @'200000);
x←x mod @'200000; dvi_out(x div @'400);
dvi_out(x mod @'400);

@ Versions of \TeX\ intended for small computers might well choose to omit
the ideas in the next few parts of this program, since it is not really
necessary to optimize the \.{DVI} code by making use of the |w0|, |x0|,
|y0|, and |z0| commands. Furthermore, the algorithm that we are about to
describe does not pretend to give an optimum reduction in the length
of the \.{DVI} code; after all, speed is more important than compactness.
But the method is surprisingly effective, and it takes comparatively little

We can best understand the basic idea by first considering a simpler problem
that has the same essential characteristics. Given a sequence of digits,
say $3\,1\,4\,1\,5\,9\,2\,6\,5\,3\,5\,8\,9$, we want to assign subscripts
$d$, $y$, or $z$ to each digit so as to maximize the number of ``$y$-hits''
and ``$z$-hits''; a $y$-hit is an instance of two appearances of the same
digit with the subscript $y$, where no $y$'s intervene between the two
appearances, and a $z$-hit is defined similarly. For example, the sequence
above could be decorated with subscripts as follows:
There are three $y$-hits ($1↓y\ldotsm1↓y$ and $5↓y\ldotsm5↓y\ldotsm5↓y$) and
one $z$-hit ($3↓z\ldotsm3↓z$); there are no $d$-hits, since the two appearances
of $9↓d$ have $d$'s between them, but we don't count $d$-hits so it doesn't
matter how many there are. These subscripts are analogous to the \.{DVI}
commands called \\{down}, $y$, and $z$, and the digits are analogous to
different amounts of vertical motion; a $y$-hit or $z$-hit corresponds to
the opportunity to use the one-byte commands |y0| or |z0| in a \.{DVI} file.

\TeX's method of assigning subscripts works like this: Append a new digit,
say $\delta$, to the right of the sequence. Now look back through the
sequence until one of the following things happens: (a)@@You see
$\delta↓y$ or $\delta↓z$, and this was the first time you encountered a
$y$ or $z$ subscript, respectively.  Then assign $y$ or $z$ to the new
$\delta$; you have scored a hit. (b)@@You see $\delta↓d$, and no $y$
subscripts have been encountered so far during this search.  Then change
the previous $\delta↓d$ to $\delta↓y$ (this corresponds to changing a
command in the output buffer), and assign $y$ to the new $\delta$; it's
another hit.  (c)@@You see $\delta↓d$, and a $y$ subscript has been seen
but not a $z$.  Change the previous $\delta↓d$ to $\delta↓z$ and assign
$z$ to the new $\delta$. (d)@@You encounter both $y$ and $z$ subscripts
before encountering a suitable $\delta$, or you scan all the way to the
front of the sequence. Assign $d$ to the new $\delta$; this assignment may
be changed later.

The subscripts $3↓z\,1↓y\,4↓d\ldotsm$ in the example above were, in fact,
produced by this procedure, as the reader can verify. (Go ahead and try it.)

@ In order to implement such an idea, \TeX\ maintains a stack of pointers
to the \\{down}, $y$, and $z$ commands that have been generated for the
current page. And there is a similar stack for \\{right}, |w|, and |x|
commands. These stacks are called the down stack and right stack, and their
top elements are maintained in the variables |down_ptr| and |right_ptr|.

Each entry in these stacks contains four fields: The |width| field is
the amount of motion down or to the right; the |location| field is the
byte number of the \.{DVI} command in question (including the appropriate
|dvi_offset|); the |link| field points to the next item below this one
on the stack; and the |info| field encodes the options for possible change
in the \.{DVI} command.

@d movement_node_size=3 {number of words per entry in the down and right stacks}
@d location(#)==mem[#+2].int {\.{DVI} byte number for a movement command}

@!down_ptr,@!right_ptr:pointer; {heads of the down and right stacks}

@ @<Set init...@>=
down_ptr←null; right_ptr←null;

@ Here is a subroutine that produces a \.{DVI} command for some specified
downward or rightward motion. It has two parameters: |w| is the amount
of motion, and |o| is either |down1| or |right1|. We use the fact that
the command codes have convenient arithmetic properties: |y1-down1=w1-right1|
and |z1-down1=x1-right1|.

@p procedure movement(@!w:scaled;@!o:eight_bits);
label exit,found,not_found,2,1;
var mstate:small_number; {have we seen a |y| or |z|?}
@!p,@!q:pointer; {current and top nodes on the stack}
@!k:integer; {index into |dvi_buf|, modulo |dvi_buf_size|}
begin q←get_node(movement_node_size); {new node for the top of the stack}
width(q)←w; location(q)←dvi_offset+dvi_ptr;
if o=down1 then
	begin link(q)←down_ptr; down_ptr←q;
else	begin link(q)←right_ptr; right_ptr←q;
@<Look at the other stack entries until deciding what sort of \.{DVI} command
	to generate; |goto found| if node |p| is a ``hit''@>;
@<Generate a |down| or |right| command for |w| and |return|@>;
found: @<Generate a |y0| or |z0| command in order to reuse a previous
	appearance of@@|w|@>;

@ The |info| fields in the entries of the down stack or the right stack
have six possible settings: |y_here| or |z_here| mean that the \.{DVI}
command refers to |y| or |z|, respectively (or to |w| or |x|, in the
case of horizontal motion); |yz_OK| means that the \.{DVI} command is
\\{down} (or \\{right}) but can be changed to either |y| or |z| (or
to either |w| or |x|); |y_OK| means that it is \\{down} and can be changed
to |y| but not |z|; |z_OK| is similar; and |d_fixed| means it must stay

The four settings |yz_OK|, |y_OK|, |z_OK|, |d_fixed| would not need to
be distinguished from each other if we were simply solving the
digit-subscripting problem mentioned above. But in \TeX's case there is
a complication because of the nested structure of |push| and |pop|
commands. Suppose we add parentheses to the digit-subscripting problem,
redefining hits so that $\delta↓y\ldotsm \delta↓y$ is a hit if all $y$'s between
the $\delta$'s are enclosed in properly nested parentheses, and if the
parenthesis level of the right-hand $\delta↓y$ is deeper than or equal to
that of the left-hand one. Thus, `(' and `)' correspond to `|push|'
and `|pop|'. Now if we want to assign a subscript to the final 1 in the
we cannot change the previous $1↓d$ to $1↓y$, since that would invalidate
the $2↓y\ldotsm2↓y$ hit. But we can change it to $1↓z$, scoring a hit
since the intervening $8↓z$'s are enclosed in parentheses.

@d y_here=1 {|info| when the movement entry points to a |y| command}
@d z_here=2 {|info| when the movement entry points to a |z| command}
@d yz_OK=3 {|info| corresponding to an unconstrained \\{down} command}
@d y_OK=4 {|info| corresponding to a \\{down} that can't become a |z|}
@d z_OK=5 {|info| corresponding to a \\{down} that can't become a |y|}
@d d_fixed=6 {|info| corresponding to a \\{down} that can't change}

@ When the |movement| procedure gets to the label |found|, the value of
|info(p)| will be either |y_here| or |z_here|. If it is, say, |y_here|,
the procedure generates a |y0| command (or a |w0| command), and marks
all |info| fields between |q| and |p| so that |y| is not OK in that range.

@<Generate a |y0| or |z0| command...@>=
if info(q)=y_here then
	begin dvi_out(o+y0-down1); {|y0| or |w0|}
	while link(q)≠p do
		begin q←link(q);
		case info(q) of
		yz_OK: info(q)←z_OK;
		y_OK: info(q)←d_fixed;
		othercases do_nothing
else	begin dvi_out(o+z0-down1); {|z0| or |x0|}
	while link(q)≠p do
		begin q←link(q);
		case info(q) of
		yz_OK: info(q)←y_OK;
		z_OK: info(q)←d_fixed;
		othercases do_nothing

@ @<Generate a |down| or |right|...@>=
if abs(w)≥@'40000000 then
	begin dvi_out(o+3); {|down4| or |right4|}
	dvi_four(w); return;
if abs(w)≥@'100000 then
	begin dvi_out(o+2); {|down3| or |right3|}
	if w<0 then w←w+@'100000000;
	dvi_out(w div @'200000); w←w mod @'200000; goto 2;
if abs(w)≥@'200 then
	begin dvi_out(o+1); {|down2| or |right2|}
	if w<0 then w←w+@'200000;
	goto 2;
dvi_out(o); {|down1| or |right1|}
if w<0 then w←w+@'400;
goto 1;
2: dvi_out(w div @'400);
1: dvi_out(w mod @'400); return

@ As we search through the stack, we are in one of three states,
|y_seen|, |z_seen|, or |none_seen|, depending on whether we have
encountered |y_here| or |z_here| nodes. These states are encoded as
multiples of 6, so that they can be added to the |info| fields for quick

@d none_seen=0 {no |y_here| or |z_here| nodes have been encountered yet}
@d y_seen=6 {we have seen |y_here| but not |z_here|}
@d z_seen=12 {we have seen |z_here| but not |y_here|}

@<Look at the other stack entries until deciding...@>=
p←link(q); mstate←none_seen;
while p≠null do
	begin if width(p)=w then @<Consider a node with matching width;
		|goto found| if it's a hit@>
	else	case mstate+info(p) of
		none_seen+y_here: mstate←y_seen;
		none_seen+z_here: mstate←z_seen;
		y_seen+z_here,z_seen+y_here: goto not_found;
		othercases do_nothing

@ We might find a valid hit in a |y| or |z| byte that is already gone
from the buffer. But we can't change bytes that are gone forever; ``the
moving finger writes, $\ldotss\,$.''

@<Consider a node with matching width...@>=
case mstate+info(p) of
	if location(p)<dvi_gone then goto not_found
	else @<Change buffered instruction to |y| or |w| and |goto found|@>;
	if location(p)<dvi_gone then goto not_found
	else @<Change buffered instruction to |z| or |x| and |goto found|@>;
none_seen+y_here,none_seen+z_here,y_seen+z_here,z_seen+y_here: goto found;
othercases do_nothing

@ @<Change buffered instruction to |y| or |w| and |goto found|@>=
begin k←location(p)-dvi_offset;
if k<0 then k←k+dvi_buf_size;
info(p)←y_here; goto found;

@ @<Change buffered instruction to |z| or |x| and |goto found|@>=
begin k←location(p)-dvi_offset;
if k<0 then k←k+dvi_buf_size;
info(p)←z_here; goto found;

@ In case you are wondering when all the movement nodes are removed from
\TeX's memory, the answer is that they are recycled just before
|hlist_out| and |vlist_out| finish outputting a box. This restores the
down and right stacks to the state they were in before the box was output,
except that some |info|'s may have become more restrictive.

@p procedure prune_movements(@!l:integer);
	{delete movement nodes with |location≥l|}
label done,exit;
var p:pointer; {node being deleted}
begin while down_ptr≠null do
	begin if location(down_ptr)<l then goto done;
	p←down_ptr; down_ptr←link(p); free_node(p,movement_node_size);
done: while right_ptr≠null do
	begin if location(right_ptr)<l then return;
	p←right_ptr; right_ptr←link(p); free_node(p,movement_node_size);

@ The actual distances by which we want to move might be computed as the
sum of several separate movements. For example, there might be several
glue nodes in succession, or we might want to move right by the width of
some box plus some amount of glue. More importantly, the baselineskip
distances are computed in terms of glue together with the depth and
height of adjacent boxes, and we want the \.{DVI} file to lump these
three quantities together into a single motion.

Therefore \TeX\ maintains two pairs of global variables: |dvi_h| and |dvi_v|
are the |h| and |v| coordinates corresponding to the commands actually
output to the \.{DVI} file, while |cur_h| and |cur_v| are the coordinates
corresponding to the current state of the output routines. Coordinate
changes will accumulate in |cur_h| and |cur_v| without being reflected
in the output, until such a change becomes necessary or desirable; we
can call the |movement| procedure whenever we want to make |dvi_h=cur_h|
or |dvi_v=cur_v|.

The current font reflected in the \.{DVI} output is called |dvi_f|;
there is no need for a `\\{cur\_f}' variable.

The depth of nesting of |hlist_out| and |vlist_out| is called |cur_s|;
this is essentially the depth of |push| commands in the \.{DVI} output.

@d synch_h==if cur_h≠dvi_h then
		begin movement(cur_h-dvi_h,right1); dvi_h←cur_h;
@d synch_v==if cur_v≠dvi_v then
		begin movement(cur_v-dvi_v,down1); dvi_v←cur_v;

@!dvi_h,@!dvi_v:scaled; {a \.{DVI} reader program thinks we are here}
@!cur_h,@!cur_v:scaled; {\TeX\ thinks we are here}
@!dvi_f:internal_font_number; {the current font}
@!cur_s:integer; {current depth of output box nesting}

@ @<Initialize variables as |ship_out| begins@>=
dvi_h←0; dvi_v←0; cur_h←0; dvi_f←undefined_font;
cur_s←-1; ensure_dvi_open;

@ When |hlist_out| is called, its duty is to output the box represented
by the |hlist_node| pointed to by |temp_ptr|. The reference point of that
box has coordinates |(cur_h,cur_v)|.

Similarly, when |vlist_out| is called, its duty is to output the box represented
by the |vlist_node| pointed to by |temp_ptr|. The reference point of that
box has coordinates |(cur_h,cur_v)|.

@p procedure@?vlist_out; forward; {|hlist_out| and |vlist_out| are mutually

@ The recursive procedures |hlist_out| and |vlist_out| each have local variables
|save_h| and |save_v| to hold the values of |dvi_h| and |dvi_v| just before
entering a new level of recursion.  In effect, the values of |save_h| and
|save_v| on \TeX's run-time stack correspond to the values of |h| and |v|
that a \.{DVI}-reading program will push onto its coordinate stack.

@d advance=13 {go to this label when advancing past glue or a rule}
@d fin_rule=14 {go to this label to finish processing a rule}
@d next_p=15 {go to this label when finished with node |p|}

@p @t\4@>@<Declare procedures needed in |hlist_out|, |vlist_out|@>@t@>@/
procedure hlist_out; {output an |hlist_node| box}
label reswitch, advance, fin_rule, next_p;
var base_line: scaled; {the baseline coordinate for this box}
@!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to}
@!this_box: pointer; {pointer to containing box}
@!g_order: glue_ord; {applicable order of infinity for glue}
@!g_sign: normal..shrinking; {selects type of glue}
@!p:pointer; {current position in the hlist}
@!save_loc:integer; {\.{DVI} byte location upon entry}
@!leader_box:pointer; {the leader box being replicated}
@!leader_wd:scaled; {width of leader box being replicated}
@!outer_doing_leaders:boolean; {were we doing leaders?}
@!edge:scaled; {left edge of sub-box, or right edge of leader space}
begin this_box←temp_ptr; g_order←glue_order(this_box);
g_sign←glue_sign(this_box); p←list_ptr(this_box);
if cur_s>max_push then max_push←cur_s;
save_loc←dvi_offset+dvi_ptr; base_line←cur_v;
while p≠null do @<Output node |p| for |hlist_out| and move to the next node,
	maintaining the condition |cur_v=base_line|@>;
prune_movements(save_loc); decr(cur_s);

@ We ought to give special care to the efficiency of one part of |hlist_out|,
since it belongs to \TeX's inner loop. When a |char_node| is encountered,
we save a little time by processing several nodes in succession until
reaching a non-|char_node|. The program uses the fact that |set_char_0=0|.
@^inner loop@>

@<Output node |p| for |hlist_out|...@>=
reswitch: if is_char_node(p) then
	begin synch_h; synch_v;
	repeat f←font(p); c←character(p);
	if f≠dvi_f then @<Change font |dvi_f| to |f|@>;
	if c<128+min_quarterword then dvi_out(qo(c))
	else	begin dvi_out(set1); dvi_out(qo(c));
	until not is_char_node(p);
else if p≠null then @<Output the non-|char_node| |p| for |hlist_out|
		and move to the next node@>

@ @<Change font |dvi_f| to |f|@>=
begin if f≤64+font_base then dvi_out(f-font_base-1+fnt_num_0)
else	begin dvi_out(fnt1); dvi_out(f-font_base-1);

@ @<Output the non-|char_node| |p| for |hlist_out|...@>=
begin case type(p) of
hlist_node,vlist_node:@<Output a box in an hlist@>;
rule_node: begin rule_ht←height(p); rule_dp←depth(p); rule_wd←width(p);
	goto fin_rule;
whatsit_node: @<Output the whatsit node |p| in an hlist@>;
glue_node: @<Move right or output leaders@>;
ligature_node: @<Make node |p| look like a |char_node| and |goto reswitch|@>;
othercases do_nothing
goto next_p;
fin_rule: @<Output a rule in an hlist@>;
advance: cur_h←cur_h+rule_wd;

@ @<Output a box in an hlist@>=
if list_ptr(p)=null then cur_h←cur_h+width(p)
else	begin save_h←dvi_h; save_v←dvi_v; dvi_out(push);
	cur_v←base_line+shift_amount(p); {shift the box down}
	temp_ptr←p; edge←cur_h;
	if type(p)=vlist_node then vlist_out@+else hlist_out;
	dvi_out(pop); dvi_h←save_h; dvi_v←save_v;
	cur_h←edge+width(p); cur_v←base_line;

@ @<Output a rule in an hlist@>=
if is_running(rule_ht) then rule_ht←height(this_box);
if is_running(rule_dp) then rule_dp←depth(this_box);
rule_ht←rule_ht+rule_dp; {this is the rule thickness}
if (rule_ht>0)∧(rule_wd>0) then {we don't output empty rules}
	begin synch_h; cur_v←base_line+rule_dp; synch_v;
	dvi_out(set_rule); dvi_four(rule_ht); dvi_four(rule_wd);
	cur_v←base_line; dvi_h←dvi_h+rule_wd;

@ @<Move right or output leaders@>=
begin g←glue_ptr(p); rule_wd←width(g);
if g_sign≠normal then
	begin if g_sign=stretching then
		begin if stretch_order(g)=g_order then
	else	begin if shrink_order(g)=g_order then
if subtype(p)≥a_leaders then
	@<Output leaders in an hlist, |goto fin_rule| if a rule
		or to |next_p| if done@>;
goto advance;

@ @<Output leaders in an hlist...@>=
begin leader_box←leader_ptr(p);
if type(leader_box)=rule_node then
	begin rule_ht←height(leader_box); rule_dp←depth(leader_box);
	goto fin_rule;
if (leader_wd>0)∧(rule_wd>0) then
	begin edge←cur_h+rule_wd;
	@<Let |cur_h| be the position of the first box, and set |leader_wd|
		to the spacing between corresponding parts of boxes@>;
	while cur_h+leader_wd≤edge do
		@<Output a leader box at |cur_h|,
			then advance |cur_h| by |leader_wd|@>;
	cur_h←edge; goto next_p;

@ The calculations related to leaders require a bit of care. First, in the
case of |a_leaders| (aligned leaders), we want to move |cur_h| to the
smallest multiple of |leader_wd| that is not less than the current
value of |cur_h|, namely $|leader_wd|\ast\lceil |cur_h|/|leader_wd|\rceil$.
The program here should work in all cases even though \PASCAL\ does not define
the |div| operation precisely, and even when |cur_h| is negative.

In the case of |c_leaders| (centered leaders), we want to increase |cur_h|
by half of the excess space not occupied by the leaders; and in the case of
case of |x_leaders| (expanded leaders) we increase both |cur_h| and |leader_wd|
by $1/(q+1)$ of this excess space, where $q$ is the number of times the
leader box will be replicated. Slight inaccuracies in the division might
accumulate; half of this rounding error is placed at each end of the leaders.

@<Let |cur_h| be the position of the first box, ...@>=
if subtype(p)=a_leaders then
	begin save_h←cur_h; cur_h←leader_wd*(cur_h @!div leader_wd);
	if cur_h<save_h then cur_h←cur_h+leader_wd;
else	begin lq←rule_wd div leader_wd; {the number of box copies}
	lr←rule_wd mod leader_wd; {the remaining space}
	if subtype(p)=c_leaders then cur_h←cur_h+(lr div 2)
	else	begin lx←(2*lr+lq+1) div (2*lq+2); {round|(lr/(q+1))|}
		cur_h←cur_h+((lr-(lq-1)*lx) div 2);

@ The `\\{synch}' operations here are intended to decrease the number
of bytes needed to specify horizontal and vertical motion in the \.{DVI} output.

@<Output a leader box at |cur_h|, ...@>=
begin cur_v←base_line+shift_amount(leader_box); synch_v; save_v←dvi_v;@/
synch_h; save_h←dvi_h; dvi_out(push); temp_ptr←leader_box;
outer_doing_leaders←doing_leaders; doing_leaders←true;
if type(leader_box)=vlist_node then vlist_out@+else hlist_out;
dvi_out(pop); dvi_v←save_v; dvi_h←save_h; cur_v←save_v;

@ The |vlist_out| routine is similar to |hlist_out|, but a bit simpler.

@p procedure vlist_out; {output a |vlist_node| box}
label advance, fin_rule, next_p;
var left_edge: scaled; {the left coordinate for this box}
@!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to}
@!this_box: pointer; {pointer to containing box}
@!g_order: glue_ord; {applicable order of infinity for glue}
@!g_sign: normal..shrinking; {selects type of glue}
@!p:pointer; {current position in the vlist}
@!save_loc:integer; {\.{DVI} byte location upon entry}
@!leader_box:pointer; {the leader box being replicated}
@!leader_ht:scaled; {height of leader box being replicated}
@!outer_doing_leaders:boolean; {were we doing leaders?}
@!edge:scaled; {bottom boundary of leader space}
begin this_box←temp_ptr; g_order←glue_order(this_box);
g_sign←glue_sign(this_box); p←list_ptr(this_box);
if cur_s>max_push then max_push←cur_s;
save_loc←dvi_offset+dvi_ptr; left_edge←cur_h; cur_v←cur_v-height(this_box);
while p≠null do @<Output node |p| for |vlist_out| and move to the next node,
	maintaining the condition |cur_h=left_edge|@>;
prune_movements(save_loc); decr(cur_s);

@ @<Output node |p| for |vlist_out|...@>=
begin if is_char_node(p) then confusion("vlistout")
@:confusion vlistout}{\quad vlistout@>
else @<Output the non-|char_node| |p| for |vlist_out|@>;

@ @<Output the non-|char_node| |p| for |vlist_out|@>=
begin case type(p) of
hlist_node,vlist_node:@<Output a box in a vlist@>;
rule_node: begin rule_ht←height(p); rule_dp←depth(p); rule_wd←width(p);
	goto fin_rule;
whatsit_node: @<Output the whatsit node |p| in a vlist@>;
glue_node: @<Move down or output leaders@>;
othercases do_nothing
goto next_p;
fin_rule: @<Output a rule in a vlist, |goto next_p|@>;
advance: cur_v←cur_v+rule_ht;

@ The |synch_v| here allows the \.{DVI} output to use one-byte commands
for adjusting |v| in most cases, since the baselineskip distance will
usually be constant.

@<Output a box in a vlist@>=
if list_ptr(p)=null then cur_v←cur_v+height(p)+depth(p)
else	begin cur_v←cur_v+height(p); synch_v;
	save_h←dvi_h; save_v←dvi_v; dvi_out(push);
	cur_h←left_edge+shift_amount(p); {shift the box right}
	if type(p)=vlist_node then vlist_out@+else hlist_out;
	dvi_out(pop); dvi_h←save_h; dvi_v←save_v;
	cur_v←save_v+depth(p); cur_h←left_edge;

@ @<Output a rule in a vlist...@>=
if is_running(rule_wd) then rule_wd←width(this_box);
rule_ht←rule_ht+rule_dp; {this is the rule thickness}
if (rule_ht>0)∧(rule_wd>0) then {we don't output empty rules}
	begin synch_h; synch_v;
	dvi_out(put_rule); dvi_four(rule_ht); dvi_four(rule_wd);
goto next_p

@ @<Move down or output leaders@>=
begin g←glue_ptr(p); rule_ht←width(g);
if g_sign≠normal then
	begin if g_sign=stretching then
		begin if stretch_order(g)=g_order then
	else	begin if shrink_order(g)=g_order then
if subtype(p)≥a_leaders then
	@<Output leaders in a vlist, |goto fin_rule| if a rule
		or to |next_p| if done@>;
goto advance;

@ @<Output leaders in a vlist...@>=
begin leader_box←leader_ptr(p);
if type(leader_box)=rule_node then
	begin rule_wd←width(leader_box); rule_dp←0;
	goto fin_rule;
if (leader_ht>0)∧(rule_ht>0) then
	begin edge←cur_v+rule_ht;
	@<Let |cur_v| be the position of the first box, and set |leader_ht|
		to the spacing between corresponding parts of boxes@>;
	while cur_v+leader_ht≤edge do
		@<Output a leader box at |cur_v|,
			then advance |cur_v| by |leader_ht|@>;
	cur_v←edge; goto next_p;

@ @<Let |cur_v| be the position of the first box, ...@>=
if subtype(p)=a_leaders then
	begin save_v←cur_v; cur_v←leader_ht*(cur_v @!div leader_ht);
	if cur_v<save_v then cur_v←cur_v+leader_ht;
else	begin lq←rule_ht div leader_ht; {the number of box copies}
	lr←rule_ht mod leader_ht; {the remaining space}
	if subtype(p)=c_leaders then cur_v←cur_v+(lr div 2)
	else	begin lx←(2*lr+lq+1) div (2*lq+2); {round|(lr/(q+1))|}
		cur_v←cur_v+((lr-(lq-1)*lx) div 2);

@ When we reach this part of the program, |cur_v| indicates the top of a
leader box, not its baseline.

@<Output a leader box at |cur_v|, ...@>=
begin cur_h←left_edge+shift_amount(leader_box); synch_h; save_h←dvi_h;@/
cur_v←cur_v+height(leader_box); synch_v; save_v←dvi_v;
dvi_out(push); temp_ptr←leader_box;
outer_doing_leaders←doing_leaders; doing_leaders←true;
if type(leader_box)=vlist_node then vlist_out@+else hlist_out;
dvi_out(pop); dvi_v←save_v; dvi_h←save_h; cur_h←save_h;

@ The |hlist_out| and |vlist_out| procedures are now complete, so we are
ready for the |ship_out| routine that gets them started in the first place.

@p procedure ship_out(@!p:pointer); {output the box |p|}
var page_loc:integer; {location of the current |bop|}
@!k:0..9; {runs through ten count registers}
begin if tracing_output=0 then
	begin if offset>max_print_line-9 then print_ln@+else print_char(" ");
	print_char("["); print_int(count(0)); update_terminal;
else	begin print_nl(""); print_nl("Completed box being shipped out [");
@.Completed box...@>
	for k←0 to 9 do
		begin print_int(count(k));
		if k<9 then print_char(".");
	begin_diagnostic; show_box(p); end_diagnostic;
@<Ship box |p| out@>;
if tracing_output=0 then print_char("]"); update_terminal; {progress report}
@<Flush the box from memory, showing statistics if requested@>;

@ @<Flush the box from memory, showing statistics if requested@>=
stat if tracing_stats≠0 then
	begin print_nl("Memory usage before: ");
	print_int(var_used); print_char(",");
	print_int(dyn_used); print_char(";");
stat if tracing_stats≠0 then
	begin print(" after: ");
	print_int(var_used); print_char(",");
	print_int(dyn_used); print("; max so far: ");
	print_int(max_var_used); print_char(",");
	print_int(mem_end+1-hi_mem_base); print_ln;

@ @<Ship box |p| out@>=
if height(p)+depth(p)>max_v then max_v←height(p)+depth(p);
if width(p)>max_h then max_h←width(p);
@<Initialize variables as |ship_out| begins@>;
for k←0 to 9 do dvi_four(count(k));
dvi_four(last_bop); last_bop←page_loc;
cur_v←height(p); temp_ptr←p;
if type(p)=vlist_node then vlist_out@+else hlist_out;
dvi_out(eop); incr(total_pages)

@ At the end of the program, we must finish things off by writing the
post\-amble. If |total_pages=0|, the \.{DVI} file was never opened.

An integer variable |k| will be declared for use by this routine.

@<Finish the \.{DVI} file@>=
if total_pages=0 then print_nl("No output file.")
else	begin dvi_out(pst); {beginning of the postamble}
	dvi_four(last_bop); last_bop←dvi_offset+dvi_ptr-5; {|pst| location}
	dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp}
	prepare_mag; dvi_four(mag); {magnification factor}
	dvi_four(max_v); dvi_four(max_h);@/
	dvi_out(max_push div 256); dvi_out(max_push mod 256);@/
	dvi_out(total_pages div 256); dvi_out(total_pages mod 256);@/
	@<Output the font definitions for all fonts that were used@>;
	dvi_four(-1); dvi_four(last_bop); dvi_out(id_byte);
	k←4+((dvi_buf_size-dvi_ptr) mod 4); {the number of 223's}
	while k>0 do
		begin dvi_out(223); decr(k);
	@<Empty the last bytes out of |dvi_buf|@>;
	print_nl("Output written on "); print(output_file_name);
	print(" ("); print_int(total_pages); print(" pages, ");
	print_int(dvi_offset+dvi_ptr); print(" bytes)."); b_close(dvi_file);

@ @<Output the font definitions...@>=
while font_ptr>font_base do
	begin if font_used[font_ptr] then
		begin dvi_four(font_ptr-font_base-1);
		@<Output the font name whose internal number is |font_ptr|@>;
@ @<Output the font name whose internal number is |font_ptr|@>=
for k←str_start[font_area[font_ptr]] to str_start[font_area[font_ptr]+1]-1 do
for k←str_start[font_name[font_ptr]] to str_start[font_name[font_ptr]+1]-1 do
@* \[32] Packaging.
We're essentially done with the parts of \TeX\ that are concerned with
the input (|get_next|) and the output (|ship_out|). So it's time to
get heavily into the remaining part, which does the real work of typesetting.

After lists are constructed, \TeX\ wraps them up and puts them into boxes.
Two major subroutines are given the responsibility for this task: |hpack|
applies to horizontal lists (hlists) and |vpack| applies to vertical lists
(vlists). The main duty of |hpack| and |vpack| is to compute the dimensions
of the resulting boxes, and to adjust the glue if one of those dimensions
is pre-specified. The computed sizes normally enclose all of the material
inside the new box, but some items may stick out if negative glue is used,
if the box is overfull, or if a \.{\\vbox} includes other boxes that have
been shifted left.

The subroutine call |hpack(p,w,m)| returns a pointer to an |hlist_node|
for a box containing the hlist that starts at |p|. Parameter |w| specifies
a width; and pa\-ram\-eter |m| is either `|exactly|' or `|additional|'.  Thus,
|hpack(p,w,exactly)| pro\-duces a box whose width is exactly |w|, while
|hpack(p,w,additional)| yields a box whose width is the natural width plus
|w|.  It is convenient to define a macro called `|natural|' to cover the
most common case, so that we can say |hpack(p,natural)| to get a box that
has the natural width of list |p|.

Similarly, |vpack(p,w,m)| returns a pointer to a |vlist_node| for a
box containing the vlist that starts at |p|. In this case |w| represents
a height instead of a width; the parameter |m| is interpreted as in |hpack|.

@d exactly=0 {a box dimension is pre-specified}
@d additional=1 {a box dimension is increased from the natural one}
@d natural==0,additional {shorthand for parameters to |hpack| and |vpack|}

@ The parameters to |hpack| and |vpack| correspond to \TeX's primitives
like `\.{\\hbox} \.{to} \.{300pt}', `\.{\\hbox} \.{expand} \.{10pt}'; note
that `\.{\\hbox}' with no dimension following it is equivalent to
`\.{\\hbox} \.{expand} \.{0pt}'.  The |scan_spec| subroutine scans such
constructions in the user's input, including the mandatory left brace that
follows them, and it puts the specification onto |save_stack| so that the
desired box can later be obtained by executing the following code:

@p procedure scan_spec; {scans a box specification and left brace}
label found;
begin if scan_keyword("to") then saved(0)←exactly
else if scan_keyword("expand") then saved(0)←additional
else	begin saved(0)←additional; saved(1)←0;
	goto found;
scan_dimen(false,false,false); saved(1)←cur_val;
found: save_ptr←save_ptr+2; scan_left_brace;

@ To figure out the glue setting, |hpack| and |vpack| determine how much
stretchability and shrinkability are present, considering all four orders
of infinity. The highest order of infinity that has a nonzero coefficient
is then used as if no other orders were present.

For example, suppose that the given list contains six glue nodes with
the respective stretchabilities 3pt, 8fill, 5fil, 6pt, $-3$fil, $-8$fill.
Then the total is essentially 2fil; and if a total additional space of 6pt
is to be achieved by stretching, the actual amounts of stretch will be
0pt, 0pt, 15pt, 0pt, $-9$pt, and 0pt, since only `fil' glue will be
considered. (The `fill' glue is therefore not really stretching infinitely
with respect to `fil'; nobody would actually want that to happen.)

The arrays |total_stretch| and |total_shrink| are used to determine how much
glue of each kind is present.

@!total_stretch, @!total_shrink: array[glue_ord] of scaled;
	{glue found by |hpack| or |vpack|}

@ Here now is |hpack| itself, which contains only one possible surprise:
Be\-sides making a list into a box, |hpack| also removes all occurrences of
|ins_node|, |mark_node|, and |adjust_node| items; it puts the resulting
material into a list that starts at location |adjustments=link(adjust_head)|.
These adjustments will be put into the enclosing vertical list, if |hpack|
is being called to make a line of a paragraph; otherwise such nodes cannot
be present in a horizontal list, since they are syntactically forbidden in
restricted horizontal mode.

@d adjustments==link(adjust_head)

@p function hpack(@!p:pointer;@!w:scaled;@!m:small_number):pointer;
label reswitch, common_ending, exit;
var r:pointer; {the box node that will be returned}
@!q:pointer; {trails behind |p|}
@!h,@!d,@!x:scaled; {height, depth, and natural width}
@!s:scaled; {shift amount}
@!g:pointer; {points to a glue specification}
@!o:glue_ord; {order of infinity}
@!f:internal_font_number; {the font in a |char_node|}
@!i:four_quarters; {font information about a |char_node|}
@!hd:quarterword; {height and depth indices for a character}
@!t:pointer; {tail of the adjustment list}
@!b:integer; {badness of the new box}
begin r←get_node(box_node_size); type(r)←hlist_node;
subtype(r)←min_quarterword; shift_amount(r)←0;
q←r+list_offset; link(q)←p;@/
t←adjust_head; h←0; @<Clear dimensions to zero@>;
while p≠null do @<Examine node |p| in the hlist, taking account of its effect
	on the dimensions of the new box, or moving it to the adjustment list;
	then advance |p| to the next node@>;
link(t)←null; height(r)←h; depth(r)←d;@/
@<Determine the value of |width(r)| and the appropriate glue setting;
	then |return| or |goto common_ending|@>;
common_ending: @<Finish issuing a diagnostic message
			for an overfull or underfull hbox@>;
exit: hpack←r;

@ @<Clear dimensions to zero@>=
d←0; x←0;
for o←normal to filll do
	begin total_stretch[o]←0; total_shrink[o]←0;

@ @<Examine node |p| in the hlist, taking account of its effect...@>=
begin reswitch: while is_char_node(p) do
	@<Incorporate character dimensions into the dimensions of
		the hbox that will contain@@it, then move to the next node@>;
if p≠null then
	begin case type(p) of
		@<Incorporate box dimensions into the dimensions of
			the hbox that will contain@@it@>;
		@<Transfer node |p| to the adjustment list@>;
	whatsit_node:@<Incorporate a whatsit node into an hbox@>;
	glue_node:@<Incorporate glue into the totals@>;
	kern_node,math_node: x←x+width(p);
	ligature_node: @<Make node |p| look like a |char_node|
		and |goto reswitch|@>;
	othercases do_nothing

@ @<Make node |p| look like a |char_node| and |goto reswitch|@>=
begin mem[lig_trick]←mem[lig_char(p)]; link(lig_trick)←link(p);
p←lig_trick; goto reswitch;

@ @<Incorporate box dimensions into the dimensions of the hbox...@>=
begin x←x+width(p);
if type(p)≥rule_node then s←0 @+else s←shift_amount(p);
if height(p)-s>h then h←height(p)-s;
if depth(p)+s>d then d←depth(p)+s;

@ The following code is part of \TeX's inner loop; i.e., adding another
character of text to the user's input will cause each of these instructions
to be exercised one more time.
@^inner loop@>

@<Incorporate character dimensions into the dimensions of the hbox...@>=
begin f←font(p); i←char_info(f)(character(p)); hd←height_depth(i);
s←char_height(f)(hd);@+if s>h then h←s;
s←char_depth(f)(hd);@+if s>d then d←s;

@ Although node |q| is not necessarily the immediate predecessor of node |p|,
it always points to some node in the list preceding |p|. Thus, we can delete
nodes by moving |q| when necessary. The algorithm takes linear time, and the
extra computation does not intrude on the inner loop unless it is necessary
to make a deletion.

@<Transfer node |p| to the adjustment list@>=
begin while link(q)≠p do q←link(q);
if type(p)=adjust_node then
	begin link(t)←adjust_ptr(p);
	while link(t)≠null do t←link(t);
	p←link(p); free_node(link(q),small_node_size);
else	begin link(t)←p; t←p; p←link(p);
link(q)←p; p←q;

@ @<Incorporate glue into the totals@>=
begin g←glue_ptr(p); x←x+width(g);@/
o←stretch_order(g); total_stretch[o]←total_stretch[o]+stretch(g);
o←shrink_order(g); total_shrink[o]←total_shrink[o]+shrink(g);

@ When we get to the present part of the program, |x| is the natural width
of the box being packaged.

@<Determine the value of |width(r)| and the appropriate glue setting...@>=
if m=additional then w←x+w;
width(r)←w; x←w-x; {now |x| is the excess to be made up}
if x=0 then
	begin glue_sign(r)←normal; glue_order(r)←normal; glue_set(r)←0.0;
else if x>0 then @<Determine horizontal glue stretch setting, then |return|
		or \hbox{|goto common_ending|}@>
else @<Determine horizontal glue shrink setting, then |return|
		or \hbox{|goto common_ending|}@>

@ @<Determine horizontal glue stretch setting...@>=
begin @<Determine the stretch order@>;
glue_order(r)←o; glue_sign(r)←stretching;
if total_stretch[o]=0 then glue_set(r)←0.0 {there's nothing to stretch}
else glue_set(r)←x/total_stretch[o];
if (hbadness<inf_bad)∧(o=normal)∧(list_ptr(r)≠null) then
	@<Report an underfull hbox and |goto common_ending|, if this box
		is sufficiently bad@>;

@ @<Determine the stretch order@>=
if total_stretch[filll]≠0 then o←filll
else if total_stretch[fill]≠0 then o←fill
else if total_stretch[fil]≠0 then o←fil
else o←normal

@ @<Report an underfull hbox and |goto common_ending|, if...@>=
begin b←badness(x,total_stretch[normal]);
if b>hbadness then
	begin print_ln;
	if b>100 then print_nl("Underfull")@+else print_nl("Loose");
	print(" \hbox (badness "); print_int(b);
	goto common_ending;

@ In order to provide a decent indication of where an overfull or underfull
box originated, we use a global variable |par_begin_line| that is
set nonzero only when |hpack| is being called by the paragraph builder.

@!par_begin_line:integer; {source file line where the current paragraph began}

@ @<Set init...@>=

@ @<Finish issuing a diagnostic message for an overfull or underfull hbox@>=
if output_active then print(") has occurred while \output is active")
else	begin if par_begin_line≠0 then
		begin print(") in paragraph at lines "); print_int(par_begin_line);
	else print(") detected at line ");
font_in_short_display←undefined_font; short_display(list_ptr(r)); print_ln;@/
begin_diagnostic; show_box(r); end_diagnostic

@ @<Determine horizontal glue shrink setting...@>=
begin @<Determine the shrink order@>;
glue_order(r)←o; glue_sign(r)←shrinking;
if total_shrink[o]=0 then glue_set(r)←0.0 {there's nothing to shrink}
else glue_set(r)←-x/total_shrink[o];
if (total_shrink[o]<-x)∧(o=normal)∧(list_ptr(r)≠null) then
	begin glue_set(r)←1.0; {this is the maximum shrinkage}
	@<Report an overfull hbox and |goto common_ending|, if this box
		is sufficiently bad@>;
else if (hbadness<100)∧(o=normal)∧(list_ptr(r)≠null) then
	@<Report a tight hbox and |goto common_ending|, if this box
		is sufficiently bad@>;

@ @<Determine the shrink order@>=
if total_shrink[filll]≠0 then o←filll
else if total_shrink[fill]≠0 then o←fill
else if total_shrink[fil]≠0 then o←fil
else o←normal

@ @<Report an overfull hbox and |goto common_ending|, if...@>=
if -x-total_shrink[normal]>hfuzz then
	begin if overfull_rule>0 then
		begin while link(q)≠null do q←link(q);
	print_ln; print_nl("Overfull \hbox (");
	print_scaled(-x-total_shrink[normal]); print("pt too wide");
	goto common_ending;

@ @<Report a tight hbox and |goto common_ending|, if...@>=
begin b←badness(-x,total_shrink[normal]);
if b>hbadness then
	begin print_ln; print_nl("Tight \hbox (badness "); print_int(b);
	goto common_ending;

@ The |vpack| subroutine is actually a special case of a slightly more
general routine called |vpackage|, which has four parameters. The fourth
parameter, which is |max_dimen| in the case of |vpack|, specifies the
maximum depth of the page box that is constructed. The depth is first
computed by the normal rules; if it exceeds this limit, the reference
point is simply moved down until the limiting depth is attained.

@d vpack(#)==vpackage(#,max_dimen) {special case of unconstrained depth}

@p function vpackage(@!p:pointer;@!h:scaled;@!m:small_number;@!l:scaled):
label common_ending, exit;
var r:pointer; {the box node that will be returned}
@!w,@!d,@!x:scaled; {width, depth, and natural height}
@!s:scaled; {shift amount}
@!g:pointer; {points to a glue specification}
@!o:glue_ord; {order of infinity}
@!b:integer; {badness of the new box}
begin r←get_node(box_node_size); type(r)←vlist_node;
subtype(r)←min_quarterword; shift_amount(r)←0;
w←0; @<Clear dimensions to zero@>;
while p≠null do @<Examine node |p| in the vlist, taking account of its effect
	on the dimensions of the new box; then advance |p| to the next node@>;
if d>l then
	begin x←x+d-l; depth(r)←l;
else depth(r)←d;
@<Determine the value of |height(r)| and the appropriate glue setting;
	then |return| or |goto common_ending|@>;
common_ending: @<Finish issuing a diagnostic message
			for an overfull or underfull vbox@>;
exit: vpackage←r;

@ @<Examine node |p| in the vlist, taking account of its effect...@>=
begin if is_char_node(p) then confusion("vpack")
@:confusion vpack}{\quad vpack@>
else	case type(p) of
		@<Incorporate box dimensions into the dimensions of
			the vbox that will contain@@it@>;
	whatsit_node:@<Incorporate a whatsit node into a vbox@>;
	glue_node: begin x←x+d; d←0;
		@<Incorporate glue into the totals@>;
	kern_node: begin x←x+d+width(p); d←0;
	othercases do_nothing

@ @<Incorporate box dimensions into the dimensions of the vbox...@>=
begin x←x+d+height(p); d←depth(p);
if type(p)≥rule_node then s←0 @+else s←shift_amount(p);
if width(p)+s>w then w←width(p)+s;

@ When we get to the present part of the program, |x| is the natural height
of the box being packaged.

@<Determine the value of |height(r)| and the appropriate glue setting...@>=
if m=additional then h←x+h;
height(r)←h; x←h-x; {now |x| is the excess to be made up}
if x=0 then
	begin glue_sign(r)←normal; glue_order(r)←normal; glue_set(r)←0.0;
else if x>0 then @<Determine vertical glue stretch setting, then |return|
		or \hbox{|goto common_ending|}@>
else @<Determine vertical glue shrink setting, then |return|
		or \hbox{|goto common_ending|}@>

@ @<Determine vertical glue stretch setting...@>=
begin @<Determine the stretch order@>;
glue_order(r)←o; glue_sign(r)←stretching;
if total_stretch[o]=0 then glue_set(r)←0.0 {there's nothing to stretch}
else glue_set(r)←x/total_stretch[o];
if (vbadness<inf_bad)∧(o=normal)∧(list_ptr(r)≠null) then
	@<Report an underfull vbox and |goto common_ending|, if this box
		is sufficiently bad@>;

@ @<Report an underfull vbox and |goto common_ending|, if...@>=
begin b←badness(x,total_stretch[normal]);
if b>vbadness then
	begin print_ln;
	if b>100 then print_nl("Underfull")@+else print_nl("Loose");
	print(" \vbox (badness "); print_int(b);
	goto common_ending;

@ @<Finish issuing a diagnostic message for an overfull or underfull vbox@>=
if output_active then print(") has occurred while \output is active")
else	begin print(") detected at line "); print_int(line);
begin_diagnostic; show_box(r); end_diagnostic

@ @<Determine vertical glue shrink setting...@>=
begin @<Determine the shrink order@>;
glue_order(r)←o; glue_sign(r)←shrinking;
if total_shrink[o]=0 then glue_set(r)←0.0 {there's nothing to shrink}
else glue_set(r)←-x/total_shrink[o];
if (total_shrink[o]<-x)∧(o=normal)∧(list_ptr(r)≠null) then
	begin glue_set(r)←1.0; {this is the maximum shrinkage}
	@<Report an overfull vbox and |goto common_ending|, if this box
		is sufficiently bad@>;
else if (vbadness<100)∧(o=normal)∧(list_ptr(r)≠null) then
	@<Report a tight vbox and |goto common_ending|, if this box
		is sufficiently bad@>;

@ @<Report an overfull vbox and |goto common_ending|, if...@>=
if -x-total_shrink[normal]>vfuzz then
	begin print_ln; print_nl("Overfull \vbox (");
	print_scaled(-x-total_shrink[normal]); print("pt too high");
	goto common_ending;

@ @<Report a tight vbox and |goto common_ending|, if...@>=
begin b←badness(-x,total_shrink[normal]);
if b>vbadness then
	begin print_ln; print_nl("Tight \vbox (badness "); print_int(b);
	goto common_ending;

@ When a box is being appended to the current vertical list, the
baselineskip calculation is handled by the |append_to_vlist| routine.

@p procedure append_to_vlist(@!b:pointer);
var d:scaled; {deficiency of space between baselines}
@!p:pointer; {a new glue specification}
begin if prev_depth≠ignore_depth then
	begin d←width(baseline_skip)-prev_depth-height(b);
	if d<line_skip_limit then p←new_param_glue(line_skip_code)
	else	begin p←new_skip_param(baseline_skip_code);
		width(temp_ptr)←d; {|temp_ptr=glue_ptr(p)|}
	link(tail)←p; tail←p;
link(tail)←b; tail←b; prev_depth←depth(b);
@* \[33] Data structures for math mode.
When \TeX\ reads a formula that is enclosed between \.\$'s, it constructs an
{\sl mlist}, which is essentially a tree structure representing that
formula.  An mlist is a linear sequence of items, but we can regard it as
a tree structure because mlists can appear within mlists. For example, many
of the entries can be subscripted or superscripted, and such ``scripts''
are mlists in their own right.

An entire formula is parsed into such a tree before any of the actual
typesetting is done, because the current style of type is usually not
known until the formula has been fully scanned. For example, when the
formula `\.{\$a+b \\over c+d\$}' is being read, there is no way to tell
that `\.{a+b}' will be in script size until `\.{\\over}' has appeared.

During the scanning process, each element of the mlist being built is
classi\-fied as a relation, a binary operator, an open parenthesis, etc.,
or as a construct like `\.{\\sqrt}' that must be built up. This classification
appears in the mlist data structure.

After a formula has been fully scanned, the mlist is converted to an hlist
so that it can be incorporated into the surrounding text. This conversion is
controlled by a recursive procedure that decides all of the appropriate
styles by a ``top-down'' process starting at the outermost level and working
in towards the subformulas. The formula is ultimately pasted together using
combinations of horizontal and vertical boxes, with glue and penalty nodes
inserted as necessary.

An mlist is represented internally as a linked list consisting chiefly
of ``noads'' (pronounced ``no-adds''), to distinguish them from the somewhat
similar ``nodes'' in hlists and vlists. Certain kinds of ordinary nodes are
allowed to appear in mlists together with the noads; \TeX\ tells the difference
by means of the |type| field, since a noad's |type| is always greater than
that of a node. An mlist does not contain character nodes, hlist nodes, vlist
nodes, math nodes, ligature nodes, mark nodes, insert nodes, adjust nodes,
or unset nodes; in particular, each mlist item appears in the
variable-size part of |mem|, so the |type| field is always present.

@ Each noad is four or more words long. The first word contains the |type|
and |subtype| and |link| fields that are already so familiar to us; the
second, third, and fourth words are called the noad's |operand|, |subscr|,
and |supscr| fields.

Consider, for example, the simple formula `\.{\$x\UA2\$}', which would be
parsed into an mlist containing a single element called an |ord_noad|.
The |operand| of this noad is a representation of `\.x', the |subscr| is
empty, and the |supscr| is a representation of `\.2'.

The |operand|, |subscr|, and |supscr| fields are further broken into
subfields. If |p| points to a noad, and if |q| is one of its principal
fields (e.g., |q=subscr(p)|), there are several possibilities for the
subfields, depending on the |math_type| of |q|.

\yskip\hang|math_type(q)=math_char| means that |fam(q)| refers to one of
the sixteen font families, and |character(q)| is the number of a character
within a font of that family, as in a character node.

\yskip\hang|math_type(q)=empty| indicates a field with no value (the
corresponding attribute of noad |p| is not present).

\yskip\hang|math_type(q)=sub_box| means that |info(q)| points to a box
node (either an |hlist_node| or a |vlist_node|) that should be used as the
value of the field.  The |shift_amount| in the subsidiary box node is the
amount by which that box will be shifted downward.

\yskip\hang|math_type(q)=sub_mlist| means that |info(q)| points to
an mlist; the mlist must be converted to an hlist in order to obtain
the value of this field.

\yskip\noindent In the latter case, we might have |info(q)=null|. This
is not the same as |math_type(q)=empty|; for example, `\.{\$P\DA\{\}\$}'
and `\.{\$P\$}' produce different results (the former will not have the
``italic correction'' added to the width of |P|, but the ``script skip''
will be added).

The definitions of subfields given here are evidently wasteful of space,
since a halfword is being used for the |math_type| although only two
bits would be needed. However, there are hardly ever many noads present at
once, since they are soon converted to nodes that take up even more space,
so we can afford to represent them in whatever way simplifies the

@d noad_size=4 {number of words in a normal noad}
@d operand(#)==#+1 {the |operand| field of a noad}
@d supscr(#)==#+2 {the |supscr| field of a noad}
@d subscr(#)==#+3 {the |subscr| field of a noad}
@d math_type==link {a |halfword| in |mem|}
@d fam==font {a |quarterword| in |mem|}
@d empty=0 {|math_type| when the attribute is absent}
@d math_char=1 {|math_type| when the attribute is simple}
@d sub_box=2 {|math_type| when the attribute is a box}
@d sub_mlist=3 {|math_type| when the attribute is a formula}

@ Each portion of a formula is classified as Ord, Op, Bin, Rel, Open,
Close, Punct, or Inner, for purposes of spacing and line breaking. An
|ord_noad|, |op_noad|, |bin_noad|, |rel_noad|, |open_noad|, |close_noad|,
|punct_noad|, or |inner_noad| is used to represent portions of the various
types. For example, an `\.=' sign in a formula leads to the creation of a
|rel_noad| whose |operand| field is a representation of an equals sign
(usually |fam=0|, |character=@'75|).  A formula preceded by \.{\\mathrel}
also results in a |rel_noad|.  When a |rel_noad| is followed by an
|op_noad|, say, and possibly separated by one or more ordinary nodes (not
noads), \TeX\ will insert a penalty node (with the current |rel_penalty|)
just after the formula that corresponds to the |rel_noad|, unless there
already was a penalty immediately following; and a ``thick space'' will be
inserted just before the formula that corresponds to the |op_noad|.

A noad of type |ord_noad|, |op_noad|, $\ldotss$, |inner_noad| usually
has a |subtype=normal|. The only exception is that an |op_noad| might
have |subtype=switched|, if \.{\\limitswitch} has been applied to this

@d ord_noad=unset_node+2 {|type| of a noad classified Ord}
@d op_noad=ord_noad+1 {|type| of a noad classified Op}
@d bin_noad=ord_noad+2 {|type| of a noad classified Bin}
@d rel_noad=ord_noad+3 {|type| of a noad classified Rel}
@d open_noad=ord_noad+4 {|type| of a noad classified Open}
@d close_noad=ord_noad+5 {|type| of a noad classified Close}
@d punct_noad=ord_noad+6 {|type| of a noad classified Punct}
@d inner_noad=ord_noad+7 {|type| of a noad classified Inner}
@d switched=1 {|subtype| of a limit-switched |op_noad|}

@ A |radical_noad| is five words long; the fifth word is the |left_delimiter|
field, which usually represents a square root sign.

A |fraction_noad| is six words long; it has a |right_delimiter| field
as well as a |left_delimiter|.

Delimiter fields are of type |four_quarters|, and they have four subfields
called |small_fam|, |small_char|, |large_fam|, |large_char|. These subfields
represent variable-size delimiters by giving the ``small'' and ``large''
starting characters, as explained in the manual.

A |fraction_noad| is actually quite different from all other noads. Not
only does it have six words, it has |thickness|, |denominator|, and
|numerator| fields instead of |operand|, |subscr|, and |supscr|. The
|thickness| is a scaled value that tells how thick to make a fraction
rule; however, the special value |default_code| is used to stand for the
|default_rule_thickness| of the current size. The |numerator| and
|denominator| point to mlists that define a fraction; we always have
$$\hbox{|math_type(numerator)=math_type(denominator)=sub_mlist|}.$$ The
|left_delimiter| and |right_delimiter| fields specify delimiters that will
be placed at the left and right of the fraction. In this way, a
|fraction_noad| is able to represent all of \TeX's operators \.{\\over},
\.{\\atop}, \.{\\above}, \.{\\xoverx}, \.{\\xatopx}, and \.{\\xabovex}.

@d left_delimiter(#)==#+4 {first delimiter field of a noad}
@d right_delimiter(#)==#+5 {second delimiter field of a fraction noad}
@d radical_noad=inner_noad+1 {|type| of a noad for square roots}
@d radical_noad_size=5 {number of |mem| words in a radical noad}
@d fraction_noad=radical_noad+1 {|type| of a noad for generalized fractions}
@d fraction_noad_size=6 {number of |mem| words in a fraction noad}
@d small_fam(#)==mem[#].qqqq.b0 {|fam| for ``small'' delimiter}
@d small_char(#)==mem[#].qqqq.b1 {|character| for ``small'' delimiter}
@d large_fam(#)==mem[#].qqqq.b2 {|fam| for ``large'' delimiter}
@d large_char(#)==mem[#].qqqq.b3 {|character| for ``large'' delimiter}
@d thickness==width {|thickness| field in a fraction noad}
@d default_code==@'10000000000 {denotes |default_rule_thickness|}
@d numerator==supscr {|numerator| field in a fraction noad}
@d denominator==subscr {|denominator| field in a fraction noad}

@ The global variable |empty_field| is set up for initialization of empty
fields in new noads. Similarly, |null_delimiter| is for the initialization
of delimiter fields; we can use the fact that a null delimiter is the same
as an empty word of character information.

@d null_delimiter==null_character


@ @<Set init...@>=
empty_field.rh←empty; empty_field.lh←null;

@ The |new_noad| function creates an |inner_noad| that is completely null.

@p function new_noad:pointer;
var p:pointer;
begin p←get_node(noad_size);
type(p)←inner_noad; subtype(p)←normal;

@ A few more kinds of noads will complete the set: An |under_noad| has its
operand underlined; an |over_noad| has it overlined. An |accent_noad| places
an accent over its operand; the accent character appears as
|fam(accent_chr(p))| and |character(accent_chr(p))|. A |vcenter_noad|
centers its operand vertically with respect to the axis of the formula;
we always have |math_type(operand(p))=sub_box| in this case.

And finally, we have |left_noad| and |right_noad| types, to implement
\TeX's \.{\\left} and \.{\\right}. The |operand| of such noads is
replaced by a |delimiter| field; thus, for example, `\.{\\left(}' produces
a |left_noad| such that |delimiter(p)| holds the family and character
codes for all left parentheses. A |left_noad| never appears in an mlist
except as the first element, and a |right_noad| never appears in an mlist
except as the last element; furthermore, we either have both a |left_noad|
and a |right_noad|, or neither one is present. The |subscr| and |supscr|
fields are always |empty| in a |left_noad| and a |right_noad|.

@d under_noad=fraction_noad+1 {|type| of a noad for underlining}
@d over_noad=under_noad+1 {|type| of a noad for overlining}
@d accent_noad=over_noad+1 {|type| of a noad for accented subformulas}
@d accent_noad_size=5 {number of |mem| words in an accent noad}
@d accent_chr(#)==#+4 {the |accent_chr| field of an accent noad}
@d vcenter_noad=accent_noad+1 {|type| of a noad for \.{\\vcenter}}
@d left_noad=vcenter_noad+1 {|type| of a noad for \.{\\left}}
@d right_noad=left_noad+1 {|type| of a noad for \.{\\right}}
@d delimiter==operand {|delimiter| field in left and right noads}

@ Math formulas can also contain instructions like \.{\\textstyle} that
override \TeX's normal style rules. A |style_node| is inserted into the
data structure to record such instructions; it is two words long, so it
is considered a node instead of a noad. The |subtype| is either |display_style|
or |text_style| or |script_style| or |script_script_style|. The
second word of a |style_node| is not used.

\TeX\ uses even numbers 0, 2, 4, 6 to encode the basic styles
|display_style|, $\ldotss$, |script_script_style|, and adds@@1 to get the
``cramped'' versions of these styles.

@d style_node=unset_node+1 {|type| of a style node}
@d display_style=0 {|subtype| for \.{\\displaystyle}}
@d text_style=2 {|subtype| for \.{\\textstyle}}
@d script_style=4 {|subtype| for \.{\\scriptstyle}}
@d script_script_style=6 {|subtype| for \.{\\scriptscriptstyle}}
@d cramped=1 {add this to an uncramped style if you want to cramp it}

@p function new_style(@!s:small_number):pointer; {create a style node}
var p:pointer; {the new node}
begin p←get_node(small_node_size); type(p)←style_node;
subtype(p)←s; width(p)←0; {the |width| is not used}

@ Let's consider now the previously unwritten part of |show_node_list|
that displays the things that can only be present in mlists; this 
program illustrates how to access the data structures just defined.

In the context of the following program, |p| points to a node or noad that
should be displayed, and the current string contains the ``recursion history''
that leads to this point. The recursion history consists of a dot for each
outer level in which |p| is subsidiary to some node, or in which |p| is
subsidiary to the |operand| field of some noad; the dot is replaced by
`\.[' or `\.(' or `\./' or `\.\\' if |p| is descended from the |subscr|
or |supscr| or |denominator| or |numerator| fields of noads. For example,
the current string would be `\.{.(.[/}' if |p| points to the |ord_noad| for
|x| in the (ridiculous) formula `\.{\$\\sqrt\{a\UA\{\{b\DA\{c\\over x+y\}

@<Cases of |show_node_list| that arise...@>=
	left_noad,right_noad:@<Display normal noad |p|@>;
fraction_noad:@<Display fraction noad |p|@>;

@ Here are some simple routines used in the display of noads.

@<Declare procedures needed for displaying the elements of mlists@>=
procedure print_fam_and_char(@!p:pointer); {prints family and character}
begin print_esc("fam"); print_int(fam(p)); print_char(" ");
procedure print_delimiter(@!p:pointer); {prints a delimiter as 24-bit octal}
var a:integer; {accumulator}
begin a←small_fam(p)*256+qo(small_char(p));
if a<0 then print_int(a) {this should never happen}
else print_octal(a);

@ The next subroutine will descend to another level of recursion when a
subsidiary mlist needs to be displayed. The parameter |c| indicates what
character is to become part of the recursion history. An empty mlist is
distinguished from a field with |math_type(p)=empty|, because these are
not equivalent (as explained above).

@<Declare procedures needed for displaying...@>=
procedure@?show_info; forward;@t\2@>@?{|show_node_list(info(temp_ptr))|}
procedure print_subsidiary_data(@!p:pointer;@!c:ascii_code);
	{display a noad field}
begin if cur_length<depth_threshold then
	begin append_char(c); {include |c| in the recursion history}
	temp_ptr←p; {prepare for |show_info| if recursion is needed}
	case math_type(p) of
	math_char: begin print_ln; print_current_string; print_fam_and_char(p);
	sub_box: show_info; {recursive call}
	sub_mlist: if info(p)=null then
			begin print_ln; print_current_string; print("{}");
		else show_info; {recursive call}
	othercases do_nothing {|empty|}
	flush_char; {remove |c| from the recursion history}

@ The inelegant introduction of |show_info| in the code above seems better
than the alternative of using \PASCAL's strange |forward| declaration for a
procedure with parameters. The \PASCAL\ convention about dropping parameters
from a post-|forward| procedure is, frankly, so intolerable to the author
of \TeX\ that he would rather stoop to communication via a global temporary
variable. (A similar stoopidity occurred with respect to |hlist_out| and
|vlist_out| above, and it will occur with respect to |mlist_to_hlist| below.)
@^Knuth, Donald Ervin@>

@p procedure show_info; {the reader will kindly forgive this}
begin show_node_list(info(temp_ptr));

@ @<Declare procedures needed for displaying...@>=
procedure print_style(@!c:integer);
begin case c div 2 of
0: print_esc("displaystyle"); {|display_style=0|}
1: print_esc("textstyle"); {|text_style=2|}
2: print_esc("scriptstyle"); {|script_style=4|}
3: print_esc("scriptscriptstyle"); {|script_script_style=6|}
othercases print("Unknown style!")

@ @<Display normal noad |p|@>=
begin case type(p) of
ord_noad: print_esc("mathord");
op_noad: print_esc("mathop");
bin_noad: print_esc("mathbin");
rel_noad: print_esc("mathrel");
open_noad: print_esc("mathopen");
close_noad: print_esc("mathclose");
punct_noad: print_esc("mathpunct");
over_noad: print_esc("overline");
under_noad: print_esc("underline");
vcenter_noad: print_esc("vcenter");
radical_noad: begin print_esc("radical"); print_delimiter(left_delimiter(p));
accent_noad: begin print_esc("accent"); print_fam_and_char(accent_chr(p));
left_noad: begin print_esc("left"); print_delimiter(operand(p));
right_noad: begin print_esc("right"); print_delimiter(operand(p));
if subtype(p)≠normal then print_esc("limitswitch");
if type(p)<left_noad then print_subsidiary_data(operand(p),".");

@ @<Display fraction noad |p|@>=
begin print_esc("xabovex"); print(" thickness ");
if thickness(p)=default_code then print("= default")
else print_scaled(thickness(p));
if (small_fam(left_delimiter(p))≠0)∨@+
	(large_char(left_delimiter(p))≠min_quarterword) then
	begin print(", left-delimiter "); print_delimiter(left_delimiter(p));
if (small_fam(right_delimiter(p))≠0)∨@|
	(large_char(right_delimiter(p))≠min_quarterword) then
	begin print(", right-delimiter "); print_delimiter(right_delimiter(p));
@* \[34] Subroutines for math mode.
In order to convert mlists to hlists, i.e., noads to nodes, we need several
subroutines that are conveniently dealt with now.

Let us first introduce the macros that make it easy to get at the parameters and
other font information. A size code, which is a multiple of 16, is added to a
family number to get an index into the table of internal font numbers
for each combination of family and size.  (Be alert: Size codes get
larger as the type gets smaller.)

@d text_size=0 {size code for the largest size in a family}
@d script_size=16 {size code for the medium size in a family}
@d script_script_size=32 {size code for the smallest size in a family}

@<Basic printing procedures@>=
procedure print_size(@!s:integer);
begin if s=0 then print_esc("textfont")
else if s=script_size then print_esc("scriptfont")
else print_esc("scriptscriptfont");

@ Before an mlist is created, \TeX\ makes sure that 
the fonts in family@@2 have enough parameters to be math-symbol
fonts, and that the text font in family@@3 has enough parameters to be a
math-extension font. The math-symbol parameters are referred to by using the
following macros, which take a size code as their parameter; for example,
|num1(cur_size)| gives the value of the |num1| parameter for the current size.
@^parameters for symbols@>
@^font parameters@>

@d mathsy_end(#)==fam_fnt(2+#)]].sc
@d mathsy(#)==font_info[#+param_base[mathsy_end
@d math_x_height==mathsy(5) {height of `\.x'}
@d math_quad==mathsy(6) {\.{18mu}}
@d num1==mathsy(8) {numerator shift-up in display styles}
@d num2==mathsy(9) {numerator shift-up in non-display, non-\.{\\atop}}
@d num3==mathsy(10) {numerator shift-up in non-display \.{\\atop}}
@d denom1==mathsy(11) {denominator shift-down in display styles}
@d denom2==mathsy(12) {denominator shift-down in non-display styles}
@d sup1==mathsy(13) {superscript shift-up in uncramped display style}
@d sup2==mathsy(14) {superscript shift-up in uncramped non-display}
@d sup3==mathsy(15) {superscript shift-up in cramped styles}
@d sub1==mathsy(16) {subscript shift-down if superscript is absent}
@d sub2==mathsy(17) {subscript shift-down if superscript is present}
@d sup_drop==mathsy(18) {superscript baseline below top of large box}
@d sub_drop==mathsy(19) {subscript baseline below bottom of large box}
@d delim1==mathsy(20) {size of \.{\\xatopx} delimiters in display styles}
@d delim2==mathsy(21) {size of \.{\\xatopx} delimiters in non-displays}
@d axis_height==mathsy(22) {height of fraction lines above the baseline}
@d total_mathsy_params=22

@ The math-extension parameters have similar macros, but the size code is
omitted (since it is always |cur_size| when we refer to such parameters).
@^parameters for symbols@>
@^font parameters@>

@d mathex(#)==font_info[#+param_base[fam_fnt(3+cur_size)]].sc
@d default_rule_thickness==mathex(8) {thickness of \.{\\over} bars}
@d big_op_spacing1==mathex(9) {minimum clearance above a displayed op}
@d big_op_spacing2==mathex(10) {minimum clearance below a displayed op}
@d big_op_spacing3==mathex(11) {minimum baselineskip above displayed op}
@d big_op_spacing4==mathex(12) {minimum baselineskip below displayed op}
@d big_op_spacing5==mathex(13) {padding above and below displayed limits}
@d total_mathex_params=13

@ We also need to compute the change in style between mlists and their
subsidiaries. The following macros define the subsidiary style for
an overlined operand (|und_style|), for a subscript or a superscript
(|sub_style| or |sup_style|), or for a numerator or denominator (|num_style|
or |denom_style|).

@d und_style(#)==2*(# div 2)+cramped {cramp the style}
@d sub_style(#)==2*(# div 4)+script_style+cramped {smaller and cramped}
@d sup_style(#)==2*(# div 4)+script_style+(# mod 2) {smaller}
@d num_style(#)==#+2-2*(# div 6) {smaller unless already script-script}
@d denom_style(#)==2*(# div 2)+cramped+2-2*(# div 6) {smaller, cramped}

@ When the style changes, the following piece of program computes associated

@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>=
begin if cur_style<script_style then cur_size←text_size
else cur_size←16*((cur_style-text_style) div 2);

@ Here is a function that returns a pointer to a rule node having a given
thickness |t|. The rule will extend horizontally to the boundary of the vlist
that eventually contains it.

@p function fraction_rule(@!t:scaled):pointer;
	{construct the bar for a fraction}
var p:pointer; {the new node}
begin p←new_rule; height(p)←t; depth(p)←0; fraction_rule←p;

@ The |overbar| function returns a pointer to a vlist box that consists of
a given box |b|, above which has been placed a kern of height |k| under a
fraction rule of thickness |t| under additional space of height |t|.

@p function overbar(@!b:pointer;@!k,@!t:scaled):pointer;
var p,@!q:pointer; {nodes being constructed}
begin p←new_kern(k); link(p)←b; q←fraction_rule(t); link(q)←p;
p←new_kern(t); link(p)←q; overbar←vpack(p,natural);

@ The |var_delimiter| function, which finds or constructs a sufficiently
large delimiter, is the most interesting of the auxiliary functions that
currently concern us. Given a pointer |d| to a delimiter field in some noad,
together with a size code |s| and a vertical distance |v|, this function
returns a pointer to a box that contains the smallest variant of |d| whose
height plus depth is |v| or more. (And if no variant is large enough, it
returns the largest available variant.) In particular, this routine will
construct arbitrarily large delimiters from extensible components, if
|d| leads to such characters.

The value returned is a box whose |shift_amount| has been set so that
the box is vertically centered with respect to the axis in the given size.
If a built-up symbol is returned, the height of the box before shifting
will be the height of its topmost component.

@p@t\4@>@<Declare subprocedures for |var_delimiter|@>
function var_delimiter(@!d:pointer;@!s:small_number;@!v:scaled):pointer;
label found,continue;
var b:pointer; {the box that will be constructed}
@!f,@!g: internal_font_number; {best-so-far and tentative font codes}
@!c,@!x,@!y: quarterword; {best-so-far and tentative character codes}
@!m,@!n: integer; {the number of extensible pieces}
@!u: scaled; {height-plus-depth of a tentative character}
@!w: scaled; {largest height-plus-depth so far}
@!q: four_quarters; {character info}
@!hd: eight_bits; {height-depth byte}
@!r: four_quarters; {extensible pieces}
@!p: pointer; {character nodes constructed}
@!z: small_number; {runs through font family members}
@!large_attempt: boolean; {are we trying the ``large'' variant?}
begin f←undefined_font; w←0; large_attempt←false;
z←small_fam(d); x←small_char(d);
loop@+	begin @<Look at the variants of |(z,x)|; set |f| and |c| whenever
		a better character is found; |goto found| as soon as a
		large enough variant is encountered@>;
	if large_attempt then goto found; {there were none large enough}
	large_attempt←true; z←large_fam(d); x←large_char(d);
found: if f≠undefined_font then
	@<Make variable |b| point to a box for |(f,c)|@>
else	begin b←new_null_box;
	width(b)←null_delimiter_space; {use this width if no delimiter was found}
shift_amount(b)←half(height(b)-depth(b)) - axis_height(s);

@ The search process is complicated slightly by the facts that some of the
characters might not be present in some of the fonts, and they might not
be probed in increasing order of height.

@<Look at the variants of |(z,x)|; set |f| and |c|...@>=
if (z≠0)∨(x≠min_quarterword) then
	begin z←z+s+16;
	repeat z←z-16; g←fam_fnt(z);
	if g≠undefined_font then
		@<Look at the list of characters starting with |x| in
			font |g|; set |f| and |c| whenever
			a better character is found; |goto found| as soon as a
			large enough variant is encountered@>;
	until z<16;

@ @<Look at the list of characters starting with |x|...@>=
begin y←x;
continue: if (qo(y)≥font_bc[g])∧(qo(y)≤font_ec[g]) then
	begin q←char_info(g)(y);
	if char_exists(q) then
		begin if char_tag(q)=ext_tag then goto found;
		if u>w then
			begin f←g; c←y; w←u;
			if u≥v then goto found;
		if char_tag(q)=list_tag then
			begin y←rem_byte(q); goto continue;

@ When the following code is executed, |char_tag(q)| will be equal to
|ext_tag| if and only if a built-up symbol is supposed to be returned.

@<Make variable |b| point to a box for |(f,c)|@>=
if char_tag(q)=ext_tag then
	@<Construct an extensible character for |(f,c)| in box |b|@>
else	begin p←get_avail; font(p)←f; character(p)←c; list_ptr(b)←p;

@ When we build an extensible character, it's handy to have the
following subroutine, which puts a given character on top
of the characters already in box |b|:

@<Declare subprocedures for |var_delimiter|@>=
procedure stack_into_box(@!b:pointer;@!f:internal_font_number;
var p:pointer; {new node placed into |b|}
begin p←get_avail; font(p)←f; character(p)←c; p←hpack(p,natural);
link(p)←list_ptr(b); list_ptr(b)←p;

@ Another handy subroutine computes the height plus depth of
a given character:

@<Declare subprocedures for |var_delimiter|@>=
function height_plus_depth(@!f:internal_font_number;@!c:quarterword):scaled;
var q:four_quarters;
@!hd:eight_bits; {|height_depth| byte}
begin q←char_info(f)(c); hd←height_depth(q);

@ @<Construct an extensible...@>=
begin b←new_null_box;
@<Compute the minimum suitable height, |w|, and the corresponding
	number of extension steps, |n|; also set |width(b)|@>;
if c≠min_quarterword then stack_into_box(b,f,c);
for m←1 to n do stack_into_box(b,f,c);
if c≠min_quarterword then
	begin stack_into_box(b,f,c); c←ext_rep(r);
	for m←1 to n do stack_into_box(b,f,c);
if c≠min_quarterword then stack_into_box(b,f,c);

@ The width of an extensible character is the width of the repeatable
mod\-ule. If this module does not have positive height plus depth,
we don't use any copies of it, otherwise we use as few as possible
(in groups of two if there is a middle part).

@<Compute the minimum suitable height, |w|, and...@>=
c←ext_rep(r); u←height_plus_depth(f,c);
w←0; width(b)←char_width(f)(char_info(f)(c));@/
c←ext_bot(r);@+if c≠min_quarterword then w←w+height_plus_depth(f,c);
c←ext_mid(r);@+if c≠min_quarterword then w←w+height_plus_depth(f,c);
c←ext_top(r);@+if c≠min_quarterword then w←w+height_plus_depth(f,c);
if u>0 then while w<v do
	begin w←w+u; incr(n);
	if ext_mid(r)≠min_quarterword then w←w+u;

@ The next subroutine is much simpler; it is used for numerators and
denominators of fractions as well as for displayed operators and
their limits above and below. It takes a given hlist box |b| and
changes it so that the new box is centered in a box of width |w|.
The centering is done by putting \.{\\hss} glue at the left and right
of the list inside |b|, then packaging the new box; thus, the
actual box might not really be centered, if it already contains
infinite glue.

@p function rebox(@!b:pointer;@!w:scaled):pointer;
var p:pointer; {temporary register for list manipulation}
begin if (width(b)≠w)∧(list_ptr(b)≠null) then
	begin p←list_ptr(b); free_node(b,box_node_size);
	b←new_glue(ss_glue); link(b)←p;
	while link(p)≠null do p←link(p);
else	begin width(b)←w; rebox←b;

@ Here is a subroutine that creates a new glue specification from another
one that is expressed in `\.{mu}', given the value of the math unit.

@p function math_glue(@!g:pointer;@!m:scaled):pointer;
var p:pointer; {the new glue specification}
@!n:integer; {integer part of |m|}
@!f:scaled; {fraction part of |m|}
begin n←x_over_n(m,@'200000); f←remainder;@/
width(p)←nx_plus_y(n,width(g),f); {convert \.{mu} to \.{pt}}
if stretch_order(p)=normal then stretch(p)←nx_plus_y(n,stretch(g),f)
else stretch(p)←stretch(g);
if shrink_order(p)=normal then shrink(p)←nx_plus_y(n,shrink(g),f)
else shrink(p)←shrink(g);

@ The |math_kern| subroutine removes |mu_glue| from a kern node, given
the value of the math unit.

@p procedure math_kern(@!p:pointer;@!m:scaled);
var @!n:integer; {integer part of |m|}
@!f:scaled; {fraction part of |m|}
begin if subtype(p)=mu_glue then
	begin n←x_over_n(m,@'200000); f←remainder;@/
	width(p)←nx_plus_y(n,width(p),f); subtype(p)←normal;
@* \[35] Typesetting math formulas.
\TeX's major routine for dealing with formulas is called |mlist_to_hlist|.
After a formula has been scanned and represented as an mlist, this routine
converts it to an hlist that can be placed into a box or incorporated into
the text of a paragraph. There are three implicit parameters, passed in
global variables: |cur_mlist| points to the first node or noad in the
given mlist (and it might be |null|); |cur_style| is a style code; and
|mlist_penalties| is |true| if penalty nodes for potential line breaks are
to be inserted into the resulting hlist. After |mlist_to_hlist| has
acted, |link(temp_head)| points to the translated hlist.

Since mlists can be inside mlists, the procedure is recursive. And since this
is not part of \TeX's inner loop, the program has been written in a manner
that stresses compactness over efficiency.

@!cur_mlist:pointer; {beginning of mlist to be translated}
@!cur_style:small_number; {style code at current place in the list}
@!cur_size:small_number; {size code corresponding to |cur_style|}
@!cur_mu:scaled; {the math unit width corresponding to |cur_size|}
@!mlist_penalties:boolean; {should |mlist_to_hlist| insert penalties?}

@ The recursion in |mlist_to_hlist| is due primarily to a subroutine
called |clean_box| that puts a given noad field into a box using a given
math style; |mlist_to_hlist| can call |clean_box|, which can call
|mlist_to_hlist|. The box returned by |clean_box| is ``clean'' in the
sense that its |shift_amount| is zero.

@p procedure mlist_to_hlist; forward;@t\2@>@/
function clean_box(@!p:pointer;@!s:small_number):pointer;
label found,exit;
var q:pointer; {beginning of a list to be boxed}
@!save_style:small_number; {|cur_style| to be restored}
begin case math_type(p) of
math_char: begin cur_mlist←new_noad; mem[operand(cur_mlist)]←mem[p];
sub_box: begin q←info(p);
	if shift_amount(q)=0 then {already clean}
		begin clean_box←q; return;
	else goto found;
sub_mlist: cur_mlist←info(p);
othercases begin q←null; goto found;
save_style←cur_style; cur_style←s; mlist_penalties←false;
mlist_to_hlist; q←link(temp_head); {recursive call}
cur_style←save_style; {restore the style}
@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
found: clean_box←hpack(q,natural);

@ It is convenient to have a procedure that converts a |math_char|
field to an ``unpacked'' form. The |fetch| routine sets |cur_f|, |cur_c|,
and |cur_i| to the font code, character code, and character information bytes of
a given noad field. It also takes care of issuing error messages for
nonexistent characters; in such cases, |char_exists(cur_i)| will be |false|
after |fetch| has acted, and the field will also have been reset to |empty|.

@p procedure fetch(@!a:pointer); {unpack the |math_char| field |a|}
begin cur_c←character(a); cur_f←fam_fnt(fam(a)+cur_size);
if cur_f=undefined_font then
	@<Complain about an undefined family and set |cur_i| null@>
else	begin if (qo(cur_c)≥font_bc[cur_f])∧(qo(cur_c)≤font_ec[cur_f]) then
	else cur_i←null_character;
	if not(char_exists(cur_i)) then
		begin char_warning(cur_f,cur_c);

@ @<Complain about an undefined family...@>=
begin print_nl("! "); print_size(cur_size); print(" ");
print_int(fam(a)); print(" is undefined");
@:text_font_}{\.{\\textfont x is undefined}@>
@:script_font_}{\.{\\scriptfont x is undefined}@>
@:script_script_font_}{\.{\\scriptscriptfont x is undefined}@>
help3("You have to define families outside of a math")@/
	("subformula that uses them. Proceed, and I'll")@/
	("ignore the nonexistent font you referred to.");
error; cur_i←null_character; math_type(a)←empty;

@ The outputs of |fetch| are placed in global variables.

@!cur_f:internal_font_number; {the |font| field of a |math_char|}
@!cur_c:quarterword; {the |character| field of a |math_char|}
@!cur_i:four_quarters; {the |char_info| of a |math_char|,
	or a lig/kern instruction}

@ We need to do a lot of different things, so |mlist_to_hlist| makes two
passes over the given mlist.

The first pass does most of the processing: It removes ``mu'' spacing from
glue, it recursively evaluates all subsidiary mlists so that only the
top-level mlist remains to be handled, it puts fractions and square roots
and such things into boxes, it attaches subscripts and superscripts, and
it computes the overall height and depth of the top-level mlist so that
the size of delimiters for a |left_noad| and a |right_noad| will be known.
The hlist resulting from each noad is recorded in that noad's |new_hlist|
field, an integer field that replaces the |operand| or |thickness|.

The second pass eliminates all noads and inserts the correct glue and
penal\-ties between nodes.

@d new_hlist(#)==mem[operand(#)].int {the translation of an mlist}

@ Here is the overall plan of |mlist_to_hlist|, and the list of its
local variables.

@d done_with_noad=80 {go here when a noad has been fully translated}
@d done_with_node=81 {go here when a node has been fully converted}
@d check_dimensions=82 {go here to update |max_h| and |max_d|}
@d delete_q=83 {go here to delete |q| and move to the next node}

@p@t\4@>@<Declare math construction procedures@>
procedure mlist_to_hlist;
label reswitch, check_dimensions, done_with_noad, done_with_node, delete_q,
var mlist:pointer; {beginning of the given list}
@!penalties:boolean; {should penalty nodes be inserted?}
@!style:small_number; {the given style}
@!save_style:small_number; {holds |cur_style| during recursion}
@!q:pointer; {runs through the mlist}
@!r:pointer; {the most recent noad preceding |q|}
@!r_type:small_number; {the |type| of noad |r|, or |op_noad| if |r=null|}
@!t:small_number; {the effective |type| of noad |q| during the second pass}
@!p,@!v,@!x,@!y,@!z: pointer; {temporary registers for list construction}
@!pen:integer; {a penalty to be inserted}
@!s:small_number; {the size of a noad to be deleted}
@!max_h,@!max_d:scaled; {maximum height and depth of the list translated so far}
@!delta:scaled; {offset between subscript and superscript}
begin mlist←cur_mlist; penalties←mlist_penalties;
style←cur_style; {tuck global parameters away} 
q←mlist; r←null; r_type←op_noad; max_h←0; max_d←0;
@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
while q≠null do @<Process node-or-noad |q| as much as possible in preparation
		for the second pass of |mlist_to_hlist|, then move to the next
		item in the mlist@>;
@<Make a second pass over the mlist, removing all noads and inserting the
	proper spacing and penalties@>;

@ We use the fact that no character nodes appear in an mlist, hence 
the field |type(q)| is always present.

@<Process node-or-noad...@>=
begin @<Do first-pass processing based on |type(q)|; |goto done_with_noad|
	if a noad has been fully processed, |goto check_dimensions| if it
	has been translated into |new_hlist(q)|, or |goto done_with_node|
	if a node has been fully processed@>;
check_dimensions: z←hpack(new_hlist(q),natural);
if height(z)>max_h then max_h←height(z);
if depth(z)>max_d then max_d←depth(z);
done_with_noad: r←q; r_type←type(r);
done_with_node: q←link(q);

@ One of the things we must do on the first pass is change a |bin_noad| to
an |ord_noad| if the |bin_noad| is not in the context of a binary operator.
The values of |r| and |r_type| make this fairly easy.

@<Do first-pass processing...@>=
reswitch: delta←0;
case type(q) of
bin_noad: case r_type of
		begin type(q)←ord_noad; goto reswitch;
	othercases do_nothing
rel_noad,close_noad,punct_noad,right_noad: begin@t@>@;@/
	if r_type=bin_noad then type(r)←ord_noad;
	if type(q)=right_noad then goto done_with_noad;
@t\4@>@<Cases for noads that can follow a |bin_noad|@>@;
@t\4@>@<Cases for nodes that can appear in an mlist, after which we
	|goto done_with_node|@>@;
othercases confusion("mlist1")
@:confusion mlist1}{\quad mlist1@>
@<Convert \(o)|operand(q)| to an hlist and attach the sub/superscripts@>

@ @<Cases for nodes that can appear in an mlist...@>=
style_node: begin cur_style←subtype(q);
	@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
	goto done_with_node;
whatsit_node,penalty_node,disc_node: goto done_with_node;
rule_node: begin if height(q)>max_h then max_h←height(q);
	if depth(q)>max_d then max_d←depth(q); goto done_with_node;
glue_node: begin @<Convert \(m)math glue to ordinary glue@>;
	goto done_with_node;
kern_node: begin math_kern(q,cur_mu); goto done_with_node;

@ Conditional math glue (`\.{\\nonscript}') results in a |glue_node|
pointing to |zero_glue|, with |subtype(q)=cond_math_glue|; in such a case
the node following will be eliminated if it is a glue or kern node and if the
current size is different from |text_size|. Unconditional math glue
(`\.{\\mskip}') is converted to normal glue by multiplying the dimensions
by |cur_mu|.

@<Convert \(m)math glue to ordinary glue@>=
if subtype(q)=mu_glue then
	begin x←glue_ptr(q);
	y←math_glue(x,cur_mu); delete_glue_ref(x); glue_ptr(q)←y;
else if (cur_size>text_size)∧(subtype(q)=cond_math_glue) then
	begin p←link(q);
	if p≠null then if (type(p)=glue_node)∨(type(p)=kern_node) then
			begin link(q)←link(p); link(p)←null; flush_node_list(p);

@ @<Cases for noads that can follow a |bin_noad|@>=
left_noad: goto done_with_noad;
fraction_noad: begin make_fraction(q); goto check_dimensions;
op_noad: begin delta←make_op(q);
	if subtype(q)=switched then goto check_dimensions;
ord_noad: make_ord(q);
open_noad,inner_noad: do_nothing;
radical_noad: make_radical(q);
over_noad: make_over(q);
under_noad: make_under(q);
accent_noad: make_math_accent(q);
vcenter_noad: make_vcenter(q);

@ Most of the actual construction work of |mlist_to_hlist| is done
by procedures like |make_fraction|, |make_radical|, etc. To illustrate
the general setup of such procedures, let's begin with a couple of
simple ones.

@<Declare math...@>=
procedure make_over(@!q:pointer);
begin info(operand(q))←@|overbar(clean_box(operand(q),und_style(cur_style)),@|

@ @<Declare math...@>=
procedure make_under(@!q:pointer);
var p,@!x,@!y: pointer; {temporary registers for box construction}
@!delta:scaled; {overall height plus depth}
begin x←clean_box(operand(q),cur_style);
p←new_kern(3*default_rule_thickness); link(x)←p;
height(y)←height(x); depth(y)←delta-height(y);
info(operand(q))←y; math_type(operand(q))←sub_box;

@ @<Declare math...@>=
procedure make_vcenter(@!q:pointer);
var v:pointer; {the box that should be centered vertically}
@!delta:scaled; {its height plus depth}
begin v←info(operand(q));
if type(v)≠vlist_node then confusion("vcenter");
@:confusion vcenter}{\quad vcenter@>

@ According to the rules in the \.{DVI} file specifications, we ensure alignment
@^square roots@>
between a square root sign and the rule above its operand by assuming that the
baseline of the square-root symbol is the same as the bottom of the rule. The
height of the square-root symbol will be the thickness of the rule, and the
depth of the square-root symbol should exceed or equal the height-plus-depth
of the operand plus a certain minimum clearance@@|clr|. The symbol will be
placed so that the actual clearance is |clr| plus half the excess.

@<Declare math...@>=
procedure make_radical(@!q:pointer);
var x,@!y:pointer; {temporary registers for box construction}
@!delta,@!clr:scaled; {dimensions involved in the calculation}
begin x←clean_box(operand(q),und_style(cur_style));
if cur_style<text_style then
	clr←default_rule_thickness+(math_x_height(cur_size) div 4)
else	begin clr←default_rule_thickness; clr←clr + (clr div 4);
if height(y)≤0 then height(y)←default_rule_thickness;
if delta>0 then clr←clr+half(delta); {increase the actual clearance}
info(operand(q))←hpack(y,natural); math_type(operand(q))←sub_box;

@ Slants are not considered when placing accents in math mode. The accenter is
centered over the accentee, and the accent width is treated as zero with
respect to the size of the final box.

@<Declare math...@>=
procedure make_math_accent(@!q:pointer);
label done;
var p,@!x,@!y:pointer; {temporary registers for box construction}
@!delta:scaled; {amount to raise the accent}
begin fetch(accent_chr(q));
if char_exists(cur_i) then
	begin x←clean_box(operand(q),und_style(cur_style));
	@<Switch to a larger accent if available and appropriate@>;
	if delta<0 then delta←0;
	width(y)←0; p←new_kern(delta-height(x)); link(p)←x; link(y)←p;
	y←vpack(y,natural); width(y)←width(x);

@ @<Switch to a larger accent if available and appropriate@>=
loop@+	begin if char_tag(cur_i)≠list_tag then goto done;
	if char_width(cur_f)(cur_i)>width(x) then goto done;

@ The |make_fraction| procedure is a bit different because it sets
|new_hlist(q)| directly rather than making a sub-box.

@<Declare math...@>=
procedure make_fraction(@!q:pointer);
var p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction}
	{dimensions for box calculations}
begin @<Create equal-width boxes |x| and |z| for the numerator and denominator,
	and compute the default amounts |shift_up| and |shift_down| by which they
	are displaced from the baseline@>;
if thickness(q)=0 then @<Adjust \(s)|shift_up| and |shift_down| for the case
	of no fraction line@>
else @<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>;
@<Construct a vlist box for the fraction, according to |shift_up| and
@<Put the \(f)fraction into a box with its delimiters, and make |new_hlist(q)|
	point to it@>;

@ @<Create equal-width boxes |x| and |z| for the numerator and denom...@>=
if width(x)<width(z) then x←rebox(x,width(z))
else z←rebox(z,width(x));
if cur_style<text_style then
	begin shift_up←num1(cur_size); shift_down←denom1(cur_size);
else	begin shift_down←denom2(cur_size);
	if thickness(q)≠0 then shift_up←num2(cur_size)
	else shift_up←num3(cur_size);

@ The numerator and denominator must be separated by a certain minimum
clearance, called |clr| in the following program. The difference between
|clr| and the actual clearance is |2delta|.

@<Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line@>=
begin if cur_style<text_style then clr←7*default_rule_thickness
else clr←3*default_rule_thickness;
if delta>0 then
	begin shift_up←shift_up+delta;

@ In the case of a fraction line, the minimum clearance depends on the actual
thickness of the line.

@<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>=
begin if thickness(q)=default_code then thickness(q)←default_rule_thickness;
if cur_style<text_style then clr←3*thickness(q)
else clr←thickness(q);
if delta1>0 then shift_up←shift_up+delta1;
if delta2>0 then shift_down←shift_down+delta2;

@ @<Construct a vlist box for the fraction...@>=
v←new_null_box; type(v)←vlist_node;
height(v)←shift_up+height(x); depth(v)←depth(z)+shift_down;
width(v)←width(x); {this also equals |width(z)|}
if thickness(q)=0 then
	begin p←new_kern((shift_up-depth(x))-(height(z)-shift_down));
else	begin y←fraction_rule(thickness(q));@/
link(x)←p; list_ptr(v)←x

@ @<Put the \(f)fraction into a box with its delimiters...@>=
if cur_style<text_style then delta←delim1(cur_size)
else delta←delim2(cur_size);
x←var_delimiter(left_delimiter(q), cur_size, delta); link(x)←v;@/
z←var_delimiter(right_delimiter(q), cur_size, delta); link(v)←z;@/

@ An |op_noad| is a bit confusing because its |operand| is really an
``operator.''  We shall occasionally call the operator an operand, since
that is where the information appears in the data structure. If such an
operand is a single character, it is to be centered vertically with
respect to the axis, after first being enlarged (via a character list in
the font) if we are in display style.  The normal convention for placing
displayed limits is to put them to the right only when the character has a
nonzero italic correction.

The italic correction is removed from the character unless the limits are
being placed at the right and there is no subscript. The |make_op|
routine returns the value that should be used as an offset between
subscript and superscript.

After |make_op| has acted, |subtype(q)| will be |switched| if and only if
the limits have been set above and below the operand. In that case,
|new_hlist(q)| will already contain the desired final box.

@<Declare math...@>=
function make_op(@!q:pointer):scaled;
var delta:scaled; {offset between subscript and superscript}
@!p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction}
@!shift_up,@!shift_down:scaled; {dimensions for box calculation}
begin if math_type(operand(q))=math_char then
	begin fetch(operand(q));
	if (cur_style<text_style)∧(char_tag(cur_i)=list_tag) then {make it larger}
		begin cur_c←rem_byte(cur_i); character(operand(q))←cur_c;
	delta←char_italic(cur_f)(cur_i); x←clean_box(operand(q),cur_style);
	width(x)←width(x)-delta; {remove italic correction}
	shift_amount(x)←half(height(x)-depth(x)) - axis_height(cur_size);
		{center vertically}
	math_type(operand(q))←sub_box; info(operand(q))←x;
else delta←0;
if cur_style<text_style then {display styles}
	if ((delta=0)∧(subtype(q)=normal))∨@|
		((delta≠0)∧(subtype(q)=switched)) then subtype(q)←switched
	else subtype(q)←normal
else subtype(q)←normal;
if subtype(q)=switched then
	@<Construct a box with limits above and below it, skewed by |delta|@>
else if (delta≠0)∧(math_type(subscr(q))=empty) then width(x)←width(x)+delta;

@ The following program builds a vlist box |v| for displayed limits. The
width of the box is not affected by the fact that the limits may be skewed.

@<Construct a box with limits above and below it...@>=
begin x←clean_box(supscr(q),sup_style(cur_style));
v←new_null_box; type(v)←vlist_node; width(v)←width(y);
if width(x)>width(v) then width(v)←width(x);
if width(z)>width(v) then width(v)←width(z);
x←rebox(x,width(v)); y←rebox(y,width(v)); z←rebox(z,width(v));@/
shift_amount(x)←half(delta); shift_amount(z)←-shift_amount(x);
height(v)←height(y); depth(v)←depth(y);
@<Attach the limits to |y| and adjust |height(v)|, |depth(v)| to
	account for their presence@>;

@ We use |shift_up| and |shift_down| in the following program for the
amount of glue between the displayed operator |y| and its limits |x| and
|z|. The vlist inside box |v| will consist of |x| followed by |y| followed
by |z|, with kern nodes for the spaces between and around them.

@<Attach the limits to |y| and adjust |height(v)|, |depth(v)|...@>=
if math_type(supscr(q))=empty then
	begin free_node(x,box_node_size); list_ptr(v)←y;
else	begin shift_up←big_op_spacing3-depth(x);
	if shift_up<big_op_spacing1 then shift_up←big_op_spacing1;
	p←new_kern(shift_up); link(p)←y; link(x)←p;@/
	p←new_kern(big_op_spacing5); link(p)←x; list_ptr(v)←p;
if math_type(subscr(q))=empty then free_node(z,box_node_size)
else	begin shift_down←big_op_spacing4-height(z);
	if shift_down<big_op_spacing2 then shift_down←big_op_spacing2;
	p←new_kern(shift_down); link(y)←p; link(p)←z;@/
	p←new_kern(big_op_spacing5); link(z)←p;

@ A ligature found in a math formula does not create a |ligature_node|, because
there is no question of hyphenation afterwards; the ligature will simply be
stored in an ordinary |char_node|, after residing in an |ord_noad|.

@<Declare math...@>=
procedure make_ord(@!q:pointer);
label restart;
var a:integer; {address of lig/kern instruction}
@!p:pointer; {temporary register for list manipulation}
begin restart:@;@/
if (math_type(subscr(q))=empty)∧(math_type(supscr(q))=empty)∧@|
	(math_type(operand(q))=math_char) then
	begin p←link(q);
	if p≠null then if (type(p)≥ord_noad)∧(type(p)≤punct_noad) then
		if math_type(operand(p))=math_char then
		if fam(operand(p))=fam(operand(q)) then
			begin fetch(operand(q));
			if char_tag(cur_i)=lig_tag then
				begin a←lig_kern_start(cur_f)(cur_i);
				repeat cur_i←font_info[a].qqqq;@/
				@<If instruction |cur_i| is a kern with |cur_c|,
				 attach the kern after |q|;
				 or if it is a ligature with |cur_c|, combine
				 noads |q| and |p| and |goto restart|@>;
				until stop_bit(cur_i)≥stop_flag;
@ Note that a ligature between an |ord_noad| and another kind of noad
is replaced by an |ord_noad|. Presumably a font designer will define such
ligatures only when this convention makes sense.

@<If instruction |cur_i| is a kern with |cur_c|, ...@>=
if next_char(cur_i)=cur_c then
	if op_bit(cur_i)≥kern_flag then
		begin p←new_kern(char_kern(cur_f)(cur_i));
		link(p)←link(q); link(q)←p;
	else	begin link(q)←link(p); character(operand(q))←rem_byte(cur_i);@/
		free_node(p,noad_size); goto restart;

@ When we get to the following part of the program, we have ``fallen through''
from cases that did not lead to |check_dimensions| or |done_with_noad| or
|done_with_node|. Thus, |q|@@points to a noad whose operand may need to be
converted to an hlist, and whose subscripts and superscripts need to be
appended if they are present.

If |operand(q)| is not a |math_char|, the variable |delta| is the amount
by which a superscript should be moved right with respect to a subscript
when both are present.

@<Convert \(o)|operand(q)| to an hlist and attach the sub/superscripts@>=
case math_type(operand(q)) of
math_char: @<Create a character node |p| for |operand(q)|, possibly followed
	by a kern node for the italic correction, and set |delta| to the
	italic correction if a subscript is present@>;
empty: p←null;
sub_box: p←info(operand(q));
sub_mlist: begin cur_mlist←info(operand(q)); save_style←cur_style;
	mlist_penalties←false; mlist_to_hlist; {recursive call}
	cur_style←save_style; @<Set up the values...@>;
othercases confusion("mlist2")
@:confusion mlist2}{\quad mlist2@>
if (math_type(subscr(q))=empty)∧(math_type(supscr(q))=empty) then
	goto check_dimensions;

@ @<Create a character node |p| for |operand(q)|...@>=
begin fetch(operand(q));
if char_exists(cur_i) then
	begin delta←char_italic(cur_f)(cur_i); p←new_character(cur_f,cur_c);
	if (math_type(subscr(q))=empty)∧(delta≠0) then
		begin link(p)←new_kern(delta); delta←0;
else p←null;

@ The purpose of |make_scripts(q,delta)| is to attach the subscript and/or
superscript of noad |q| to the list that starts at |new_hlist(q)|,
given that subscript and superscript aren't both empty. The superscript
will appear to the right of the subscript by a given distance |delta|.

We set |shift_down| and |shift_up| to the minimum amounts to shift the
baseline of subscripts and superscripts based on the given operand.

@<Declare math...@>=
procedure make_scripts(@!q:pointer;@!delta:scaled);
var p,@!x,@!y,@!z:pointer; {temporary registers for box construction}
@!shift_up,@!shift_down,@!clr:scaled; {dimensions in the calculation}
@!t:small_number; {subsidiary size code}
begin p←new_hlist(q);
if is_char_node(p) then
	begin shift_up←0; shift_down←0;
else	begin z←hpack(p,natural);
	if cur_style<script_style then t←script_size@+else t←script_script_size;
if math_type(supscr(q))=empty then
	@<Construct a subscript box |x| when there is no superscript@>
else	begin @<Construct a superscript box |x|@>;
	if math_type(subscr(q))=empty then shift_amount(x)←-shift_up
	else @<Construct a sub/superscript combination box |x|, with the
		superscript offset by |delta|@>;
if new_hlist(q)=null then new_hlist(q)←x
else	begin p←new_hlist(q);
	while link(p)≠null do p←link(p);

@ When there is a subscript without a superscript, the top of the subscript
should not exceed the baseline plus four-fifths of the x-height.

@<Construct a subscript box |x| when there is no superscript@>=
begin x←clean_box(subscr(q),sub_style(cur_style));
if shift_down<sub1(cur_size) then shift_down←sub1(cur_size);
clr←height(x)-((math_x_height(cur_size)*4) div 5);
if shift_down<clr then shift_down←clr;

@ The bottom of a superscript should never descend below the baseline plus
one-fourth of the x-height.

@<Construct a superscript box |x|@>=
begin x←clean_box(supscr(q),sup_style(cur_style));
if odd(cur_style) then clr←sup3(cur_size)
else if cur_style<text_style then clr←sup1(cur_size)
else clr←sup2(cur_size);
if shift_up<clr then shift_up←clr;
clr←depth(x)+(math_x_height(cur_size) div 4);
if shift_up<clr then shift_up←clr;

@ When both subscript and superscript are present, the subscript must be
at least four times |default_rule_thickness| away from the superscript.
If this condition would be violated, the subscript moves down, after which
both subscript and superscript move up so that the bottom of the superscript
is at least as high as the baseline plus four-fifths of the x-height.

@<Construct a sub/superscript combination box |x|...@>=
begin y←clean_box(subscr(q),sub_style(cur_style));
if shift_down<sub2(cur_size) then shift_down←sub2(cur_size);
if clr>0 then
	begin shift_down←shift_down+clr;
	clr←(4*math_x_height(cur_size) div 5)-(shift_up-depth(x));
	if clr>0 then
		begin shift_up←shift_up+clr;
shift_amount(x)←delta; {superscript is |delta| to the right of the subscript}
p←new_kern((shift_up-depth(x))-(height(y)-shift_down)); link(x)←p; link(p)←y;
x←vpack(x,natural); shift_amount(x)←shift_down;
@ We have now tied up all the loose ends of the first pass of |mlist_to_hlist|.
The second pass simply goes through and hooks everything together with the
proper glue and penalties. It also handles the |left_noad| and |right_noad| that
might be present, since |max_h| and |max_d| are now known. Variable |p| points
to a node at the current end of the final hlist.

@<Make a second pass over the mlist, ...@>=
p←temp_head; link(p)←null; q←mlist; r_type←0; cur_style←style;
@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
while q≠null do
	begin @<If node |q| is a style node, change the style and |goto delete_q|;
		otherwise if it is not a noad, put it into the hlist,
		advance |q|, and |goto done|; otherwise set |s| to the size
		of noad |q|, set |t| to the associated type (|ord_noad..
		inner_noad|), and set |pen| to the associated penalty@>;
	@<Append inter-element spacing based on |r_type| and |t|@>;
	@<Append any |new_hlist| entries for |q|, and any appropriate penalties@>;
	delete_q: r←q; q←link(q); free_node(r,s);
	done: end

@ Before we do the big |case| switch in the second pass, we set the default
values so that most of the branches are short.

@<If node |q| is a style node, change the style...@>=
t←inner_noad; s←noad_size; pen←inf_penalty;
case type(q) of
ord_noad,op_noad,open_noad,close_noad,punct_noad: t←type(q);
bin_noad: begin t←bin_noad; pen←bin_op_penalty;
rel_noad: begin t←rel_noad; pen←rel_penalty;
inner_noad,vcenter_noad,over_noad,under_noad: do_nothing;
radical_noad: s←radical_noad_size;
accent_noad: s←accent_noad_size;
fraction_noad: s←fraction_noad_size;
left_noad,right_noad: t←make_left_right(q,style,max_d,max_h);
style_node: @<Change the current style and |goto delete_q|@>;
 glue_node,kern_node: begin@t@>@;@/
	link(p)←q; q←link(q); link(p)←null; goto done;
othercases confusion("mlist3")
@:confusion mlist3}{\quad mlist3@>

@ The |make_left_right| function constructs a delimiter of the required size
and returns the value |open_noad| or |close_noad|. The |right_noad| and
|left_noad| will both be based on the original |style|, so they will
have consistent sizes.

We use the fact that |right_noad-left_noad=close_noad-open_noad|.

@<Declare math...@>=
function make_left_right(@!q:pointer;@!style:small_number;
var delta,@!delta1,@!delta2:scaled; {dimensions used in the calculation}
begin if style<script_style then cur_size←text_size
else cur_size←16*((style-text_style) div 2);
if delta2>delta1 then delta1←delta2; {|delta1| is max distance from axis}
delta←(delimiter_factor*delta1) div 500;
if delta<delta2 then delta←delta2;
make_left_right←type(q)-(left_noad-open_noad); {|open_noad| or |close_noad|}

@ @<Change the current style and |goto delete_q|@>=
begin cur_style←subtype(q); s←small_node_size;
@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>;
goto delete_q;

@ The inter-element spacing in math formulas depends on a $8\times8$ table that
\TeX\ preloads as a 64-digit string. The elements of this string have the
following significance:
\.0 means no space;\cr
\.1 means a conditional thin space (\.{\\nonscript\\mskip\\the\\thinmskip});\cr
\.2 means a thin space (\.{\\mskip\\the\\thinmskip});\cr
\.3 means a conditional medium space (\.{\\nonscript\\mskip\\the\\medmskip});\cr
\.4 means a thick space (\.{\\mskip\\the\\thickmskip});\cr
\.* means an impossible case.\cr}}$$
This is all pretty cryptic, but the \TeX\ manual explains what is supposed to
happen, and the string makes it happen.

A global variable |magic_offset| is computed so that if |a| and |b| are
in the range |ord_noad..inner_noad|, then |str_pool[a*8+b+magic_offset]|
is the digit for spacing between noad types |a| and |b|.

If \PASCAL\ had provided a good way to preload constant arrays, this part of
the program would not have been so strange.

@d math_spacing=@;@/

@!magic_offset:integer; {used to find inter-element spacing}

@ @<Compute the magic offset@>=

@ @<Append inter-element spacing based on |r_type| and |t|@>=
if r_type>0 then {not the first noad}
	begin case str_pool[r_type*8+t+magic_offset] of
	"0": x←0;
	"1": if cur_style<script_style then x←thin_mskip_code@+else x←0;
	"2": x←thin_mskip_code;
	"3": if cur_style<script_style then x←med_mskip_code@+else x←0;
	"4": x←thick_mskip_code;
	othercases confusion("mlist4")
@:confusion mlist4}{\quad mlist4@>
	if x≠0 then
		begin y←math_glue(glue_par(x),cur_mu);
		z←new_glue(y); glue_ref_count(y)←null; link(p)←z; p←z;@/
		subtype(z)←x+1; {store a symbolic subtype}

@ We insert a penalty node after the hlist entries of noad |q| if |pen|
is not an ``infinite'' penalty, and if the node immediately following |q|
is not a penalty node or a |rel_noad| or absent entirely.

@<Append any |new_hlist| entries for |q|, and any appropriate penalties@>=
if new_hlist(q)≠null then
	begin link(p)←new_hlist(q);
	repeat p←link(p);
	until link(p)=null;
if (pen<inf_penalty)∧ penalties ∧(link(q)≠null) then
	begin r_type←type(link(q));
	if (r_type≠penalty_node)∧(r_type≠rel_noad) then
		begin z←new_penalty(pen); link(p)←z; p←z;
@* \[36] Alignment.
It's sort of a miracle whenever \.{\\halign} and \.{\\valign} work, because
they cut across so many of the control structures of \TeX. There\-fore the
present page is probably not the best place for a beginner to start reading
this program; it is better to master everything else first.

Let us focus our thoughts on an example of what the input might be, in order
to get some idea about how the alignment miracle happens. The example doesn't
do anything useful, but it is sufficiently general to indicate all of the
special cases that must be dealt with; please do not be disturbed by its
apparent complexity and meaninglessness.
{}\\tabskip 2pt plus 3pt\cr
{}\\halign to 300pt\{u1\#v1\&\cr
\hskip 50pt\\tabskip 1pt plus 1fil u2\#v2\&\cr
\hskip 50pt u3\#v3\\cr\cr
\hskip 25pt a1\&\\omit a2\&\\vrule\\cr\cr
\hskip 25pt \\noalign\{\\vskip 3pt\}\cr
\hskip 25pt b1\\span b2\\cr\cr
\hskip 25pt \\omit\&c2\\span\\omit\\cr\}\cr}}$$
Here's what happens:

(0) When `\.{\\halign to 300pt\{}' is scanned, the |scan_spec| routine
places the 300pt dimension onto |save_stack|, and the code |align_group|
is placed above it. This will make it possible to complete the alignment
when the matching `\.\}' is found.

(1) The preamble is scanned next. Macros in the preamble are not expanded,
except as part of a tabskip specification. For example, if \.{u2} had been
a macro in the preamble above, it would have been expanded, since \TeX\
must look for `\.{minus...}' as part of the tabskip glue. A ``preamble list''
is constructed based on the user's preamble; in our case it contains the
following seven items:
{}\\glue 2pt plus 3pt&the tabskip preceding column 1\cr
{}\\alignrecord, width $-\infty$&preamble info for column 1\cr
{}\\glue 2pt plus 3pt&the tabskip between columns 1 and 2\cr
{}\\alignrecord, width $-\infty$&preamble info for column 2\cr
{}\\glue 1pt plus 1fil&the tabskip between columns 2 and 3\cr
{}\\alignrecord, width $-\infty$&preamble info for column 3\cr
{}\\glue 1pt plus 1fil&the tabskip following column 3\cr}}$$
These ``alignrecord'' entries have the same size as an |unset_node|,
since they will later be converted into such nodes. However, at the
moment they have no |type| or |subtype| fields; they have |info| fields
instead, and these |info| fields are initially set to the value |end_span|,
for reasons explained below. Furthermore, the alignrecord nodes have no
|height| or |depth| fields; these are renamed |u_part| and |v_part|,
and they point to token lists for the templates of the alignment.
For example, the |u_part| field in the first alignrecord points to the
token list `\.{u1}', i.e., the template preceding the `\.\#' for column@@1.

(2) \TeX\ now looks at what follows the \.{\\cr} that ended the preamble.
It is not `\.{\\noalign}' or `\.{\\omit}', so this input is put back to
be read again, and the template `\.{u1}' is fed to the scanner. Just
before reading `\.{u1}', \TeX\ goes into restricted horizontal mode.
Just after reading `\.{u1}', \TeX\ will see `\.{a1}', and then (when the
{\.\&} is sensed) \TeX\ will see `\.{v1}'. Then \TeX\ scans an |endv|
token, indicating the end of a column. At this point an |unset_node| is
created, containing the contents of the current hlist (i.e., `\.{u1a1v1}').
The natural width of this unset node replaces the |width| field of the
alignrecord for column@@1; in general, the alignrecords will record the
maximum natural width that has occurred so far in a given column.

(3) Since `\.{\\omit}' folows the `\.\&', the templates for column@@2
are now bypassed. Again \TeX\ goes into restricted horizontal mode and
makes an |unset_node| from the resulting hlist; but this time the
hlist contains simply `\.{a2}'. The natural width of the new unset box
is remembered in the |width| field of the alignrecord for column@@2.

(4) A third |unset_node| is created for column 3, using essentially the
mecha\-nism that worked for column@@1; this unset box contains `\.{u3\\vrule
v3}'. The vertical rule in this case has running dimensions that will later
extend to the height and depth of the whole first row, since each |unset_node|
in a row will eventually inherit the height and depth of its enclosing box.

(5) The first row has now ended; it is made into a single unset box
comprising the following seven items:
$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
{}\\glue 2pt plus 3pt\cr
{}\\unsetbox for 1 column: u1a1v1\cr
{}\\glue 2pt plus 3pt\cr
{}\\unsetbox for 1 column: a2\cr
{}\\glue 1pt plus 1fil\cr
{}\\unsetbox for 1 column: u3\vrule v3\cr
{}\\glue 1pt plus 1fil\cr}}$$
The width of this unset row is unimportant, but it has the correct height
and depth, so the correct baselineskip glue will be computed as the row
is inserted into a vertical list.

(6) Since `\.{\\noalign}' follows the current \.{\\cr}, \TeX\ appends
additional material (in this case \.{\\vskip 3pt}) to the vertical list.
While processing this material, \TeX\ will be in internal vertical
mode, and |no_align_group| will be on |save_stack|.

(7) The next row produces an unset box that looks like this:
$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
{}\\glue 2pt plus 3pt\cr
{}\\unsetbox for 2 columns: u1b1v1u2b2v2\cr
{}\\glue 1pt plus 1fil\cr
{}\\unsetbox for 1 column: {\rm(empty)}\cr
{}\\glue 1pt plus 1fil\cr}}$$
The natural width of the unset box that spans columns 1@@and@@2 is stored
in a ``span node,'' which we will explain later; the |info| field of the
alignrecord for column@@1 now points to the new span node, and the |info|
of the span node points to |end_span|.

(8) The final row produces the unset box
$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr
{}\\glue 2pt plus 3pt\cr
{}\\unsetbox for 1 column: {\rm(empty)}\cr
{}\\glue 2pt plus 3pt\cr
{}\\unsetbox for 2 columns: u2c2v2\cr
{}\\glue 1pt plus 1fil\cr}}$$
A new span node is attached to the alignrecord for column 2.

(9) The last step is to compute the true column widths and to change all the
unset boxes to hboxes, appending the whole works to the vertical list that
encloses the \.{\\halign}. The rules for deciding on the final widths of
each unset column box will be explained below.

Note that as \.{\\halign} is being processed, we fearlessly give up control
to the rest of \TeX. At critical junctures, an alignment routine is
called upon to step in and do some little action, but most of the time
these routines just lurk in the background. It's something like
post-hypnotic suggestion.

@ We have mentioned that alignrecords contain no |height| or |depth| fields.
Their |glue_sign| and |glue_order| are pre-empted as well, since it
is necessary to store information about what to do when a template ends.
This information is called the |extra_info| field.

@d u_part(#)==mem[#+height_offset].int {pointer to \<u↓j> token list}
@d v_part(#)==mem[#+depth_offset].int {pointer to \<v↓j> token list}
@d extra_info(#)==info(#+list_offset) {info to remember during template}

@ Alignments can occur within alignments, so a small stack is used to access
the alignrecord information. At each level we have a |preamble| pointer,
indicating the beginning of the preamble list; a |cur_align| pointer,
indicating the current position in the preamble list; a |cur_span| pointer,
indicating the value of |cur_align| at the beginning of a sequence of
spanned columns; and the |align_state| variable, which indicates the nesting
of braces so that \.{\\cr} and \.{\\span} and tab marks are properly

The current values of these four quantities appear in global variables;
when they have to be pushed down, they are stored in 3-word nodes, and
|align_ptr| points to the topmost such node.

@d preamble==link(align_head) {the current preamble list}
@d align_stack_node_size=3 {number of |mem| words to save alignment states}

@!cur_align:pointer; {current position in preamble list}
@!cur_span:pointer; {start of currently spanned columns in preamble list}
@!align_ptr:pointer; {most recently pushed-down alignment stack node}

@ The |align_state| and |preamble| variables are initialized elsewhere.

@<Set init...@>=
align_ptr←null; cur_align←null; cur_span←null;

@ Alignment stack maintenance is handled by a pair of trivial routines
called |push_alignment| and |pop_alignment|.

@p procedure push_alignment;
var p:pointer; {the new alignment stack node}
begin p←get_node(align_stack_node_size);
link(p)←align_ptr; info(p)←cur_align;
llink(p)←preamble; rlink(p)←cur_span;
mem[p+2].int←align_state; align_ptr←p;
procedure pop_alignment;
var p:pointer; {the top alignment stack node}
begin p←align_ptr; align_state←mem[p+2].int;
cur_span←rlink(p); preamble←llink(p);
cur_align←info(p); align_ptr←link(p);

@ \TeX\ has eight procedures that govern alignments: |init_align| and
|fin_align| are used at the very beginning and the very end; |init_row| and
|fin_row| are used at the beginning and end of individual rows; |init_span|
is used at the beginning of a sequence of spanned columns (possibly involving
only one column); |init_col| and |fin_col| are used at the beginning and
end of individual columns; and |align_peek| is used after \.{\\cr} to see
whether the next item is \.{\\noalign}.

We shall consider these routines in the order they are first used during
the course of a complete \.{\\halign}, namely |init_align|, |align_peek|,
|init_row|, |init_span|, |init_col|, |fin_col|, |fin_row|, |fin_align|.

@ When \.{\\halign} or \.{\\valign} has been scanned in an appropriate
mode, \TeX\ calls |init_align|, whose task is to get everything off to a
good start. This mostly involves scanning the preamble and putting its
information into the preamble list.

@p @t\4@>@<Declare the procedure called |get_preamble_token|@>@t@>@/
procedure@?align_peek; forward;@t\2@>@/
procedure@?off_save; forward;@t\2@>@/
procedure init_align;
label done, done1, done2, continue;
var save_cs_ptr:pointer; {|warning_index| value for error messages}
@!p:pointer; {for short-term temporary use}
begin save_cs_ptr←cs_ptr; {\.{\\halign} or \.{\\valign}, usually}
push_alignment; align_state←-1000000; {enter a new alignment level}
@<Check for improper alignment in displayed math@>;
push_nest; {enter a new semantic level}
@<Change current mode to |-vmode| for \.{\\halign}, |-hmode| for \.{\\valign}@>;
scan_spec; new_save_level(align_group);
@<Scan the preamble and record it in the |preamble| list@>;
align_peek; {look for \.{\\noalign} or \.{\\omit}}

@ In vertical modes, |prev_depth| already has the correct value. But
if we are in |mmode| (displayed formula mode), we reach out to the
enclosing vertical mode for the |prev_depth| value that produces the
correct baseline calculations.

@<Change current mode...@>=
if mode=mmode then
	begin mode←-vmode; prev_depth←nest[nest_ptr-2].aux_field;
else	begin if mode>0 then mode←-mode;
	if mode=-hmode then space_factor←1000;

@ When \.{\\halign} is used as a displayed formula, there should be
no other pieces of mlists present. Incomplete mlists need to be
deleted; and we must convert them to hlists first, since the
|flush_node_list| procedure doesn't know about noads.

@<Check for improper alignment in displayed math@>=
if (mode=mmode)∧((tail≠head)∨(incompleat_noad≠null)) then
	begin print_nl("! Improper \halign");
@.Improper \\halign@>
	help3("Displays can use special alignments (like \eqalignno)")@/
	("only if nothing but the alignment itself is between $$'s.")@/
	("So I've deleted the formulas that preceded this alignment.");
	cur_style←display_style; mlist_penalties←false;
	cur_mlist←link(head); mlist_to_hlist; flush_node_list(link(temp_head));@/
	cur_style←display_style; cur_mlist←incompleat_noad;
	mlist_to_hlist; {|mlist_penalties=true|}

@ @<Scan the preamble and record it in the |preamble| list@>=
preamble←null; cur_align←align_head; scanner_status←aligning;
	{at this point, |cur_cmd=left_brace| and |align_state=-999999|}
loop@+	begin @<Append the current tabskip glue to the preamble list@>;
	if cur_cmd=car_ret then goto done; {\.{\\cr} ends the preamble}
	@<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|,
		looking for changes in the tabskip glue; append an
		alignrecord to the preamble list@>;
done: scanner_status←normal

@ @<Append the current tabskip glue to the preamble list@>=

@ @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|...@>=
@<Scan the template \<u↓j>, putting the resulting token list in |hold_head|@>;
link(cur_align)←new_null_box; cur_align←link(cur_align); {a new alignrecord}
info(cur_align)←end_span; width(cur_align)←null_flag;
@<Scan the template \<v↓j>, putting the resulting token list in |hold_head|@>;

@ We enter `\.{\\span}' into |eqtb| with |tab_mark| as its command code,
and with |span_code| as the operand modifier. This makes \TeX\ interpret it
essentially the same as an alignment delimiter like `\.\&', yet it is
recognizably different when we need to distinguish it from a normal delimiter.
It also turns out to be useful to give a special |cr_code| to `\.{\\cr}'.

@d span_code=128 {distinct from any character}
@d cr_code=129 {distinct from |span_code| and from any character}

@<Put each of \TeX's primitives into the hash table@>=
@!@:span_}{\.{\\span} primitive@>
@!@:cr_}{\.{\\cr} primitive@>

@ The preamble is copied directly, except that \.{\\tabskip} causes a change
to the tabskip glue, thereby possibly expanding macros that immediately
follow it.

Note that if the preamble contains `\.{\\global\\tabskip}', the `\.{\\global}'
token survives in the preamble and the `\.{\\tabskip}' defines new
tabskip glue (locally).

@<Declare the procedure called |get_preamble_token|@>=
procedure get_preamble_token;
label reswitch;
begin reswitch: get_token;
if (cur_cmd=assign_glue)∧(cur_chr=tab_skip_code) then
	begin scan_optional_equals; scan_glue(false);
	goto reswitch;
if (cur_chr=span_code)∧(cur_cmd=tab_mark) then
	begin print_nl("! Illegal preamble");
@.Illegal preamble@>
	help2("You mustn't say \span in an alignment until after the \cr")@/
	("that ends the preamble. But proceed; I'll ignore this \span.");@/
	error; goto reswitch;

@ Spaces are eliminated from the beginning of a template.

@<Scan the template \<u↓j>...@>=
p←hold_head; link(p)←null;
loop@+	begin get_preamble_token;
	if cur_cmd=mac_param then goto done1;
	if (cur_cmd≤car_ret)∧(cur_cmd≥tab_mark) then
		begin print_nl("! Missing # inserted in alignment preamble");
@.Missing {\#} inserted...@>
		help3("There should be exactly one # between &'s, when an")@/
		("\halign or \valign is being set up. In this case you had")@/
		("none, so I've put one in; maybe that will work.");
		back_error; goto done1;
	if (cur_cmd≠spacer)∨(p≠hold_head) then
		begin link(p)←get_avail; p←link(p); info(p)←cur_tok;

@ @<Scan the template \<v↓j>...@>=
p←hold_head; link(p)←null;
loop@+	begin continue: get_preamble_token;
	if (cur_cmd≤car_ret)∧(cur_cmd≥tab_mark)∧(align_state=-1000000) then
		goto done2;
	if cur_cmd=mac_param then
		begin print_nl("! Only one # is allowed per tab");
@.Only one {\#} is allowed...@>
		help3("There should be exactly one # between &'s, when an")@/
		("\halign or \valign is being set up. In this case you had")@/
		("more than one, so I'm ignoring all but the first.");
		goto continue;
	link(p)←get_avail; p←link(p); info(p)←cur_tok;
done2: link(p)←get_avail; p←link(p);
info(p)←endv_token {put |endv| at the end}

@ The tricky part about alignments is getting the templates into the
scanner at the right time, and recovering control when a row or column
is finished.

We usually begin a row after each \.{\\cr} has been sensed, unless that
\.{\\cr} is followed by \.{\\noalign} or by the right brace that terminates
the alignment. The |align_peek| routine is used to look ahead and do
the right thing; it either gets a new row started, or gets a \.{\\noalign}
started, or finishes off the alignment.

@<Declare the procedure called |align_peek|@>=
procedure align_peek;
begin @<Get the next non-blank non-call token@>;
if cur_cmd=no_align then
	begin scan_left_brace; new_save_level(no_align_group);
else if cur_cmd=right_brace then fin_align
else	begin init_row; {start a new row}
	init_col; {start a new column and replace what we peeked at}

@ To start a row (i.e., a `row' that rhymes with `dough' but not with `bough'),
we enter a new semantic level, copy the first tabskip glue, and change
from internal vertical mode to restricted horizontal mode or vice versa.

@p @t\4@>@<Declare the procedure called |init_span|@>@t@>@/
procedure init_row;
begin push_nest; mode←(-hmode-vmode)-mode;
cur_align←link(preamble); init_span(cur_align);

@ The parameter to |init_span| is a pointer to the alignrecord where the
next column or group of columns will begin. A new semantic level is
entered, so that the columns will generate a list for subsequent packaging.

@<Declare the procedure called |init_span|@>=
procedure init_span(@!p:pointer);
begin push_nest;
if mode=-vmode then prev_depth←ignore_depth
else space_factor←1000;

@ When a column begins, we assume that |cur_cmd| is either |omit| or else
the current token should be put back into the input until the \<u↓j> template
has been scanned. We remain in the same mode, and start the template if
it is called for.

@p procedure init_col;
begin extra_info(cur_align)←cur_cmd;
if cur_cmd=omit then align_state←0
else	begin back_input; begin_token_list(u_part(cur_align),u_template);

@ The scanner sets |align_state| to zero when the \<u↓j> template ends. When
a subsequent \.{\\cr} or \.{\\span} or tab mark occurs with |align_state=0|,
the scanner activates the following code, which fires up the \<v↓j> template.
We need to remember the |cur_chr|, which is either |cr_code|, |span_code|,
or a character code, depending on how the column text has ended.

@<Insert the \(v)\<v↓j>...@>=
begin cur_cmd←extra_info(cur_align); extra_info(cur_align)←cur_chr;
if cur_cmd=omit then begin_token_list(omit_template,v_template)
else begin_token_list(v_part(cur_align),v_template);
align_state←1000000; goto restart;

@ The token list |omit_template| just referred to is a constant token
list that contains |endv| only.

@<Initialize the special...@>=
info(omit_template)←endv_token; {|link(omit_template)=null|}

@ When the |endv| token at the end of a \<v↓j> template comes through the
scanner, things really start to happen; and it is the |fin_col| routine
that makes them happen. This routine returns |true| if a row as well as a
column has been finished.

@p function fin_col:boolean;
label exit;
var p:pointer; {the alignrecord after the current one}
@!q:pointer; {runs through spanned alignrecords and span nodes}
@!s:pointer; {a new span node}
@!u:pointer; {a new unset box}
@!w:scaled; {natural width}
@!o:glue_ord; {order of infinity}
@!n:halfword; {span counter}
begin if (cur_align=null)∨(link(cur_align)=null) then confusion("endv");
@:confusion endv}{\quad endv@>
@<If the preamble list has been traversed, check that the row has ended@>;
if extra_info(cur_align)≠span_code then
	begin @<Package an unset box for the current column and
		record its width@>;
	@<Copy the tabskip glue between columns@>;
	if extra_info(cur_align)=cr_code then
		@<Package any omitted columns as dummies,
			then |return| with |fin_col=true|@>;
@<Get the next non-blank non-call token@>;
init_col; fin_col←false;
exit: end;

@ @<If the preamble list has been traversed, check that the row has ended@>=
if (p=null)∧(extra_info(cur_align)≠cr_code) then
	begin print_nl("! Extra alignment tab has been changed to \cr");
@.Extra alignment tab...@>
	help3("You have given more \span or & marks than there were")@/
	("in the preamble to the \halign or \valign now in progress.")@/
	("So I'll assume that you meant to type \cr instead.");
	extra_info(cur_align)←cr_code; error;

@ @<Copy the tabskip glue...@>=

@ @<Package any omitted columns...@>=
begin cur_align←p;
while cur_align≠null do
	begin tail_append(new_null_box);
	if width(cur_align)<0 then width(cur_align)←0;
	@<Copy the tabskip glue...@>;
fin_col←true; return;

@ @<Package an unset...@>=
begin if mode=-hmode then
	begin u←hpack(head,natural); w←width(u);
else	begin u←vpackage(head,natural,0); w←height(u);
n←min_quarterword; {this represents a span count of 1}
if cur_span≠cur_align then @<Update width entry for spanned columns@>
else if w>width(cur_align) then width(cur_align)←w;
type(u)←unset_node; span_count(u)←n;@/
@<Determine the stretch order@>;
glue_order(u)←o; glue_stretch(u)←total_stretch[o];@/
@<Determine the shrink order@>;
glue_sign(u)←o; glue_shrink(u)←total_shrink[o];@/
pop_nest; link(tail)←u; tail←u;

@ A span node is a 2-word record contining |width|, |info|, and |link|
fields. The |link| field is not really a link, it indicates the number of
spanned columns; the |info| field points to a span node for the same
starting column, having a greater extent of spanning, or to |end_span|,
which has the largest possible |link| field; the |width| field holds the
largest natural width corresponding to a particular set of spanned columns.

A list of the maximum widths so far, for spanned columns starting at a
given column, begins with the |info| field of the alignrecord for that

@d span_node_size=2 {number of |mem| words for a span node}

@<Initialize the special list heads...@>=
link(end_span)←max_quarterword+1; info(end_span)←null;

@ @<Update width entry for spanned columns@>=
repeat incr(n); q←link(link(q));
until q=cur_align;
if n>max_quarterword then confusion("256 spans"); {this can happen, but won't}
@:confusion 256 spans}{\quad 256 spans@>
q←cur_span; while link(info(q))<n do q←info(q);
if link(info(q))>n then
	begin s←get_node(span_node_size); info(s)←info(q); link(s)←n;
	info(q)←s; width(s)←w;
else if width(info(q))<w then width(info(q))←w

@ At the end of a row, we append an unset box to the current vlist (for
\.{\\halign}) or the current hlist (for \.{\\valign}). This unset box
contains the unset boxes for the columns, separated by the tabskip glue.
Everything will be set later.

@p procedure fin_row;
var p:pointer; {the new unset box}
begin if mode=-hmode then
	begin p←hpack(link(head),natural); pop_nest; append_to_vlist(p);
else	begin p←vpack(link(head),natural); pop_nest;
	link(tail)←p; tail←p; space_factor←1000;
type(p)←unset_node; glue_stretch(p)←0;

@ Finally, we will reach the end of the alignment, and we can breathe a
sigh of relief that memory hasn't overflowed. All the unset boxes will now be
set so that the columns line up, taking due account of spanned columns.

@p procedure@?do_assignments; forward;@t\2@>@/
procedure@?resume_after_display; forward;@t\2@>@/
procedure@?build_page; forward;@t\2@>@/
procedure fin_align;
var @!p,@!q,@!r,@!s,@!u: pointer; {registers for the list operations}
@!t:scaled; {width of column plus tabskip}
@!n:halfword; {matching span amount}
begin if cur_group≠align_group then confusion("align");
@:confusion align}{\quad align@>
@<Go through the preamble list, determining the column widths and
	changing the alignrecords to dummy unset boxes@>;
@<Package the preamble list, to determine the actual tabskip glue amounts,
	and let |p| point to this prototype box@>;
@<Set the glue in all the unset boxes of the current list@>;
flush_node_list(p); pop_alignment;
@<Insert the \(c)current list into its environment@>;
@<Declare the procedure called |align_peek|@>

@ It's time now to dismantle the preamble list and to compute the column
widths. Let $w↓{ij}$ be the maximum of the natural widths of all entries
that span columns $i$ through $j$, inclusive. The alignrecord for column@@$i$
contains $w↓{ii}$ in its |width| field, and there is also a linked list of
the nonzero $w↓{ij}$ for increasing $j$, accessible via the |info| field;
these span nodes contain the value $j-i-1+|min_quarterword|$ in their
|link| fields. The values of $w↓{ii}$ were initialized to |null_flag|, which
we regard as $-\infty$.

The final column widths are defined by the formula
$$w↓j=\max↓{1\L i\L j}\bigglp(w↓{ij}-\sum↓{i\L k<j}(t↓k+w↓k)\biggrp,$$
where $t↓k$ is the natural width of the tabskip glue between columns
$k$ and@@$k+1$. However, if $w↓{ij}=-\infty$ for all |i| in the range
|1≤i≤j| (i.e., if every entry that involved column@@|j| also involved
column@@|j+1|), we let $w↓j=0$.

\TeX\ computes these values by using the following scheme: First $w↓1=w↓{11}$.
Then replace $w↓{2j}$ by $\max(w↓{2j},w↓{1j}-t↓1-w↓1)$, for all $j>1$.
Then $w↓2=w↓{22}$. Then replace $w↓{3j}$ by $\max(w↓{3j},w↓{2j}-t↓2-w↓2)$
for all $j>2$; and so on. If any $w↓j$ turns out to be $-\infty$, its
value is changed to zero.

@<Go through the preamble list,...@>=
repeat flush_list(u_part(q)); flush_list(v_part(q));
if width(q)=null_flag then width(q)←0;
if (p≠null)∧(info(q)≠end_span) then
	@<Merge the widths in the span nodes of |q| with those of |p|,
		destroying the span nodes of |q|@>;
type(q)←unset_node; span_count(q)←min_quarterword; height(q)←width(q);
depth(q)←0; glue_order(q)←normal; glue_sign(q)←normal; q←p;
until q=null

@ Merging of two span-node lists is a typical exercise in the manipulation of
linearly linked data structures. The essential invariant in the following
|repeat| loop is that we want to dispense with node |r|, in |q|'s list,
and |u| is its successor; all nodes of |p|'s list up to and including |s|
have been processed, and the successor of |s| matches |r| or precedes |r|
or follows |r|, according as |link(r)=n| or |link(r)>n| or |link(r)<n|.

@<Merge the widths...@>=
begin t←width(q)+width(glue_ptr(link(q)));
r←info(q); s←end_span; info(s)←p; n←min_quarterword+1;
repeat width(r)←width(r)-t; u←info(r);
while link(r)>n do
	begin s←info(s); n←link(info(s))+1;
if link(r)<n then
	begin info(r)←info(s); info(s)←r; decr(link(r)); s←r;
else	begin if width(r)>width(info(s)) then width(info(s))←width(r);
until r=end_span;

@ Now the preamble list has been converted to a list of alternating unset
boxes and tabskip glue, where the box heights and widths are both equal
to the final column sizes. This list is both an hlist and a vlist, and
both |hpack| and |vpack| will set its glue in the same way; but we should
choose the correct alternative, because it is important to produce the
correct error messages when an alignment is overfull or underfull.
The following program changes the tabskip amounts to equivalent kerns.

@<Package the preamble list...@>=
if mode=-hmode then
else p←vpack(preamble,saved(1),saved(0));
repeat q←link(q); r←glue_ptr(q); type(q)←kern_node; subtype(q)←normal;
if glue_sign(p)=normal then width(q)←width(r)
else if glue_sign(p)=stretching then
	if stretch_order(r)≠glue_order(p) then width(q)←width(r)
	else width(q)←width(q)+round(glue_set(p)*stretch(r))
else if shrink_order(r)≠glue_order(p) then width(q)←width(r)
	else width(q)←width(q)-round(glue_set(p)*shrink(r));
delete_glue_ref(r); q←link(q);
until q=null

@ @<Set the glue in all the unset...@>=
while q≠null do
	begin if type(q)=unset_node then
		@<Set the unset box |q| and the unset boxes in it@>;

@ @<Set the unset box |q| and the unset boxes in it@>=
begin if mode=-vmode then
	begin type(q)←hlist_node; width(q)←width(p);
else	begin type(q)←vlist_node; height(q)←height(p);
glue_order(q)←glue_order(p); glue_sign(q)←glue_sign(p);
glue_set(q)←glue_set(p); shift_amount(q)←0;@/
r←link(list_ptr(q)); s←link(list_ptr(p));
repeat @<Set the glue in node |r| and change it from an unset node@>;
r←link(link(r)); s←link(link(s));
until r=null;

@ @<Set the glue in node |r|...@>=
n←span_count(r); t←width(s);
while n>min_quarterword do
	begin decr(n); s←link(s); t←t+width(s);
	s←link(s); t←t+width(s);
if mode=-vmode then
	@<Make the unset node |r| into an |hlist_node| of width |t|@>
else @<Make the unset node |r| into a |vlist_node| of height |t|@>;

@ @<Make the unset node |r| into an |hlist_node| of width |t|@>=
begin height(r)←height(q); depth(r)←depth(q);
if t=width(r) then
	begin glue_sign(r)←normal; glue_order(r)←normal;
else if t>width(r) then
	begin glue_sign(r)←stretching;
	if glue_stretch(r)=0 then glue_set(r)←0.0
	else glue_set(r)←(t-width(r))/glue_stretch(r);
else	begin glue_order(r)←glue_sign(r); glue_sign(r)←shrinking;
	if glue_shrink(r)=0 then glue_set(r)←0.0
	else glue_set(r)←(width(r)-t)/glue_shrink(r);
width(r)←t; type(r)←hlist_node;

@ @<Make the unset node |r| into a |vlist_node| of height |t|@>=
begin width(r)←width(q);
if t=height(r) then
	begin glue_sign(r)←normal; glue_order(r)←normal;
else if t>height(r) then
	begin glue_sign(r)←stretching;
	if glue_stretch(r)=0 then glue_set(r)←0.0
	else glue_set(r)←(t-height(r))/glue_stretch(r);
else	begin glue_order(r)←glue_sign(r); glue_sign(r)←shrinking;
	if glue_shrink(r)=0 then glue_set(r)←0.0
	else glue_set(r)←(height(r)-t)/glue_shrink(r);
height(r)←t; type(r)←vlist_node;

@ We now have a completed alignment, in the list that starts at |head|
and ends at |tail|. This list will be merged with the one that encloses
it. (In case the enclosing mode is |mmode|, for displayed formulas,
another routine will take care of inserting glue before and after the display.)

@<Insert the \(c)current list into its environment@>=
t←aux; p←link(head); q←tail; pop_nest;
if mode=mmode then @<Finish an alignment in a display@>
else	begin aux←t; link(tail)←p;
	if q≠null then tail←q;
	if mode=vmode then build_page;
@* \[37] Breaking paragraphs into lines.
We come now to what is probably the most interesting algorithm of \TeX:
the mechanism for choosing the ``best possible'' breakpoints that yield
the individual lines of a paragraph. \TeX's line-breaking algorithm takes
a given horizontal list and converts it to a sequence of boxes that are
appended to the current vertical list. In the course of doing this, it
creates a special data structure containing three kinds of records that are
not used elsewhere in \TeX. Such nodes are created while a paragraph is
being processed, and they are destroyed afterwards; thus, the other parts
of \TeX\ do not need to know anything about how line-breaking is done.

The method used here is based on an approach devised by Michael F. Plass and
@^Plass, Michael Frederick@>
@^Knuth, Donald Ervin@>
the author in 1977, subsequently generalized and improved by the same two
people in 1980. A detailed discussion appears in {\sl SOFTWARE---Practice
\AM\ Experience \bf11} (1981), 1119--1184, where it is shown that the
line-breaking problem can be regarded as a special case of the problem of
computing the shortest path in an acyclic network. The cited paper includes
numerous examples and describes the history of line breaking as it has been
practiced by printers through the ages. The present implementation adds two
new ideas to the algorithm of 1980: memory space requirements are considerably
reduced by using smaller records for inactive nodes than for active ones,
and arithmetic overflow is avoided by using ``delta distances'' instead of
keeping track of the total distance from the beginning of the paragraph to the
current point.

@ The |line_break| procedure should be invoked only in horizontal mode; it
leaves that mode and places its output into the current vlist of the
enclosing vertical mode (or internal vertical mode).
There is one explicit parameter:  |final_widow_penalty| is the amount of
additional penalty to be inserted before the final line of the paragraph.

There are also a number of implicit parameters: The hlist to be broken
starts at |link(head)|, and it is nonempty. The value of |already| in the
enclosing semantic level tells where the paragraph should begin in the
sequence of line numbers, in case hanging indentation or \.{\\parshape}
are in use; |already| is zero unless this paragraph is being continued
after a displayed formula.  Other implicit parameters, such as the
|par_shape_ptr| and various penalties to use for hyphenation, etc., appear
in |eqtb|.

After |line_break| has acted, it will have updated the current vlist and the
value of |already|. Furthermore, the global variable |just_box| will
point to the final box created by |line_break|, so that the width of this
line can be ascertained when it is necessary to decide whether to use
|disp_skip| or |disp_a_skip| before a displayed formula.

@!just_box:pointer; {the |hlist_node| for the last line of the new paragraph}

@ Since |line_break| is a rather lengthy procedure---sort of a small world unto
itself---we must build it up little by little, somewhat more cautiously
than we have done with the simpler procedures of \TeX. Here is the
general outline.

@p@t\4@>@<Declare subprocedures for |line_break|@>
procedure line_break(@!final_widow_penalty:integer); 
label done,done1,done2,done3,done4;
var@?@<Local variables for line breaking@>@;
begin par_begin_line←mode_line; {this is for over/underfull box messages}
@<Get ready to start line breaking@>;
@<Find optimal breakpoints@>;
@<Break the paragraph at the chosen breakpoints, justify the resulting lines
to the correct widths, and append them to the current vertical list@>;
@<Clean up the memory by removing the break nodes@>;

@ The first task is to move the list from |head| to |temp_head| and go
into the enclosing semantic level. We also append the \.{\\parfillskip}
glue to the end of the paragraph, removing a space (or other glue node) if
it was there, since spaces usually precede blank lines and instances of
`\.{\$\$}'. The |par_fill_skip| is preceded by an infinite penalty, so
it will never be considered as a potential breakpoint.

This code assumes that a |glue_node| and a |penalty_node| occupy the
same number of words in |mem|.

@<Get ready...@>=
if is_char_node(tail) then tail_append(new_penalty(inf_penalty))
else if (type(tail)≠glue_node)∨(subtype(tail)≥a_leaders) then
else	begin type(tail)←penalty_node; delete_glue_ref(glue_ptr(tail));

@ When looking for optimal line breaks, \TeX\ creates a ``break node'' for
each break that is {\sl feasible}, in the sense that there is a way to end
a line at the given place without requiring any line to stretch more than
a given tolerance. A break node is characterized by three things: the position
of the break (which is a pointer to a |glue_node|, |math_node|, |penalty_node|,
or |disc_node|); the ordinal number of the line that will follow this
breakpoint; and the fitness classification of the line that has just
ended, i.e., |tight_fit|, |decent_fit|, |loose_fit|, or |very_loose_fit|.

@d tight_fit=0 {fitness classification for lines shrinking 0.5 to 1.0 of their
@d loose_fit=2 {fitness classification for lines stretching 0.5 to 1.0 of their
@d very_loose_fit=3 {fitness classification for lines stretching more than
	their stretchability}
@d decent_fit=1 {fitness classification for all other lines}

@ The algorithm essentially determines the best possible way to achieve
each feasible combination of position, line, and fitness. Thus, it answers
questions like, ``What is the best way to break the opening part of the
paragraph so that the fourth line is a tight line ending at such-and-such
a place?'' However, the fact that all lines are to be the same length
after a certain point makes it possible to regard all sufficiently large
line numbers as equivalent, when the looseness parameter is zero, and this
makes it possible for the algorithm to save space and time.

An ``active node'' and a ``passive node'' are created in |mem| for each
feasible breakpoint that needs to be considered. Active nodes are three
words long and passive nodes are two words long. We need active nodes only
for breakpoints near the place in the paragraph that is currently being
examined, so they are recycled within a comparatively short time after
they are created.

@ An active node for a given breakpoint contains six fields:

\yskip\hang|link| points to the next node in the list of active nodes; the
last active node has |link=last_active|.

\yskip\hang|break_node| points to the passive node associated with this

\yskip\hang|line_number| is the number of the line that follows this

\yskip\hang|fitness| is the fitness classification of the line ending at this

\yskip\hang|type| is either |hyphenated| or |unhyphenated|, de\-pending on
whether this breakpoint is a |disc_node|.

\yskip\hang|total_demerits| is the minimum possible sum of demerits over all
lines leading from the beginning of the paragraph to this breakpoint.

The value of |link(active)| points to the first active node on a linked list
of all currently active nodes. This list is in order by |line_number|,
except that nodes with |line_number>easy_line| may be in any order relative
to each other.

@d active_node_size=3 {number of words in active nodes}
@d fitness==subtype {|tight_fit..very_loose_fit| on final line for this break}
@d break_node==rlink {pointer to the corresponding passive node}
@d line_number==llink {line that begins at this breakpoint}
@d total_demerits(#)==mem[#+2].int {the quantity that \TeX\ minimizes}
@d unhyphenated=0 {the |type| of a normal active break node}
@d hyphenated=1 {the |type| of an active node that breaks at a |disc_node|}
@d last_active==active {the active list ends where it begins}

@ @<Initialize the special list heads...@>=
type(last_active)←hyphenated; line_number(last_active)←max_halfword;
subtype(last_active)←0; {the |subtype| is never examined by the algorithm}

@ The passive node for a given breakpoint contains only three fields:

\yskip\hang|link| points to the passive node created just before this one,
if any, otherwise it is |null|.

\yskip\hang|cur_break| points to the position of this breakpoint in the
horizontal list for the paragraph being broken.

\yskip\hang|prev_break| points to the passive node that should precede this
one in an optimal path to this breakpoint.

There is a global variable called |passive| that points to the most
recently created passive node.

@d passive_node_size=2 {number of words in passive nodes}
@d cur_break==mark_ptr {in passive node, points to position of this breakpoint}
@d prev_break==info {points to passive node that should precede this one}

@!passive:pointer; {most recent node on passive list}

@ The active list also contains ``delta'' nodes that help the algorithm
compute the badness of individual lines. Such nodes appear only between two
active nodes, and they have |type=delta|. If |p| and |r| are active nodes
and if |q| is a delta node between them, so that |link(p)=q| and |link(q)=r|,
then |q| tells the space difference between lines in the horizontal list that
start after breakpoint |p| and lines that start after breakpoint |r|. In
other words, if we know the length of the line that starts after |p| and
ends at our current position, then the corresponding length of the line that
starts after |r| is obtained by adding the amounts in node@@|q|. A delta node
contains six scaled numbers, since it must record the net change in glue
stretchability with respect to all orders of infinity. The natural width
difference appears in |mem[q+1].sc|; the stretch differences in units of
pt, fil, fill, and filll appear in |mem[q+2..q+5].sc|; and the shrink difference
appears in |mem[q+6].sc|. The |subtype| field of a delta node is not used.

@d delta_node_size=7 {number of words in a delta node}
@d delta_node=2 {|type| field in a delta node}

@ As the algorithm runs, it maintains a set of six delta-like registers
for the length of the line following the first active breakpoint to the
current position in the given hlist. When it makes a pass through the
active list, it also maintains a similar set of six registers for the
length following the active breakpoint of current interest. A third set
holds the length of an empty line (namely, the sum of \.{\\leftskip} and
\.{\\rightskip}); and a fourth set is used to create new delta nodes.

When we pass a delta node we want to do operations like `\!|for k←1 to 6 do
cur_active_width[k]←cur_active_width[k]+mem[q+k].sc|', and we want to
do this without the overhead of |for| loops. The |do_all_six| macro
makes such six-tuples convenient.

@d do_all_six(#)==#(1);#(2);#(3);#(4);#(5);#(6)

@!active_width:array[1..6] of scaled;
	{distance from first active node to@@|cur_p|}
@!cur_active_width:array[1..6] of scaled; {distance from current active node}
@!background:array[1..6] of scaled; {length of an ``empty'' line}
@!break_width:array[1..6] of scaled; {length being computed after current break}

@ Let's state the principles of the delta nodes more precisely and concisely,
so that the following programs will be less obscure. For each legal
breakpoint@@$p$ in the paragraph, we define two quantities $\alpha(p)$ and
$\beta(p)$ such that the length of material in a line from breakpoint@@$p$
to breakpoint@@$q$ is $\gamma+\beta(q)-\alpha(p)$, for some fixed $\gamma$.
Intuitively, $\alpha(p)$ and $\beta(q)$ are the total length of material from
the beginning of the paragraph to a point ``after'' a break at $p$ and to a
point ``before'' a break at $q$; and $\gamma$ is the width of an empty line,
namely the length contributed by \.{\\leftskip} and \.{\\rightskip}.

Suppose, for example, that the paragraph consists entirely of alternating
boxes and glue skips; let the boxes have widths $x↓1\ldotsm x↓n$ and
let the skips have widths $y↓1\ldotsm y↓n$, so that the paragraph can be
represented by $x↓1y↓1\ldotsm x↓ny↓n$. Let $p↓i$ be the legal breakpoint
at $y↓i$; then $\alpha(p↓i)=x↓1+y↓1+\cdots+x↓i+y↓i$, and $\beta(p↓i)=
x↓1+y↓1+\cdots+x↓i$. To check this, note that the length of material from
$p↓2$ to $p↓5$, say, is $\gamma+x↓3+y↓3+x↓4+y↓4+x↓5=\gamma+\beta(p↓5)

The quantities $\alpha$, $\beta$, $\gamma$ involve glue stretchability and
shrinkability as well as a natural width. If we were to compute $\alpha(p)$
and $\beta(p)$ for each $p$, we would need multiple precision arithmetic, and
the multiprecise numbers would have to be kept in the active nodes.
\TeX\ avoids this problem by working entirely with relative differences
or ``deltas.'' Suppose, for example, that the active list contains
$a↓1\,\delta↓1\,a↓2\,\delta↓2\,a↓3$, where the $a$'s are active breakpoints
and the $\delta$'s are delta nodes. Then $\delta↓1=\alpha(a↓1)-\alpha(a↓2)$
and $\delta↓2=\alpha(a↓2)-\alpha(a↓3)$. If the line breaking algorithm is
currently positioned at some other breakpoint $p$, the |active_width| array
contains the value $\gamma+\beta(p)-\alpha(a↓1)$. If we are scanning through
the list of active nodes and considering a tentative line that runs from
$a↓2$ to@@$p$, say, the |cur_active_width| array will contain the value
$\gamma+\beta(p)-\alpha(a↓2)$. Thus, when we move from $a↓2$ to $a↓3$,
we want to add $\alpha(a↓2)-\alpha(a↓3)$ to |cur_active_width|; and this
is just $\delta↓2$, which appears in the active list between $a↓2$ and
$a↓3$. The |background| array contains $\gamma$. The |break_width| array
will be used to calculate values of new delta nodes when the active
list is being updated.

@ Glue nodes in a horizontal list that is being paragraphed are not supposed to
include ``infinite'' shrinkability; that is why the algorithm maintains
four registers for stretching but only one for shrinking. If the user tries to
introduce infinite shrinkability, the shrinkability will be reset to finite
and an error message will be issued. A boolean variable |no_shrink_error_yet|
prevents this error message from appearing more than once per paragraph.

@d check_shrinkage(#)==if (shrink_order(#)≠normal)∧(shrink(#)≠0) then
	begin #←finite_shrink(#);

@!no_shrink_error_yet:boolean; {have we complained about infinite shrinkage?}

@ @<Declare subprocedures for |line_break|@>=
function finite_shrink(@!p:pointer):pointer; {recovers from infinite shrinkage}
var q:pointer; {new glue specification}
begin if no_shrink_error_yet then
	begin no_shrink_error_yet←false;
	help5("The paragraph just ended includes some glue that has")@/
	("infinite shrinkability, e.g., `\hskip 0pt minus 1fil'.")@/
	("Such glue doesn't belong there---it allows a paragraph")@/
	("of any length to fit on one line. But it's safe to proceed,")@/
	("since the offensive shrinkability has been made finite.");
	print_nl("! Infinite glue shrinkage found in a paragraph"); error;
@.Infinite glue shrinkage...@>
q←new_spec(p); shrink_order(q)←normal;
delete_glue_ref(p); finite_shrink←q;

@ @<Get ready...@>=
check_shrinkage(left_skip); check_shrinkage(right_skip);@/
q←left_skip; r←right_skip; background[1]←width(q)+width(r);@/
background[2]←0; background[3]←0; background[4]←0; background[5]←0;@/

@ A pointer variable |cur_p| runs through the given horizontal list as we look
for breakpoints. This variable is global, since it is used both by |line_break|
and by its subprocedure |try_break|.

Another global variable called |threshold| is used to determine the feasibility
of individual lines: breakpoints are feasible if there is a way to reach
them without creating lines whose badness exceeds |threshold|.  (The
badness is compared to |threshold| before penalties are added, so that
penalty values do not affect the feasibility of breakpoints, except that
no break is allowed when the penalty is 10000 or more.) If |threshold|
is 10000 or more, all legal breaks are considered feasible, since the
|badness| function specified above never returns a value greater than@@10000.

Two passes might be made through the paragraph in an attempt to find at
least one set of feasible breakpoints. On the first pass, we have
|threshold=pretolerance| and |second_pass=false|. If this pass fails to find a
feasible solution, |threshold| is set to |tolerance|, |second_pass| is set
|true|, and an attempt is made to hyphenate as many words as possible.

@!cur_p:pointer; {the current breakpoint under consideration}
@!second_pass:boolean; {is this our second attempt to break this paragraph?}
@!threshold:integer; {maximum badness on feasible lines}

@ The heart of the line-breaking procedure is `|try_break|', a subroutine
that tests if the current breakpoint |cur_p| is feasible, by running
through the active list to see what lines of text can be made from active
nodes to@@|cur_p|.  If feasible breaks are possible, new break nodes are
created.  If |cur_p| is too far from an active node, that node is

The parameter |pi| to |try_break| is the penalty associated
with a break at |cur_p|; we have |pi=eject_penalty| if the break is forced,
and |pi=inf_penalty| if the break is illegal.

The other parameter, |break_type|, is set to |hyphenated| or |unhyphenated|,
de\-pending on whether or not the current break is at a |disc_node|. The
end of a para\-graph is also regarded as `|hyphenated|'; this case is
distinguishable by the condition |cur_p=null|.

@d copy_to_cur_active(#)==cur_active_width[#]←active_width[#]
@d deactivate=60 {go here when node |r| should be deactivated}

@<Declare subprocedures for |line_break|@>=
procedure try_break(@!pi:integer;@!break_type:small_number);
label exit,done,continue,deactivate;
var r:pointer; {runs through the active list}
@!prev_r:pointer; {stays a step behind |r|}
@!old_l:halfword; {maximum line number in current equivalence class of lines}
@!no_break_yet:boolean; {have we found a feasible break at |cur_p|?}
@<Other local variables for |try_break|@>@;
begin @<Make sure that |pi| is in the proper range@>;
no_break_yet←true; prev_r←active; old_l←0;
loop@+	begin continue: r←link(prev_r);
	@<If node |r| is of type |delta_node|, update |cur_active_width|,
		set |prev_r| and |prev_prev_r|, then |goto continue|@>;
	@<If a line number class has ended, create new active nodes for
		the best feasible breaks in that class; then |return|
		if |r=last_active|, otherwise compute the new |line_width|@>;
	@<Consider the demerits for a line from |r| to |cur_p|;
		deactivate node |r| if it should no longer be active;
		then |goto continue| if a line from |r| to |cur_p| is infeasible,
		otherwise record a new feasible break@>;

@ @<Other local variables for |try_break|@>=
@!prev_prev_r:pointer; {a step behind |prev_r|, if |type(prev_r)=delta_node|}
@!s:pointer; {runs through nodes ahead of |cur_p|}
@!q:pointer; {points to a new node being created}
@!v:pointer; {points to a glue specification}
@!t:quarterword; {replacement count, if |cur_p| is a discretionary node}
@!f:internal_font_number; {used in character width calculation}
@!l:halfword; {line number of current active node}
@!node_r_stays_active:boolean; {should node |r| remain in the active list?}
@!line_width:scaled; {the current line will be justified to this width}
@!fit_class:tight_fit..very_loose_fit; {possible fitness class of test line}
@!b:halfword; {badness of test line}
@!d:integer; {demerits of test line}

@ @<Make sure that |pi| is in the proper range@>=
if abs(pi)≥inf_penalty then
	if pi>0 then return {this breakpoint is inhibited by infinite penalty}
	else pi←eject_penalty {this breakpoint will be forced}

@ The following code uses the fact that |type(last_active)≠delta_node|.

@d update_width(#)==@|

@<If node |r|...@>=
if type(r)=delta_node then
	begin do_all_six(update_width);
	prev_prev_r←prev_r; prev_r←r; goto continue;

@ As we consider various ways to end a line at |cur_p|, in a given line number
class, we keep track of the best total demerits known, in an array with
one entry for each of the fitness classifications. For example,
|minimal_demerits[tight_fit]| contains the fewest total demerits of feasible
line breaks ending at |cur_p| with a |tight_fit| line; |best_place[tight_fit]|
points to the passive node for the break before@@|cur_p| that achieves such
an optimum; and |best_pl_line[tight_fit]| is the |line_number| field in the
active node corresponding to |best_place[tight_fit]|. When no feasible break
sequence is known, the |minimal_demerits| entries will be equal to
|awful_bad|, which is $2^{30}-1$. Another variable, |minimum_demerits|,
keeps track of the smallest value in the |minimal_demerits| array.

@d awful_bad==@'7777777777 {more than a billion demerits}

@!minimal_demerits:array[tight_fit..very_loose_fit] of scaled; {best total
	demerits known for current line class and position, given the fitness}
@!minimum_demerits:scaled; {best total demerits known for current line class
	and position}
@!best_place:array[tight_fit..very_loose_fit] of pointer; {how to achieve
@!best_pl_line:array[tight_fit..very_loose_fit] of halfword; {corresponding
	line number}

@ @<Get ready...@>=

@ The first part of the following code is part of \TeX's inner loop, so
we don't want to waste any time. The current active node, namely node |r|,
contains the line number that will be considered next. At the end of the
list we have arranged the data structure so that |r=last_active| and
@^inner loop@>

@<If a line number class...@>=
begin l←line_number(r);
if l>old_l then
	begin {now we are no longer in the inner loop}
	if (minimum_demerits<awful_bad)∧@|
			((old_l≠easy_line)∨(r=last_active)) then
		@<Create new active nodes for the best feasible breaks
			just found@>;
	if r=last_active then return;
	@<Compute the new line width@>;

@ It is not necessary to create new active nodes having |minimal_demerits
>minimum_demerits+abs(adj_demerits)|, since such active nodes will never
be chosen in the final paragraph breaks. This observation allows us to
omit a substantial number of feasible breakpoints from further consideration.

@<Create new active nodes...@>=
begin if no_break_yet then @<Compute the values of |break_width|@>;
@<Insert a delta node to prepare for breaks at |cur_p|@>;
for fit_class←tight_fit to very_loose_fit do
	begin if minimal_demerits[fit_class]≤minimum_demerits then
		@<Insert a new active node
			from |best_place[fit_class]| to |cur_p|@>;
@<Insert a delta node to prepare for the next active node@>;

@ When we insert a new active node for a break at |cur_p|, suppose this
new node is to be placed just before active node |a|; then we essentially
want to insert `$\delta\,|cur_p|\,\delta^\prime$' before $a$, where
$\delta=\alpha(a)-\alpha(|cur_p|)$ and $\delta^\prime=\alpha(|cur_p|)-\alpha(a)$
in the notation explained above.  The |cur_active_width| array now hold
$\gamma+\beta(|cur_p|)-\alpha(a)$; so $\delta$ can be obtained by
subtracting |cur_active_width| from the quantity $\gamma+\beta(|cur_p|)-
\alpha(|cur_p|)$. The latter quantity can be regarded as the length of a
line ``from |cur_p| to |cur_p|''; we call it the |break_width| at |cur_p|.

The |break_width| is usually negative, since it consists of the background
(which is normally zero) minus the width of nodes following@@|cur_p| that are
eliminated after a break. If, for example, node |cur_p| is a glue node, the
width of this glue is subtracted from the background; and we also look
ahead to eliminate all subsequent glue and penalty and kern and math
nodes, subtracting their widths as well.

@d set_break_width_to_background(#)==break_width[#]←background[#]

@<Compute the values of |break...@>=
begin no_break_yet←false; do_all_six(set_break_width_to_background);
if (break_type=unhyphenated)∨(cur_p=null) then
	begin s←cur_p;
	while s≠null do
		begin if is_char_node(s) then goto done;
		case type(s) of
		glue_node:@<Subtract glue from |break_width|@>;
		penalty_node: do_nothing;
		math_node,kern_node: break_width[1]←break_width[1]-width(s);
		othercases goto done
else @<Compute the discretionary |break_width| values@>;
done: end

@ @<Subtract glue from |break...@>=
begin v←glue_ptr(s); break_width[1]←break_width[1]-width(v);

@ When |cur_p| is a discretionary break, the length of a line ``from |cur_p| to
|cur_p|'' has to be defined properly so that the other calculations work out.
Suppose that the pre-break text at |cur_p| has length $l↓0$, the post-break
text has length $l↓1$, and the replacement text has length $l$. Suppose
also that |q| is the node following the replacement text. Then length of a
line from |cur_p| to |q| will be computed as $\gamma+\beta(q)-\alpha(|cur_p|)$,
where $\beta(q)=\beta(|cur_p|)-l↓0+l$. The actual length will be the background
plus $l↓1$, so the length from |cur_p| to |cur_p| should be $\gamma+l↓0+l↓1-l$.

The value of $l↓0$ need not be computed, since |line_break| will put
it into the global variable |disc_width| before calling |try_break|.

@!disc_width:scaled; {the length of discretionary material preceding a break}

@ @<Compute the discretionary |break...@>=
begin t←replace_count(cur_p); s←cur_p;
while t>0 do
	begin decr(t); s←link(s);
	@<Subtract the width of node |s| from |break_width|@>;
while s≠null do
	begin @<Add the width of node |s| to |break_width|@>;

@ Replacement texts and discretionary texts are supposed to contain
only character nodes, kern nodes, and ligature nodes.

@<Subtract the width of node |s|...@>=
if is_char_node(s) then
	begin f←font(s);
else	case type(s) of
	ligature_node: begin f←font(lig_char(s));@/
	kern_node: break_width[1]←break_width[1]-width(s);
	othercases confusion("disc1")
@:confusion disc1}{\quad disc1@>

@ @<Add the width of node |s| to |b...@>=
if is_char_node(s) then
	begin f←font(s);
else	case type(s) of
	ligature_node: begin f←font(lig_char(s));
	kern_node: break_width[1]←break_width[1]+width(s);
	othercases confusion("disc2")
@:confusion disc2}{\quad disc2@>

@ We use the fact that |type(active)≠delta_node|.

@d convert_to_break_width(#)==@|
@d store_break_width(#)==active_width[#]←break_width[#]
@d new_delta_to_break_width(#)==@|

@<Insert a delta node to prepare for breaks at |cur_p|@>=
if type(prev_r)=delta_node then {modify an existing delta node}
	begin εo_all_six(convert_to_break_width);
else if prev_r=active then {no delta node needed at the beginning}
	begin do_all_six(store_break_width);
else	begin q←get_node(delta_node_size); link(q)←r; type(q)←delta_node;
	subtype(q)←0; {the |subtype| is not used}
	link(prev_r)←q; prev_prev_r←prev_r; prev_r←q;

@ When the following code is performed, we will have just inserted at
least one active node before |r|, so |type(prev_r)≠delta_node|.

@d new_delta_from_break_width(#)==@|mem[q+#].sc←

@<Insert a delta node to prepare for the next active node@>=
if r≠last_active then
	begin q←get_node(delta_node_size); link(q)←r; type(q)←delta_node;
	subtype(q)←0; {the |subtype| is not used}
	link(prev_r)←q; prev_prev_r←prev_r; prev_r←q;

@ When we create an active node, we also create the corresponding
passive node.

@<Insert a new active node from |best_place[fit_class]| to |cur_p|@>=
begin q←get_node(passive_node_size);
link(q)←passive; passive←q; cur_break(q)←cur_p;
q←get_node(active_node_size); break_node(q)←passive;
fitness(q)←fit_class; type(q)←break_type;
link(q)←r; link(prev_r)←q; prev_r←q;

@ The length of lines depends on whether the user has specified
\.{\\parshape} or \.{\\hangindent}. If |par_shape_ptr| is not null, it
points to a $(2n+1)$-word record in |mem|, where the |info| in the first
word contains the value of |n|, and the other $2n$ words contain the left
margins and line lengths for the first |n| lines of the paragraph; the
specifications for line |n| apply to all subsequent lines. If
|par_shape_ptr=null|, the shape of the paragraph depends on the value of
|n=hang_after|; if |n≥0|, hanging indentation takes place on lines |n+1|,
|n+2|, $\ldotss$, otherwise it takes place on lines 1, $\ldotss$, $\leftv
n\rightv$. When hanging indentation is active, the left margin is
|hanging_indent|, if |hanging_indent≥0|, else it is 0; the line length is
$|hsize|-\leftv|hanging_indent|\rightv$. The normal setting is
|par_shape_ptr=null|, |hang_after=0|, and |hanging_indent=0|.
Note that if |hanging_indent=0|, the value of |hang_after| is irrelevant.
@^length of lines@> @^hanging indentation@>

@!easy_line:halfword; {line numbers |>easy_line| are equivalent in break nodes}
@!last_special_line:halfword; {line numbers |>last_special_line| all have
	the same width}
@!first_width:scaled; {the width of all lines |≤last_special_line|, if
	no \.{\\parshape} has been specified}
@!second_width:scaled; {the width of all lines |>last_special_line|}
@!first_indent:scaled; {left margin to go with |first_width|}
@!second_indent:scaled; {left margin to go with |second_width|}

@ We compute the values of |easy_line| and the other local variables relating
to line length when the |line_break| procedure is initializing itself.

@<Get ready...@>=
if par_shape_ptr=null then
	if hanging_indent=0 then
		begin last_special_line←0; first_width←hsize;
	else @<Set line length parameters in preparation for hanging indentation@>
else	begin last_special_line←info(par_shape_ptr)-1;
if looseness=0 then easy_line←last_special_line
else easy_line←max_halfword

@ @<Set line length parameters in preparation for hanging indentation@>=
begin last_special_line←abs(hang_after);
if hang_after<0 then
	begin first_width←hsize-abs(hanging_indent);
	if hanging_indent≥0 then first_indent←hanging_indent
	else first_indent←0;
	second_width←hsize; second_indent←0;
else	begin first_width←hsize; first_indent←0;
	if hanging_indent≥0 then second_indent←hanging_indent
	else second_indent←0;

@ When we come to the following code, we have just encountered the first
active node@@|r| whose |line_number| field contains |l|. Thus we want to
compute the length of the $l\,$th line of the current paragraph. Furthermore
we want to set |old_l| to the last number in the class of line numbers
equivalent to@@|l|.

@<Compute the new line width@>=
if l>easy_line then
	begin line_width←second_width; old_l←max_halfword-1;
else	begin old_l←l;
	if l>last_special_line then line_width←second_width
	else if par_shape_ptr=null then line_width←first_width
	else line_width←mem[par_shape_ptr+2*l@,].sc;

@ The remaining part of |try_break| deals with the calculation of
demerits for a break from |r| to |cur_p|.

The first thing to do is calculate the badness, |b|. This value will always
be between zero and |inf_bad+1|; the latter value occurs only in the
case of lines from |r| to |cur_p| that cannot shrink enough to fit the necessary
width. In such cases, node |r| will be deactivated.
We also deactivate node@@|r| when a break at@@|cur_p| is forced, since future
breaks must go through a forced break.

@<Consider the demerits for a line from |r| to |cur_p|...@>=
begin if cur_active_width[1]<line_width then
	@<Set the value of |b| to the badness for stretching the line,
		and compute the corresponding |fit_class|@>
else @<Set the value of |b| to the badness for shrinking the line,
		and compute the corresponding |fit_class|@>;
if (b>inf_bad)∨(pi=eject_penalty) then
	@<Prepare to deactivate node@@|r|, and |goto deactivate| unless
		there is a reason to consider lines of text from |r| to |cur_p|@>
else	begin prev_r←r;
	if b>threshold then goto continue;
@<Record a new feasible break@>;
if node_r_stays_active then goto continue; {|prev_r| has been set to |r|}
deactivate: @<Deactivate node |r|@>;

@ When a line must stretch, the available stretchability appears in the
subarray |cur_active_width[2..5]|, in units of points, fil, fill, and filll.

@<Set the value of |b| to the badness for stretching...@>=
if (cur_active_width[3]≠0)∨(cur_active_width[4]≠0)∨@|
	(cur_active_width[5]≠0) then
	begin b←0; fit_class←decent_fit; {infinite stretch}
else	begin b←badness(line_width-cur_active_width[1],cur_active_width[2]);
	if b>12 then
		if b>99 then fit_class←very_loose_fit
		else fit_class←loose_fit
	else fit_class←decent_fit;

@ Shrinkability is never infinite in a paragraph;
we can shrink the line from |r| to |cur_p| by at most |cur_active_width[6]|.

@<Set the value of |b| to the badness for shrinking...@>=
begin if cur_active_width[1]-line_width>cur_active_width[6] then
else b←badness(cur_active_width[1]-line_width,cur_active_width[6]);
if b>12 then fit_class←tight_fit@+else fit_class←decent_fit;

@ During the second pass, we dare not lose all active nodes, lest we lose
touch with the line breaks already found. The code shown here makes sure
that such a catastrophe does not happen, by permitting overfull boxes as
a last resort. This particular part of \TeX\ was a source of several subtle
bugs before the correct program logic was finally discovered; readers
who seek to ``improve'' \TeX\ should therefore think thrice before daring
to make any changes here.
@^overfull boxes@>

@<Prepare to deactivate node@@|r|, and |goto deactivate| unless...@>=
begin if second_pass ∧ (minimum_demerits=awful_bad) ∧@|
	 (link(r)=last_active) ∧
	(prev_r=active) then b←0 {set badness zero, this break is forced}
else if b>threshold then goto deactivate;

@ When we get to this part of the code, the line from |r| to |cur_p| is
feasible, its badness is@@|b|, and its fitness classification is |fit_class|.
We don't want to make an active node for this break yet, but we will
compute the total demerits and record them in the |minimal_demerits| array,
if such a break is the current champion among all ways to get to |cur_p|
in a given line-number class and fitness class.

@<Record a new feasible break@>=
@<Compute the fewest total demerits, |d|,
	from the beginning to |cur_p| via@@|r|@>;
if d≤minimal_demerits[fit_class] then
	begin minimal_demerits[fit_class]←d;
	best_place[fit_class]←break_node(r); best_pl_line[fit_class]←l;
	if d<minimum_demerits then minimum_demerits←d;

@ @<Compute the fewest total demerits...@>=
if pi≥0 then
	begin d←(line_penalty+b+pi); d←d*d;
else	begin d←line_penalty+b; d←d*d;
	if pi>eject_penalty then d←d-pi*pi;
if (break_type=hyphenated)∧(type(r)=hyphenated) then
	if cur_p≠null then d←d+double_hyphen_demerits
	else d←d+final_hyphen_demerits;
if abs(fit_class-fitness(r))>1 then d←d+adj_demerits

@ When an active node disappears, we must delete an adjacent delta node if the
active node was at the beginning or the end of the active list, or if it
was surrounded by delta nodes. We also must preserve the property that
|cur_active_width| represents the length of material from |link(prev_r)|

@d combine_two_deltas(#)==@|mem[prev_r+#].sc←mem[prev_r+#].sc+mem[r+#].sc
@d downdate_width(#)==@|cur_active_width[#]←cur_active_width[#]-

@<Deactivate node |r|@>=
link(prev_r)←link(r); free_node(r,active_node_size);
if prev_r=active then @<Update the active widths, since the first active
	node has been deleted@>
else if type(prev_r)=delta_node then
	begin r←link(prev_r);
	if r=last_active then
		begin do_all_six(downdate_width);
		free_node(prev_r,delta_node_size); prev_r←prev_prev_r;
	else if type(r)=delta_node then
		begin do_all_six(update_width);
		link(prev_r)←link(r); free_node(r,delta_node_size);

@ The following code uses the fact that |type(last_active)≠delta_node|. If the
active list has just become empty, we do not need to update the
|active_width| array, since it will be initialized when an active
node is next inserted.

@d update_active(#)==active_width[#]←active_width[#]+mem[r+#].sc

@<Update the active widths,...@>=
begin r←link(active);
if type(r)=delta_node then
	begin do_all_six(update_active);
	link(active)←link(r); free_node(r,delta_node_size);
@* \[38] Breaking paragraphs into lines, continued.
So far we have gotten a little way into the |line_break| routine, having
covered its important |try_break| subroutine. Now let's consider the
rest of the process.

The main loop of |line_break| traverses the hlist of the given paragraph,
starting at |link(temp_head)|, and calls |try_break| at each legal
breakpoint. A variable called |auto_breaking| is set to true except
within math formulas, since glue nodes are not legal breakpoints when
they appear in formulas.

The current node of interest in the hlist is pointed to by |cur_p|. Another
local variable, |prev_p|, is usually one step behind |cur_p|, but the real
meaning of |prev_p| is this: If |type(cur_p)=glue_node| then |cur_p| is a legal
breakpoint if and only if |auto_breaking| is true and |prev_p| does not
point to a glue node, penalty node, kern node, or math node.

The following declarations provide for a few other local variables that are
used in special calculations.

@<Local variables for line breaking@>=
@!auto_breaking:boolean; {is node |cur_p| outside a formula?}
@!prev_p:pointer; {helps to determine when glue nodes are breakpoints}
@!q,@!r,@!s:pointer; {miscellaneous nodes of temporary interest}
@!f:internal_font_number; {used when calculating character widths}

@ The `\!|loop|\unskip' in the following code is performed at most
twice per call of |line_break|, since it is actually a pass over the
entire paragraph.

@<Find optimal breakpoints@>=
threshold←pretolerance; second_pass←false;
loop@+	begin @<Create an active breakpoint representing the beginning of
		the paragraph@>;
	cur_p←link(temp_head); auto_breaking←true;@/
	prev_p←cur_p; {glue at beginning is not a legal breakpoint}
	while (cur_p≠null)∧(link(active)≠last_active) do
		@<Call |try_break| if |cur_p| is a legal breakpoint;
		on the second pass, also try to hyphenate the next
		word, if |cur_p| is a glue node;
		then advance |cur_p| to the next node of the paragraph
		that could possibly be a legal breakpoint@>;
	if cur_p=null then
		@<Try the final line break at the end of the paragraph,
		and |goto done| if the desired breakpoints have been found@>;
	@<Clean up the memory by removing the break nodes@>;
	threshold←tolerance; second_pass←true; {if at first you don't
		succeed, $\ldotss$}

@ The active node that represents the starting point does not need a
corresponding passive node. 

@d store_background(#)==active_width[#]←background[#]

@<Create an active breakpoint representing the beginning of the paragraph@>=
type(q)←unhyphenated; fitness(q)←decent_fit;
link(q)←last_active; break_node(q)←null;
line_number(q)←already+1; total_demerits(q)←0; link(active)←q;

@ @<Clean...@>=
while q≠last_active do
	begin cur_p←link(q);
	if type(q)=delta_node then free_node(q,delta_node_size)
	else free_node(q,active_node_size);
while q≠null do
	begin cur_p←link(q);

@ Here is the main switch in the |line_break| routine, where legal breaks
are determined. As we move through the hlist, we need to keep the |active_width|
array up to date, so that the badness of individual lines is readily calculated
by |try_break|. It is convenient to use the short name |act_width| for
the component of active width that represents real width as opposed to glue.

@d act_width==active_width[1] {length from first active node to current node}
@d kern_break==begin@t@>@;@/
	if not is_char_node(link(cur_p)) ∧ auto_breaking then
		if type(link(cur_p))=glue_node then try_break(0,unhyphenated);

@<Call |try_break| if |cur_p| is a legal breakpoint...@>=
begin while is_char_node(cur_p) do
	@<Advance \(c)|cur_p| to the node following the present character@>;
case type(cur_p) of
hlist_node,vlist_node,rule_node: act_width←act_width+width(cur_p);
whatsit_node: @<Advance \(p)past a whatsit node in the |line_break| loop@>;
glue_node: begin @<If node |cur_p| is a legal breakpoint, call |try_break|@>;
	@<Update the active widths by including the glue in |glue_ptr(cur_p)|@>;
	if second_pass ∧ auto_breaking then
		@<Try to hyphenate the following word@>;
kern_node: kern_break;
ligature_node: begin f←font(lig_char(cur_p));
disc_node: @<Try to break after a discretionary fragment@>;
math_node: begin auto_breaking←(subtype(cur_p)=after); kern_break;
penalty_node: try_break(penalty(cur_p),unhyphenated);
mark_node,ins_node,adjust_node: do_nothing;
othercases confusion("paragraph")
@:confusion paragraph}{\quad paragraph@>
prev_p←cur_p; cur_p←link(cur_p);

@ The code that passes over the characters of words in a paragraph is
part of \TeX's inner loop, so it has been streamlined for speed. We use
the fact that `\.{\\parfillskip}' glue appears at the end of each paragraph;
it is therefore unnecessary to check if |link(cur_p)=null| when |cur_p| is a
character node.

@<Advance \(c)|cur_p| to the node following the present character@>=
begin f←font(cur_p);

@ When node |cur_p| is a glue node, we look at |prev_p| to see whether or not
a breakpoint is legal at |cur_p|, as explained above.

@<If node |cur_p| is a legal breakpoint, call...@>=
if auto_breaking then
	begin if is_char_node(prev_p) then try_break(0,unhyphenated)
	else if precedes_break(prev_p) then try_break(0,unhyphenated);

@ @<Update the active widths by including the glue in |glue_ptr(cur_p)|@>=
begin check_shrinkage(glue_ptr(cur_p)); q←glue_ptr(cur_p);

@ The following code knows that discretionary texts contain
only character nodes, kern nodes, and ligature nodes.

@<Try to break after a discretionary fragment@>=
begin s←pre_break(cur_p);
if s=null then try_break(ex_hyphen_penalty,hyphenated)
else	begin disc_width←0;
	repeat @<Add the width of node |s| to |disc_width|@>;
	until s=null;

@ @<Add the width of node |s| to |disc_width|@>=
if is_char_node(s) then
	begin f←font(s);
else	case type(s) of
	ligature_node: begin f←font(lig_char(s));
	kern_node: disc_width←disc_width+width(s);
	othercases confusion("disc3")
@:confusion disc3}{\quad disc3@>

@ The forced line break at the paragraph's end will reduce the list of
breakpoints so that all active nodes represent breaks at |cur_p=null|.
On the first pass, we insist on finding an active node that has the
correct ``looseness.'' On the second pass, there will be at least one active
node, and we will match the desired looseness as well as we can.

The global variable |best_bet| will be set to the active node for the best
way to break the paragraph, and a few other variables are used to
help determine what is best.

@!best_bet:pointer; {use this passive node and its predecessors}
@!fewest_demerits:integer; {the demerits associated with |best_bet|}
@!best_line:halfword; {line number following the last line of the new paragraph}
@!actual_looseness:integer; {the difference between |line_number(best_bet)|
	and the optimum |best_line|}
@!line_diff:integer; {the difference between the current line number and
	the optimum |best_line|}

@ @<Try the final line break at the end of the paragraph...@>=
begin try_break(eject_penalty,hyphenated);
if link(active)≠last_active then
	begin @<Find an active node with fewest demerits@>;
	if looseness=0 then goto done;
	@<Find the best active node for the desired looseness@>;
	if (actual_looseness=looseness)∨ second_pass then goto done;

@ @<Find an active node...@>=
r←link(active); fewest_demerits←awful_bad;
repeat if type(r)≠delta_node then if total_demerits(r)<fewest_demerits then
	begin fewest_demerits←total_demerits(r); best_bet←r;
until r=last_active;

@ The adjustment for a desired looseness is a slightly more complicated
version of the loop just considered. Note that if a paragraph is broken
into segments by displayed equations, each segment will be subject to the
looseness calculation, independently of the other segments.

@<Find the best active node...@>=
begin r←link(active); actual_looseness←0;
repeat if type(r)≠delta_node then
	begin line_diff←line_number(cur_p)-best_line;
	if ((line_diff<actual_looseness)∧(looseness≤line_diff))∨@|
	((line_diff>actual_looseness)∧(looseness≥line_diff)) then
		begin best_bet←cur_p; actual_looseness←line_diff;
	else if (line_diff=actual_looseness)∧@|
		(total_demerits(r)<fewest_demerits) then
		begin best_bet←cur_p; fewest_demerits←total_demerits(r);
until r=last_active;

@ Once the best sequence of breakpoints has been found, we call on the
procedure |post_line_break| to finish the remainder of the work.
(By introducing this subprocedure, we are able to keep |line_break|
from getting extremely long.)

@<Break the paragraph at the chosen...@>=

@ The total number of lines that will be set by |post_line_break|
is |best_line-already-1|. The last breakpoint is specified by
|break_node(best_bet)|, and this passive node points to the other breakpoints
via the |prev_break| links. The finishing-up phase starts by linking the
relevant passive nodes in forward order, changing |prev_break| to
|next_break|. (The |next_break| fields actually reside in the same memory
space as the |prev_break| fields did, but we give them a new name because
of their new significance.) Then the lines are justified, one by one.

@d next_break==prev_break {new name for |prev_break| after links are reversed}

@<Declare subprocedures for |line_break|@>=
procedure post_line_break(@!final_widow_penalty:integer);
label done,done1;
var q,@!r,@!s:pointer; {temporary registers for list manipulation}
@!disc_break:boolean; {was the current break at a discretionary node?}
@!cur_width:scaled; {width of line number |cur_line|}
@!cur_indent:scaled; {left margin of line number |cur_line|}
@!t:quarterword; {used for replacement counts in discretionary nodes}
@!pen:integer; {use when calculating penalties between lines}
@!cur_line: halfword; {the current line number being justified}
begin @<Reverse the links of the relevant passive nodes, setting |cur_p| to the
	first breakpoint@>;
repeat @<Justify the line ending at breakpoint |cur_p|, and append it to the
	current vertical list, together with associated penalties and other
incr(cur_line); cur_p←next_break(cur_p);
if cur_p≠null then @<Prune unwanted nodes at the beginning of the next line@>;
until cur_p=null;
if (cur_line≠best_line)∨(link(temp_head)≠null) then
	confusion("line breaking");
@:confusion line breaking}{\quad line breaking@>

@ The job of reversing links in a list is conveniently regarded as the job
of taking items off one stack and putting them on another. In this case we
take them off a stack pointed to by |q| and having |prev_break| fields;
we put them on a stack pointed to by |cur_p| and having |next_break| fields.
Node |r| is the passive node being moved from stack to stack.

@<Reverse the links of the relevant passive nodes...@>=
q←break_node(best_bet); cur_p←null;
repeat r←q; q←prev_break(q); next_break(r)←cur_p; cur_p←r;
until q=null

@ Glue and penalty and kern and math nodes are deleted at the beginning of
a line, except in the unusual case that the node to be deleted is actually
one of the chosen breakpoints. The pruning done here is designed to match
the lookahead computation in |try_break|, where the |break_width| values
are computed for non-discretionary breakpoints.

@<Prune unwanted nodes at the beginning of the next line@>=
begin r←temp_head;
loop@+	begin q←link(r);
	if q=cur_break(cur_p) then goto done1;
		{|cur_break(cur_p)| is the next breakpoint}
	{now |q| cannot be |null|}
	if is_char_node(q) then goto done1;
	if non_discardable(q) then goto done1;
	r←q; {now |type(q)=glue_node|, |kern_node|, |math_node| or |penalty_node|}
done1: if r≠temp_head then
	begin link(r)←null; flush_node_list(link(temp_head));

@ The current line to be justified appears in a horizontal list starting
at |link(temp_head)| and ending at |cur_break(cur_p)|. If |cur_break(cur_p)| is
a glue node, we reset the glue to equal the |right_skip| glue; otherwise
we append the |right_skip| glue at the right. If |cur_break(cur_p)| is a
discretionary node, we modify the list so that the discretionary break
is compulsory, and we set |disc_break| to |true|. We also append
the |left_skip| glue at the left of the line, unless it is zero.

@<Justify the line ending at breakpoint |cur_p|, and append it...@>=
@<Modify the end of the line to reflect the nature of the break and to include
	\.{\\rightskip}; also set the proper value of |disc_break|@>;
@<Put the \(l)\.{\\leftskip} glue at the left and detach this line@>;
@<Call the packaging subroutine, setting |just_box| to the justified box@>;
@<Append the new box to the current vertical list, followed by the list of
	special nodes taken out of the box by the packager@>;
@<Append a penalty node, if a nonzero penalty is appropriate@>

@ At the end of the following code, |q| will point to the final node on the
list about to be justified.

@<Modify the end of the line...@>=
q←cur_break(cur_p); disc_break←false;
if q≠null then {|q| cannot be a |char_node|}
	if type(q)=glue_node then
		begin delete_glue_ref(glue_ptr(q));
		subtype(q)←right_skip_code+1; add_glue_ref(right_skip);
		goto done;
	else	begin if type(q)=disc_node then
			@<Change discretionary to compulsory and set
		if (type(q)=math_node)∨(type(q)=kern_node) then width(q)←0;
else	begin q←temp_head;
	while link(q)≠null do q←link(q);
@<Put the \(r)\.{\\rightskip} glue after node |q|@>;

@ @<Change discretionary to compulsory...@>=
begin t←replace_count(q);
@<Destroy the |t| nodes following |q|, but save the last one if it is
	a necessary kern; make |r| point to the following node@>;
if post_break(q)≠null then @<Transplant the post-break list@>;
if pre_break(q)≠null then @<Transplant the pre-break list@>;
link(q)←r; disc_break←true;

@ A subtle bug that would perhaps never have been detected is avoided here
by preserving a kern node that just might equal |cur_break(next_break(cur_p))|.

@<Destroy the |t| nodes following |q|...@>=
if t=0 then r←link(q)
else	begin r←q;
	while t>1 do
		begin r←link(r); decr(t);
	if ¬is_char_node(s) then if next_break(cur_p)≠null then
		if cur_break(next_break(cur_p))=s then s←r;
	r←link(s); link(s)←null;
	flush_node_list(link(q)); replace_count(q)←0;

@ We move the post-break list from inside node |q| to the main list by
re\-attaching it just before the present node |r|, then resetting |r|.

@<Transplant the post-break list@>=
begin s←post_break(q);
while link(s)≠null do s←link(s);
link(s)←r; r←post_break(q); post_break(q)←null;

@ We move the pre-break list from inside node |q| to the main list by
re\-attaching it just after the present node |q|, then resetting |q|.

@<Transplant the pre-break list@>=
begin s←pre_break(q); link(q)←s;
while link(s)≠null do s←link(s);
pre_break(q)←null; q←s;

@ @<Put the \(r)\.{\\rightskip} glue after node |q|@>=
r←new_param_glue(right_skip_code); link(r)←link(q); link(q)←r; q←r

@ The following code begins with |q| at the end of the list to be
justified. It ends with |q| at the beginning of that list, and with
|link(temp_head)| pointing to the remainder of the paragraph, if any.

@<Put the \(l)\.{\\leftskip} glue at the left...@>=
r←link(q); link(q)←null; q←link(temp_head); link(temp_head)←r;
if left_skip≠zero_glue then
	begin r←new_param_glue(left_skip_code);
	link(r)←q; q←r;

@ Now |q| points to the hlist that represents the current line of the
paragraph. We need to compute the appropriate line width, pack the
line into a box of this size, and shift the box by the appropriate
amount of indentation.

@<Call the packaging subroutine...@>=
if cur_line>last_special_line then
	begin cur_width←second_width; cur_indent←second_indent;
else if par_shape_ptr=null then
	begin cur_width←first_width; cur_indent←first_indent;
else	begin cur_width←mem[par_shape_ptr+2*cur_line].sc;

@ @<Append the new box to the current vertical list...@>=
if adjustments≠null then
	begin link(tail)←adjustments;
	repeat tail←link(tail);
	until link(tail)=null;

@ Penalties between the lines of a paragraph come from widow lines, from
the |inter_line_penalty| parameter, and from lines that end at
discretionary breaks.  Breaking between lines of a two-line paragraph gets
two contributions of widow-line penalties. The local variable |pen| will
be set to the sum of all relevant penalties for the current line.

@<Append a penalty node, if a nonzero penalty is appropriate@>=
if cur_line+1=best_line then pen←0@+else pen←inter_line_penalty;
if (cur_line=already+1)∧(best_line≠already+2) then pen←pen+widow_penalty;
if cur_line+2=best_line then pen←pen+final_widow_penalty;
if disc_break then pen←pen+broken_penalty;
if pen≠0 then
	begin r←new_penalty(pen);
	link(tail)←r; tail←r;
@* \[39] Pre-hyphenation.
When the line-breaking routine is unable to find a feasible sequence of
breakpoints, it makes a second pass over the paragraph, attempting to
hyphenate the hyphenatable words. The goal of hyphenation is to insert
discretionary material into the paragraph so that there are more
potential places to break.

The general rules for hyphenation are somewhat complex and technical,
because we want to be able to hyphenate words that are preceded or
followed by punctuation marks, and because we want the rules to work
for languages other than English. We also must contend with the fact
that hyphens might radically alter the ligature and kerning structure
of a word.

A sequence of characters will be considered for hyphenation only if it
belongs to a ``potentially hyphenatable part'' of the current paragraph.
This is a sequence of nodes $p↓0p↓1\ldotsm p↓m$ where $p↓0$ is a glue
node, $p↓1\ldotsm p↓{m-1}$ are either character or ligature or kern nodes,
and $p↓m$ is a glue or penalty or insertion or adjust or mark node.
(Therefore hyphenation is disabled by boxes, math formulas, whatsit nodes, and
discretionary nodes already inserted by the user.) The ligature nodes among
$p↓1\ldotsm p↓{m-1}$ are effectively expanded into the original non-ligature
characters, and the kern nodes are ignored. Each character $c$ is now
classified as either a nonletter (if |c≥128| or |lc_code(c)=0|), a lower
case letter (if |lc_code(c)=c|), or an upper case letter (otherwise); an
upper case letter is treated as if it were |lc_code(c)| for purposes of
hyphenation. The characters generated by $p↓1\ldotsm p↓{m-1}$ may begin with
nonletters; let $c↓1$ be the first letter that is not in the middle
of a ligature. All characters that do not have
the same font as $c↓1$ will be treated as nonletters. \TeX\ looks ahead
for as many consecutive letters $c↓1\ldotsm c↓n$ as possible; however,
|n| must be less than 64, so a character that would otherwise be $c↓{64}$
is effectively not a letter. Furthermore $c↓n$ must not be in the middle
of a ligature. In this way we obtain a string of letters
$c↓1\ldotsm c↓n$ that are generated by nodes $p↓a\ldotsm p↓b$, where
|1≤a≤b+1≤m|. If |n≥5|, this string qualifies for hyphenation; however,
|uc_hyph| must be nonzero, if $c↓1$ is upper case.

The hyphenation process takes place in three stages. First, the candidate
sequence $c↓1\ldotsm c↓n$ is found; then potential positions for hyphens
are determined by referring to hyphenation tables; and finally, the nodes
$p↓a\ldotsm p↓b$ are replaced by a new sequence of nodes that includes the
discretionary breaks found.

Fortunately, we do not have to do all this calculation very often, because
of the way it has been taken out of \TeX's inner loop. For example, when
the second edition of the author's 700-page book {\sl Seminumerical
Algorithms} was typeset by \TeX, only about 1.2 hyphenations needed to be
@^Knuth, Donald Ervin@>
tried per paragraph, since the line breaking algorithm needed to use two
passes on only about 5 per cent of the paragraphs.

@ The letters $c↓1\ldotsm c↓n$ that are candidates for hyphenation are placed
into an array called |hc|; the number |n| is placed into |hn|; pointers to
nodes $p↓a$ and@@$p↓b$ in the description above are placed into variables
|ha| and |hb|; and the font number is placed into |hf|.

@!hc:array[0..65] of ascii_code; {word to be hyphenated}
@!hn:small_number; {the number of positions occupied in |hc|}
@!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result}
@!hf:internal_font_number; {font number of the letters in |hc|}
@!hu:array[1..63] of ascii_code; {like |hc|, before conversion to lower case}

@ Hyphenation routines need a few more local variables.

@<Local variables for line...@>=
@!i,@!j,@!l:small_number; {indices into |hc| or |hu|}
@!c:0..255; {character being considered for hyphenation}

@ When the following code is activated, the |line_break| procedure is in its
second pass, and |cur_p| points to a glue node.

@<Try to hyphenate...@>=
begin s←link(cur_p);
if s≠null then
	begin @<Skip to node |ha|, or |goto done1| if no hyphenation
		should be attempted@>;
	@<Skip to node |hb|, putting letters into |hu| and |hc|,
		or |goto done1| if no hyphenation should be attempted@>;
	@<Check that the nodes following |hb| permit hyphenation and that
		at least five letters have been found, otherwise |goto done1|@>;
done1: end

@ @<Declare subprocedures for |line_break|@>=
@t\4@>@<Declare the function called |reconstitute|@>
procedure hyphenate;
label done,found,not_found,found1,exit;
var@?@<Local variables for hyphenation@>@;
begin @<Find hyphen locations for the word in |hc|@>;
@<If no hyphens were found, |return|@>;
@<Replace nodes |ha..hb| by a sequence of nodes that includes
	the discretionary hyphens@>;

@ The first thing we need to do is find the node |ha| that contains the
first letter.

@<Skip to node |ha|, or |goto done1|...@>=
loop@+	begin if is_char_node(s) then
		begin c←qo(character(s)); hf←font(s);
	else if type(s)=ligature_node then
		begin q←lig_ptr(s); c←qo(character(q)); hf←font(q);
	else if type(s)≠kern_node then goto done1
	else c←128;
	if c<128 then if lc_code(c)≠0 then
		if (lc_code(c)=c)∨(uc_hyph≠0) then goto done2
		else goto done1;
done2: ha←s

@ The word to be hyphenated is now moved to the |hu| and |hc| arrays.

@<Skip to node |hb|, putting letters...@>=
loop@+	begin if is_char_node(s) then
		begin if font(s)≠hf then goto done3;
		if c≥128 then goto done3;
		if (lc_code(c)=0)∨(hn=63) then goto done3;
		hb←s; incr(hn); hu[hn]←c; hc[hn]←lc_code(c);
	else if type(s)=ligature_node then
		@<Move the characters of a ligature node to |hu| and |hc|;
			but |goto done3| if they are not all letters@>
	else if type(s)≠kern_node then goto done3;

@ We let |j| be the index of the character being stored when a ligature node
is being expanded, since we do not want to advance |hn| until we are sure
that the entire ligature consists of letters. Note that it is possible
to get to |done3| with |hn=0| and |hb| not set to any value.

@<Move the characters of a ligature node to |hu| and |hc|...@>=
begin j←hn; q←lig_ptr(s);
if font(q)≠hf then goto done3;
repeat c←qo(character(q));
if c≥128 then goto done3;
if (lc_code(c)=0)∨(j=63) then goto done3;
incr(j); hu[j]←c; hc[j]←lc_code(c);@/
until q=null;
hb←s; hn←j;

@ @<Check that the nodes following |hb| permit hyphenation...@>=
if hn<5 then goto done1;
loop@+	begin if ¬(is_char_node(s)) then
		case type(s) of
		ligature_node,kern_node: do_nothing;
			goto done4;
		othercases goto done1
@* \[40] Post-hyphenation.
If a hyphen may be inserted between |hc[j]| and |hc[j+1]|, the hyphenation
procedure will set |hyf[j]| to some small odd number. But before we look
at \TeX's hyphenation procedure, which is independent of the rest of the
line-breaking algorithm, let us consider what we will do with the hyphens
it finds, since it is better to work on this part of the program before
forgetting what |ha| and |hb|, etc., are all about.

@!hyf:array [0..64] of 0..9; {odd values indicate discretionary hyphens}

@ @<Local variables for hyphenation@>=
@!i,@!j,@!l:small_number; {indices into |hc| or |hu|}
@!q,@!r,@!s:pointer; {temporary registers for list manipulation}

@ \TeX\ will never insert a hyphen that has fewer than two letters before
it or fewer than three after it; hence, a five-letter word has comparatively
little chance of being hyphenated. If no hyphens have been found,
we can save time by not having to make any changes to the paragraph.

@<If no hyphens were found, |return|@>=
for j←2 to hn-3 do if odd(hyf[j]) then goto found1;

@ If hyphens are in fact going to be inserted, \TeX\ first deletes the
subsequence of nodes |ha..hb|. The variable |s| will point to the node
preceding |ha|, and |q| will point to the node following |hb|, so that
things can be hooked up after we reconstitute the hyphenated word.

@<Replace nodes |ha..hb| by a sequence of nodes...@>=
q←link(hb); link(hb)←null; s←cur_p;
while link(s)≠ha do s←link(s);
link(s)←null; flush_node_list(ha);
@<Reconstitute nodes for the hyphenated word, inserting discretionary hyphens@>

@ We must now face the fact that the battle is not over, even though the
hyphens have been found: The process of reconstituting a word can be nontrivial
because ligatures might change when a hyphen is present. The manual discusses
the difficulties of the word ``difficult'', but since fonts can include
highly general ligatures, the discretionary material surrounding a hyphen can
be even more complex than that. For example, suppose that \.{abcdef} is a
word in a font for which the only ligatures are \.{b\!c}, \.{c\!d}, \.{d\!e},
and \.{e\!f}. If this word is to permit hyphenation between \.b and \.c, the
two patterns with and without hyphenation are $\.a\,\.{b\!c}\,\.{d\!e}\,\.f$
and $\.a\,\.b\,\.-\,\.{c\!d}\,\.{e\!f}$. Thus the insertion of a hyphen might
cause effects to ripple arbitrarily far into the rest of the word. A
further complication arises if additional hyphens appear together with
such rippling, e.g., if the word in the example just given could also be
hyphenated between \.c and \.d; \TeX\ avoids this by simply ignoring the
additional hyphens in such weird cases.}

@ The processing is facilitated by a subroutine called |reconstitute|. Given
an index@@|j|, this function creates a node for the next character or ligature
found in the |hu| array starting at |hu[j]|, using font |hf|. For example,
if |hu[j..j+2]| contains the three letters `f', `i', and `x', and if
font |hf| contains an `fi' ligature but no `fix' ligature, then |reconstitute|
will create a ligature node for `fi'. The index of the last character
consumed, in this case |j+1|, will be returned. Furthermore, a kern node
is created and appended, if kerning is called for between the consumed
character or ligature and the next (unconsumed) character.

A second parameter, |n|, gives the limit beyond which this procedure does not
advance. In other words, |hu[n]| might be consumed, but |hu[n+1]| is never

The global variable |hyphen_passed| is set to@@|k| if this procedure
consumes two characters |hu[k]| and |hu[k+1]| such that |hyf[k]| is odd,
i.e., if the ligature might have to be broken by a hyphen, or if a kern is
inserted between |hu[k]| and |hu[k+1]|.  If this condition holds for more
than one value of |k|, the smallest value is used; and if the condition
holds for no values of |k|, |hyphen_passed| is set to zero.

After |reconstitute| has acted, |link(hold_head)| points to the character
or ligature node that was created, and |link(link(hold_head))| will either
be |null| or a pointer to the kern node that was appended.

@!hyphen_passed:small_number; {first hyphen in a ligature, if any}

@ @<Declare the function called |reconstitute|@>=
function reconstitute(@!j,@!n:small_number):
label continue,done;
var p:pointer; {a node being created}
@!s:pointer; {a node being appended to}
@!q:four_quarters; {character information or a lig/kern instruction}
@!c:quarterword; {current character}
@!d:quarterword; {current character or ligature}
@!w:scaled; {amount of kerning}
@!r:0..font_mem_size; {position of current lig/kern instruction}
begin @<Build a list of characters in a maximal ligature, and set |w|
	to the amount of kerning that should follow@>;
@<If the list has more than one element, create a ligature node@>;
@<Attach kerning, if |w≠0|@>;

@ @<Build a list of characters in a maximal ligature...@>=
hyphen_passed←0; s←hold_head; w←0; d←qi(hu[j]); c←d;
loop@+	begin continue: p←get_avail; font(p)←hf;
	character(p)←c; link(s)←p;@/
	@<Look for a ligature or kern between |d| and the following
		character; update the data structure and |goto continue|
		if a ligature is found, otherwise update@@|w| and |goto done|@>;

@ @<Look for a ligature or kern between |d| and...@>=
if j=n then goto done;
if char_tag(q)≠lig_tag then goto done;
r←lig_kern_start(hf)(q); c←qi(hu[j+1]);
loop@+	begin q←font_info[r].qqqq;
	if next_char(q)=c then
		begin if odd(hyf[j])∧(hyphen_passed=0) then hyphen_passed←j;
		if op_bit(q)<kern_flag then
			@<Append to the ligature and |goto continue|@>
		else	begin w←char_kern(hf)(q);
			goto done;
	else if stop_bit(q)<stop_flag then incr(r)
	else goto done;

@ @<Append to the ligature...@>=
begin d←rem_byte(q);
incr(j); s←p; goto continue;

@ After the list has been built, |link(s)| points to the final list element.

@<If the list has more than one element, create a ligature node@>=
if s≠hold_head then
	begin p←new_ligature(hf,d,link(hold_head));

@ @<Attach kerning, if |w≠0|@>=
if w≠0 then link(link(hold_head))←new_kern(w)

@ Okay, we're ready to insert the potential hyphenations that were found.
When the following program is executed, we want to append the word
|hu[]| after node |s|, and node |q| should be appended to the result.
During this process, the variable |i| will be a temporary counter or an
index into |hu|; the variable |j| will be an index to our current position
in |hu|; the variable |l| will be the counterpart of |j|, in a discretionary
branch; the variable |r| will point to new nodes being created; and
we need two new local variables:

@<Local variables for hyph...@>=
@!major_tail,@!minor_tail:pointer; {the end of lists in the main and
	discretionary branches being reconstructed}

@ When the following code is performed, |hyf[j]| will be zero for |j=1|
and for |j≥hn-2|.

@<Reconstitute nodes for the hyphenated word...@>=
repeat l←j; j←reconstitute(j+1,hn);
if hyphen_passed≠0 then
	@<Create and append a discretionary node as an alternative to the
		ligature, and continue to develop both branches until they
		become equivalent@>
else	begin link(s)←link(hold_head); s←link(s);
	if link(s)≠null then s←link(s);
if odd(hyf[j]) then @<Insert a discretionary hyphen after |s|@>;
until j=hn;

@ @<Create and append a discretionary node as an alternative...@>=
begin r←get_node(small_node_size);
link(s)←r; link(r)←link(hold_head); type(r)←disc_node;
if link(major_tail)≠null then major_tail←link(major_tail);
@<Put the \(c)characters |hu[l+1..i]| and a hyphen into |pre_break(r)|@>;
@<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|, appending to this
	list and to |major_tail| until synchronization has been achieved@>;
@<Move pointer |s| to the end of the current list, and set |replace_count(r)|

@ The new hyphen might combine with the previous character via ligature
or kern. At this point we have |i≤j|.

@<Put the \(c)characters |hu[l+1..i]| and a hyphen into |pre_break(r)|@>=
minor_tail←null; c←hu[i+1]; hu[i+1]←"-";
repeat l←reconstitute(l+1,i+1);
if minor_tail=null then pre_break(r)←link(hold_head)
else link(minor_tail)←link(hold_head);
if link(minor_tail)≠null then minor_tail←link(minor_tail);
until l>i;
hu[i+1]←c; {restore the character in the hyphen position}
decr(l); hyf[l]←0

@ The synchronization algorithm begins with |l≤j|.

@<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|...@>=
minor_tail←null; post_break(r)←null;
while l<j do
	begin repeat l←reconstitute(l+1,hn);
	if minor_tail=null then post_break(r)←link(hold_head)
	else link(minor_tail)←link(hold_head);
	if link(minor_tail)≠null then
		begin hyf[l]←0; minor_tail←link(minor_tail); {kern present}
	until l≥j;
	while l>j do
		begin j←reconstitute(j+1,hn);
		if link(major_tail)≠null then
			begin hyf[j]←0; major_tail←link(major_tail); {kern present}

@ @<Move pointer |s| to the end of the current list...@>=
i←0; s←r;
while link(s)≠null do
	begin incr(i); s←link(s);

@ At this point |link(s)| is |null|.

@<Insert a discretionary hyphen after |s|@>=
begin r←new_disc; pre_break(r)←new_character(hf,"-");
link(s)←r; s←r;
@* \[41] Hyphenation.
When a word |hc[]| has been set up to contain a candidate for hyphenation,
\TeX\ first looks to see if it is in the user's exception dictionary. If not,
hyphens are inserted based on patterns that appear within the given word,
using an algorithm due to Frank M. Liang.
@^Liang, Franklin Mark@>

Let's consider Liang's method first, since it is much more interesting than the
exception-lookup routine.  The algorithm begins by setting |hyf[j]| to zero
for all |j|, and invalid `\.{\^\^?}' characters are inserted into |hc[0]|
and |hc[hn+1]| to serve as delimiters. Then a reasonably fast method is
used to see which of a given set of patterns occurs in the word
|hc[0..(hn+1)]|. Each pattern $p↓1\ldotsm p↓k$ of length |k| has an associated
sequence of |k+1| numbers $n↓0\ldotsm n↓k$; and if the pattern occurs in
|hc[(j+1)..(j+k)]|, \TeX\ will set |hyf[j+i]←@tmax@>(hyf[j+i],@t$n↓i$@>)| for
|0≤i≤k|. After this has been done for each pattern that occurs, a
discretionary hyphen will be inserted between |hc[j]| and |hc[j+1]| when
|hyf[j]| is odd, as we have already seen.

The set of patterns $p↓1\ldotsm p↓k$ and associated numbers $n↓0\ldotsm n↓k$
depends, of course, on the language whose words are being hyphenated, and
on the degree of hyphenation that is desired. A method for finding
appropriate |p|'s and |n|'s, from a given dictionary of words and acceptable
hyphenations, is discussed in Liang's Ph.D. thesis (Stanford University,
1982); \TeX\ simply starts with the patterns and works from there.

@ The patterns are stored in a compact table that is also efficient for
retrieval, using a variant of ``trie memory'' [cf.\ {\sl The Art of
Computer Program\-ming \bf3} (1973), 481--505]. We can find each pattern
$p↓1\ldotsm p↓k$ by setting |@t$z↓1$@>←@t$p↓1$@>| and then, for |1<i≤k|,
setting |@t$z↓i$@>←trie_link@t$(z↓{i-1})+p↓i$@>|; the pattern will be
identified by the number $z↓k$. Since all the pattern information is
packed together into a single |trie_link| array, it is necessary to
prevent confusion between the data from inequivalent patterns, so another
table is provided such that |trie_char@t$(z↓i)=p↓i$@>| for all |i|. There
is also a table |trie_op|$(z↓k)$ to identify the numbers $n↓0\ldotsm n↓k$
associated with $p↓1\ldotsm p↓k$.

Comparatively few different number sequences $n↓0\ldotsm n↓k$ actually occur,
since most of the |n|'s are generally zero. Therefore the number sequences
are encoded in such a way that |trie_op|$(z↓k)$ is only one byte long.
If |trie_op(@t$z↓k$@>)≠min_quarterword|, when $p↓1\ldotsm p↓k$ has matched
the letters in |hc[(l-k+1)..l@,]|, we perform all of the required operations
for this pattern by carrying out the following little program: Set
|v←trie_op(@t$z↓k$@>)|. Then set |hyf[l-hyf_distance[v]]←@tmax@>(
hyf[l-hyf_distance[v]], hyf_num[v])|, and |v←hyf_next[v]|; repeat, if
necessary, until |v=min_quarterword|.

@!trie_pointer=0..trie_size; {an index into |trie|}

@ @d trie_link(#)==trie[#].rh {``downward'' link in a trie}
@d trie_char(#)==trie[#].b1 {character matched at this trie location}
@d trie_op(#)==trie[#].b0 {program for hyphenation at this trie location}

@!trie:array[trie_pointer] of two_halves; {|trie_link|, |trie_char|, |trie_op|}
@!hyf_distance:array[quarterword] of small_number; {position |k-j| of $n↓j$}
@!hyf_num:array[quarterword] of small_number; {value of $n↓j$}
@!hyf_next:array[quarterword] of quarterword; {continuation of this |trie_op|}

@ @<Local variables for hyph...@>=
@!z:trie_pointer; {an index into |trie|}
@!v:quarterword; {an index into |hyf_distance|, etc.}

@ Assuming that these auxiliary tables have been set properly, the
hyphenation algorithm is quite short. In the following code we use the
fact that no pattern contains $p↓i=0$; setting |hc[hn+2]| to zero therefore
guarantees that |hc[hn+3]| will never be fetched.

@<Find hyphen locations for the word in |hc|@>=
for j←0 to hn do hyf[j]←0;
@<Look for the word |hc[]| in the exception table, and |goto found| (with
	|hyf| containing the hyphens) if an entry is found@>;
hc[0]←127; hc[hn+1]←127; hc[hn+2]←0; {insert delimiters}
for j←0 to hn-2 do
	begin z←hc[j]; l←j;
	while hc[l]=trie_char(z) do
		begin if trie_op(z)≠min_quarterword then
			@<Store \(m)maximum values in the |hyf| table@>;
		incr(l); z←trie_link(z)+hc[l];
found: hyf[1]←0; hyf[hn-2]←0; hyf[hn-1]←0; hyf[hn]←0

@ @<Store \(m)maximum values in the |hyf| table@>=
begin v←trie_op(z);
repeat i←l-hyf_distance[v];
if hyf_num[v]>hyf[i] then hyf[i]←hyf_num[v];
until v=min_quarterword;

@ The exception table that is built by \TeX's \.{\\hyphenation} primitive is
organized as an ordered hash table [cf.\ Amble and Knuth, {\sl The Computer
@^Amble, Ole@> @^Knuth, Donald Ervin@>
Journal\/\ \bf17} (1974), 135--142] using linear probing. If $\alpha$ and
$\beta$ are words, we will say that $\alpha<\beta$ if $\leftv\alpha\rightv<
\leftv\beta\rightv$ or if $\leftv\alpha\rightv=\leftv\beta\rightv$ and
$\alpha$ is lexicographically smaller than $\beta$. (The notation $\leftv
\alpha\rightv$ stands for the length of $\alpha$.) The idea of ordered hashing
is to arrange the table so that a given word $\alpha$ can be sought by computing
a hash address $h=h(\alpha)$ and then looking in table positions $h$, $h-1$,
$\ldotss$, until encountering the first word $\L\alpha$. If this word is
different from $\alpha$, we can conclude that $\alpha$ is not in the table.

The words in the table point to lists in |mem| that specify hyphen positions
in their |info| fields. The list for $c↓1\ldotsm c↓n$ contains $k$ if
the word $c↓1\ldotsm c↓n$ has a discretionary hyphen between $c↓k$ and

@!hyph_pointer=0..hyph_size; {an index into the ordered hash table}

@ @<Glob...@>=
@!hyph_word:array[hyph_pointer] of str_number; {exception words}
@!hyph_list:array[hyph_pointer] of pointer; {list of hyphen positions}
@!hyph_count:hyph_pointer; {the number of words in the exception dictionary}

@ @<Local variables for init...@>=
@!z:hyph_pointer; {runs through the exception dictionary}

@ @<Set init...@>=
for z←0 to hyph_size do hyph_word[z]←0;

@ The algorithm for exception lookup is quite simple, as soon as we have
a few more local variables to work with.

@<Local variables for hyph...@>=
@!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|}
@!k:str_number; {an index into |str_start|}
@!u:pool_pointer; {an index into |str_pool|}

@ First we compute the hash code |h|, then we search until we either
find the word or we don't.

@<Look for the word |hc[1...@>=
for j←2 to hn do h←(h+h+hc[j]) mod hyph_size;
loop@+	begin @<If the string |hyph_word[h]| is less than \(hc)|hc[]|,
		|goto not_found|; but if the two strings are equal,
		set |hyf| to the hyphen positions and |goto found|@>;
	if h>0 then decr(h)@+else h←hyph_size;

@ @<If the string |hyph_word[h]| is less than \(hc)...@>=
k←hyph_word[h]; if k=0 then goto not_found;
if length(k)<hn then goto not_found;
if length(k)=hn then
	begin j←1; u←str_start[k];
	repeat if str_pool[u]<hc[j] then goto not_found;
	if str_pool[u]>hc[j] then goto done;
	incr(j); incr(u);
	until j>hn;
	@<Insert hyphens as specified in |hyph_list[h]|@>;
	goto found;

@ @<Insert hyphens as specified...@>=
while s≠null do
	begin hyf[info(s)]←1; s←link(s);

@ We have now completed the hyphenation routine, so the |line_break| procedure
is finished at last. Since the hyphenation exception table is fresh in our
minds, it's a good time to deal with the routine that adds new entries to it.

When \TeX\ has scanned the `\.{\\hyphenation}' control sequence, it calls
on a procedure named |new_hyph_exceptions| to do the right thing.

@p procedure new_hyph_exceptions; {enters new exceptions}
label exit, found, not_found;
var n:small_number; {length of current word}
@!j:small_number; {an index into |hc|}
@!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|}
@!k:str_number; {an index into |str_start|}
@!p:pointer; {head of a list of hyphen positions}
@!q:pointer; {used when creating a new node for list |p|}
@!s,@!t:str_number; {strings being compared or stored}
@!u,@!v:pool_pointer; {indices into |str_pool|}
begin scan_left_brace; {a left brace must follow \.{\\hyphenation}}
@<Enter as many hyphenation exceptions as are listed,
until coming to a right brace; then skip an optional space and |return|@>;

@ @<Enter as many...@>=
n←0; p←null;
loop@+	begin get_nc_token;
	case cur_cmd of
	letter,other_char:@<Append a new letter or hyphen@>;
	spacer,right_brace: begin if n>4 then @<Enter a hyphenation exception@>;
		if cur_cmd=right_brace then
			begin @<Scan an optional space@>; return;
		n←0; p←null;
	othercases @<Give improper \.{\\hyphenation} error@>

@ @<Give improper \.{\\hyph...@>=
begin help2("Hyphenation exceptions must contain only letters")@/
	("and hyphens. But continue; I'll forgive and forget.");
print_nl("! Improper \hyphenation will be flushed"); error;
@.Improper \\hyphenation...@>

@ @<Append a new letter or hyphen@>=
if cur_chr="-" then @<Append the value |n| to list |p|@>
else	begin if lc_code(cur_chr)=0 then
		begin print_nl("! Not a letter");
@.Not a letter@>
		help2("Letters in \hyphenation words must have \lccode>0.")@/
			("Proceed; I'll ignore the character I just read.");
	else if n<63 then
		begin incr(n); hc[n]←lc_code(cur_chr);

@ @<Append the value |n| to list |p|@>=
begin if n>1 then
	begin q←get_avail; link(q)←p; info(q)←n; p←q;

@ @<Enter a hyphenation exception@>=
begin str_room(n); h←0;
for j←1 to n do
	begin h←(h+h+hc[j]) mod hyph_size;
while info(p)>n-3 do {eliminate hyphens \TeX\ doesn't like}
	begin q←link(p); free_avail(p); p←q;
@<Insert the \(p)pair |(s,p)| into the exception table@>;

@ @<Insert the \(p)pair |(s,p)|...@>=
if hyph_count=hyph_size then overflow("exception dictionary",hyph_size);
while hyph_word[h]≠0 do
	begin @<If the string |hyph_word[h]| is less than \(or)or equal to
	|s|, interchange |(hyph_word[h],hyph_list[h])| with |(s,p)|@>;
	if h>0 then decr(h)@+else h←hyph_size;
hyph_word[h]←s; hyph_list[h]←p

@ @<If the string |hyph_word[h]| is less than \(or)...@>=
if length(k)<length(s) then goto found;
if length(k)>length(s) then goto not_found;
u←str_start[k]; v←str_start[s];
repeat if str_pool[u]<str_pool[v] then goto found;
if str_pool[u]>str_pool[v] then goto not_found;
incr(u); incr(v);
until u=str_start[k+1];
found:q←hyph_list[h]; hyph_list[h]←p; p←q;@/
t←hyph_word[h]; hyph_word[h]←s; s←t;
@* \[42] Initializing the hyphenation tables.
The trie for \TeX's hyphenation algorithm is built from a sequence of
patterns following a \.{\\patterns} speci\-fi\-ca\-tion. Such a specification
is allowed only in \.{INITEX}, since the extra memory for auxiliary tables
and for the initialization program itself would only clutter up the
production version of \TeX\ with a lot of deadwood.

The initialization first builds a trie that is linked instead of packed
into sequential storage, so that insertions are readily made. Then it
compresses the linked trie by identifying common subtries, and finally the
trie is packed into the efficient sequential form that the hyphenation
algorithm actually uses.

@p init @<Declare procedures for preprocessing hyphenation patterns@>@;

@ Before we discuss trie building in detail, let's consider the simpler
problem of creating the |hyf_distance|, |hyf_num|, and |hyf_next| arrays.

Suppose, for example, that \TeX\ reads the pattern `\.{ab2cde1}'. This is
a pattern of length 5, with $n↓0\ldotsm n↓5=0\,0\,2\,0\,0\,1$ in the
notation above. We want the corresponding |trie_op| code |v| to have
|hyf_distance[v]=3|, |hyf_num[v]=2|, and |hyf_next[v]=@t$v^\prime$@>|,
where the auxiliary |trie_op| code $v^\prime$ has
|hyf_distance[@t$v^\prime$@>]=0|, |hyf_num[@t$v^\prime$@>]=1|, and

\TeX\ computes an appropriate value |v| with the |new_trie_op| subroutine
below, by setting
This subroutine looks up its three
parameters in a special hash table, assigning a new value only if these
three have not appeared before.

The hash table is called |trie_op_hash|, and the number of entries it contains
is |trie_op_ptr|. If the table overflows, the excess ops are ignored,
and |trie_op_ptr=max_quarterword|.

@d quarterword_diff=max_quarterword-min_quarterword
@d trie_op_hash_size=quarterword_diff+quarterword_diff {double}

init@! trie_op_hash:array[0..trie_op_hash_size] of quarterword;
	{trie op codes for triples}
@t\hskip1em@>@!trie_op_ptr:quarterword; {highest |trie_op| assigned}

@ The hash function used by |new_trie_op| is based on the idea that
313/510 is an approximation to the golden ratio [cf.\ {\sl The Art of
Computer Programming \bf3} (1973), 510--512]; but the choice is
comparatively unimportant in this particular application.

@<Declare procedures for preprocessing hyph...@>=
function new_trie_op(@!d,@!n:small_number;@!v:quarterword):quarterword;
label exit;
var h:0..trie_op_hash_size; {trial hash location}
@!u:quarterword; {trial op code}
begin h←(n+313*d+361*v) mod trie_op_hash_size;
loop@+	begin u←trie_op_hash[h];
	if u=min_quarterword then {empty position found}
		begin if trie_op_ptr≥max_quarterword-1 then {overflow}
			begin trie_op_ptr←max_quarterword;
			new_trie_op←min_quarterword; return;
		incr(trie_op_ptr); hyf_distance[trie_op_ptr]←d;
		hyf_num[trie_op_ptr]←n; hyf_next[trie_op_ptr]←v;
		new_trie_op←trie_op_ptr; return;
	if (hyf_distance[u]=d)∧(hyf_num[u]=n)∧(hyf_next[u]=v) then
		begin new_trie_op←u; return;
	if h>min_quarterword then decr(h)@+else h←max_quarterword;

@ The linked trie that is used to preprocess hyphenation patterns appears
in several global arrays. Each node represents an instruction of the form
``if you see character |c|, then perform operation |o|, move to the
next character, and go to node |l|; otherwise go to node |r|.''
The four quantities |c|, |o|, |l|, and |r| are stored in four arrays
|trie_c|, |trie_o|, |trie_l|, and |trie_r|. The root of the trie
is |trie_l[0]|, and the number of nodes is |trie_ptr|. Null trie
pointers are represented by zero. To initialize the trie, we simply
set |trie_l[0]| and |trie_ptr| to zero. We also set |trie_c[0]| to some
arbitrary value, since the algorithm may access it.

The algorithms maintain the condition |trie_c[trie_r[z]]>trie_c[z]|
whenever |z≠0| and |trie_r[z]≠0|; in other words, sibling nodes are
ordered by their |c| fields.

@d trie_root==trie_l[0] {root of the linked trie}

init @!trie_c:array[trie_pointer] of ascii_code; {characters to match}
@t\hskip1em@>@!trie_o:array[trie_pointer] of quarterword;
	{operations to perform}
@t\hskip1em@>@!trie_l:array[trie_pointer] of trie_pointer; {left subtrie links}
@t\hskip1em@>@!trie_r:array[trie_pointer] of trie_pointer; {right subtrie links}
@t\hskip1em@>@!trie_ptr:trie_pointer; {the number of nodes in the trie}

@ Let us suppose that a linked trie has already been constructed.
Experi\-ence shows that we can often reduce its size by recognizing common
subtries; therefore another hash table is introduced for this purpose,
somewhat similar to |trie_op_hash|. The new hash table will be
initialized to zero.

init @!trie_hash:array[trie_pointer] of trie_pointer;
tini	{to identify equivalent subtries}

@ The function |trie_node(p)| returns |p| if |p| is distinct from other nodes
that it has seen, otherwise it returns the number of the first equivalent
node that it has seen.

@<Declare procedures for preprocessing hyph...@>=
function trie_node(@!p:trie_pointer):trie_pointer; {converts
	to a canonical form}
label exit;
var h:trie_pointer; {trial hash location}
@!q:trie_pointer; {trial trie node}
begin h←(trie_c[p]+1009*trie_o[p]+@|
		2718*trie_l[p]+3142*trie_r[p]) mod trie_size;
loop@+	begin q←trie_hash[h];
	if q=0 then
		begin trie_hash[h]←p; trie_node←p; return;
	if (trie_c[q]=trie_c[p])∧(trie_o[q]=trie_o[p])∧@|
		(trie_l[q]=trie_l[p])∧(trie_r[q]=trie_r[p]) then
		begin trie_node←q; return;
	if h>0 then decr(h)@+else h←trie_size;

@ A neat recursive procedure is now able to compress a trie by
traversing it and applying |trie_node| to its nodes in ``bottom up''
fashion. We will compress the entire trie by clearing |trie_hash| to
zero and then saying `|trie_root←compress_trie(trie_root)|'.

@<Declare procedures for preprocessing hyph...@>=
function compress_trie(@!p:trie_pointer):trie_pointer;
begin if p=0 then compress_trie←0
else	begin trie_l[p]←compress_trie(trie_l[p]);

@ Before we forget how to initialize the data structures that have been
mentioned so far, let's write a procedure that does the initialization.

@<Declare procedures for preprocessing hyph...@>=
procedure init_pattern_memory; {gets ready to build a linked trie}
var h:0..trie_op_hash_size; {an index into |trie_op_hash|}
@!p:trie_pointer; {an index into |trie_hash|}
begin for h←0 to trie_op_hash_size do trie_op_hash[h]←min_quarterword;
trie_op_ptr←min_quarterword; trie_root←0; trie_c[0]←0; trie_ptr←0;
for p←0 to trie_size do trie_hash[p]←0;

@ The compressed trie will be packed into the |trie| array using a
``top-down first-fit'' procedure. This is a little tricky, so the reader
should pay close attention: The |trie_hash| array is cleared to zero
again and renamed |trie_ref| for this phase of the operation; later on,
|trie_ref[p]| will be nonzero if the linked trie node |p| is the oldest sibling
in a family and if the characters |c| of that family have been allocated to
locations |trie_ref[p]+c| in the |trie| array. Locations of |trie| that
are in use will have |trie_link=0|, while the unused holes in |trie|
will be doubly linked with |trie_link| pointing to the next larger vacant
location and |trie_back| pointing to the next smaller one. This double
linking will have been carried out only as far as |trie_max|, where
|trie_max| is the largest index of |trie| that will be needed.
Another array |trie_taken| tells whether or not a given location is
equal to |trie_ref[p]| for some |p|; this array is used to ensure that
distinct nodes in the compressed trie will have distinct |trie_ref|

@d trie_ref==trie_hash {where linked trie families go into |trie|}
@d trie_back(#)==trie[#].lh {backward links in |trie| holes}

init@!trie_taken:array[trie_pointer] of boolean; {does a family start here?}
@t\hskip1em@>@!trie_max:trie_pointer; {largest location used in |trie|}
	{all locations |≤trie_min| are vacant in |trie|}

@ Here is how these data structures are initialized.

@<Declare procedures for preprocessing hyph...@>=
procedure init_trie_memory; {gets ready to pack into |trie|}
var p:trie_pointer; {index into |trie_ref|, |trie|, |trie_taken|}
begin for p←0 to trie_ptr do trie_ref[p]←0;
trie_max←128; trie_min←128; trie_link(0)←1; trie_taken[0]←false;
for p←1 to 128 do
	begin trie_back(p)←p-1; trie_link(p)←p+1; trie_taken[p]←false;

@ Each time \.{\\patterns} appears, it overrides any patterns that were
entered earlier, so the arrays are not initialized until \TeX\ sees
\.{\\patterns}. However, some of the global variables must be
initialized when \.{INITEX} is loaded, in case the user never mentions
any \.{\\patterns}.

@<Initialize table entries...@>=
trie_link(0)←0; trie_char(0)←0; trie_op(0)←0;
for k←1 to 127 do trie[k]←trie[0];

@ The |first_fit| procedure finds the smallest hole |z| in |trie| such that
a trie family starting at a given node |p| will fit into vacant positions
starting at |z|. If |c=trie_c[p]|, this means that location |z-c| must
not already be taken by some other family, and that |z-c+@t$c^\prime$@>|
must be vacant for all characters $c^\prime$ in the family. The procedure
sets |trie_ref[p]| to |z-c| when the first fit has been found.

@<Declare procedures for preprocessing hyph...@>=
procedure first_fit(@!p:trie_pointer); {packs a family into |trie|}
label not_found,found;
var h:trie_pointer; {candidate for |trie_ref[p]|}
@!z:trie_pointer; {runs through holes}
@!q:trie_pointer; {runs through the family starting at |p|}
@!c:ascii_code; {smallest character in the family}
begin c←trie_c[p]; {we have |c≠0|}
if c<trie_min then trie_min←c;
z←trie_link(trie_min-1); {get the first conceivably good hole}
loop@+	begin if z<c then goto not_found;
	@<Ensure that |trie_max≥h+128|@>;
	if trie_taken[h] then goto not_found;
	@<If all characters of the family fit relative to |h|, then
		|goto found|,\30\ otherwise |goto not_found|@>;
	not_found: z←trie_link(z); {move to the next hole}
found: @<Pack the family into |trie| relative to |h|@>;

@ By making sure that |trie_max| is at least |h+128|, we can be sure that
|trie_max>z|, since |h=z+c|. It follows that location |trie_max| will
never be occupied in |trie|, and we will have |trie_max≥trie_link(z)|.

@<Ensure that |trie_max≥h+128|@>=
if trie_max<h+128 then
	begin if trie_size≤h+128 then overflow("pattern memory",trie_size);
	repeat incr(trie_max); trie_taken[trie_max]←false;
	trie_link(trie_max)←trie_max+1; trie_back(trie_max)←trie_max-1;
	until trie_max=h+128;

@ @<If all characters of the family fit relative to |h|...@>=
while q>0 do
	begin if trie_link(h+trie_c[q])=0 then goto not_found;
goto found

@ @<Pack the family into |trie| relative to |h|@>=
trie_taken[h]←true; trie_ref[p]←h; q←p;
repeat z←h+trie_c[q]; trie_back(trie_link(z))←trie_back(z);
trie_link(trie_back(z))←trie_link(z); trie_link(z)←0; q←trie_r[q];
until q=0

@ To pack the entire linked trie, we use the following recursive procedure.

@<Declare procedures for preprocessing hyph...@>=
procedure trie_pack(@!p:trie_pointer); {pack subtries of a family}
var q:trie_pointer; {a local variable that need not be saved on recursive calls}
begin repeat q←trie_l[p];
if (q>0)∧(trie_ref[q]=0) then
	begin first_fit(q); trie_pack(q);
until p=0;

@ When the whole trie has been allocated into the sequential table, we
must go through it once again so that |trie| contains the correct
information. Null pointers in the linked trie will be replaced by the
first untaken position |r| in |trie|, since this properly implements an
``empty'' family. The value of |r| is stored in |trie_ref[0]| just before
the fixup process starts. Note that |trie_max| will always be at least as
large as |r+127|, since it is always at least 128 more than each location
that is taken.

@<Move the data into |trie|@>=
while trie_taken[r] do incr(r);
trie_ref[0]←r; {|r| will be used for null pointers}
trie_fix(trie_root) {this fixes the non-holes in |trie|}

@ The fixing-up procedure is, of course, recursive. Since the linked trie
usually has overlapping subtries, the same data may be moved several
times; but that causes no harm, and at most as much work is done as it
took to build the uncompressed trie.

@<Declare procedures for preprocessing hyph...@>=
procedure trie_fix(@!p:trie_pointer); {moves |p| and its siblings into |trie|}
var q:trie_pointer; {a local variable that need not be saved on recursive calls}
@!c:ascii_code; {another one that need not be saved}
@!z:trie_pointer; {|trie| reference; this local variable must be saved}
begin z←trie_ref[p];
while p≠0 do
	begin q←trie_l[p]; c←trie_c[p];
	trie_link(z+c)←trie_ref[q]; trie_char(z+c)←c; trie_op(z+c)←trie_o[p];
	if q>0 then trie_fix(q);

@ Now let's put all these routines together. When \.{INITEX} has scanned
the `\.{\\patterns}' control sequence, it calls on |new_patterns| to do
the right thing. After |new_patterns| has acted, the compacted pattern data
will appear in the array |trie[1..trie_max]|, and the associated numeric
hyphenation data will appear in locations |[(min_quarterword+1)..trie_op_ptr]|
of the arrays |hyf_distance|, |hyf_num|, |hyf_next|.

@<Declare procedures for preprocessing hyph...@>=
procedure new_patterns; {initializes the hyphenation pattern data}
label done, done1;
var k,@!l:small_number; {indices into |hc| and |hyf|}
@!digit_sensed:boolean; {should the next digit be treated as a letter?}
@!v:quarterword; {trie op code}
@!p,@!q:trie_pointer; {nodes of trie traversed during insertion}
@!first_child:boolean; {is |p=trie_l[q]|?}
@!c:ascii_code; {character being inserted}
@!r,@!s:trie_pointer; {used to clean up the packed |trie|}
@!h:two_halves; {template used to zero out |trie|'s holes}
begin scan_left_brace; {a left brace must follow \.{\\patterns}}
@<Enter all of the patterns into a linked trie, until coming to a right
	brace; then skip an optional space@>;
trie_root←compress_trie(trie_root); {compress the trie}
@<Pack the trie@>;

@ Novices are not supposed to be using \.{\\patterns}, so the error
messages are terse. (Note that all error messages appear in \TeX's string
pool, even if they are used only by \.{INITEX}.)

@<Enter all of the patterns into a linked trie...@>=
k←0; hyf[0]←0; digit_sensed←false;
loop@+	begin get_nc_token;
	case cur_cmd of
	letter,other_char:@<Append a new letter or a hyphen level@>;
	spacer,right_brace: begin if k>0 then
			@<Insert a new pattern into the linked trie@>;
		if cur_cmd=right_brace then
			begin @<Scan an optional space@>; goto done;
		k←0; hyf[0]←0; digit_sensed←false;
	othercases begin print_nl("! Bad \patterns");
@.Bad \\patterns@>
		help1("(See Appendix H.)"); error; pass_block(1); goto done;

@ @<Append a new letter or a hyphen level@>=
if digit_sensed ∨(cur_chr<"0")∨(cur_chr>"9") then
	begin if cur_chr="." then cur_chr←127 {change edge-of-word delimiter
		to |@'177|}
	else	begin cur_chr←lc_code(cur_chr);
		if cur_chr=0 then
			begin print_nl("! Nonletter");
			help1("(See Appendix H.)"); error; cur_chr←127;
	if k<63 then
		begin incr(k); hc[k]←cur_chr; hyf[k]←0; digit_sensed←false;
else	begin hyf[k]←cur_chr-"0"; digit_sensed←true;

@ When the following code comes into play, the pattern $p↓1\ldotsm p↓k$
appears in |hc[1..k]|, and the corresponding sequence of numbers $n↓0\ldotsm
n↓k$ appears in |hyf[0..k]|.

@<Insert a new pattern into the linked trie@>=
begin @<Compute the trie op code, |v|, and set |l←0|@>;
while l<k do
	begin incr(l); c←hc[l]; p←trie_l[q]; first_child←true;
	while (p>0)∧(c>trie_c[p]) do
		begin q←p; p←trie_r[q]; first_child←false;
	if (p=0)∨(c<trie_c[p]) then
		@<Insert a new trie node between |q| and |p|, and
			make |p| point to it@>;
	q←p; {now node |q| represents $p↓1\ldotsm p↓l$}
if trie_o[q]≠min_quarterword then
	begin print_nl("! Duplicate pattern");
@.Duplicate pattern@>
	help1("(See Appendix H.)"); error;

@ @<Insert a new trie node between |q| and |p|...@>=
begin if trie_ptr=trie_size then overflow("pattern memory",trie_size);
incr(trie_ptr); trie_r[trie_ptr]←p; p←trie_ptr; trie_l[p]←0;
if first_child then trie_l[q]←p@+else trie_r[q]←p;
trie_c[p]←c; trie_o[p]←min_quarterword;

@ @<Compute the trie op code, |v|...@>=
l←k; v←min_quarterword;
loop@+	begin if hyf[l]≠0 then v←new_trie_op(k-l,hyf[l],v);
	if l>0 then decr(l)@+else goto done1;

@ The following packing routine is rigged so that the root of the linked
tree gets mapped into location 0 of |trie|, as required by the hyphenation
algorithm. This happens because the first call of |first_fit| will
``take'' location@@0.

@<Pack the trie@>=
if trie_root≠0 then
	begin first_fit(trie_root); trie_pack(trie_root);
@<Move the data into |trie|@>;
r←0; {finally, we will zero out the holes}
h.rh←0; h.b0←min_quarterword; h.b1←0; {|trie_link←0|,
	|trie_op←min_quarterword|, |trie_op←0|}
repeat s←trie_link(r); trie[r]←h; r←s;
until r>trie_max
@* \[43] Breaking vertical lists into pages.
The |vsplit| procedure, which implements \TeX's \.{\\vsplit} operation,
is considerably simpler than |line_break| because it doesn't have to
worry about hyphenation, and because its mission is to discover a single
break instead of an optimum sequence of breakpoints.  But before we get
into the details of |vsplit|, we need to consider a few more basic things.

@ A subroutine called |prune_page_top| takes a pointer to a vlist and
returns a pointer to a modified vlist in which all glue, kern, and penalty nodes
have been deleted before the first box or rule node. However, the first
box or rule is actually preceded by a newly created glue node designed so that
the topmost baseline will be at distance |split_top_skip| from the top,
whenever this is possible without backspacing.

In this routine and those that follow, we make use of the fact that a
vertical list contains no character nodes, hence the |type| field exists
for each node in the list.

@p function prune_page_top(@!p:pointer):pointer; {adjust top after page break}
var prev_p:pointer; {lags one step behind |p|}
@!q:pointer; {temporary variable for list manipulation}
begin prev_p←temp_head; link(temp_head)←p;
while p≠null do
	case type(p) of
	hlist_node,vlist_node,rule_node:@<Insert glue for |split_top_skip|
		and set@@|p←null|@>;
	whatsit_node,mark_node,ins_node: begin prev_p←p; p←link(prev_p);
	glue_node,kern_node,penalty_node: begin q←p; p←link(q); link(q)←null;
		link(prev_p)←p; flush_node_list(q);
	othercases confusion("pruning")
@:confusion pruning}{\quad pruning@>

@ @<Insert glue for |split_top_skip|...@>=
begin q←new_skip_param(split_top_skip_code); link(prev_p)←q; link(q)←p;
	{now |temp_ptr=glue_ptr(q)|}
if width(temp_ptr)>height(p) then width(temp_ptr)←width(temp_ptr)-height(p)
else width(temp_ptr)←0;

@ The next subroutine finds the best place to break a given vertical list
so as to obtain a box of height |h|, taking account of the |split_max_depth|
parameter. A pointer to the beginning of the vertical list is given,
and a pointer to the optimum breakpoint is returned. The list is effectively
followed by a forced break, i.e., a penalty node with the |eject_penalty|;
if the best break occurs at this artificial node, the value |null| is returned.

An array of six |scaled| distances is used to keep track of the height
from the beginning of the list to the current place, just as in |line_break|.
In fact, we use one of the same arrays, only changing its name to reflect
its new significance.

@d active_height==active_width {new name for the six distance variables}
@d cur_height==active_height[1] {the natural height}
@d set_height_zero(#)==active_height[#]←0 {initialize the height to zero}
@d update_heights=90 {go here to record glue in the |active_height| table}

@p function vert_break(@!p:pointer; @!h:scaled):pointer;
	{finds optimum page break}
label done,not_found,update_heights;
var prev_p:pointer; {if |p| is a glue node, |type(prev_p)| determines
	whether |p| is a legal breakpoint}
@!q,@!r:pointer; {glue specifications}
@!pi:integer; {penalty value}
@!b:integer; {badness at a trial breakpoint}
@!least_badness:integer; {the smallest badness found so far}
@!best_place:pointer; {the most recent break that leads to |least_badness|}
@!prev_dp:scaled; {depth of previous box in the list}
@!t:small_number; {|type| of the node following a kern}
begin prev_p←p; {an initial glue node is not a legal breakpoint}
least_badness←awful_bad; do_all_six(set_height_zero); prev_dp←0;
loop@+	begin @<If node |p| is a legal breakpoint, check if this break is
		the best known, and |goto done| if |p| is null or
		if the page-so-far is already too full to accept more stuff@>;
	prev_p←p; p←link(prev_p);
done: vert_break←best_place;

@ A global variable |best_height_plus_depth| will be set to the natural size
of the box that corresponds to the optimum breakpoint found by |vert_break|.
(This value is used by the insertion-splitting algorithm of the page builder.)

@!best_height_plus_depth:scaled; {height of the best box, without stretching or

@ A subtle point to be noted here is that |split_max_depth| might be negative,
so |cur_height| and |prev_dp| might need to be corrected even after a
glue or kern node.

@<If node |p| is a legal breakpoint, check...@>=
if p=null then pi←eject_penalty
else	@<Use node |p| to update the current height and depth measurements;
		if this node is not a legal breakpoint, |goto not_found|
		or |update_heights|,
		otherwise set |pi| to the associated penalty at the break@>;
@<Check if node |p| is a new champion breakpoint; then \(go)|goto done|
	if |p| is a forced break or if the page-so-far is already too full@>;
if (type(p)<glue_node)∨(type(p)>kern_node) then goto not_found;
update_heights: @<Update the current height and depth measurements with
	respect to a glue or kern node |p|@>;
not_found: if prev_dp>split_max_depth then
		begin cur_height←cur_height+prev_dp-split_max_depth;

@ @<Use node |p| to update the current height and depth measurements...@>=
case type(p) of
hlist_node,vlist_node,rule_node: begin@t@>@;@/
	cur_height←cur_height+prev_dp+height(p); prev_dp←depth(p);
	goto not_found;
whatsit_node:@<Process whatsit |p| in |vert_break| loop, |goto not_found|@>;
glue_node: if precedes_break(prev_p) then pi←0
	else goto update_heights;
kern_node: begin if link(p)=null then t←penalty_node
	else t←type(link(p));
	if t=glue_node then pi←0@+else goto update_heights;
penalty_node: pi←penalty(p);
mark_node,ins_node: goto not_found;
othercases confusion("vertbreak")
@:confusion vertbreak}{\quad vertbreak@>

@ @<Check if node |p| is a new champion breakpoint; then \(go)...@>=
if pi<inf_penalty then
	begin @<Compute the badness, |b|, using |awful_bad|
		if the box is too full@>;
	if b≤inf_bad then
		if pi≤eject_penalty then b←pi@+else b←b+pi;
	if b≤least_badness then
		begin best_place←p; least_badness←b;
	if (b=awful_bad)∨(pi≤eject_penalty) then goto done;

@ @<Compute the badness, |b|, using |awful_bad| if the box is too full@>=
if cur_height<h then
	if (active_height[3]≠0) or (active_height[4]≠0) or
		(active_height[5]≠0) then b←0
	else b←badness(h-cur_height,active_height[2])
else if cur_height-h>active_height[6] then b←awful_bad
else b←badness(cur_height-h,active_height[6])

@ Vertical lists that are subject to the |vert_break| procedure should not
contain infinite shrinkability, since that would permit any amount of
information to ``fit'' on one page.

@<Update the current height and depth measurements with...@>=
if type(p)=kern_node then q←p
else	begin q←glue_ptr(p);
	if (shrink_order(q)≠normal)∧(shrink(q)≠0) then
		print_nl("! Infinite glue shrinkage found in box being split");@/
@.Infinite glue shrinkage...@>
		help4("The box you are \vsplitting contains some infinitely")@/
			("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/
			("Such glue doesn't belong there; but you can safely proceed,")@/
			("since the offensive shrinkability has been made finite.");
		error; r←new_spec(q); shrink_order(r)←normal; delete_glue_ref(q);
cur_height←cur_height+prev_dp+width(q); prev_dp←0

@ Now we are ready to consider |vsplit| itself. Most of
its work is accomplished by the two subroutines that we have just considered.

Given the number of a vlist box |n|, and given a desired page height |h|,
the |vsplit| function finds the best initial segment of the vlist and
returns a box for a page of height@@|h|. The remainder of the vlist, if
any, replaces the original box, after removing glue and penalties and
adjusting for |split_top_skip|. Mark nodes in the split-off box are used to
set the values of |split_first_mark| and |split_bot_mark|; we use the
fact that |split_first_mark=null| if and only if |split_bot_mark=null|.

The original box becomes ``absent'' if and only if it has been entirely
extracted.  The extracted box is ``absent'' if and only if the original
box was absent (or if it was, erroneously, an hlist box).

@p function vsplit(@!n:quarterword; @!h:scaled):pointer;
	{extracts a page of height |h| from box |n|}
label exit,done;
var v:pointer; {the box to be split}
p:pointer; {runs through the vlist}
q:pointer; {points to where the break occurs}
begin v←box(n);
if split_first_mark≠null then
	begin delete_token_ref(split_first_mark); split_first_mark←null;
	delete_token_ref(split_bot_mark); split_bot_mark←null;
@<Dispense with trivial cases of absent or bad boxes@>;
@<Look at all the marks in nodes before the break, and set the final
	link to |null| at the break@>;
q←prune_page_top(q); p←list_ptr(v); free_node(v,box_node_size);
if q=null then box(n)←null {the |eq_level| of the box stays the same}
else box(n)←vpack(q,natural);
exit: end;

@ @<Dispense with trivial cases of absent or bad boxes@>=
if v=null then
	begin vsplit←null; return;
if type(v)≠vlist_node then
	begin help2("The box you are trying to split is an \hbox.")@/
		("I can't split such boxes, so I'll leave it alone.");
	print_nl("! \vsplit needs a \vbox"); error; vsplit←null; return;
@:vsplit_}{\.{\\vsplit needs a \\vbox}@>

@ It's possible that the box begins with a penalty node that is the
``best'' break, so we must be careful to handle this special case correctly.

@<Look at all the marks...@>=
if p=q then list_ptr(v)←null
else loop@+begin if type(p)=mark_node then
		if split_first_mark=null then
			begin split_first_mark←mark_ptr(p);
		else	begin delete_token_ref(split_bot_mark);
	if link(p)=q then
		begin link(p)←null; goto done;
@* \[44] The page builder.
When \TeX\ appends new material to its main vlist in vertical mode, it uses
a method something like |vsplit| to decide where a page ends, except that
the calculations are done ``on line'' as new items are placed on the list.
The main complication in this process is that insertions have to be put
into their boxes and removed from the vlist, in a more-or-less optimum manner.

We shall use the term ``current page'' for that part of the main vlist that
is being considered as a candidate for being broken off and sent to the
user's output routine. The current page starts at |link(page_head)|, and
it ends at |page_tail|.  We have |page_head=page_tail| if this list is empty.
@^current page@>

Utter chaos would reign if the user kept changing page specifications
while a page is being constructed, so the page builder keeps the pertinent
specifications frozen as soon as the page receives its first box or
insertion.  The global variable |page_contents| is |empty| when the
current page contains only mark nodes and content-less whatsit nodes; it
is |inserts_only| if the page contains only insertion nodes in addition to
marks and whatsits.  Glue nodes, kern nodes, and penalty nodes are
discarded until a box or rule node appears, at which time |page_contents|
changes to |box_there|.  As soon as |page_contents| becomes non-|empty|,
the current |vsize| and |max_depth| are squirreled away into |page_size|
and |page_depth_max|; the latter values will be used until the page has
been forwarded to the user's output routine. The \.{\\topskip} adjustment
is made when |page_contents| changes to |box_there|.

Although |page_size| starts out equal to |vsize|, it is decreased by the
scaled natural height-plus-depth of the insertions considered so far, and by
the \.{\\skip} corrections for those insertions. Therefore it represents
the size into which the non-inserted material should fit, assuming that
all insertions in the current page have been made.

The variables |best_page_break| and |least_page_badness| correspond to the
variables |best_place| and |least_badness| in the |vert_break| routine
that we have already studied; i.e., they record the location and value of
the best place currently known for breaking the current page. The value of
|page_size| at the time of the best break is stored in |best_size|.

@d inserts_only=1
	{|page_contents| when an insert node has been contributed, but no boxes}
@d box_there=2 {|page_contents| when a box or rule has been contributed}

@!page_tail:pointer; {the final node on the current page}
@!page_contents:empty..box_there; {what is on the current page so far?}
@!page_size:scaled; {desired height of information on page being built}
@!page_depth_max:scaled; {maximum box depth on page being built}
@!best_page_break:pointer; {break here to get the best page known so far}
@!least_page_badness:integer; {the score for this currently best page}
@!best_size:scaled; {its |page_size|}

@ The page builder has another data structure to keep track of insertions.
This is a list of four-word nodes, starting and ending at |page_ins_head|.
That is, the first element of the list is node |r@t$↓1$@>=link(page_ins_head)|;
node $r↓j$ is followed by |r@t$↓{j+1}$@>=link(r@t$↓j$@>)|; and if there are
|n| items we have |r@t$↓{n+1}$@>=page_ins_head|. The |subtype| field of
each node in this list refers to an insertion number; for example, `\.{\\insert
250}' would correspond to a node whose |subtype| is |qi(250)|
(the same as the |subtype| field of the relevant |ins_node|). These |subtype|
fields are in increasing order, and |subtype(page_ins_head)=
qi(255)|, so |page_ins_head| serves as a convenient sentinel
at the end of the list. A record is present for each insertion number that
appears in the current page.

The |type| field in these nodes distinguishes two possibilities that
might occur as we look ahead before deciding on the optimum page break.
If |type(r)=inserting|, then |width(r)| contains the total of the
height-plus-depth dimensions of the box and all its inserts seen so far.
If |type(r)=split_up|, then no more insertions will be made into this box,
because at least one previous insertion was too big to fit on the current
page; |broken_ptr(r)| points to the node where that insertion will be
split, if \TeX\ decides to split it, |broken_ins(r)| points to the
insertion node that was tentatively split, and |width(r)| includes the
natural height plus depth of the part that would be split off.

In both cases, |last_ins_ptr(r)| points to the last |ins_node|
encountered for box |qo(subtype(r))| that would be at least partially
inserted on the next page; and |best_ins_ptr(r)| points to the last
such |ins_node| that should actually be inserted, to get the page with
minimum badness among all page breaks considered so far. We have
|best_ins_ptr(r)=null| if and only if no insertion for this box should
be made to produce this optimum page.

@d page_ins_node_size=4 {number of words for a page insertion node}
@d inserting=0 {an insertion class that has not yet overflowed}
@d split_up=1 {an overflowed insertion class}
@d broken_ptr(#)==link(#+3)
	{an insertion for this class will break here if anywhere}
@d broken_ins(#)==info(#+3) {this insertion might break at |broken_ptr|}
@d last_ins_ptr(#)==link(#+2) {the most recent insertion for this |subtype|}
@d best_ins_ptr(#)==info(#+2) {the optimum most recent insertion}

@<Initialize the special list heads...@>=
type(page_ins_head)←split_up; link(page_ins_head)←page_ins_head;

@ An array |page_so_far| records the heights and depths of everything that
on the current page. This array contains six |scaled| numbers, like the
similar arrays already considered in |line_break| and |vert_break|. The
value of |page_so_far[1]| is also called |cur_page_height|.  The stretch
and shrink components of the \.{\\skip} corrections for each insertion are
included in |page_so_far|, but the natural space components of these
corrections are not, since they have been subtracted from |page_size|.

Another variable, |cur_page_depth|, records the depth of the current page,
adjusted to be at most |page_depth_max|. The variable |last_page_glue|
points to the glue specification of the most recent node contributed from
the contribution list, if this was a glue node; otherwise
|last_page_glue=max_halfword|. (If the contribution list is nonempty,
however, the value of |last_page_glue| is not necessarily accurate.)  And
finally, |ins_penalties| holds the sum of the penalties associated with
all split insertions.

@d cur_page_height==page_so_far[1] {height of the current page}

@!page_so_far:array [1..6] of scaled; {height and glue of the current page}
@!cur_page_depth:scaled; {depth of the current page}
@!last_page_glue:pointer; {used to implement \.{\\lastskip}}
@!ins_penalties:integer; {sum of the penalties for split insertions}

@ @d print_plus_end(#)==print(#);@+end
@d print_plus(#)==if page_so_far[#]≠0 then
	begin print(" plus "); print_scaled(page_so_far[#]); print_plus_end

@<Show the status of the current page@>=
if page_contents>empty then
	begin print_nl("total height "); print_scaled(page_so_far[1]);
	if page_so_far[6]≠0 then
		begin print(" minus "); print_scaled(page_so_far[6]);
	print_nl(" goal height"); print_scaled(page_size);
	while r≠page_ins_head do
		begin print_ln; print_esc("insert"); a←qo(subtype(r));
		print_int(a); print(" adds ");
		a←x_over_n(width(r),1000)*count(a); print_scaled(a);
		if type(r)=split_up then
			begin q←page_head; a←0;
			repeat q←link(q);
			if (type(q)=ins_node)∧(subtype(q)=subtype(r)) then incr(a);
			until q=broken_ins(r);
			print(", #"); print_int(a); print(" might split");

@ Here is a procedure that is called when the |page_contents| is changing
from |empty| to |inserts_only| or |box_there|.

@d set_page_so_far_zero(#)==page_so_far[#]←0

@p procedure freeze_page_specs(@!s:small_number);
begin page_contents←s;
page_size←vsize; page_depth_max←max_depth;
cur_page_depth←0; do_all_six(set_page_so_far_zero);

@ Pages are built by appending nodes to the current list in \TeX's
vertical mode, which is at the outermost level of the semantic nest. This
vlist is split into two parts; the ``current page'' that we have been
talking so much about already, and the ``contribution list'' that receives
new nodes as they are created.  The current page contains everything that
the page builder has accounted for in its data structures, as described
above, while the contribution list contains other things that have been
generated by other parts of \TeX\ but not yet seen by the page builder.
The contribution list starts at |link(contrib_head)|, and it ends at the
current node in \TeX's vertical mode.

When \TeX\ has appended new material in vertical mode, it calls the procedure
|build_page|, which tries to catch up by moving nodes from the contribution
list to the current page. This procedure will succeed in its goal of
emptying the contribution list, unless a page break is discovered, i.e.,
unless the current page has grown to the point where the optimum next
page break has been determined. In the latter case, the nodes after the
optimum break will go back onto the contribution list, and control will
effectively pass to the user's output routine.

We make |type(page_head)=glue_node|, so that an initial glue node on
the current page will not be considered a valid breakpoint.

@<Initialize the special list...@>=
type(page_head)←glue_node; subtype(page_head)←normal;

@ The global variable |output_active| is true during the time the
user's output routine is driving \TeX.

@!output_active:boolean; {are we in the midst of an output routine?}

@ @<Set init...@>=

@ The page builder is ready to start a fresh page if we initialize
the following state variables. (However, the page insertion list is initialized

@<Start a new current page@>=
page_contents←empty; page_tail←page_head; link(page_head)←null;@/
last_page_glue←max_halfword; ins_penalties←0;
cur_page_depth←0; page_depth_max←0;

@ \TeX\ is not always in vertical mode at the time |build_page|
is called; the current mode reflects what \TeX\ should return to, after
the contribution list has been emptied. A call on |build_page| should
be immediately followed by `|goto big_switch|', which is \TeX's central
control point.

@d contribute=80 {go here to link a node into the current page}

@p procedure build_page; {append contributions to the current page}
label exit,done,done1,continue,contribute,update_heights;
var p:pointer; {the node being appended}
@!q,@!r,@!s:pointer; {nodes being examined}
@!prev_p:pointer; {precedessor of |p|}
@!b:integer; {badness of current page}
@!pi:integer; {penalty to be added to the badness}
@!n:min_quarterword..255; {insertion box number}
@!delta,@!h,@!w:scaled; {sizes used for insertion calculations}
@!wait:boolean; {should the present insertion be held over?}
begin if (link(contrib_head)=null)∨ output_active then return;
repeat continue: p←link(contrib_head);@/
@<Update the value of |last_page_glue|@>;
@<Move node |p| to the current page; if it is time for a page break,
	put the nodes following the break back onto the contribution list,
	and |return| to the user's output routine if there is one@>;
until link(contrib_head)=null;
@<Make the contribution list empty by setting its tail to |contrib_head|@>;

@ @d contrib_tail==nest[0].tail_field {tail of the contribution list}

@<Make the contribution list empty...@>=
if nest_ptr=0 then tail←contrib_head {vertical mode}
else contrib_tail←contrib_head {other modes}

@ @<Update the value of |last_page_glue|@>=
if last_page_glue≠max_halfword then delete_glue_ref(last_page_glue);
if type(p)=glue_node then
	begin last_page_glue←glue_ptr(p); add_glue_ref(last_page_glue);
else last_page_glue←max_halfword

@ The code here is an example of a many-way switch into routines that
merge together in different places. Some people call this unstructured
programming, but the author doesn't see much wrong with it, as long as
@^Knuth, Donald Ervin@>
the various labels have a well-understood meaning.

@<Move node |p| to the current page; ...@>=
@<If the current page is empty and node |p| is to be deleted, |goto done1|;
	otherwise use node |p| to update the state of the current page;
	if this node is an insertion, |goto contribute|; otherwise if this node
	is not a legal breakpoint, |goto contribute| or |update_heights|;
	otherwise set |pi| to the penalty associated with this breakpoint@>;
@<Check if node |p| is a new champion breakpoint; then \(if)if it is time for
	a page break, prepare for output, and either fire up the user's
	output routine and |return| or ship out the page and |goto done|@>;
if (type(p)<glue_node)∨(type(p)>kern_node) then goto contribute;
update_heights:@<Update the current page measurements with respect to the
	glue or kern specified by node |p|@>;
contribute: @<Make sure that |page_depth_max| is not exceeded@>;
@<Link node |p| into the current page and |goto done|@>;
done1:@<Recycle node |p|@>;

@ @<Link node |p| into the current page and |goto done|@>=
link(page_tail)←p; page_tail←p;
link(contrib_head)←link(p); link(p)←null; goto done

@ @<Recycle node |p|@>=
link(contrib_head)←link(p); link(p)←null; flush_node_list(p)

@ The title of this section is already so long, it seems best to avoid
making it more accurate but still longer, by mentioning the fact that a
kern node at the end of the contribution list will not be contributed until
we know its successor.

@<If the current page is empty...@>=
case type(p) of
hlist_node,vlist_node,rule_node: if page_contents<box_there then
		@<Initialize the current page, insert the \.{\\topskip} glue
			ahead of |p|, and |goto continue|@>
	else @<Prepare to move a box or rule node to the current page,
		then |goto contribute|@>;
whatsit_node: @<Prepare to move whatsit |p| to the current page,
	then |goto contribute|@>;
glue_node: if page_contents<box_there then goto done1
	else if precedes_break(page_tail) then pi←0
	else goto update_heights;
kern_node: if page_contents<box_there then goto done1
	else if link(p)=null then return
	else if type(link(p))=glue_node then pi←0
	else goto update_heights;
penalty_node: if page_contents<box_there then goto done1@+else pi←penalty(p);
mark_node: goto contribute;
ins_node: @<Append an insertion to the current page and |goto contribute|@>;
othercases confusion("page")
@:confusion page}{\quad page@>

@ @<Initialize the current page, insert the \.{\\topskip} glue...@>=
begin if page_contents=empty then freeze_page_specs(box_there)
else page_contents←box_there;
q←new_skip_param(top_skip_code); link(q)←p; {now |temp_ptr=glue_ptr(q)|}
if width(temp_ptr)>height(p) then width(temp_ptr)←width(temp_ptr)-height(p)
else width(temp_ptr)←0;
link(q)←p; link(contrib_head)←q; goto continue;

@ @<Prepare to move a box or rule node to the current page...@>=
begin cur_page_height←cur_page_height+cur_page_depth+height(p);
goto contribute;

@ @<Make sure that |page_depth_max| is not exceeded@>=
if cur_page_depth>page_depth_max then
	begin cur_page_height←@|

@ @<Update the current page measurements with respect to the glue...@>=
if type(p)=kern_node then q←p
else begin q←glue_ptr(p);
	if (shrink_order(q)≠normal)∧(shrink(q)≠0) then
		print_nl("! Infinite glue shrinkage found on current page");@/
@.Infinite glue shrinkage...@>
		help4("The page about to be output contains some infinitely")@/
			("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/
			("Such glue doesn't belong there; but you can safely proceed,")@/
			("since the offensive shrinkability has been made finite.");
		r←new_spec(q); shrink_order(r)←normal; delete_glue_ref(q);
cur_page_height←cur_page_height+cur_page_depth+width(q); cur_page_depth←0

@ @<Check if node |p| is a new champion breakpoint; then \(if)...@>=
if pi<inf_penalty then
	begin @<Compute the badness, |b|, of the current page,
		using |awful_bad| if the box is too full@>;
	if b≤inf_bad then
		if pi≤eject_penalty then b←pi
		else b←b+pi+ins_penalties;
	if b≤least_page_badness then
		begin best_page_break←p; best_size←page_size;
		while r≠page_ins_head do
			begin best_ins_ptr(r)←last_ins_ptr(r);
	if (b=awful_bad)∨(pi≤eject_penalty) then
		@<Prepare to output the current page at the best place;
			then fire up the user's output routine and |return|,
			or ship out the page and |goto done|@>;

@ @<Compute the badness, |b|, of the current page...@>=
if cur_page_height<page_size then
	if (page_so_far[3]≠0) or (page_so_far[4]≠0) or@|
		(page_so_far[5]≠0) then b←0
	else b←badness(page_size-cur_page_height,page_so_far[2])
else if cur_page_height-page_size>page_so_far[6] then b←awful_bad
else b←badness(cur_page_height-page_size,page_so_far[6])

@ @<Append an insertion to the current page and |goto contribute|@>=
begin if page_contents=empty then freeze_page_specs(inserts_only);
n←subtype(p); r←page_ins_head;
while n≥subtype(link(r)) do r←link(r);
if subtype(r)≠qi(n) then
	@<Create a page insertion node with |subtype(r)=qi(n)|, and
		include the glue correction for box |n| in the
		current page state@>;
if type(r)=inserting then
	begin last_ins_ptr(r)←p;
		{this much room is left if we shrink the maximum}
	h←x_over_n(width(p),1000)*count(n); {this much room is needed}
	if (h≤delta)∧(width(p)+width(r)≤dimen(n)) then
		begin page_size←page_size-h; width(r)←width(r)+width(p);
	else @<Find the best way to split the insertion, and change
		|type(r)| to |split_up|@>;
goto contribute;

@ We take note of the value of \.{\\skip} |n| and the height plus depth
of \.{\\box}@@|n| only when the first \.{\\insert}@@|n| node is
encountered for a new page. A user who changes the contents of \.{\\box}@@|n|
after that first \.{\\insert}@@|n| had better be either extremely careful
or extremely lucky, or both.

@<Create a page insertion node...@>=
begin q←get_node(page_ins_node_size); link(q)←link(r); link(r)←q; r←q;
subtype(r)←qi(n); type(r)←inserting;
if box(n)=null then width(r)←0
else width(r)←height(box(n))+depth(box(n));
q←skip(n); h←x_over_n(width(r),1000)*count(n);
if (shrink_order(q)≠normal)∧(shrink(q)≠0) then
	begin print_nl("! Infinite glue shrinkage inserted from \skip ");
@.Infinite glue shrinkage@>
	help3("The correction glue for page breaking with insertions")@/
		("must have finite shrinkability. But you may proceed,")@/
		("since the offensive shrinkability has been made finite.");

@ Here is the code that will split a long footnote between pages, in an
emergency. The current situation deserves to be recapitulated: Node |p|
is an insertion into box |n|; the insertion will not fit, in its entirety,
either because it would make the total contents of box |n| greater than
\.{\\dimen} |n|, or because it would make the incremental amount of growth
|h| greater than the available space |delta|, or both. (This amount |h| has
been weighted by the insertion scaling factor, i.e., by \.{\\count} |n|
over 1000.) Now we will choose the best way to break the vlist of the
insertion, using the same criteria as in the \.{\\vsplit} operation.

@<Find the best way to split the insertion...@>=
begin if count(n)≤0 then w←max_dimen
else w←x_over_n(delta,count(n))*1000;
if w>dimen(n)-width(r) then w←dimen(n)-width(r);
type(r)←split_up; broken_ptr(r)←q; broken_ins(r)←p;
if q=null then ins_penalties←ins_penalties+eject_penalty
else if type(q)=penalty_node then ins_penalties←ins_penalties+penalty(q);

@ When the page builder has looked at as much material as could appear before
the next page break, it makes its decision. The break that gave minimum
badness will be used to put a completed ``page'' into box 255, with insertions
appended to their other boxes.

We also set the values of |top_mark|, |first_mark|, and |bot_mark|. The
program uses the fact that |bot_mark≠null| implies |first_mark≠null|;
|bot_mark=null| implies |top_mark=first_mark=null|.

@<Prepare to output the current page at the best place...@>=
begin @<Set the value of |output_penalty|@>;
if bot_mark≠null then
	begin if top_mark≠null then delete_token_ref(top_mark);
	top_mark←bot_mark; add_token_ref(top_mark);
	delete_token_ref(first_mark); first_mark←null;
@<Put the \(o)optimal current page into box 255, update |first_mark| and
	|bot_mark|, append insertions to their boxes, and put the
	remaining nodes back on the contribution list@>;
if (top_mark≠null)∧(first_mark=null) then
	begin first_mark←top_mark; add_token_ref(top_mark);
if output_routine≠null then @<Fire up the user's output routine and |return|@>
else @<Perform the default output routine and |goto done|@>;

@ @<Set the value of |output_penalty|@>=
if type(best_page_break)=penalty_node then
	begin geq_word_define(int_base+output_penalty_code,penalty(best_page_break));
else geq_word_define(int_base+output_penalty_code,inf_penalty)

@ As the page is finally being prepared for output, |p| runs through the vlist,
with |prev_p| trailing behind, and |q| is the tail of a list of insertions that
are being held over for a subsequent page.

@<Put the \(o)optimal current page into box 255...@>=
if p=best_page_break then best_page_break←null; {|p| not yet linked in}
@<Ensure that box 255 is empty before output@>;
@<Prepare all the boxes involved in insertions to act as queues@>;
q←hold_head; link(q)←null; prev_p←page_head; p←link(prev_p);
while p≠best_page_break do
	begin if type(p)=ins_node then @<Either insert the material
		specified by node |p| into the appropriate box, or
		hold it for the next page; also delete node |p| from
		the current page@>
	else if type(p)=mark_node then @<Update the values of
		|first_mark| and |bot_mark|@>;
	prev_p←p; p←link(prev_p);
@<Break the current page at node |p|, put it in box@@255,
	and put the remaining nodes on the contribution list@>;
@<Delete the page-insertion nodes@>

@ @<Ensure that box 255 is empty before output@>=
if box(255)≠null then
	begin print_nl("! \box255 is not absent");
@:box255}{\.{\\box255 is not absent}@>
	help2("You shouldn't use \box255 except in \output routines.")@/
		("Proceed, and I'll discard its present contents.");
	error; flush_node_list(box(255)); box(255)←null;

@ @<Update the values of |first_mark| and |bot_mark|@>=
begin if first_mark=null then
	begin first_mark←mark_ptr(p);
if bot_mark≠null then delete_token_ref(bot_mark);
bot_mark←mark_ptr(p); add_token_ref(bot_mark);

@ When the following code is executed, the current page runs from node
|link(page_head)| to node |prev_p|, and the nodes from |p| to |page_tail|
are to be placed back at the front of the contribution list. Furthermore
the heldover insertions appear in a list from |link(hold_head)| to |q|; we
will put them into the current page list for safekeeping while the user's
output routine is active.  We might have |q=hold_head|; and |p=null| if
and only if |prev_p=page_tail|.

@<Break the current page at node |p|, put it...@>=
if p≠null then
	begin if link(contrib_head)=null then
		if nest_ptr=0 then tail←page_tail
		else contrib_tail←page_tail;
@<Start a new current page@>;
if q≠hold_head then
	begin link(page_head)←link(hold_head); page_tail←q;

@ If many insertions are supposed to go into the same box, we want to know
the position of the last node in that box, so that we don't need to waste time
when linking further information into it. The |last_ins_ptr| fields of the
page insertion nodes are therefore used for this purpose during the
packaging phase.

@<Prepare all the boxes involved in insertions to act as queues@>=
while r≠page_ins_head do
	begin if best_ins_ptr(r)≠null then
		begin n←qo(subtype(r));
		if box(n)=null then box(n)←new_null_box;
		while link(p)≠null do p←link(p);

@ @<Delete the page-insertion nodes@>=
while r≠page_ins_head do
	begin q←link(r); free_node(r,page_ins_node_size); r←q;

@ We will set |best_ins_ptr←null| and package the box corresponding to
insertion node@@|r|, just after making the final insertion into that box.
If this final insertion is `|split_up|', the remainder after splitting
and pruning (if any) will be carried over to the next page.

@<Either insert the material specified by node |p| into...@>=
begin r←link(page_ins_head);
while subtype(r)≠subtype(p) do r←link(r);
if best_ins_ptr(r)=null then wait←true
else	begin wait←false; s←ins_ptr(p); {|s≠null|}
	link(last_ins_ptr(r))←s; s←last_ins_ptr(r);
	if best_ins_ptr(r)=p then
		@<Wrap up the box specified by node |r|, splitting node |p| if
		called for; set |wait←true| if node |p| holds a remainder after
	else	begin while link(s)≠null do s←link(s);
@<Either append the insertion node |p| after node |q|, and remove it
	from the current page, or delete |node(p)|@>;

@ @<Wrap up the box specified by node |r|, splitting node |p| if...@>=
begin if type(r)=split_up then
	if (broken_ins(r)=p)∧(broken_ptr(r)≠null) then 
		begin while link(s)≠broken_ptr(r) do s←link(s);
		if ins_ptr(p)≠null then
			begin temp_ptr←vpack(ins_ptr(p),natural);
		link(s)←null; wait←(ins_ptr(p)≠null);

@ @<Either append the insertion node |p|...@>=
link(prev_p)←link(p); link(p)←null;
if wait then
	begin link(q)←p; q←p;
else free_node(p,ins_node_size);

@ The list of heldover insertions, running from |link(page_head)| to
|page_tail|, must be moved to the contribution list when the user has
specified no output routine.

@<Perform the default output routine and |goto done|@>=
begin if link(page_head)≠null then
	begin if link(contrib_head)=null then
		if nest_ptr=0 then tail←page_tail@+else contrib_tail←page_tail
	else link(page_tail)←link(contrib_head);
	link(page_head)←null; page_tail←page_head;
ship_out(box(255)); box(255)←null; goto done;

@ @<Fire up the user's output routine and |return|@>=
begin output_active←true;
push_nest; mode←-vmode; prev_depth←ignore_depth; mode_line←-line;

@ When the user's output routine finishes, it has constructed a vlist
in internal vertical mode, and \TeX\ will do the following:

@<Resume the page builder after an output routine has come to an end@>=
begin end_graf; unsave; output_active←false;
@<Ensure that box 255 is empty after output@>;
if tail≠head then {current list goes after heldover insertions}
	begin link(page_tail)←link(head);
if link(page_head)≠null then {and both go before heldover contributions}
	begin if link(contrib_head)=null then contrib_tail←page_tail;
	link(page_head)←null; page_tail←page_head;
pop_nest; build_page;

@ @<Ensure that box 255 is empty after output@>=
if box(255)≠null then
	begin help3("Your \output commands should empty \box255,")@/
		("e.g., by saying `\shipout\box255'.")@/
		("Proceed; I'll discard its present contents.");
	print_nl("! \output routine didn't use all of \box255"); error;
@:output_}{\.{\\output routine didn't use...}@>
	flush_node_list(box(255)); box(255)←null;
@* \[45] The chief executive.
\hskip-1.1pt % avoid overfull box in published version!
We come now to the |main_control| routine, which contains the master
switch that causes all the various pieces of \TeX\ to do their things,
in the right order.

In a sense, this is the grand climax of the program: It applies all of the
tools that we have worked so hard to construct. In another sense, this is
the messiest part of the program: It necessarily refers to other pieces
of code all over the place, so that a person can't fully understand what is
going on without paging back and forth to be reminded of conventions that
are defined elsewhere. We are now at the hub of the web, the central nervous
system that touches most of the other parts and ties them together.

The structure of |main_control| itself is quite simple. There's a label
called |big_switch|, at which point the next token of input is fetched
using |get_nc_token|. Then the program branches at high speed into one of
about 100 possible directions, based on the the value of the current
mode and the newly fetched command code; the sum |abs(mode)+cur_cmd|
indicates what to do next. For example, the case `|vmode+letter|' arises
when a letter occurs in vertical mode (or internal vertical mode); this
case leads to instructions that initialize a new paragraph and enter
horizontal mode.

The big |case| statement that contains this multiway switch has been labeled
|reswitch|, so that the program can |goto reswitch| when the next token
has already been fetched. Most of the cases are quite short; they call
an ``action procedure'' that does the work for that case, and then they
either |goto reswitch| or they ``fall through'' to the end of the |case|
statement, which returns control back to |big_switch|. Thus, |main_control|
is not an extremely large procedure, in spite of the multiplicity of things
it must do; it is small enough to be handled by \PASCAL\ compilers that put
severe restrictions on procedure size.
@!@^action procedure@>

One case is singled out for special treatment, because it accounts for most
of \TeX's activities in typical applications. The process of reading simple
text and converting it into |char_node| records, while looking for ligatures
and kerns, is part of \TeX's ``inner loop''; the whole program runs
efficiently when its inner loop is fast, so this part has been written
with special care.

@ We shall concentrate first on the inner loop of |main_control|, deferring
consideration of the other cases until later.
@^inner loop@>

@d big_switch=60 {go here to branch on the next token of input}
@d main_loop=70 {go here to typeset |cur_chr| in the current font}
@d main_loop_1=71 {like |main_loop|, but |(f,c)| = current font and char}
@d main_loop_2=72 {like |main_loop_1|, but |c| is known to be in range}
@d main_loop_3=73 {like |main_loop_2|, but several variables are set up}
@d append_normal_space=74 {go here to append a normal space between words}

@p @t\4@>@<Declare action procedures for use by |main_control|@>@;
@t\4@>@<Declare the procedure called |handle_right_brace|@>@;
procedure main_control; {governs \TeX's activities}
label big_switch,reswitch,main_loop,main_loop_1,main_loop_2,main_loop_3,
var t:integer; {general-purpose temporary variable}
@<Local variables for the inner loop of |main_control|@>@;
begin big_switch: get_nc_token;@/
reswitch: @<Give diagnostic information, if requested@>;
case abs(mode)+cur_cmd of
hmode+letter,hmode+other_char: goto main_loop;
hmode+char_num: begin scan_char_num; cur_chr←cur_val; goto main_loop;
hmode+spacer: if space_factor=1000 then goto append_normal_space
	else app_space;
hmode+ex_space,mmode+ex_space: goto append_normal_space;
@t\4@>@<Cases of |main_control| that are not part of the inner loop@>@;
end; {of the big |case| statement}
goto big_switch;
main_loop:@<Append character |cur_chr| and the following characters (if@@any)
	to the current hlist in the current font; |goto reswitch| when
	a non-character has been fetched@>;
append_normal_space:@<Append a normal inter-word space to the current list,
	then |goto big_switch|@>;

@ When a new token has just been fetched at |big_switch|, we have an
ideal place to monitor \TeX's activity.

@<Give diagnostic information, if requested@>=
if interrupt>0 then if OK_to_interrupt then
	begin back_input; pause_for_instructions; goto big_switch;
debug if panicking then check_mem(false);@+@;@+gubed
if tracing_commands≠0 then
	begin begin_diagnostic; print_nl("{");
	if mode≠shown_mode then
		begin print_mode(mode); print(": "); shown_mode←mode;
	print_cmd_chr(cur_cmd,cur_chr); print_char("}");

@ In the following program, |l| is the current character or ligature;
it might grow into a longer ligature. One or more characters has been
used to define |l|, and the last of these was |c|. The chief use of |c|
will be to modify |space_factor| and to insert discretionary nodes after
explicit hyphens in the text.

@<Local variables for the inner loop of |main_control|@>=
@!l:quarterword; {the current character or ligature}
@!c:eight_bits; {the most recent character}
@!f:internal_font_number; {the current font}
@!r:halfword; {the next character for ligature/kern matching}
@!p:pointer; {the current |char_node|}
@!k:0..font_mem_size; {index into |font_info|}
@!q:pointer; {where a ligature should be detached}
@!i:four_quarters; {character information bytes for |l|}
@!j:four_quarters; {ligature/kern command}
@!s:integer; {space factor code}
@!ligature_present:boolean; {should a ligature node be made?}

@ @<Append character |cur_chr| and the following characters...@>=
@^inner loop@>
if f=undefined_font then
	begin missing_font; goto big_switch;
main_loop_1: if (c<font_bc[f])∨(c>font_ec[f]) then
	begin char_warning(f,c); goto big_switch;
main_loop_2: q←tail; ligature_present←false; l←qi(c);
main_loop_3: @<Adjust \(t)the space factor,
	based on its current value and |c|@>;
@<Append character |l| and the following characters (if any) to the current
	hlist, in font |f|; if |ligature_present|, detach a ligature node
	starting at |link(q)|; if |c="-"|, append a null |disc_node|;
	finally |goto reswitch|@>

@ @<Declare action...@>=
procedure missing_font;
begin print_nl("! No font has been selected");
@.No font has been selected@>
help4("You have to use `\:' to define the current font,")@/
("before you can typeset characters or accents.")@/
("If you aren't trying to typeset anything just now, something")@/
("else is wrong up above---a missing } or %, or a bad \chcode?");

@ We leave |space_factor| unchanged if |sf_code(c)=0|; otherwise we set it
to |sf_code(c)|, except that the space factor never changes from a value
less than 1000 to a value exceeding 1000. If |c≥128|, its |sf_code| is
implicitly 1000. The most common case is |sf_code(c)=1000|, so we want
that case to be fast.
@^inner loop@>

@<Adjust \(t)the space factor...@>=
if c<128 then
	begin s←sf_code(c);
	if s=1000 then space_factor←1000
	else if s<1000 then
		begin if s>0 then space_factor←s;
	else if space_factor<1000 then space_factor←1000
	else space_factor←s;
else space_factor←1000

@ Still in the inner loop here.
@^inner loop@>

@<Append character |l| and the following...@>=
if char_exists(i) then
	begin p←avail; {avoid |get_avail| if possible, to save time}
	if p=null then p←get_avail
	else	begin avail←link(p); link(p)←null;
	stat incr(dyn_used);@+tats@/
	font(p)←f; character(p)←qi(c);
	link(tail)←p; tail←p;
else char_warning(f,qo(l));
@<Look ahead for ligature or kerning, either continuing the main loop
	or going to |reswitch|@>

@ The result of \.{\\char} can participate in a ligature or kern, so
we must look ahead for it.
@^inner loop@>

@<Look ahead for ligature...@>=
get_next; {set only |cur_cmd| and |cur_chr|}
if (cur_cmd=letter)∨(cur_cmd=other_char) then r←qi(cur_chr)
else	begin nc_token; {set |cur_cmd|, |cur_chr|, |cur_tok|}
	if cur_cmd=char_num then
		begin scan_char_num; r←qi(cur_val);
	else r←qi(256); {this flag means that no character follows}
if (char_tag(i)=lig_tag)∧(r≠qi(256)) then
	@<Follow the lig/kern program; |goto main_loop_3| if scoring a hit@>;
@<Make a ligature node, if |ligature_present|; insert a discretionary
	node for an explicit hyphen, if |c="-"|@>;
if r=qi(256) then goto reswitch; {|cur_cmd|, |cur_chr|, |cur_tok| are untouched}
c←qo(r); goto main_loop_1 {|f| is still valid}

@ Even though comparatively few characters have a lig/kern program, the |repeat|
construction here counts as part of \TeX's inner loop, since it involves a
potentially long sequential search. For example, tests with one commonly
used font showed that about 40 per cent of all characters had a lig/kern
program, and the |repeat| loop was performed about four times for every
such character.
@^inner loop@>

@<Follow the lig/kern...@>=
begin k←lig_kern_start(f)(i);
repeat j←font_info[k].qqqq; {fetch a lig/kern command}
if next_char(j)=r then
	if op_bit(j)<kern_flag then @<Extend a ligature, |goto main_loop_3|@>
	else @<Append a kern, |goto main_loop_2|@>;
until stop_bit(j)≥stop_flag;

@ @<Append a kern,...@>=
begin @<Make a ligature node,...@>;
c←qo(r); goto main_loop_2;

@ @<Make a ligature node,...@>=
@^inner loop@>
if ligature_present then
	begin p←new_ligature(f,l,link(q)); link(q)←p; tail←p;
if c="-" then tail_append(new_disc)

@ @<Extend a ligature...@>=
begin ligature_present←true; l←rem_byte(j); c←qo(r); goto main_loop_3;

@ The occurrence of blank spaces is almost part of \TeX's inner loop,
since we usually encounter about one space for every five non-blank characters.
There\-fore |main_control| gives second-highest priority to ordinary spaces.

When a glue parameter like \.{\\spaceskip} is set to `\.{0pt}', we will
see to it later that the corresponding glue specification is precisely
|zero_glue|, not merely a pointer to some other specification that happens
to be full of zeroes. Therefore it is simple to test whether a glue parameter
is zero or not.

@<Append a normal inter-word space...@>=
if space_skip=zero_glue then
	begin @<Find the glue specification, |p|, for 
		text spaces in the current font@>;
else q←new_param_glue(space_skip_code);
link(tail)←q; tail←q;
goto big_switch

@ Having |font_glue| allocated for each text font saves both time and memory.
If any of the three spacing parameters are subsequently changed by the
use of \.{\\texinfo}, the |scan_tex_info| procedure deallocates the
|font_glue| specification allocated here.

@<Find the glue specification...@>=
begin p←font_glue[cur_font];
if p=null then
	begin f←cur_font; p←new_spec(zero_glue); k←param_base[f]+space_code;

@ @<Declare act...@>=
procedure app_space; {handle spaces when |space_factor≠1000|}
var p:pointer; {glue specification}
@!q:pointer; {glue node}
@!f:internal_font_number; {the current font}
@!k:0..font_mem_size; {index into |font_info|}
begin if (space_factor≥2000)∧(xspace_skip≠zero_glue) then
else	begin if space_skip≠zero_glue then p←space_skip
	else @<Find the glue specification...@>;
	@<Modify the glue specification in |p| according to the space factor@>;
	q←new_glue(p); glue_ref_count(p)←null;
link(tail)←q; tail←q;

@ @<Modify the glue specification in |p| according to the space factor@>=
if space_factor≥2000 then width(p)←width(p)+extra_space(cur_font);

@ Whew---that covers the main loop. We can now proceed at a leisurely
pace through the other combinations of possibilities.

@d any_mode(#)==vmode+#,hmode+#,mmode+# {for mode-independent commands}

@<Cases of |main_control| that are not part of the inner loop@>=
any_mode(ignore_space): begin @<Get the next non-blank non-call...@>;
	goto reswitch;
vmode+input: start_input;
vmode+stop: return; {this is the only way out}
@t\4@>@<Forbidden cases detected in |main_control|@>@+@,any_mode(mac_param):
@<Math-only cases in non-math modes, or vice versa@>: begin@t@>@;@/
	insert_dollar_sign; goto reswitch;
@t\4@>@<Cases of |main_control| that build boxes and lists@>@;
@t\4@>@<Cases of |main_control| that don't depend on |mode|@>@;
@t\4@>@<Cases of |main_control| that are for extensions to \TeX@>@;

@ Either \.{\\dump} or \.{\\end} will cause |main_control| to exit,
since both of them have `|stop|' as their command code.

@<Put each...@>=
@!@:end_}{\.{\\end} primitive@>
@!@:dump_}{\.{\\dump} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
stop:if chr_code=1 then print_esc("dump")@+else print_esc("end");

@ Here is a list of cases where the user has probably gotten into or out of math
mode by mistake. \TeX\ will insert a dollar sign and rescan the current token.

@d non_math(#)==vmode+#,hmode+#

@<Math-only cases in non-math modes...@>=
non_math(sup_mark), non_math(sub_mark), non_math(math_char_num),
non_math(math_comp), non_math(delim_num), non_math(left_right),
non_math(above), non_math(radical), non_math(math_style),
non_math(vcenter), non_math(non_script), non_math(mkern),
non_math(limit_switch), non_math(mskip), non_math(math_accent),
mmode+endv, mmode+par_end, mmode+stop, mmode+vskip

@ @<Declare action...@>=
procedure insert_dollar_sign;
begin print_nl("! Missing $ inserted");
@.Missing {\$} inserted@>
help2("I've inserted a begin-math/end-math symbol since I think")@/
("you left one out. Proceed, with fingers crossed."); back_error;
cur_cmd←math_delim; cur_chr←"$"; cur_tok←math_delim_token+"$";

@ When erroneous situations arise, \TeX\ usually issues an error message
specific to the particular error. For example, `\.{\\noalign}' should
not appear in any mode, since it is recognized by the |align_peek| routine
in all of its legitimate appearances; a special error message is given
when `\.{\\noalign}' occurs elsewhere. But sometimes the most appropriate
error message is simply that the user is not allowed to do what he or she
has attempted. For example, `\.{\\input}' is allowed only in vertical mode.
Such cases are enumerated here and in the other sections referred to
under `See also $\ldotss$.'

@<Forbidden cases...@>=

@ The `|you_cant|' procedure prints a line saying that the current command
is illegal in the current mode; it identifies these things symbolically.

@<Declare action...@>=
procedure you_cant;
begin print_nl("! You can't use `");
@.You can't use x in y mode@>
print("' in "); print_mode(mode);

@ @<Declare act...@>=
procedure report_illegal_case;
begin you_cant;
help4("Sorry, but I'm not programmed to handle this case;")@/
("I'll just pretend that you didn't ask for it.")@/
("If you're in the wrong mode, you might be able to")@/
("return to the right one by typing `I}' or `I$' or `I\par'.");@/

@ Some operations are allowed only in privileged modes, i.e., in cases
that |mode>0|. The |privileged| function is used to detect violations
of this rule; it issues an error message and returns |false| if the
current |mode| is negative.

@<Declare act...@>=
function privileged:boolean;
begin if mode>0 then privileged←true
else	begin report_illegal_case; privileged←false;
@* \[46] Building boxes and lists.
The most important parts of |main_control| are concerned with \TeX's
chief mission of box-making. We need to control the activities that put
entries on vlists and hlists, as well as the activities that convert
those lists into boxes. All of the necessary machinery has already been
developed; it remains for us to ``push the buttons'' at the right times.

@ As an introduction to these routines, let's consider one of the simplest
cases: What happens when `\.{\\hrule}' occurs in vertical mode, or
`\.{\\vrule}' in horizontal mode or math mode? The code in |main_control|
is short, since the |scan_rule_spec| routine already does most of what is
required; thus, there is no need for a special action procedure.

Note that baselineskip calculations are disabled after a rule in vertical
mode, by setting |prev_depth←ignore_depth|.

@<Cases of |main_control| that build...@>=
vmode+hrule,hmode+vrule,mmode+vrule: begin tail_append(scan_rule_spec);
	if abs(mode)=vmode then prev_depth←ignore_depth;

@ We disallow \.{\\vrule} in vertical mode, \.{\\hrule} in horizontal
or math mode.


@ The processing of things like \.{\\hskip} and \.{\\vskip} is slightly
more complicated. But the code in |main_control| is very short, since
it simply calls on the action routine |append_glue|. Similarly, \.{\\kern}
activates |append_kern|.

@<Cases of |main_control| that build...@>=
vmode+vskip,hmode+hskip,mmode+hskip,mmode+mskip: append_glue;
any_mode(kern),mmode+mkern: append_kern;

@ The |hskip| and |vskip| command codes are used for control sequences
like \.{\\hss} and \.{\\vfil} as well as for \.{\\hskip} and \.{\\vskip}.
The difference is in the value of |cur_chr|.

@d fil_code=0 {identifies \.{\\hfil} and \.{\\vfil}}
@d fill_code=1 {identifies \.{\\hfill} and \.{\\vfill}}
@d ss_code=2 {identifies \.{\\hss} and \.{\\vss}}
@d fil_neg_code=3 {identifies \.{\\hfilneg} and \.{\\vfilneg}}
@d skip_code=4 {identifies \.{\\hskip} and \.{\\vskip}}
@d mskip_code=5 {identifies \.{\\mskip}}

@<Put each...@>=
@!@:hskip_}{\.{\\hskip} primitive@>
@!@:hfil_}{\.{\\hfil} primitive@>
@!@:hfill_}{\.{\\hfill} primitive@>
@!@:hss_}{\.{\\hss} primitive@>
@!@:hfil_neg_}{\.{\\hfilneg} primitive@>
@!@:vskip_}{\.{\\vskip} primitive@>
@!@:vfil_}{\.{\\vfil} primitive@>
@!@:vfill_}{\.{\\vfill} primitive@>
@!@:vss_}{\.{\\vss} primitive@>
@!@:vfil_neg_}{\.{\\vfilneg} primitive@>
@!@:mskip_}{\.{\\mskip} primitive@>
@!@:kern_}{\.{\\kern} primitive@>
@!@:mkern_}{\.{\\mkern} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
hskip: case chr_code of
	othercases print_esc("hfilneg")
vskip: case chr_code of
	othercases print_esc("vfilneg")
mskip: print_esc("mskip");
kern: print_esc("kern");
mkern: print_esc("mkern");

@ All the work relating to glue creation has been relegated to the
following subroutine. It does not call |build_page|, because it is
used in at least one place where that would be a mistake.

@<Declare action...@>=
procedure append_glue;
var s:small_number; {modifier of skip command}
begin s←cur_chr;
case s of
fil_code: cur_val←fil_glue;
fill_code: cur_val←fill_glue;
ss_code: cur_val←ss_glue;
fil_neg_code: cur_val←fil_neg_glue;
skip_code: scan_glue(false);
mskip_code: scan_glue(true);
end; {now |cur_val| points to the glue specification}
if s≥skip_code then
	begin decr(glue_ref_count(cur_val));
	if s>skip_code then subtype(tail)←mu_glue;

@ @<Declare act...@>=
procedure append_kern;
var s:quarterword; {|subtype| of the kern node}
begin s←cur_chr; scan_dimen(s=mu_glue,false,false);
tail_append(new_kern(cur_val)); subtype(tail)←s;

@ Many of the actions related to box-making are triggered by the appearance
of a right brace in the input. For example, when the user says `\.{\\hbox}
\.{to} \.{100pt\{$\langle\,\hbox{hlist}\,\rangle$\}}' in vertical mode,
the information about the box size (100pt, exactly) is put onto |save_stack|
with a level boundary word just above it, and |cur_group←hbox_group|;
\TeX\ enters restricted horizontal mode to process the hlist. The right
brace eventually causes |save_stack| to be restored to its former state,
at which time the information about the box size (100pt, exactly) is
available once again; a box is packaged and we leave restricted horizontal
mode, appending the new box to the current list of the enclosing mode
(in this case to the current list of vertical mode).

The next few sections of the program are therefore concerned with the
treatment of left and right curly braces.

@ If a left brace occurs in the middle of a page or paragraph, it simply
introduces a new level of grouping, and the matching right brace will not have
such a drastic effect. Such grouping affects neither the mode nor the
current list.

@<Cases of |main_control| that build...@>=
non_math(left_brace): new_save_level(simple_group);
any_mode(group_begin): new_save_level(semi_simple_group);
any_mode(group_end): if cur_group=semi_simple_group then unsave
	else off_save;

@ We have to deal with errors in which braces and such things are not
properly nested. Sometimes the user makes an error of commission by
inserting an extra symbol, but sometimes the user makes an error of omission.
\TeX\ can't always tell one from the other, so it makes a guess and tries
to avoid getting into a loop.

The |off_save| routine is called when the current group code is wrong. It tries
to insert something into the user's input that will help clean off
the top level.

@<Declare act...@>=
procedure off_save;
var p:pointer; {inserted token}
begin if cur_group=bottom_level then
	@<Drop current token and complain that it was unmatched@>
else	begin back_input; p←get_avail; link(temp_head)←p;
	print_nl("! Missing ");
	@<Prepare to insert a token that matches |cur_group|,
		and print what it is@>;
	print(" inserted"); ins_list(link(temp_head));
	help5("I've inserted something that you may have forgotten.")@/
	("(See the things marked <to be read again> above.)")@/
	("With luck, this will get me unwedged. But if you really")@/
	("didn't forget anything, try typing `2' now; then")@/
	("my insertion and my current dilemma will both disappear.");

@ A slight problem occurs here: What if we want to insert, say, `\.{\\groupend}'
in order to get unstuck, but the user has redefined this control sequence?
Two special unredefinable control sequences are included in |eqtb| so that
this difficulty is avoided.

@d special_group_end==special_control_sequence {permanent `\.{\\groupend}'}
@d special_right==special_control_sequence+1 {permanent `\.{\\right}'}

@<Initialize table entries...@>=
text(special_group_end)←"groupend"; eq_level(special_group_end)←level_one;
eq_type(special_group_end)←group_end; equiv(special_group_end)←0;@/
text(special_right)←"right"; eq_level(special_right)←level_one;
eq_type(special_right)←left_right; equiv(special_right)←right_noad;
@:group_end_}{\.{\\groupend} primitive@>
@:right_}{\.{\\right} primitive@>

@ At this point, |link(temp_head)=p|, a pointer to an empty one-word node.

@<Prepare to insert a token that matches |cur_group|...@>=
case cur_group of
semi_simple_group: begin info(p)←cs_token_flag+special_group_end;
@.Missing \\groupend inserted@>
math_delim_group: begin info(p)←math_delim_token+"$"; print_char("$");
@.Missing {\$} inserted@>
math_left_group: begin info(p)←cs_token_flag+special_right; link(p)←get_avail;
	p←link(p); info(p)←other_token+"."; print_esc("right.");
@.Missing \\right. inserted@>
othercases begin info(p)←right_brace_token+"}"; print_char("}");
@.Missing {\}} inserted@>

@ This particular case may be impossible, but can anybody prove that it
will never arise?

@<Drop current token and complain that it was unmatched@>=
begin print_nl("! Extra "); print_cmd_chr(cur_cmd,cur_chr);
@.Extra x@>
help1("Things are pretty mixed up, but I think the worst is over.");@/

@ The routine for a |right_brace| character branches into many subcases,
since a variety of things may happen, depending on |cur_group|. Some
types of groups are not supposed to be ended by a right brace; error
messages are given in hopes of pinpointing the problem. Most branches
of this routine will be filled in later, when we are ready to understand
them; meanwhile, we must prepare ourselves to deal with such errors.

@<Cases of |main_control| that build...@>=
any_mode(right_brace): handle_right_brace;

@ @<Declare the procedure called |handle_right_brace|@>=
procedure handle_right_brace;
label done,exit;
var p:pointer; {for short-term use}
begin case cur_group of
simple_group: unsave;
bottom_level: begin print_nl("! Too many }'s");
@.Too many \}'s@>
	help2("You've closed more groups than you opened.")@/
	("Such booboos are generally harmless, so keep going."); error;
semi_simple_group,math_delim_group,math_left_group: extra_right_brace;
@t\4@>@<Cases of |handle_right_brace| where a |right_brace| triggers
	a delayed action@>@;
othercases confusion("rightbrace")
@:confusion rightbrace}{\quad rightbrace@>

@ @<Declare act...@>=
procedure extra_right_brace;
begin print_nl("! Extra }, or forgotten ");
@.Extra \}, or forgotten x@>
case cur_group of
semi_simple_group: print_esc("groupend");
math_delim_group: print_char("$");
math_left_group: print_esc("right");
help5("I've deleted a group-closing symbol because it seems to be")@/
("spurious, as in `$x}$'. But perhaps the } is legitimate and")@/
("you forgot something else, as in `\hbox{$x}'. In such cases")@/
("the way to recover is to insert both the forgotten and the")@/
("deleted material, e.g., by typing `I$}'."); error;

@ Now let's turn to the question of how \.{\\hbox} is treated. We actually
need to consider also a slightly larger context, since constructions like
`\.{\\setbox3=}\penalty0\.{\\hbox...}' and `\.{\\lower3.8pt\\hbox...}' and
`\.{\\leaders\\hbox...}' are supposed to invoke quite different actions
after the box has been packaged. Conversely, constructions like
`\.{\\setbox3=}' can be followed by a variety of different kinds of
boxes, and we would like to encode such things in an efficient way.

In other words, there are two problems: To represent the context of a box,
and to represent its type.

The first problem is solved by putting a ``context code'' on the |save_stack|,
just below the two entries that give the dimensions produced by |scan_spec|.
The context code is either a (signed) shift amount, or it is a large
integer |≥box_flag|, where |box_flag=@t$2^{30}$@>|. Codes |box_flag| through
|box_flag+255| represent `\.{\\setbox0}' through `\.{\\setbox255}';
codes |box_flag+256| through |box_flag+511| represent `\.{\\global\\setbox0}'
through `\.{\\global\\setbox255}';
code |box_flag+512| represents `\.{\\shipout}'; and codes |box_flag+513|
through |box_flag+515| represent `\.{\\leaders}', `\.{\\cleaders}',
and `\.{\\xleaders}'.

The second problem is solved by giving the command code |make_box| to all
control sequences that produce a box, and by using the following |chr_code|
values to distinguish between them: |box_code|, |copy_code|, |last_box_code|,
|vsplit_code|, |vtop_code|, |vtop_code+vmode|, and |vtop_code+hmode|,
where the latter two are used denote \.{\\vbox} and \.{\\hbox}, respectively.

@d box_flag==@'10000000000 {context code for `\.{\\setbox0}'}
@d ship_out_flag==box_flag+512 {context code for `\.{\\shipout}'}
@d leader_flag==box_flag+513 {context code for `\.{\\leaders}'}
@d box_code=0 {|chr_code| for `\.{\\box}'}
@d copy_code=1 {|chr_code| for `\.{\\copy}'}
@d last_box_code=2 {|chr_code| for `\.{\\lastbox}'}
@d vsplit_code=3 {|chr_code| for `\.{\\vsplit}'}
@d vtop_code=4 {|chr_code| for `\.{\\vtop}'}

@<Put each...@>=
@!@:move_left_}{\.{\\moveleft} primitive@>
@!@:move_right_}{\.{\\moveright} primitive@>
@!@:raise_}{\.{\\raise} primitive@>
@!@:lower_}{\.{\\lower} primitive@>
@!@:box_}{\.{\\box} primitive@>
@!@:copy_}{\.{\\copy} primitive@>
@!@:last_box_}{\.{\\lastbox} primitive@>
@!@:vsplit_}{\.{\\vsplit} primitive@>
@!@:vtop_}{\.{\\vtop} primitive@>
@!@:vbox_}{\.{\\vbox} primitive@>
@!@:hbox_}{\.{\\hbox} primitive@>
primitive("shipout",leader_ship,a_leaders-1); {|ship_out_flag=leader_flag-1|}
@!@:ship_out_}{\.{\\shipout} primitive@>
@!@:leaders_}{\.{\\leaders} primitive@>
@!@:c_leaders_}{\.{\\cleaders} primitive@>
@!@:x_leaders_}{\.{\\xleaders} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
hmove: if chr_code=1 then print_esc("moveleft")@+else print_esc("moveright");
vmove: if chr_code=1 then print_esc("raise")@+else print_esc("lower");
make_box: case chr_code of
	box_code: print_esc("box");
	copy_code: print_esc("copy");
	last_box_code: print_esc("lastbox");
	vsplit_code: print_esc("vsplit");
	vtop_code: print_esc("vtop");
	vtop_code+vmode: print_esc("vbox");
	othercases print_esc("hbox");
leader_ship: if chr_code=a_leaders then print_esc("leaders")
	else if chr_code=c_leaders then print_esc("cleaders")
	else if chr_code=x_leaders then print_esc("xleaders")
	else print_esc("shipout");

@ Constructions that require a box are started by placing a context code
on |save_stack| and calling |scan_box|. The |scan_box| routine verifies
that a |make_box| command comes next and then it calls |begin_box|.

@<Cases of |main_control| that build...@>=
vmode+hmove,hmode+vmove,mmode+vmove: begin t←cur_chr;
	if t=0 then saved(0)←cur_val@+else saved(0)←-cur_val;
any_mode(leader_ship): begin saved(0)←leader_flag-a_leaders+cur_chr; scan_box;
any_mode(make_box): begin saved(0)←0; begin_box;

@ You can't say `\.{\\lower}' in vertical mode, nor can you `\.{\\moveright}'
in horizontal or math mode.


@ The global variable |cur_box| will point to a newly-made box. If the box
is absent, we will have |cur_box=null|. Otherwise we will have
|type(cur_box)=hlist_node| or |vlist_node| or |rule_node|; the |rule_node|
case can occur only with leaders.

@!cur_box:pointer; {box to be placed into its context}

@ The |box_end| procedure does the right thing with |cur_box|, if
|saved(0)| represents the context as explained above.

@<Declare act...@>=
procedure box_end;
var p:pointer; {|inner_noad| for new box in math mode}
begin if saved(0)<box_flag then @<Append box |cur_box| to the current list,
		shifted by |saved(0)|@>
else if saved(0)<ship_out_flag then @<Store \(c)|cur_box| in a box register@>
else if cur_box≠null then
	if saved(0)>ship_out_flag then @<Append a new leader node that
			uses |cur_box|@>
	else ship_out(cur_box);

@ @<Append box |cur_box| to the current...@>=
begin if cur_box≠null then
	begin shift_amount(cur_box)←saved(0);
	if abs(mode)=vmode then
		begin append_to_vlist(cur_box);
		if mode>0 then build_page;
	else	begin if abs(mode)=mmode then
			begin p←new_noad; math_type(operand(p))←sub_box;
			info(operand(p))←cur_box; cur_box←p;
		link(tail)←cur_box; tail←cur_box;

@ @<Store \(c)|cur_box| in a box register@>=
if saved(0)<box_flag+256 then
else geq_define(box_base-box_flag-256+saved(0),box_ref,cur_box)

@ @<Append a new leader node ...@>=
begin @<Get the next non-blank non-call...@>;
if ((cur_cmd=hskip)∧(abs(mode)≠vmode))∨@|
	 ((cur_cmd=vskip)∧(abs(mode)=vmode)) then
	begin append_glue; subtype(tail)←saved(0)-(leader_flag-a_leaders);
else	begin print_nl("! \leaders not followed by proper glue");
@:leaders_}{\.{\\leaders not followed by...}@>
	help3("You should say `\leaders <box or rule><hskip or vskip>'.")@/
	("I found the <box or rule>, but there's no suitable")@/
	("<hskip or vskip>, so I'm ignoring these leaders."); error;

@ Now that we can see what eventually happens to boxes, we can consider
the first steps in their creation. The |begin_box| routine is called when
|saved(0)| is a context specification and |cur_chr| specifies the type of
box desired.

@<Declare act...@>=
procedure begin_box;
label exit;
var p:pointer; {runs through the current list}
@!k:halfword; {0 or |vmode| or |hmode|}
@!n:eight_bits; {a box number}
begin case cur_chr of
box_code: begin scan_eight_bit_int; cur_box←box(cur_val);
	box(cur_val)←null; {the box becomes absent, at the same level}
copy_code: begin scan_eight_bit_int; cur_box←copy_node_list(box(cur_val));
last_box_code: @<If the current list ends with a box node, delete it from
	the list and make |cur_box| point to it; otherwise set |cur_box←null|@>;
vsplit_code: @<Split off part of a vertical box, make |cur_box| point to it@>;
othercases @<Initiate the construction of an hbox or vbox, then |return|@>
box_end; {in simple cases, we use the box immediately}

@ Note that the condition |¬ is_char_node(tail)| implies that |head≠tail|,
since |head| is a one-word node.

@<If the current list ends with a box node, delete it...@>=
begin cur_box←null;
if (mode=vmode)∧(head=tail) then
	@<Apologize for inability to do \.{\\lastbox} now@>
else	begin if not is_char_node(tail) then
		if (type(tail)=hlist_node)∨(type(tail)=vlist_node) then
			begin p←head; cur_box←tail;
			while link(p)≠tail do p←link(p);
			tail←p; link(p)←null;

@ @<Apologize for inability to do \.{\\lastbox} now@>=
begin you_cant;
help2("Sorry...I'm usually unable to take things from the current")@/
	("page. This \lastbox will therefore be absent."); error;

@ Here we deal with things like `\.{\\vsplit 13 to 100pt}'.

@<Split off part of a vertical box, make |cur_box| point to it@>=
begin scan_eight_bit_int; n←cur_val;
if not scan_keyword("to") then
	begin print_nl("! Missing `to' inserted");
@.Missing `to' inserted@>
	help2("I'm working on `\vsplit<box number> to <dimen>';")@/
	("will look for the <dimen> next."); error;

@ Here is where we enter restricted horizontal mode or internal vertical
mode, in order to make a box.

@<Initiate the construction of an hbox or vbox, then |return|@>=
begin k←cur_chr-vtop_code;
incr(save_ptr); scan_spec;
if k=hmode then new_save_level(hbox_group)
else if k=vmode then new_save_level(vbox_group)
else	begin new_save_level(vtop_group); k←vmode;
push_nest; mode←-k;
if k=vmode then prev_depth←ignore_depth else space_factor←1000;

@ @<Declare act...@>=
procedure scan_box; {the next input should specify a box or perhaps a rule}
begin @<Get the next non-blank non-call...@>;
if cur_cmd=make_box then begin_box
else if (saved(0)≥leader_flag)∧((cur_cmd=hrule)∨(cur_cmd=vrule)) then
	begin cur_box←scan_rule_spec; box_end;
else	begin@t@>@;@/
	print_nl("! A box specification was supposed to be here");@/
@.A box specification was...@>
	help3("I was expecting to see \hbox or \vbox or \copy or \box or")@/
	("\vsplit or \vtop. So you might find something missing in")@/
	("your output. But keep trying; you can fix this later."); back_error;

@ When the right brace occurs at the end of an \.{\\hbox} or \.{\\vbox} or
\.{\\vtop} construction, the |package| routine comes into action. We might
also have to finish a paragraph that hasn't ended.

@<Cases of |handle...@>=
hbox_group: package(0);
vbox_group: begin end_graf; package(0);
vtop_group: begin end_graf; package(vtop_code);

@ @<Declare action...@>=
procedure package(@!c:small_number);
var h:scaled; {height of box}
@!f:internal_font_number; {font in a |char_node|}
@!p:pointer; {first node in a box}
begin unsave; save_ptr←save_ptr-3;
if mode=-hmode then cur_box←hpack(link(head),saved(2),saved(1))
else	begin cur_box←vpack(link(head),saved(2),saved(1));
	if c=vtop_code then @<Readjust the height and depth of |cur_box|,
		for \.{\\vtop}@>;
pop_nest; box_end;

@ The height of a `\.{\\vtop}' box depends on the first item on its list.

@<Readjust the height...@>=
begin h←0; p←list_ptr(cur_box);
if p≠null then
	if (type(p)=hlist_node)∨(type(p)=vlist_node)∨@|(type(p)=rule_node) then
depth(cur_box)←depth(cur_box)-h+height(cur_box); height(cur_box)←h;

@ A paragraph begins when horizontal-mode material occurs in vertical mode,
or when the paragraph is explicitly started by `\.{\\indent}' or

@<Put each...@>=
@!@:indent_}{\.{\\indent} primitive@>
@!@:no_indent_}{\.{\\noindent} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
start_par: if chr_code=0 then print_esc("noindent")@+ else print_esc("indent");

@ @<Cases of |main_control| that build...@>=
vmode+start_par: new_graf(cur_chr>0);
	back_input; new_graf(true);

@ @<Forbidden...@>= mmode+start_par,

@ @<Declare act...@>=
procedure new_graf(@!indented:boolean);
begin already←0;
if (mode=vmode)∨(head≠tail) then
push_nest; mode←hmode; space_factor←1000;
if indented then
	begin tail←new_null_box; link(head)←tail; width(tail)←par_indent;
if every_par≠null then begin_token_list(every_par,every_par_text);
if nest_ptr=1 then build_page; {put |par_skip| glue on current page}

@ @<Cases of |main_control| that build...@>=
hmode+start_par: indent_in_hmode;

@ @<Declare act...@>=
procedure indent_in_hmode;
begin if cur_chr>0 then {\.{\\indent}}
	begin tail_append(new_null_box);

@ A paragraph ends when a |par_end| command is sensed, or when we are in
horizontal mode when reaching the right brace of vertical-mode routines
like \.{\\vbox}, \.{\\insert}, or \.{\\output}.

@<Cases of |main_control| that build...@>=
hmode+par_end: begin end_graf; {this takes us to the enclosing mode}
	if mode=vmode then build_page;
hmode+stop,hmode+vskip: begin back_input;
	cur_tok←par_token; cs_ptr←par_loc;
	cur_cmd←eq_type(cs_ptr); cur_chr←equiv(cs_ptr); {insert `\.{\\par}'}
	goto reswitch;
@ Here is where we clear the parameters that are supposed to revert to their
default values after every paragraph.

@<Declare act...@>=
procedure end_graf;
begin if mode=hmode then
	begin if head=tail then pop_nest {null paragraphs are ignored}
	else line_break(widow_penalty);
	if looseness≠0 then eq_word_define(int_base+looseness_code,0);
	if hanging_indent≠0 then eq_word_define(dimen_base+hanging_indent_code,0);
	if par_shape_ptr≠null then eq_define(par_shape_loc,shape_ref,null);

@ Insertion and adjustment and mark nodes are constructed by the following
pieces of the program.

@<Cases of |main_control| that build...@>=
any_mode(insert),hmode+vadjust: begin_insert_or_adjust;
any_mode(mark): make_mark;

@ @<Forbidden...@>=

@ @<Declare act...@>=
procedure begin_insert_or_adjust;
begin if mode<-vmode then report_illegal_case
else	begin if cur_cmd=vadjust then cur_val←255
	else	begin scan_eight_bit_int;
		if cur_val=255 then
			begin print_nl("! You can't \insert 255");
@.You can't \\insert 255@>
			help1("I'm changing to \insert0; box 255 is special.");
			error; cur_val←0;
	saved(0)←cur_val; incr(save_ptr);
	new_save_level(insert_group); scan_left_brace;
	push_nest; mode←-vmode; prev_depth←ignore_depth;

@ @<Cases of |handle...@>=
insert_group: begin end_graf; unsave; decr(save_ptr);
	{now |saved(0)| is the insertion number, or 255 for |vadjust|}
	if head≠tail then
		begin p←vpack(link(head),natural); pop_nest;
		if saved(0)<255 then
			begin tail_append(get_node(ins_node_size));
			type(tail)←ins_node; subtype(tail)←qi(saved(0));
			width(tail)←height(p)+depth(p); ins_ptr(tail)←list_ptr(p);
		else	begin tail_append(get_node(small_node_size));
			subtype(tail)←0; {the |subtype| is not used}
	@<Scan an optional space@>;
output_group: @<Resume the page builder...@>;

@ @<Declare act...@>=
procedure make_mark;
var p:pointer; {new node}
begin if mode<-vmode then report_illegal_case
else	begin p←scan_toks(false,true); p←get_node(small_node_size);
	type(p)←mark_node; subtype(p)←0; {the |subtype| is not used}
	mark_ptr(p)←cur_val; link(tail)←p; tail←p;

@ The |space_factor| or |prev_depth| settings are changed when a |set_aux|
command is sensed.

@<Cases of |main_control| that build...@>=

@ @<Declare act...@>=
procedure change_aux;
begin if cur_chr≠abs(mode) then report_illegal_case
else if cur_chr=vmode then
	begin scan_dimen(false,false,false); prev_depth←cur_val;
else	begin scan_int;
	if (cur_val<0)∨(cur_val>65536) then
		begin print_nl("! Bad space factor");
@.Bad space factor@>
		help1("I allow only values in the range 1..65536 here.");
	else space_factor←cur_val;

@ Penalty nodes get into a list via the |break_penalty| command.

@<Cases of |main_control| that build...@>=
any_mode(break_penalty): append_penalty;

@ @<Declare action...@>=
procedure append_penalty;
begin scan_int; tail_append(new_penalty(cur_val));
if mode=vmode then build_page;

@ The |unskip| command removes glue if it appears at the tail of the
current list, using a brute-force linear scan. Like \.{\\lastbox},
this command is not allowed in vertical mode (except internal vertical
mode), since the current list in vertical mode is sent to the page builder.
But if we happen to be able to implement it in vertical mode, we do.

@<Cases of |main_control| that build...@>=
any_mode(unskip): delete_skip;

@ @<Declare action...@>=
procedure delete_skip;
var p:pointer; {runs through the current list}
begin if (mode=vmode)∧(tail=head) then
	@<Apologize for inability to do \.{\\unskip} now@>
else	begin if not is_char_node(tail) then if type(tail)=glue_node then
		begin p←head;
		while link(p)≠tail do p←link(p);
		link(p)←null; flush_node_list(tail); tail←p;

@ @<Apologize for inability to do \.{\\unskip} now@>=
begin you_cant;
help2("Sorry...I'm usually unable to take things from the current")@/
	("page. Try `I\vskip\minus\lastskip' instead."); error;

@ The |unbox| command unwraps one of the 256 current boxes.

@<Cases of |main_control| that build...@>=
vmode+unbox,hmode+unbox: unpackage;

@ @<Declare act...@>=
procedure unpackage;
var p:pointer; {the box}
begin scan_eight_bit_int; p←box(cur_val);
if p≠null then
	if ((abs(mode)=vmode)∧(type(p)=vlist_node))∨@|
		 ((abs(mode)=hmode)∧(type(p)=hlist_node)) then
		begin link(tail)←list_ptr(p); box(cur_val)←null;
		while link(tail)≠null do tail←link(tail);
	else	begin print_nl("! Incompatible list can't be unboxed");
@.Incompatible list...@>
		help2("Sorry, Pandora. (You sneaky devil.)")@/
		("I refuse to unbox an \hbox in vertical mode or vice versa.");@/

@ @<Forbidden...@>=mmode+unbox,vmode+ital_corr,

@ Italic corrections are converted to kern nodes when the |ital_corr| command
follows a character. In math mode the same effect is achieved by appending
a kern of zero here, since italic corrections are supplied later.

@<Cases of |main_control| that build...@>=
hmode+ital_corr: append_italic_correction;
mmode+ital_corr: tail_append(new_kern(0));

@ @<Declare act...@>=
procedure append_italic_correction;
var f:internal_font_number; {the font in the |char_node|}
begin if is_char_node(tail)∧(tail≠head) then
	begin f←font(tail);

@ Discretionary nodes are easy in the common case `\.{\\-}', but in the
general case we must process three braces full of items.

@<Put each...@>=
@!@:Single-character primitives -}{\quad\.{\\-}@>
@!@:discretionary_}{\.{\\discretionary} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
discretionary: if chr_code=1 then
	print_esc("-")@+else print_esc("discretionary");

@ @<Cases of |main_control| that build...@>=
hmode+discretionary,mmode+discretionary: append_discretionary;

@ @<Declare act...@>=
procedure append_discretionary;
begin tail_append(new_disc);
if cur_chr=1 then pre_break(tail)←new_character(cur_font,"-")
else	begin incr(save_ptr); saved(-1)←0; scan_left_brace;
	new_save_level(disc_group); push_nest; mode←-hmode;

@ The three discretionary lists are constructed somewhat as if they were
hboxes. A subroutine called |build_discretionary| handles the transitions.
(This is sort of fun.)

@<Cases of |handle...@>=
disc_group: build_discretionary;

@ @<Declare act...@>=
procedure build_discretionary;
label done,exit;
var p,@!q:pointer; {for link manipulation}
@!n:integer; {length of discretionary list}
begin unsave;
@<Prune the current list, if necessary, until it contains only
	|char_node|, |kern_node|, and |ligature_node| items; set |n| to
	the length of the list, and set |q| to the list's tail@>;
p←link(head); pop_nest;
case saved(-1) of
2:@<Attach list |p| to the current list, and record its length;
	then finish up and |return|@>;
end; {there are no other cases}
incr(saved(-1)); scan_left_brace; new_save_level(disc_group);
push_nest; mode←-hmode;

@ @<Attach list |p| to the current...@>=
begin link(tail)←p;
if n<128 then replace_count(tail)←n
else	begin print_nl("! Discretionary list is too long");
@.Discretionary list is too long@>
	help2("Wow---I never thought anybody would tweak me here.")@/
	("You can't seriously need such a huge discretionary list?");
tail←q;	decr(save_ptr); return;

@ During this loop, |p=link(q)| and there are |n| items preceding |p|.

@<Prune the current list, if necessary...@>=
q←head; p←link(q); n←0;
while p≠null do
	begin if not is_char_node(p) then
		if (type(p)≠kern_node)∧(type(p)≠ligature_node) then
			begin print_nl("! Improper discretionary list");
@.Improper discretionary list@>
			begin_diagnostic; show_box(p); end_diagnostic;
			help1("Discretionary lists must contain only text characters.");@/
			error; flush_node_list(p); link(q)←null; goto done;
	q←p; p←link(q); incr(n);

@ The user is allowed to say `\.{\\the}' or `\.{\\minus}' or `\.{\\number}',
in horizontal mode or math mode, to get the text equivalent of
numerical quantities.

@<Cases of |main_control| that build...@>=
hmode+the,mmode+the: begin link(garbage)←the_toks(cur_chr>0);
hmode+number,mmode+number: begin link(garbage)←num_toks;

@ @<Declare act...@>=
procedure ins_the;
begin print_nl("! Missing \the inserted");
help3("Please don't say \count or \dimen or \skip in the midst of")@/
	("a paragraph or formula. I'm going to assume that you")@/
	("meant to say `\the\count' or `\the\dimen' or `\the\skip'.");
back_error; link(garbage)←the_toks(false); ins_list(link(temp_head));

@ We need only one more thing to complete the horizontal mode routines, namely
the \.{\\accent} primitive.

@<Cases of |main_control| that build...@>=
hmode+accent: make_accent;

@ @<Forbidden...@>=mmode+accent,

@ The positioning of accents is straightforward but tedious. Given an accent
of width |a|, designed for characters of height |x| and slant |s|;
and given a character of width |w|, height |h|, and slant |t|: We will shift
the accent down by |x-h|, and we will insert kern nodes that have the effect of
centering the accent over the character and shifting the accent to the
right by $\delta={1\over2}(w-a)+h\cdot t-x\cdot s$.  If either character is
absent from the font, we will simply use the other, without shifting.

@<Declare act...@>=
procedure make_accent;
var s,@!t: real; {amount of slant}
@!p,@!q: pointer; {character and box nodes}
@!f:internal_font_number; {relevant font}
@!a,@!h,@!x,@!w,@!delta:scaled; {heights and widths, as explained above}
@!i:four_quarters; {character information}
begin scan_char_num; f←cur_font; p←new_character(f,cur_val);
if p≠null then
	begin x←x_height(f); s←slant(f)/65536.0;
	@<Create a character node |q| for the next character,
		but set |q←null| if problems arise@>;
	if q≠null then @<Append the accent with appropriate kerns,
			then set |p←q|@>;
	link(tail)←p; tail←p;

@ @<Create a character node |q| for the next...@>=
q←null; f←cur_font;
if (cur_cmd=letter)∨(cur_cmd=other_char) then q←new_character(f,cur_chr)
else if cur_cmd=char_num then
	begin scan_char_num; q←new_character(f,cur_val);
else back_input

@ @<Append the accent with appropriate kerns...@>=
begin t←slant(f)/65536.0;
w←char_width(f)(i); h←char_height(f)(height_depth(i));
if h≠x then {the accent must be shifted up or down}
	begin p←hpack(p,natural); shift_amount(p)←x-h;
link(tail)←new_kern(delta); link(link(tail))←p;
link(p)←new_kern(-a-delta); tail←link(p); p←q;

@ When `\.{\\cr}' or `\.{\\span}' or a tab mark comes through the scanner
into |main_control|, it might be that the user has foolishly inserted
one of them into something that has nothing to do with alignment. But it is
far more likely that a left brace or right brace has been omitted, since
|get_next| takes actions appropriate to alignment only when `\.{\\cr}'
or `\.{\\span}' or tab marks occur with |align_state=0|. The following
program attempts to make an appropriate recovery.

@<Cases of |main_control| that build...@>=
any_mode(car_ret), any_mode(tab_mark): begin align_error; goto reswitch;
any_mode(no_align): no_align_error;
any_mode(omit): omit_error;

@ @<Declare act...@>=
procedure align_error;
begin if abs(align_state)>2 then
	@<Express consternation over the fact that no alignment is in progress@>
else	begin back_input;
	if align_state<0 then
		begin print_nl("! Missing { inserted");
@.Missing {\{} inserted@>
		cur_cmd←left_brace; incr(align_state);
	else	begin print_nl("! Missing } inserted");
@.Missing {\}} inserted@>
		cur_cmd←right_brace; decr(align_state);
	help3("I've put in what seems to be necessary to fix")@/
		("the current column of the current alignment.")@/
		("Try to go on, since this might almost work."); error;

@ @<Express consternation...@>=
begin print_nl("! There's no \halign or \valign going on");
@.There's no \\halign...going on@>
help5("I can't figure out why you would want to use a tab mark or")@/
	("\cr or \span just now. Perhaps you've inserted a right")@/
	("brace that ended a previous alignment prematurely?")@/
	("If so, you're probably due for more error messages, and you")@/
	("might try typing `S' now just to see what is salvageable.");
error; cur_cmd←relax;

@ The help messages here contain a little white lie, since \.{\\noalign}
and \.{\\omit} are allowed also after `\.{\\noalign\{...\}}'.

@<Declare act...@>=
procedure no_align_error;
begin print_nl("! Misplaced \noalign");
@.Misplaced \\noalign@>
help2("I expect to see \noalign only after the \cr of")@/
	("an alignment. Proceed, and I'll ignore this case."); error;
procedure omit_error;
begin print_nl("! Misplaced \omit");
@.Misplaced \\omit@>
help2("I expect to see \omit only after tab marks or the \cr of")@/
	("an alignment. Proceed, and I'll ignore this case."); error;

@ @<Forbidden...@>=vmode+valign,hmode+halign,mmode+valign,

@ We've now covered most of the abuses of \.{\\halign} and \.{\\valign}.
Let's take a look at what happens when they are used correctly.

@<Cases of |main_control| that build...@>=
mmode+halign: if privileged then if cur_group=math_delim_group then init_align
	else off_save;
vmode+endv,hmode+endv: do_endv;

@ An |align_group| code is supposed to remain on the |save_stack|
during an entire alignment, until |fin_align| removes it.

@<Declare act...@>=
procedure do_endv;
begin if cur_group=align_group then
	begin end_graf; unsave; new_save_level(align_group);
	if fin_col then fin_row;
else off_save;

@ @<Cases of |handle_right_brace|...@>=
no_align_group: begin end_graf; unsave; align_peek;
@* \[47] Building math lists.
The routines that \TeX\ uses to create mlists are similar to those we have
just seen for the generation of hlists and vlists. But it is necessary to
make ``noads'' as well as nodes, so the reader should review the
discussion of math mode data structures before trying to make sense out of
the following program.

First let's consider how we get into math mode. This happens when we are
in horizontal mode and a `\.\$' (i.e., a |math_delim| character) is
scanned; however, we won't enter math mode unless certain members of the
critical font families have been defined. We must also check to see
whether this `\.\$' is immediately followed by another, in case display
math mode is called for.

@<Cases of |main_control| that build...@>=

@ @<Declare act...@>=
procedure init_math;
label reswitch,found,not_found,done,exit;
var w:scaled; {new or partial |pre_display_size|}
@!l:scaled; {new |display_width|}
@!s:scaled; {new |display_indent|}
@!p:pointer; {current node when calculating |pre_display_size|}
@!q:pointer; {glue specification when calculating |pre_display_size|}
@!f:internal_font_number; {font in current |char_node|}
@!n:integer; {scope of paragraph shape specification}
@!v:scaled; {|w| plus possible glue amount}
@!d:scaled; {increment to |v|}
begin @<Check that the necessary fonts for math symbols are present;
	if not, |return|@>;
if (cur_cmd=math_delim)∧(mode>0) then @<Go into display math mode@>
else	begin back_input; @<Go into ordinary math mode@>;

@ @<Go into ordinary math mode@>=
begin push_nest; mode←-mmode; incompleat_noad←null;
new_save_level(math_delim_group); eq_define(cur_fam_loc,data,1);

@ @<Check that the necessary fonts...@>=@t\2@>@/
if (font_params[fam_fnt(2+text_size)]<total_mathsy_params)∨@|
	 (font_params[fam_fnt(3+text_size)]<total_mathex_params) then
	begin print_nl("! Math mode disallowed: Insufficient symbol fonts");@/
@.Math mode disallowed...@>
	help4("Sorry, but I can't go into math mode unless \textfont 2 and")@/
		("\scriptfont 2 and \scriptscriptfont 2 have all the texinfo")@/
		("needed in math symbol fonts, and unless \textfont 3 has")@/
		("all the texinfo needed in a math extension font.");
	error; return;

@ We get into ordinary math mode from display math mode when `\.{\\eqno}' or
`\.{\\leqno}' appears. In such cases |cur_chr| will be 0 or@@1, respectively;
the value of |cur_chr| is placed onto |save_stack| for safe keeping.

@<Cases of |main_control| that build...@>=
mmode+eq_no: if privileged then start_eq_no;

@ @<Put each...@>=
@!@:eq_no_}{\.{\\eqno} primitive@>
@!@:leq_no_}{\.{\\leqno} primitive@>

@ @<Declare act...@>=
procedure start_eq_no;
begin if cur_group=math_delim_group then
	begin saved(0)←cur_chr; incr(save_ptr);
	@<Go into ordinary math mode@>;
else off_save;

@ @<Cases of |print_cmd_chr|...@>=
eq_no:if chr_code=1 then print_esc("leqno")@+else print_esc("eqno");

@ @<Forbidden...@>=vmode+eq_no,hmode+eq_no,

@ When we enter display math mode, we need to call |line_break| to
process the partial paragraph that has just been interrupted by the
display. Then we can set the proper values of |display_width| and
|display_indent| and |pre_display_size|.

@<Go into display math mode@>=
begin if head=tail then {`\.{\\noindent\$\$}' or `\.{\$\${ }\$\$}'}
	begin pop_nest; w←-max_dimen;
else	begin line_break(display_widow_penalty);@/
	@<Calculate the natural width, |w|, by which the characters of the
		final line extend to the right of the reference point,
		plus two ems; or set |w←max_dimen| if the non-blank information
		on that line is affected by stretching or shrinking@>;
{Now we are in vertical mode, working on the list that will contain the display}
@<Calculate the length, |l|, and the shift amount, |s|, of the display lines@>;
push_nest; mode←mmode; incompleat_noad←null; new_save_level(math_delim_group);
if mode=vmode then build_page;

@ @<Calculate the natural width, |w|, by which...@>=
v←shift_amount(just_box)+2*quad(cur_font); w←-max_dimen;
while p≠null do
	begin @<Let |d| be the natural width of node |p|;
		if the node is ``visible,'' |goto found|;
		if the node is glue that stretches or shrinks, set |v←max_dimen|@>;
	if v<max_dimen then v←v+d;
	goto not_found;
	found: if v<max_dimen then
		begin v←v+d; w←v;
	else	begin w←max_dimen; goto done;
	not_found: p←link(p);

@ @<Let |d| be the natural width of node |p|...@>=
reswitch: if is_char_node(p) then
	begin f←font(p); d←char_width(f)(char_info(f)(character(p)));
	goto found;
case type(p) of
hlist_node,vlist_node,rule_node: begin d←width(p); goto found;
ligature_node:@<Make node |p| look like a |char_node|...@>;
kern_node,math_node: d←width(p);
glue_node:@<Let |d| be the natural width of this glue; if stretching
	or shrinking, set |v←max_dimen|; |goto found| in the case of leaders@>;
whatsit_node: @<Let |d| be the width of the whatsit |p|@>;
othercases d←0

@ We need to be careful that |w|, |v|, and |d| do not depend on any |glue_set|
values, since such values are subject to system-dependent rounding.
System-dependent numbers are not allowed to infiltrate parameters like
|pre_display_size|, since \TeX82 is supposed to make the same decisions on all

@<Let |d| be the natural width of this glue...@>=
begin q←glue_ptr(p); d←width(q);
if ((glue_sign(just_box)=stretching)∧
	 (glue_order(just_box)=shrink_order(q))) then  
if subtype(p)≥a_leaders then goto found;

@ A displayed equation is considered to be three lines long, so we
calculate the length and offset of line number |already+2|.

@<Calculate the length, |l|, ...@>=
if par_shape_ptr=null then
	if (hanging_indent≠0)∧@|
		(already+1<-hang_after)) then
		begin l←hsize-abs(hanging_indent);
		if hanging_indent>0 then s←hanging_indent@+else s←0;
	else	begin l←hsize; s←0;
else	begin n←info(par_shape_ptr);
	if already+2≥n then p←par_shape_ptr+2*n
	else p←par_shape_ptr+2*(already+2);
	s←mem[p-1].sc; l←mem[p].sc;

@ Subformulas of math formulas cause a new level of math mode to be entered,
on the semantic nest as well as the save stack. These subformulas arise in
several ways: (1)@@A left brace by itself indicates the beginning of a
subformula that will be put into a box, thereby freezing its glue and
preventing line breaks. (2)@@A subscript or superscript is treated as a
subformula if it is not a single character; the same applies to
the operand of things like \.{\\underline}. (3)@@The \.{\\left} primitive
initiates a subformula that will be terminated by a matching \.{\\right}.
The group codes placed on |save_stack| in these three cases are
|math_group|, |math_group|, and |math_left_group|, respectively.

Here is the code that handles case (1); the other cases are not quite as
trivial, so we shall consider them later.

@<Cases of |main_control| that build...@>=
mmode+left_brace: begin tail_append(new_noad);
	back_input; scan_math(operand(tail));

@ Recall that the |operand|, |subscr|, and |supscr| fields in a noad are
broken down into subfields called |math_type| and either |info| or
|(fam,character)|. The job of |scan_math| is to figure out what to place
in one of these principal fields; it looks at the subformula that
comes next in the input, and places an encoding of that subformula
into a given word of |mem|.

@d var_code==@'70000 {math code meaning ``use the current family''}

@<Declare act...@>=
procedure scan_math(@!p:pointer);
label exit;
var c:integer; {math character code}
begin @<Get the next non-blank non-call...@>;
case cur_cmd of
letter,other_char: c←math_code(cur_chr);
char_num: begin scan_char_num; c←cur_val+var_code;
math_char_num: begin scan_fifteen_bit_int; c←cur_val;
delim_num: begin scan_twenty_seven_bit_int; c←cur_val div @'10000;
othercases @<Scan a subformula enclosed in braces and |return|@>
math_type(p)←math_char; character(p)←c mod 256;
if c≥var_code then fam(p)←cur_fam
else fam(p)←(c div 256) mod 16;

@ The pointer |p| is placed on |save_stack| while a complex subformula
is being scanned.

@<Scan a subformula...@>=
begin back_input; scan_left_brace;@/
saved(0)←p; incr(save_ptr); new_save_level(math_group);@/
push_nest; incompleat_noad←null;

@ The simplest math formula is, of course, `\.{\${ }\$}', when no noads are
gen\-era\-ted. The next simplest cases involve a single character, e.g.,
`\.{\$x\$}'. Even though such cases may not seem to be very interesting,
the reader can perhaps understand how happy the author was when `\.{\$x\$}'
was first properly typeset by \TeX. The code in this module was used.
@^Knuth, Donald Ervin@>

@<Cases of |main_control| that build...@>=
mmode+letter,mmode+other_char: set_math_char(math_code(cur_chr));
mmode+char_num: begin scan_char_num; set_math_char(cur_val+var_code);
mmode+math_char_num: begin scan_fifteen_bit_int; set_math_char(cur_val);
mmode+delim_num: begin scan_twenty_seven_bit_int;
	set_math_char(cur_val div @'10000);

@ The |set_math_char| procedure creates a new noad appropriate to a given
math code, and appends it to the current mlist.

@<Declare act...@>=
procedure set_math_char(@!c:integer);
var p:pointer; {the new noad}
begin p←new_noad; math_type(operand(p))←math_char;
character(operand(p))←c mod 256;
if c≥var_code then
	begin fam(operand(p))←cur_fam; type(p)←ord_noad;
else	begin fam(operand(p))←(c div 256) mod 16;
	type(p)←ord_noad+(c div @'10000);
link(tail)←p; tail←p;

@ Primitive math operators like \.{\\mathop} and \.{\\underline} are given
the command code |math_comp|, supplemented by the noad type that they

@<Put each...@>=
@!@:math_ord_}{\.{\\mathord} primitive@>
@!@:math_op_}{\.{\\mathop} primitive@>
@!@:math_bin_}{\.{\\mathbin} primitive@>
@!@:math_rel_}{\.{\\mathrel} primitive@>
@!@:math_open_}{\.{\\mathopen} primitive@>
@!@:math_close_}{\.{\\mathclose} primitive@>
@!@:math_punct_}{\.{\\mathpunct} primitive@>
@!@:underline_}{\.{\\underline} primitive@>
@!@:overline_}{\.{\\overline} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
math_comp: case chr_code of
	ord_noad: print_esc("mathord");
	op_noad: print_esc("mathop");
	bin_noad: print_esc("mathbin");
	rel_noad: print_esc("mathrel");
	open_noad: print_esc("mathopen");
	close_noad: print_esc("mathclose");
	punct_noad: print_esc("mathpunct");
	under_noad: print_esc("underline");
	othercases print_esc("overline")

@ @<Cases of |main_control| that build...@>=
mmode+math_comp: begin tail_append(new_noad);
	type(tail)←cur_chr; scan_math(operand(tail));
mmode+limit_switch: math_limit_switch;

@ @<Declare act...@>=
procedure math_limit_switch;
label exit;
begin if head≠tail then if type(tail)=op_noad then
	begin subtype(tail)←switched-subtype(tail); return;
print_nl("! \limitswitch must follow a math operator");
@:limit_switch_}{\.{\\limitswitch must follow...}@>
help1("I'm ignoring this misplaced \limitswitch.");

@ Delimiter fields of noads are filled in by the |scan_delimiter| routine.
The first parameter of this procedure is the |mem| address where the
delimiter is to be placed; the second tells if this delimiter follows
\.{\\radical} or not.

@<Declare act...@>=
procedure scan_delimiter(@!p:pointer;@!r:boolean);
begin if r then scan_twenty_seven_bit_int
else	begin @<Get the next non-blank non-call...@>;
	case cur_cmd of
	letter,other_char: cur_val←del_code(cur_chr);
	delim_num: scan_twenty_seven_bit_int;
	othercases cur_val←-1
if cur_val<0 then @<Report that an invalid delimiter code is being changed
	 to null; set@@|cur_val←0|@>;
small_fam(p)←(cur_val div @'4000000) mod 16;
small_char(p)←(cur_val div @'10000) mod 256;
large_fam(p)←(cur_val div 256) mod 16;
large_char(p)←cur_val mod 256;

@ @<Report that an invalid delimiter...@>=
begin print_nl("! Improper delimiter, changed to null");
@.Improper delimiter...@>
help6("I was expecting to see something like `(' or `\{' or")@/
	("`\}' here. If you typed, e.g., `{' instead of `\{', you")@/
	("should probably delete the `{' by typing `1' now, so that")@/
	("braces don't get unbalanced. Otherwise just proceed.")@/
	("Acceptable delimiters are characters whose \delcode is")@/
	("nonnegative, or you can use `\delimiter <delimiter code>'.");
back_error; cur_val←0;

@ @<Cases of |main_control| that build...@>=

@ @<Declare act...@>=
procedure math_radical;
begin tail_append(get_node(radical_noad_size));
type(tail)←radical_noad; subtype(tail)←normal;
scan_delimiter(left_delimiter(tail),true); scan_math(operand(tail));

@ @<Cases of |main_control| that build...@>=

@ @<Declare act...@>=
procedure math_ac;
begin tail_append(get_node(accent_noad_size));
type(tail)←accent_noad; subtype(tail)←normal;
character(accent_chr(tail))←cur_val mod 256;
if cur_val≥var_code then fam(accent_chr(tail))←cur_fam
else fam(accent_chr(tail))←(cur_val div 256) mod 16;

@ @<Cases of |main_control| that build...@>=
mmode+vcenter: begin scan_spec; new_save_level(vcenter_group);
	push_nest; mode←-vmode; prev_depth←ignore_depth;

@ @<Cases of |handle...@>=
vcenter_group: begin end_graf; unsave; save_ptr←save_ptr-2;
	p←vpack(link(head),saved(1),saved(0)); pop_nest;
	tail_append(new_noad); type(tail)←vcenter_noad;
	math_type(operand(tail))←sub_box; info(operand(tail))←p;

@ The routine that inserts a |style_node| holds no surprises.

@<Put each...@>=
@!@:display_style_}{\.{\\displaystyle} primitive@>
@!@:text_style_}{\.{\\textstyle} primitive@>
@!@:script_style_}{\.{\\scriptstyle} primitive@>
@!@:script_script_style_}{\.{\\scriptscriptstyle} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
math_style: print_style(chr_code);

@ @<Cases of |main_control| that build...@>=
mmode+math_style: tail_append(new_style(cur_chr));
mmode+non_script: math_non_script;

@ The \.{\\nonscript} command must be immediately followed by glue;
we use the fact that the command codes |hskip|, |vskip|, |mskip|, |kern|,
and |mkern| are consecutive.

@<Declare act...@>=
procedure math_non_script;
begin @<Get the next non-blank non-call...@>;
if (cur_cmd>mkern)∨(cur_cmd<hskip) then
	begin print_nl("! \nonscript must be followed by glue");
@:non_script_}{\.{\\nonscript must be...}@>
	help2("Next time say, e.g., `\nonscript\mskip\the\thinmskip' or")@/
		("`\nonscript\kern3pt'. I'm ignoring the present \nonscript.");
else	begin back_input; tail_append(new_glue(zero_glue));

@ Subscripts and superscripts are attached to the previous operand by the
action procedure called |sub_sup|. We use the facts that |sub_mark=sup_mark+1|
and |subscr(p)=supscr(p)+1|.

@<Cases of |main_control| that build...@>=
mmode+sub_mark,mmode+sup_mark: sub_sup;

@ @<Declare act...@>=
procedure sub_sup;
var t:small_number; {type of previous sub/superscript}
@!p:pointer; {field to be filled by |scan_math|}
begin t←empty; p←null;
if tail≠head then if (type(tail)≥ord_noad)∧(type(tail)<left_noad) then
	begin p←supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|}
if (p=null)∨(t≠empty) then @<Insert a dummy noad to be sub/superscripted@>;

@ @<Insert a dummy...@>=
begin tail_append(new_noad); type(tail)←ord_noad;
p←supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|}
if t≠empty then
	begin if cur_cmd=sup_mark then
		begin print_nl("! Double superscript");
@.Double superscript@>
		help1("I treat `x^1^2' essentially like `x^1{}^2'.");
	else	begin print_nl("! Double subscript");
@.Double subscript@>
		help1("I treat `x↓1↓2' essentially like `x↓1{}↓2'.");

@ An operation like `\.{\\over}' causes the current mlist to go into a
state of suspended animation: |incompleat_noad| points to a |fraction_noad|
that contains the mlist-so-far as its numerator, while the denominator
is yet to come. Finally when the mlist is finished, the denominator will
go into the incompleat fraction noad, and that noad will become the
whole formula, unless it is surrounded by `\.{\\left}' and `\.{\\right}'

@d above_code=0 { `\.{\\above}' }
@d over_code=1 { `\.{\\over}' }
@d atop_code=2 { `\.{\\atop}' }
@d xx_code=3 { `\.{\\xabovex}', `\.{\\xoverx}', `\.{\\xatopx}' }

@<Put each...@>=
@!@:above_}{\.{\\above} primitive@>
@!@:over_}{\.{\\over} primitive@>
@!@:atop_}{\.{\\atop} primitive@>
@!@:x_above_x_}{\.{\\xabovex} primitive@>
@!@:x_over_x_}{\.{\\xoverx} primitive@>
@!@:x_atop_x_}{\.{\\xatopx} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
above: case chr_code of
	othercases print_esc("above")

@ @<Cases of |main_control| that build...@>=
mmode+above: math_fraction;

@ @<Declare act...@>=
procedure math_fraction;
var c:small_number; {the type of generalized fraction we are scanning}
begin c←cur_chr;
if incompleat_noad≠null then
	@<Ignore the fraction operation and complain about this ambiguous case@>
else	begin incompleat_noad←get_node(fraction_noad_size);
	link(head)←null; tail←head;
	@<Use code |c| to distinguish between generalized fractions@>;

@ @<Use code |c|...@>=
case c mod xx_code of
above_code: begin scan_dimen(false,false,false);
over_code: thickness(incompleat_noad)←default_code;
atop_code: thickness(incompleat_noad)←0;
end; {there are no other cases}
if c≥xx_code then
	begin scan_delimiter(left_delimiter(incompleat_noad),false);

@ @<Ignore the fraction...@>=
begin if c mod xx_code=above_code then scan_dimen(false,false,false);
if c≥xx_code then
	begin scan_delimiter(garbage,false); scan_delimiter(garbage,false);
print_nl("! Ambiguous; you need another { and }");
help3("I'm ignoring this fraction specification, since I don't")@/
	("know whether a construction like `x \over y \over z'")@/
	("means `{x \over y} \over z' or `x \over {y \over z}'.");

@ At the end of a math formula or subformula, the |fin_mlist| routine is
called upon to return a pointer to the newly completed mlist, and to
pop the nest back to the enclosing semantic level. The parameter to
|fin_mlist|, if not null, points to a |right_noad| that ends the
current mlist; this |right_noad| has not yet been appended.

@<Declare act...@>=
function fin_mlist(@!p:pointer):pointer;
var q:pointer; {the mlist to return}
begin if incompleat_noad≠null then @<Compleat the incompleat noad@>
else	begin link(tail)←p; q←link(head);
pop_nest; fin_mlist←q;

@ @<Compleat...@>=
begin math_type(denominator(incompleat_noad))←sub_mlist;
if p=null then q←incompleat_noad
else	begin q←info(numerator(incompleat_noad));
	if type(q)≠left_noad then confusion("right");
@:confusion right}{\quad right@>
	link(q)←incompleat_noad; link(incompleat_noad)←p;

@ Now at last we're ready to see what happens when a right brace occurs
in a math formula.

@<Cases of |handle...@>=
math_group: begin unsave; decr(save_ptr);@/
	math_type(saved(0))←sub_mlist; info(saved(0))←fin_mlist(null);

@ We have dealt with all constructions of math mode except `\.{\\left}' and
`\.{\\right}', so the picture is completed by the following sections of
the program.

@<Put each...@>=
@!@:left_}{\.{\\left} primitive@>
@!@:right_}{\.{\\right} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
left_right: if chr_code=left_noad then print_esc("left")
else print_esc("right");

@ @<Cases of |main_control| that build...@>=
mmode+left_right: math_left_right;

@ @<Declare act...@>=
procedure math_left_right;
var t:small_number; {|left_noad| or |right_noad|}
@!p:pointer; {new noad}
begin t←cur_chr;
if (t=right_noad)∧(cur_group≠math_left_group) then
	@<Try to recover from mismatched \.{\\right}@>
else	begin p←new_noad; type(p)←t;
	if t=left_noad then
		begin push_nest; new_save_level(math_left_group);
		link(head)←p; tail←p;
	else	begin p←fin_mlist(p); unsave; {end of |math_left_group|}

@ @<Try to recover from mismatch...@>=
begin if cur_group=math_delim_group then
	begin scan_delimiter(garbage,false);
	print_nl("! Extra \right");
@.Extra \\right.@>
	help1("I'm ignoring a \right that had no matching \left.");
else off_save;

@ Here is the only way out of math mode.

@<Cases of |main_control| that build...@>=
mmode+math_delim: if cur_group=math_delim_group then fin_math
	else off_save;

@ @<Declare act...@>=
procedure fin_math;
var l:boolean; {`\.{\\leqno}' instead of `\.{\\eqno}'}
@!m:integer; {|mmode| or |-mmode|}
@!p:pointer; {the formula}
@!a:pointer; {box containing equation number}
@<Local variables for finishing a displayed formula@>@;
begin m←mode; l←false; p←fin_mlist(null); {this pops the nest}
if mode=-m then {end of equation number}
	begin cur_mlist←p; cur_style←text_style; mlist_penalties←false;
	mlist_to_hlist; a←hpack(link(temp_head),natural);
	unsave; decr(save_ptr); {now |cur_group=math_delim_group|}
	if saved(0)=1 then l←true;
	m←mode; p←fin_mlist(null);
else a←null;
if m<0 then @<Finish math in text@>
else	begin @<Check that another \.{\$} follows@>;
	@<Finish displayed math@>;

@ The |unsave| is done after everything else here; therefore an appearance of
`\.{\\mathsurround}' inside of `\.{\$...\$}' affects the spacing at these
particular \.\$'s. This is consistent with the conventions of
`\.{\$\$...\$\$}', since `\.{\\dispskip}' inside a display affects the
space surrounding that display.

@<Finish math in text@>=
begin tail_append(new_math(math_surround,before));
cur_mlist←p; cur_style←text_style; mlist_penalties←true; mlist_to_hlist;
while link(tail)≠null do tail←link(tail);
space_factor←1000; unsave;

@ We have saved the worst for last: The fussiest part of math mode processing
occurs when a displayed formula is being centered and placed with an optional
equation number.

\TeX\ gets to the following part of the program when the first `\.\$' ending
a display has been scanned. 

@<Check that another \.{\$} follows@>=
begin get_nc_token;
if cur_cmd≠math_delim then
	begin print_nl("! Display math should end with $$");
@.Display math...with \$\$@>
	help2("The `$' that I just saw supposedly matches a previous `$$'.")@/
		("So I shall assuming that you typed `$$' both times.");

@ @<Local variables for finishing...@>=
@!b:pointer; {box containing the equation}
@!w:scaled; {width of the equation}
@!z:scaled; {width of the line}
@!e:scaled; {width of equation number}
@!q:scaled; {width of equation number plus space to separate from equation}
@!d:scaled; {displacement of equation in the line}
@!s:scaled; {move the line right this much}
@!g1,@!g2:small_number; {glue parameter codes for before and after}
@!r:pointer; {kern node used to position the display}

@ At this time |p| points to the mlist for the formula; |a| is either
|null| or it points to a box containing the equation number; and we are in
vertical mode (or internal vertical mode).

@<Finish displayed math@>=
cur_mlist←p; cur_style←display_style; mlist_penalties←false;
mlist_to_hlist; p←link(temp_head);
b←hpack(p,natural); w←width(b); z←display_width; s←display_indent;
if a=null then
	begin e←0; q←0;
else	begin e←width(a); q←e+math_quad(text_size);
if w+q>z then
	@<Squeeze the equation as much as possible; if there is an equation
		number that should go on a separate line by itself,
@<Determine the displacement, |d|, of the left edge of the equation, with
	respect to the line size |z|, assuming that |l=false|@>;
@<Append the glue or equation number preceding the display@>;
@<Append the display and perhaps also the equation number@>;
@<Append the glue or equation number following the display@>;

@ @<Declare act...@>=
procedure resume_after_display;
begin if cur_group≠math_delim_group then confusion("display");
@:confusion display}{\quad display@>
unsave; already←already+3;
push_nest; mode←hmode; space_factor←1000;
@<Scan an optional space@>;
if nest_ptr=1 then build_page;

@ The user can force the equation number to go on a separate line
by making its width zero.

@<Squeeze the equation as much as possible...@>=
begin if (e≠0)∧((w-total_shrink[normal]+q≤z)∨@|
	 (total_shrink[filll]≠0)) then
	begin free_node(b,box_node_size);
else	begin e←0;
	if w>z then
		begin free_node(b,box_node_size);

@ We try first to center the display without regard to the existence of
the equation number. If that would make it too close (where ``too close''
means that the space between display and equation number is less than the
width of the equation number), we either center it in the remaining space
or move it as far from the equation number as possible. The latter alternative
is taken only if the display begins with glue, since we assume that the
user put glue there to control the spacing precisely.

@<Determine the displacement, |d|, of the left edge of the equation...@>=
if (e>0)∧(d<2*e) then {too close}
	begin d←half(z-w-e);
	if p≠null then if type(p)=glue_node then d←0;

@ If the equation number is set on a line by itself, either before or
after the formula, we append an infinite penalty so that no page break will
separate the display from its number; and we use the same size and
displacement for all three potential lines of the display, even though
`\.{\\parshape}' may specify them differently.
@!@:par_shape_}{\.{\\parshape} primitive@>

@<Append the glue or equation number preceding the display@>=
if (d+s≤pre_display_size)∨ l then {not enough clearance}
	begin g1←disp_skip_code; g2←disp_skip_code;
else	begin g1←disp_a_skip_code; g2←disp_b_skip_code;
if l ∧(e=0) then {it follows that |type(a)=hlist_node|}
	begin shift_amount(a)←s; append_to_vlist(a);
else tail_append(new_param_glue(g1))

@ @<Append the display and perhaps also the equation number@>=
if e≠0 then
	begin p←new_kern(z-w-e-d);
	if l then
		begin link(a)←r; link(r)←b; b←a; d←0;
	else	begin link(b)←r; link(r)←a;
shift_amount(b)←s+d; append_to_vlist(b)

@ @<Append the glue or equation number following the display@>=
if (a≠null)∧(e=0)∧ ¬ l then
	begin tail_append(new_penalty(inf_penalty));
else	begin tail_append(new_penalty(post_display_penalty));

@ When \.{\\halign} appears in a display, the alignment routines operate
essentially as they do in vertical mode. Then the following program is
activated, with |p| and |q| pointing to the beginning and end of the
resulting list, and with |t| the |prev_depth| value.

@<Finish an alignment in a display@>=
begin do_assignments;
if cur_cmd≠math_delim then @<Complain about improper alignment in display@>
else @<Check that another \.{\$} follows@>;
if q≠null then tail←q;
prev_depth←t; resume_after_display;
if cur_group≠math_delim_group then confusion("dispalign");
@:confusion dispalign}{\quad dispalign@>

@ @<Complain about improper alignment in display@>=
begin print_nl("! Missing $$ inserted");
help2("Displays can use special alignments (like \eqalignno)")@/
	("only if nothing but the alignment itself is between $$'s.");
@* \[48] Conditional processing.
The `\.{\\if}' commands can be given in any mode, and they cause \TeX\ to
select between alternative texts.

@d if_char_code=0 { `\.{\\if}' }
@d if_int_code=1 { `\.{\\ifnum}' }
@d if_dimen_code=2 { `\.{\\ifdim}' }
@d if_even_code=3 { `\.{\\ifeven}' }
@d if_vmode_code=4 { `\.{\\ifvmode}' }
@d if_hmode_code=5 { `\.{\\ifhmode}' }
@d if_mmode_code=6 { `\.{\\ifmmode}' }
@d if_inner_code=7 { `\.{\\ifinner}' }
@d if_absent_code=8 { `\.{\\ifabsent}' }
@d ifx_code=9 { `\.{\\ifx}' }

@<Put each...@>=
@!@:if_char_}{\.{\\if} primitive@>
@!@:if_int_}{\.{\\ifnum} primitive@>
@!@:if_dimen_}{\.{\\ifdim} primitive@>
@!@:if_even_}{\.{\\ifeven} primitive@>
@!@:if_vmode_}{\.{\\ifvmode} primitive@>
@!@:if_hmode_}{\.{\\ifhmode} primitive@>
@!@:if_mmode_}{\.{\\ifmmode} primitive@>
@!@:if_inner_}{\.{\\ifinner} primitive@>
@!@:if_absent_}{\.{\\ifabsent} primitive@>
@!@:ifx_}{\.{\\ifx} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
if_test: case chr_code of
	othercases print_esc("if")

@ @<Cases of |main_control| that don't depend on |mode|@>=

@ @<Declare act...@>=
procedure conditional;
label exit;
var b:boolean; {is the condition true?}
@!r:"<"..">"; {relation to be evaluated}
@!n:integer; {to be tested against the second operand}
@!c:if_int_code..if_dimen_code; {remembers which case we are in}
@!p,@!q:pointer; {for traversing token lists in \.{\\ifx} tests}
begin case cur_chr of
if_char_code: @<Test if two characters match@>;
if_int_code, if_dimen_code: @<Test relation between integers or dimensions@>;
if_even_code: @<Test if a counter is even@>;
if_vmode_code: b←(abs(mode)=vmode);
if_hmode_code: b←(abs(mode)=hmode);
if_mmode_code: b←(abs(mode)=mmode);
if_inner_code: b←(mode<0);
if_absent_code: @<Test if a box is absent@>;
ifx_code: @<Test if two control sequences match@>;
end; {there are no other cases}
@<Scan the first or second block, according as |b| is true or false@>;

@ Here we use the fact that |"<"|, |"="|, and |">"| are consecutive ascii
@^ascii code@>

@<Test relation between integers or dimensions@>=
begin c←cur_chr;
if c=if_int_code then scan_int@+else scan_dimen(false,false,false);
n←cur_val; @<Get the next non-blank non-call...@>;
if (cur_tok≥other_token+"<")∧(cur_tok≤other_token+">") then
else	begin print_nl("! Missing = inserted for "); print_cmd_chr(if_test,c);
@.Missing = inserted@>
	help1("I was expecting to see `<', `=', or `>'. Didn't.");
	back_error; r←"=";
if c=if_int_code then scan_int@+else scan_dimen(false,false,false);
case r of
"<": b←(n<cur_val);
"=": b←(n=cur_val);
">": b←(n>cur_val);

@ @<Test if a counter is even@>=
begin scan_eight_bit_int; b←not odd(count(cur_val));

@ @<Test if a box is absent@>=
begin scan_eight_bit_int; b←(box(cur_val)=null);

@ The `\.{\\if}' test will accept characters of classes |left_brace|,
|right_brace|, |math_delim|, |tab_mark|, |mac_param|, |sup_mark|,
|sub_mark|, |letter|, and |other_char|.

@<Test if two characters match@>=
begin @<Get the next non-blank non-call...@>;
if (cur_tok≥match_token)∨(cur_cmd=endv) then n←256
else	begin n←cur_chr; @<Get the next non-blank non-call...@>;
	if (cur_tok≥match_token)∨(cur_cmd=endv) then n←256;
if n<256 then b←(n=cur_chr)
else	begin print_nl("! Incomplete \if");
@.Incomplete \\if@>
	help2("I need to see two characters immediately after \if.")@/
		("Something's wrong here, so I won't make any \if test.");
	back_error; return;

@ Note that `\.{\\ifx}' will declare two macros different if one is |long|
or |outer| and the other isn't, even though the texts of the macros are
the same.

@<Test if two control sequences match@>=
begin get_token; n←cs_ptr;
if n≠0 then get_token;
if cs_ptr≠0 then
	begin if cur_cmd≠eq_type(n) then b←false
	else if cur_cmd<call then b←(cur_chr=equiv(n))
	else @<Test if two macro texts match@>;
else	begin print_nl("! Incomplete \ifx");
@.Incomplete \\ifx@>
	help2("I need to see two control sequences immediately after \ifx.")@/
		("Something's wrong here, so I won't make any \if test.");
	back_error; return;

@ Note also that `\.{\\ifx}' decides that macros \.{\\a} and \.{\\b} are
different in examples like this:

@<Test if two macro texts match@>=
begin p←link(cur_cmd); q←link(equiv(n)); {omit reference counts}
while (p≠null)∧(q≠null) do
	if info(p)≠info(q) then p←null
	else	begin p←link(p); q←link(q);

@ When the condition is true, a left brace must follow immediately.
Other\-wise there can be junk between the conditional command and the
first block.

We must decrease |cur_level| after calling |new_save_level| for
conditionals, since the level number is not supposed to increase in
such cases; the braces are ``transparent.''

@<Scan the first or second block, according as |b| is true or false@>=
if b then
	begin scan_left_brace; new_save_level(true_group); decr(cur_level);
else	begin pass_block(0); {skip the first block}
	@<Get the next non-blank non-call...@>;
	if cur_cmd=else_code then
		begin scan_left_brace; new_save_level(false_group); decr(cur_level);
	else	begin print_nl("! Missing \else{} inserted");
@.Missing \\else{\{\}} inserted@>
		help3("An \if test turned out to be false; but")@/
			("no \else followed the first clause.")@/
			("So I'm going on as if you said `\else{}'."); back_error;

@ @<Cases of |handle...@>=
true_group: begin leave_transparent_group;
	@<Get the next non-blank non-call...@>;
	if cur_cmd=else_code then pass_block(0)
	else	begin print_nl("! Missing \else{} inserted");
@.Missing \\else{\{\}} inserted@>
		help3("An \if test turned out to be true; but")@/
			("no \else followed the first clause.")@/
			("So I'm going on as if you said `\else{}'."); back_error;
false_group: begin leave_transparent_group;
	@<Scan an optional space@>;

@ @<Cases of |main_control| that don't...@>=
any_mode(case_branch): case_conditional;
any_mode(else_code): spurious_else;

@ @<Declare act...@>=
procedure spurious_else;
begin print_nl("! Extra \else");
@.Extra \\else@>
help2("If there was an \if or \case before this \else,")@/
	("I've lost track of it, so something else is wrong.");

@ @<Declare act...@>=
procedure case_conditional;
label exit,done;
var n:integer; {the number of cases to pass}
begin scan_int; n←cur_val;
while n≠0 do
	begin @<Get the next non-blank non-call...@>;
	if cur_cmd≠left_brace then
		if cur_cmd=else_code then goto done
		else @<Insert missing `\.{\\else\{\}}' for `\.{\\case}' and |return|@>;
	pass_block(1); decr(n);
if n=0 then new_save_level(case_group)@+else new_save_level(false_group);

@ @<Cases of |handle...@>=
case_group: begin leave_transparent_group;
	loop@+	begin @<Get the next non-blank non-call...@>;
		if cur_cmd=left_brace then pass_block(1)
		else if cur_cmd=else_code then goto done
		else @<Insert missing `\.{\\else\{\}}' for `\.{\\case}' and |return|@>;
	done: pass_block(0);

@ @<Insert missing `\.{\\else\{\}}' for `\.{\\case}' and |return|@>=
begin print_nl("! Missing \else{} inserted for \case");
@.Missing \\else{\{\}} inserted@>
help3("The \case construction should have had an \else here,")@/
	("or a new group beginning with `{', but I found neither.")@/
	("So I'm going on as if you said `\else{}'.");
back_error; return;
@* \[49] Mode-independent processing.
The long |main_control| procedure has now been fully specified, except for
certain activities that are independent of the current mode. These activities
do not change the current vlist or hlist or mlist; if they change anything,
it is the value of a parameter or the meaning of a control sequence.

Assignments to values in |eqtb| can be global or local. Furthermore, a
control sequence can be defined to be `\.{\\long}' or `\.{\\outer}', and
it might or might not be expanded. The prefixes `\.{\\global}', `\.{\\long}',
and `\.{\\outer}' can occur in any order. Therefore we assign binary numeric
codes, making it possible to accumulate the union of all specified prefixes
by adding the corresponding codes.  (\PASCAL's |set| operations could also
have been used.)

@<Put each...@>=
@!@:long_}{\.{\\long} primitive@>
@!@:outer_}{\.{\\outer} primitive@>
@!@:global_}{\.{\\global} primitive@>
@!@:def_}{\.{\\def} primitive@>
@!@:gdef_}{\.{\\gdef} primitive@>
@!@:edef_}{\.{\\edef} primitive@>
@!@:xdef_}{\.{\\xdef} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
prefix: if chr_code=1 then print_esc("long")
	else if chr_code=2 then print_esc("outer")
	else print_esc("global");
def: if chr_code=0 then print_esc("def")
	else if chr_code=1 then print_esc("gdef")
	else if chr_code=2 then print_esc("edef")
	else print_esc("xdef");

@ Every prefix, and every command code that might or might not be prefixed,
calls the action procedure |prefixed_command|. This routine accumulates
a sequence of prefixes until coming to a non-prefix, then it carries out
the command.

@<Cases of |main_control| that don't...@>=

@ If the user says, e.g., `\.{\\global\\global}', the redundancy is
silently accepted.

@<Declare act...@>=
@t\4@>@<Declare the procedure called |new_font|@>@t@>@;
@t\4@>@<Declare the procedure called |new_interaction|@>@t@>@;
@t\4@>@<Declare the procedure called |do_register_command|@>@t@>@;
procedure prefixed_command;
label exit;
var a:small_number; {accumulated prefix codes so far}
@!j:halfword; {index into a \.{\\parshape} specification}
@!k:0..font_mem_size; {index into |font_info|}
@!p,@!q,@!r:pointer; {for temporary short-term use}
@!n:integer; {ditto}
@!e:boolean; {should a definition be expanded? or was \.{\\let} not done?}
begin a←0;
while cur_cmd=prefix do
	begin if not odd(a div cur_chr) then a←a+cur_chr;
	@<Get the next non-blank non-call...@>;
	if cur_cmd≤max_non_prefixed_command then
		@<Discard erroneous prefixes and |return|@>;
@<Discard the prefixes \.{\\long} and \.{\\outer} if they are irrelevant@>;
case cur_cmd of
othercases confusion("prefix")
@:confusion prefix}{\quad prefix@>

@ @<Discard erroneous...@>=
begin print_nl("! You can't use a prefix with `");
@.You can't use a prefix with x@>
print_cmd_chr(cur_cmd,cur_chr); print_char("'");
help1("I'll pretend you didn't say \long or \outer or \global.");
back_error; return;

@ @<Discard the prefixes...@>=
if (cur_cmd≠def)∧(a mod 4≠0) then
	begin print_nl("! You can't use \long or \outer with `");
@.You can't use \\long...@>
	print_cmd_chr(cur_cmd,cur_chr); print_char("'");
	help1("I'll pretend you didn't say \long or \outer here.");

@ The previous routine does not have to adjust |a| so that |a mod 4=0|,
since the following routines test for the \.{\\global} prefix as follows.

@d global==(a≥4)
@d define(#)==if global then geq_define(#)@+else eq_define(#)
@d word_define(#)==if global then geq_word_define(#)@+else eq_word_define(#)

@ Here's an example of the way many of the following routines operate.
(Unfortunately, they aren't all as simple as this.)

set_font: begin scan_font_number; define(cur_font_loc,data,cur_val);
set_family: begin scan_four_bit_int; define(cur_fam_loc,data,cur_val);

@ When a |def| command has been scanned,
|cur_chr| is odd if the definition is supposed to be global, and
|cur_chr≥2| if the definition is supposed to be expanded.

def: begin if odd(cur_chr)∧ not global then a←a+4;
	e←(cur_chr≥2); get_token;
	if cur_tok<cs_token_flag then @<Pontificate about the absurdity of
		using \.{\\def} without a control sequence@>
	else	begin p←cs_ptr; q←scan_toks(true,e);
		define(p,call+(a mod 4),cur_val);

@ @<Pontificate...@>=
begin print_nl("! You can only define a control sequence");
@.You can only define...@>
help4("Please don't say `\def cs{...}', say `\def\cs{...}'.")@/
	("I'm ignoring the \def command here. You can")@/
	("recover graciously from this error if you're careful;")@/
	("see exercise 27.xx in the manual.");

@ @<Assignments@>=
let: begin e←true;@/
	@<Carry out the \.{\\let} operation and set |e←false|, unless
		the user's syntax is bad@>;
	if e then
		begin print_nl("You can use \let only with control sequences");
		help2("I'm not \let-ting anything change here,")@/
			("since I can only do things like `\let\a=\b'.");

@ @<Carry out the \.{\\let}...@>=
if cs_ptr≠0 then
	begin p←cs_ptr;
	repeat get_token;
	until (cur_cmd≠spacer)∧(cur_tok≠other_token+"=");
	if cur_tok≥cs_token_flag then
		begin e←false;
		if cur_cmd≥call then add_token_ref(cur_chr);

@ The two token-list parameters, \.{\\output} and \.{\\everypar}, receive
their values in the following way. (For safety's sake, we place an
enclosing pair of braces around an \.{\\output} list.)

assign_toks: begin p←cur_chr; scan_optional_equals; q←scan_toks(false,false);
	if link(cur_val)=null then define(p,undefined_cs,null)
	else	begin if p=output_routine_loc then {enclose in curlies}
			begin link(q)←get_avail; q←link(q);
			q←get_avail; info(q)←left_brace_token+"{";
			link(q)←link(cur_val); link(cur_val)←q;

@ Similar routines are used to assign values to the numeric parameters.

assign_int: begin p←cur_chr; scan_optional_equals; scan_int;
assign_dimen: begin p←cur_chr; scan_optional_equals;
	scan_dimen(false,false,false); word_define(dimen_base+p,cur_val);
assign_glue: begin p←cur_chr; scan_optional_equals;
	if (width(cur_val)=0)∧(stretch(cur_val)=0)∧(shrink(cur_val)=0) then
		begin add_glue_ref(zero_glue);
		delete_glue_ref(cur_val); cur_val←zero_glue;

@ The \.{\\hangindent} operation is like a special case of |assign_dimen|,
but it also changes \.{\\hangafter}. If the user says that the hanging
indentation applies `\.{for 0}' lines, we change the amount of indentation to

hang_indent:begin scan_optional_equals; scan_dimen(false,false,false);
	if scan_keyword("after") then
		begin scan_int; n←cur_val;
	else if scan_keyword("for") then
		begin scan_int; n←-cur_val;
		if n=0 then hanging_indent←0;
	else n←1; {`\.{after 1}' is the default}

@ The various character code tables are changed by the |def_code| commands,
and the font families are declared by |def_family|.

@<Put each...@>=
@!@:ch_code_}{\.{\\chcode} primitive@>
@!@:math_code_}{\.{\\mathcode} primitive@>
@!@:lc_code_}{\.{\\lccode} primitive@>
@!@:uc_code_}{\.{\\uccode} primitive@>
@!@:sf_code_}{\.{\\sfcode} primitive@>
@!@:del_code_}{\.{\\delcode} primitive@>
@!@:text_font_}{\.{\\textfont} primitive@>
@!@:script_font_}{\.{\\scriptfont} primitive@>
@!@:script_script_font_}{\.{\\scriptscriptfont} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
def_code: if chr_code=ch_code_base then print_esc("chcode")
	else if chr_code=math_code_base then print_esc("mathcode")
	else if chr_code=lc_code_base then print_esc("lccode")
	else if chr_code=uc_code_base then print_esc("uccode")
	else if chr_code=sf_code_base then print_esc("sfcode")
	else print_esc("delcode");
def_family: print_size(chr_code-math_font_base);

@ The different types of code values have different legal ranges; the
following program is careful to check each case properly.

def_code: begin @<Let |n| be the largest legal code value, based on |cur_chr|@>;
	p←cur_chr; scan_seven_bit_int; p←p+cur_val; scan_optional_equals;
	if ((cur_val<0)∧(p<del_code_base))∨(cur_val>n) then
		begin print_nl("! Invalid code ("); print_int(cur_val);
@.Invalid code@>
		if p<del_code_base then print("), should be in the range 0..")
		else print("), should be at most ");
		help1("I'm going to use 0 instead of that illegal code value.");@/
		error; cur_val←0;
	if p<del_code_base then define(p,data,cur_val)
	else word_define(p,cur_val);

@ @<Let |n| be the largest...@>=
if cur_chr=ch_code_base then n←max_char_code
else if cur_chr=math_code_base then n←@'77777
else if cur_chr=sf_code_base then n←65536
else if cur_chr=del_code_base then n←@'77777777
else n←127

@ @<Assignments@>=
def_family: begin p←cur_chr; scan_four_bit_int; p←p+cur_val;
	scan_optional_equals; scan_int;
	if (cur_val≥bad_font_code)∨(cur_val<0) then n←undefined_font
	else n←font_number[cur_val];
	if n=undefined_font then
		begin print_nl("! Font number "); print_int(cur_val);
@.Font number x has not...@>
		print(" has not been defined");
		help1("Proceed; I'm going to ignore this math font definition.");@/
	else define(p,data,n);

@ Here are the primitives that assign new values to the user's ``registers.''

@<Put each...@>=
@!@:set_count_}{\.{\\setcount} primitive@>
@!@:set_dimen_}{\.{\\setdimen} primitive@>
@!@:set_skip_}{\.{\\setskip} primitive@>
@!@:adv_count_}{\.{\\advcount} primitive@>
@!@:adv_dimen_}{\.{\\advdimen} primitive@>
@!@:adv_skip_}{\.{\\advskip} primitive@>
@!@:mult_count_}{\.{\\multcount} primitive@>
@!@:mult_dimen_}{\.{\\multdimen} primitive@>
@!@:mult_skip_}{\.{\\multskip} primitive@>
@!@:div_count_}{\.{\\divcount} primitive@>
@!@:div_dimen_}{\.{\\divdimen} primitive@>
@!@:div_skip_}{\.{\\divskip} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
	begin case cmd of
	set_register: print_esc("set");
	adv_register: print_esc("adv");
	mult_register: print_esc("mult");
	othercases print_esc("div")
	if chr_code=int_val then print("count")
	else if chr_code=dimen_val then print("dimen")
	else print("skip");

@ @<Assignments@>=
set_register,adv_register,mult_register,div_register: do_register_command(a);

@ We use the fact that |set_register<adv_register<mult_register<div_register|.

@<Declare the procedure called |do_register_command|@>=
procedure do_register_command(@!a:small_number);
label exit;
var p,@!q,@!r:pointer; {for list manipulation}
@!n:eight_bits; {register number}
begin p←cur_chr; q←cur_cmd; scan_eight_bit_int; n←cur_val;
if q=set_register then scan_optional_equals
else @<Scan the word `\.{by}'; |return| if it's not there@>;
if q<mult_register then @<Compute result of |set_register| or
		|adv_register|, put it in |cur_val|@>
else @<Compute result of |mult_register| or |div_register|,
		put it in |cur_val|@>;
if arith_error then
	begin print_nl("! Arithmetic overflow");
@.Arithmetic overflow@>
	help2("I can't carry out that multiplication or division,")@/
		("since the result is out of range.");
	error; return;
case p of
int_val: word_define(count_base+n,cur_val);
dimen_val: word_define(scaled_base+n,cur_val);
glue_val: define(skip_base+n,glue_ref,cur_val);
end; {there are only three cases}
exit: end;

@ @<Scan the word `\.{by}'...@>=
if not scan_keyword("by") then
	begin print_nl("! Missing `by'");
@.Missing `by'@>
	help2("I didn't do the arithmetic because you have to tell")@/
		("me by what; e.g., `\advcount 0 by 1'.");
	error; return;

@ @<Compute result of |set_register| or |adv_register|...@>=
case p of
int_val: begin scan_int; if q=adv_register then cur_val←cur_val+count(n);
dimen_val: begin scan_dimen(false,false,false);
	if q=adv_register then cur_val←cur_val+dimen(n);
glue_val: begin scan_glue(false);
	if q=adv_register then @<Compute the sum of two glue specs@>;
end {there are only three cases}

@ @<Compute the sum of two glue specs@>=
begin q←new_spec(cur_val); r←skip(n); delete_glue_ref(cur_val);
if stretch(q)=0 then stretch_order(q)←normal;
if stretch_order(q)=stretch_order(r) then stretch(q)←stretch(q)+stretch(r)
else if (stretch_order(q)<stretch_order(r))∧(stretch(r)≠0) then
	begin stretch(q)←stretch(r); stretch_order(q)←stretch_order(r);
if shrink(q)=0 then shrink_order(q)←normal;
if shrink_order(q)=shrink_order(r) then shrink(q)←shrink(q)+shrink(r)
else if (shrink_order(q)<shrink_order(r))∧(shrink(r)≠0) then
	begin shrink(q)←shrink(r); shrink_order(q)←shrink_order(r);

@ @<Compute result of |mult_register| or |div_register|...@>=
begin scan_int;
case p of
int_val: if q=mult_register then cur_val←nx_plus_y(count(n),cur_val,0)
	else cur_val←x_over_n(count(n),cur_val);
dimen_val: if q=mult_register then cur_val←nx_plus_y(dimen(n),cur_val,0)
	else cur_val←x_over_n(dimen(n),cur_val);
glue_val: begin r←new_spec(skip(n));
	if q=mult_register then
		begin width(r)←nx_plus_y(width(skip(n)),cur_val,0);
	else	begin width(r)←x_over_n(width(skip(n)),cur_val);
end; {there are only three cases}

@ The processing of boxes is somewhat different, because it may be necessary
to scan and create an entire box before we actually change the value of the old

set_box: begin scan_eight_bit_int;
	if global then saved(0)←box_flag+256+cur_val
	else saved(0)←box_flag+cur_val;
	scan_optional_equals; scan_box;

@ Paragraph shapes are set up in the obvious way.

set_shape: begin scan_int; n←cur_val;
	if n≤0 then p←null
	else	begin p←get_node(2*n+1); info(p)←n;
		for j←1 to n do
			begin scan_dimen(false,false,false);
			mem[p+2*j-1].sc←cur_val; {indentation}
			mem[p+2*j].sc←cur_val; {width}

@ Here's something that isn't quite so obvious. It guarantees that
|info(par_shape_ptr)| can hold any positive@@|n| such |get_node(2*n+1)|
doesn't overflow the memory capacity.

@<Check the ``constant''...@>=
if 2*max_halfword<hi_mem_base-mem_base then bad←41;

@ New hyphenation data is loaded by the |hyph_data| command.

@<Put each...@>=
@!@:hyphenation_}{\.{\\hyphenation} primitive@>
@!@:patterns_}{\.{\\patterns} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
hyph_data: if chr_code=1 then print_esc("patterns")
	else print_esc("hyphenation");

@ @<Assignments@>=
hyph_data: if cur_chr=1 then
		begin init new_patterns; return;@;@+tini@/
		print_nl("! \patterns can be loaded only by INITEX");
@:patterns_}{\.{\\patterns can be...}@>
		help0; error;
	else new_hyph_exceptions;

@ All of \TeX's parameters are kept in |eqtb| except the font information,
the interaction mode, and the hyphenation tables; these are strictly global.

assign_tex_info: begin scan_tex_info(true);
	k←cur_val; scan_dimen(false,false,false); font_info[k].sc←cur_val;

@ Here is where the information for a new font gets loaded.

def_font: new_font;

@ @<Declare the procedure called |new_font|@>=
procedure new_font;
label exit;
var u:0..bad_font_code;
@!s:scaled; {stated ``at'' size, or zero if none is stated}
@!n:str_number; {file name}
@!f:internal_font_number; {runs through existing fonts}
begin scan_int;
if (cur_val<0)∨(cur_val≥bad_font_code) then
	begin print_nl("! Invalid code ("); print_int(cur_val);
@.Invalid code@>
	print("), should be in the range 0.."); print_int(bad_font_code-1);
	help1("I'm going to use 0 instead of that illegal code value.");@/
	error; cur_val←0;
u←cur_val; scan_optional_equals; scan_file_name;
@<Scan the ``at'' size specification@>;
@<If font |u| is already defined, |return|, and issue an error message
	if the previous definition was different@>;
@<If this font has already been loaded with another number,
	make the two numbers essentially identical and |return|@>;

@ @<Scan the ``at'' size specification@>=
if scan_keyword("at") then
	begin scan_dimen(false,false,false); s←cur_val;
	if (s≤0)∨(s≥@'1000000000) then
		begin print_nl("! Improper `at' size (");
		print_scaled(s); print("pt), replaced by 10pt");
@.Improper `at' size...@>
		help2("I can only handle fonts at positive sizes that are")@/
		("less than 2048pt, so I've changed what you said to 10pt.");
		error; s←10*unity;
else s←0

@ @<If font |u| is already defined,...@>=
if font_number[u]≠undefined_font then
	begin f←font_number[u];
	if ((s=0)∧ font_scaled[f])∨
		¬ str_eq_str(font_name[f],cur_name)∨@|
		¬ str_eq_str(font_area[f],cur_area) then
		@<Inform the user that font |u| was already defined differently@>;

@ @<Inform the user that font |u| was already defined differently@>=
begin print_nl("! Font "); print_int(u); print(" is already defined to be ");
@.Font x is already defined...@>
if font_scaled[f] then
	begin print(" at ");
help2("I can't change a font code that has already been used,")@/
("so I'll stick with the previous specification.");

@ When the user gives a new code number to a loaded font, the new value becomes
the |font_code| of record.

@<If this font has already been loaded with another number,...@>=
for f←font_base+1 to font_ptr do
	if str_eq_str(font_name[f],cur_name)∧@|
		((s=0)∧ ¬ font_scaled[f])∨((s≠0)∧(s=font_size[f])) then
		begin font_number[u]←f; font_code[f]←u; return;

@ @<Put each...@>=
@!@:batch_mode_}{\.{\\batchmode} primitive@>
@!@:nonstop_mode_}{\.{\\nonstopmode} primitive@>
@!@:scroll_mode_}{\.{\\scrollmode} primitive@>
@!@:error_stop_mode_}{\.{\\errorstopmode} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
set_interaction: case chr_code of
	batch_mode: print_esc("batchmode");
	nonstop_mode: print_esc("nonstopmode");
	scroll_mode: print_esc("scrollmode");
	othercases print_esc("errorstopmode")

@ @<Assignments@>=
set_interaction: new_interaction;

@ @<Declare the procedure called |new_interaction|@>=
procedure new_interaction;
begin print_ln;
@<Initialize the print |selector| based on |interaction|@>;
if job_name≠0 then selector←selector+2;

@ Here is a procedure that might be called `Get the next non-blank non-call
non-assignment token'.

@<Declare act...@>=
procedure do_assignments;
begin @<Get the next non-blank non-call...@>;
while cur_cmd>max_non_prefixed_command do
	begin prefixed_command;
	@<Get the next non-blank non-call...@>;

@ The user can issue messages to the terminal, regardless of the
current mode.

@<Cases of |main_control| that don't...@>=

@ @<Put each...@>=
@!@:message_}{\.{\\message} primitive@>
@!@:err_message_}{\.{\\errmessage} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
message: if chr_code=0 then print_esc("message")
	else print_esc("errmessage");

@ @<Declare act...@>=
procedure issue_message;
var old_setting:0..max_selector; {holds |selector| setting}
@!c:0..1; {identifies \.{\\message} and \.{\\errmessage}}
@!s:str_number; {the message}
begin c←cur_chr; link(garbage)←scan_toks(false,true);
old_setting←selector; selector←new_string;
token_show(cur_val); selector←old_setting;
str_room(1); s←make_string;
if c=0 then @<Print string |s| on the terminal@>
else @<Print string |s| as an error message@>;

@ @<Print string |s| on the terminal@>=
begin if offset+length(s)≥max_print_line then print_ln
else if offset>0 then print_char(" ");
print(s); update_terminal;

@ If \.{\\errmessage} occurs often in |scroll_mode|, we don't want to give
a long help message each time. So we give a verbose explanation only once.

@!err_help_seen:boolean; {has the long \.{\\errmessage} help been used?}

@ @<Set init...@>=err_help_seen←false;

@ @<Print string |s| as an error message@>=
begin print_nl("! "); print(s);
if err_help_seen then help1("(That was another \errmessage.)")
else help4("This error message was generated by an \errmessage")@/
	("command, so I can't give any explicit help.")@/
	("Pretend that you're Hercule Poirot, examine all clues,")@/
	("and deduce the truth by order and method.");
if interaction<error_stop_mode then err_help_seen←true;

@ The \.{\\uppercase} and \.{\\lowercase} commands are implemented by
building a token list and then changing the cases of the letters in it.

@<Cases of |main_control| that don't...@>=

@ @<Put each...@>=
@!@:lowercase_}{\.{\\lowercase} primitive@>
@!@:uppercase_}{\.{\\uppercase} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
case_shift:if chr_code=lc_code_base then print_esc("lowercase")
	else print_esc("uppercase");

@ @<Declare act...@>=
procedure shift_case;
var b:pointer; {|lc_code_base| or |uc_code_base|}
@!p:pointer; {runs through the token list}
@!t:halfword; {token}
@!c:eight_bits; {character code}
begin p←scan_toks(false,false); p←link(cur_val);
while p≠null do
	begin @<Change the case of the token in |p|, if a change is appropriate@>;
ins_list(link(cur_val)); free_avail(cur_val); {omit reference count}

@ When the case of a |chr_code| changes, we don't change the |cmd|.

@<Change the case of the token in |p|, if a change is appropriate@>=
if (t<match_token)∧(t≥letter_token) then
	begin c←t mod 256;
	if c<128 then if equiv(b+c)≠0 then info(p)←256*(t div 256)+equiv(b+c);

@ We come finally to the last pieces missing from |main_control|, namely the
`\.{\\show}' commands that are useful when debugging.

@<Cases of |main_control| that don't...@>=
any_mode(xray): show_whatever;

@ @d show_code=0 { \.{\\show} }
@d show_box_code=1 { \.{\\showbox} }
@d show_the_code=2 { \.{\\showthe} }
@d show_lists=3 { \.{\\showlists} }

@<Put each...@>=
@!@:show_}{\.{\\show} primitive@>
@!@:show_box_}{\.{\\showbox} primitive@>
@!@:show_the_}{\.{\\showthe} primitive@>
@!@:show_lists_}{\.{\\showlists} primitive@>

@ @<Cases of |print_cmd_chr|...@>=
xray: case chr_code of
	othercases print_esc("show")

@ @d show_error==@;@/
	if interaction<error_stop_mode then
		begin help0; decr(error_count);
	else	begin@t@>@;@/
		help3("This isn't an error message; I'm just \showing something.")
		("Type `I\show...' to show more (e.g., \show\cs,")@/
		("\showthe\count10, \showbox255, \showlists).");

@<Declare act...@>=
procedure show_whatever;
var p:pointer; {a token list to show}
begin case cur_chr of
show_code: @<Show the current meaning of a control sequence@>;
show_box_code: @<Show the current contents of a box@>;
show_the_code: @<Show the current value of some parameter or register@>;
othercases @<Show the current semantic nest@>

@ @<Show the current meaning of a control sequence@>=
begin get_token;
if cs_ptr=0 then
	begin print_nl("! Not a control sequence");
@.Not a control sequence@>
	help1("Next time say, e.g., `\show\cs'.");
else	begin print_nl("> "); sprint_cs(cs_ptr); print_char("=");
	if cur_cmd≥call then token_show(cur_chr)
	else if cur_cmd=top_bot_mark then

@ The following cases of |print_cmd_chr| arise only when \TeX\ has been asked
to \.{\\show} the current meaning of a control sequence.

@<Cases of |print_cmd_chr|...@>=
undefined_cs: print("undefined");
call: print("macro:");
long_call: print_esc("long macro:");
outer_call: print_esc("outer macro:");
long_outer_call: print_esc("long\outer macro:");

@ @<Show the current contents of a box@>=
begin scan_eight_bit_int;
print_nl("> \box"); print_int(cur_val); print_char("=");
if box(cur_val)=null then print("absent")
else show_box(box(cur_val));
print_nl("! OK"); show_error;

@ @<Show the current value of some parameter...@>=
begin p←the_toks(false); print_nl("> "); token_show(temp_head);
flush_list(link(temp_head)); show_error;

@ @<Show the current semantic nest@>=
begin show_activities; print_nl("! OK"); show_error;
@* \[50] Dumping and undumping the tables.
After \.{INITEX} has seen a collection of fonts and macros, it
can write all the necessary information on an auxiliary file so
that production versions of \TeX\ are able to initialize their
memory at high speed. The present section of the program takes
care of such output and input. We shall consider simultaneously
the processes of storing and restoring,
so that the inverse relation between them is clear.

The global variable |format_ident| is a string that is printed right
after the |banner| line when \TeX\ is ready to start. For \.{INITEX} this
string says `\.{(INITEX)}'; for other versions of \TeX\ it says, e.g.,
`\.{(format=BASIC 82.11.19)}', showing the year, month, and day that the
format file was created. We have |format_ident=0| before \TeX's
tables are loaded.


@ @<Set init...@>=

@ @<Initialize table entries...@>=
format_ident←" (INITEX)";

@ @<Declare act...@>=
init procedure store_fmt_file;
label found1,found2,done1,done2;
var j,@!k,@!l:integer; {all-purpose indices}
@!p,@!q: pointer; {all-purpose pointers}
@!x: integer; {something to dump}
@!w: four_quarters; {four ascii codes}
begin @<If dumping is not allowed, abort@>;
@<Create the |format_ident| and inform the user that dumping has begun@>;
@<Dump constants for consistency check@>;
@<Dump the string pool@>;
@<Dump the dynamic memory@>;
@<Dump the table of equivalents@>;
@<Dump the font information@>;
@<Dump the hyphenation tables@>;
@<Dump a couple more things and the closing check word@>;

@ Corresponding to the procedure that dumps a format file, we have a function
that reads one in. The function returns |false| if the dumped format is
incompatible with the present \TeX\ table sizes, etc.

@d bad_fmt=6666 {go here if the format file is unacceptable}
@d too_small(#)==begin write_ln(term_out,'---! Must increase the ',#);
@.Must increase the x@>
	goto bad_fmt;

@p @t\4@>@<Declare the function called |open_fmt_file|@>@;
function load_fmt_file:boolean;
label bad_fmt,exit;
var j,@!k:integer; {all-purpose indices}
@!p,@!q: pointer; {all-purpose pointers}
@!x,@!y: integer; {something undumped}
@!w: four_quarters; {four ascii codes}
begin @<Undump constants for consistency check@>;
@<Undump the string pool@>;
@<Undump the dynamic memory@>;
@<Undump the table of equivalents@>;
@<Undump the font information@>;
@<Undump the hyphenation tables@>;
@<Undump a couple more things and the closing check word@>;
load_fmt_file←true; return; {it worked!}
bad_fmt: write_ln(term_out,'(Fatal format file error; I''m stymied)');
@.Fatal format file error@>

@ The user is not allowed to dump a format file unless |save_ptr=0|.
This condition implies that |cur_level=level_one|, hence
the |xeq_level| array is constant and it need not be dumped.

@<If dumping is not allowed, abort@>=
if save_ptr≠0 then
	begin print_nl("! You can't \dump inside a group");
@.You can't \\dump...@>
	help1("`{...\dump}' is a no-no."); succumb;

@ Format files consist of |memory_word| items, and we use the following
macros to dump words of different types:

@d dump_wd(#)==begin fmt_file^←#; put(fmt_file);@+end
@d dump_int(#)==begin fmt_file^.int←#; put(fmt_file);@+end
@d dump_hh(#)==begin fmt_file^.hh←#; put(fmt_file);@+end
@d dump_qqqq(#)==begin fmt_file^.qqqq←#; put(fmt_file);@+end

@!fmt_file:word_file; {for input or output of format information}

@ The inverse macros are slightly more complicated, since we need to check
the range of the values we are reading in. We say `|undump(a)(b)(x)|' to
read an integer value |x| that is supposed to be in the range |a≤x≤b|.

@d undump_wd(#)==begin get(fmt_file); #←fmt_file^;@+end
@d undump_int(#)==begin get(fmt_file); #←fmt_file^.int;@+end
@d undump_hh(#)==begin get(fmt_file); #←fmt_file^.hh;@+end
@d undump_qqqq(#)==begin get(fmt_file); #←fmt_file^.qqqq;@+end
@d undump_end_end(#)==#←x;@+end
@d undump_end(#)==(x>#) then goto bad_fmt@+else undump_end_end
@d undump(#)==begin undump_int(x); if (x<#) or undump_end
@d undump_size_end_end(#)==too_small(#)@+else undump_end_end
@d undump_size_end(#)==if x># then undump_size_end_end
@d undump_size(#)==begin undump_int(x);
	if x<# then goto bad_fmt; undump_size_end

@ The next few sections of the program should make it clear how we use the
dump/undump macros.

@<Dump constants for consistency check@>=

@ Sections of a \.{WEB} program that are ``commented out'' still contribute
strings to the string pool; therefore \.{INITEX} and \TeX\ will have
the same strings. (And it is, of course, a good thing that they do.)
@^string pool@>

@<Undump constants for consistency check@>=
if x≠@$ then goto bad_fmt; {check that strings are the same}
if x≠hi_mem_base then goto bad_fmt;
if x≠eqtb_size then goto bad_fmt;
if x≠hash_prime then goto bad_fmt;
if x≠hyph_size then goto bad_fmt

@ @d dump_four_ascii==
	w.b0←str_pool[k]; w.b1←str_pool[k+1];
	w.b2←str_pool[k+2]; w.b3←str_pool[k+3];

@<Dump the string pool@>=
for k←0 to str_ptr do dump_int(str_start[k]);
while k+4<pool_ptr do
	begin dump_four_ascii; k←k+4;
k←pool_ptr-4; dump_four_ascii;
print_ln; print_int(str_ptr); print(" strings of total length ");

@ @d undump_four_ascii==
	str_pool[k]←w.b0; str_pool[k+1]←w.b1;
	str_pool[k+2]←w.b2; str_pool[k+3]←w.b3

@<Undump the string pool@>=
undump_size(0)(pool_size)('string pool size')(pool_ptr);
undump_size(0)(max_strings)('max strings')(str_ptr);
for k←0 to str_ptr do undump(0)(pool_ptr)(str_start[k]);
while k+4<pool_ptr do
	begin undump_four_ascii; k←k+4;
k←pool_ptr-4; undump_four_ascii

@ By sorting the list of available spaces in the variable-size portion of
|mem|, we are usually able to get by without having to dump very much
of the dynamic memory.

We recompute |var_used| and |dyn_used|, so that \.{INITEX} dumps valid
information even when it has not been gathering statistics.

@<Dump the dynamic memory@>=
sort_avail; var_used←0;
p←mem_base; q←rover; x←0;
repeat for k←p to q+1 do dump_wd(mem[k]);
x←x+q+2-p; var_used←var_used+q-p;
p←q+node_size(q); q←rlink(q);
until q=rover;
var_used←var_used+hi_mem_base-p; dyn_used←mem_end+1-hi_mem_base;@/
dump_int(mem_end); dump_int(avail);
for k←p to mem_end do dump_wd(mem[k]);
while p≠null do
	begin decr(dyn_used); p←link(p);
dump_int(var_used); dump_int(dyn_used);
print_ln; print_int(x);
print(" memory locations dumped; current usage is ");
print_int(var_used); print_char(","); print_int(dyn_used)

@ @<Undump the dynamic memory@>=
p←mem_base; q←rover; x←0;
repeat for k←p to q+1 do undump_wd(mem[k]);
if (p>hi_mem_base)∨((q≥rlink(q))∧(rlink(q)≠rover)) then goto bad_fmt;
until q=rover;
undump_size(hi_mem_base)(mem_max)('mem max')(mem_end);
for k←p to mem_end do undump_wd(mem[k]);
undump_int(var_used); undump_int(dyn_used);

@ @<Dump the table of equivalents@>=
@<Dump regions 1 to 4 of |eqtb|@>;
@<Dump regions 5 and 6 of |eqtb|@>;
dump_int(par_loc); dump_int(send_loc);
@<Dump the hash table@>

@ @<Undump the table of equivalents@>=
@<Undump regions 1 to 6 of |eqtb|@>;
@<Undump the hash table@>

@ The table of equivalents usually contains repeated information, so we dump it
in compressed form: The sequence of $n+2$ values $(n,x↓1,\ldotss,x↓n,m)$ in the
format file represents $n+m$ consecutive entries of |eqtb|, with |m| extra
copies of $x↓n$, namely $(x↓1,\ldotss,x↓n,x↓n,\ldotss,x↓n)$.

@<Dump regions 1 to 4 of |eqtb|@>=
repeat j←k;
while j<int_base-1 do
	begin if (equiv(j)=equiv(j+1))∧(eq_type(j)=eq_type(j+1))∧@|
		(eq_level(j)=eq_level(j+1)) then goto found1;
l←int_base; goto done1; {|j=int_base-1|}
found1: incr(j); l←j;
while j<int_base-1 do
	begin if (equiv(j)≠equiv(j+1))∨(eq_type(j)≠eq_type(j+1))∨@|
		(eq_level(j)≠eq_level(j+1)) then goto done1;
while k<l do
	begin dump_wd(eqtb[k]); incr(k);
k←j+1; dump_int(k-l);
until k=int_base

@ @<Dump regions 5 and 6 of |eqtb|@>=
repeat j←k;
while j<eqtb_size do
	begin if eqtb[j].int=eqtb[j+1].int then goto found2;
l←eqtb_size+1; goto done2; {|j=eqtb_size|}
found2: incr(j); l←j;
while j<eqtb_size do
	begin if eqtb[j].int≠eqtb[j+1].int then goto done2;
while k<l do
	begin dump_wd(eqtb[k]); incr(k);
k←j+1; dump_int(k-l);
until k>eqtb_size

@ @<Undump regions 1 to 6 of |eqtb|@>=
repeat undump_int(x);
if (x<1)∨(k+x>eqtb_size) then goto bad_fmt;
for j←k to k+x-1 do undump_wd(eqtb[j]);
if (x<0)∨(k+x>eqtb_size+1) then goto bad_fmt;
for j←k to k+x-1 do eqtb[j]←eqtb[k-1];
until k>eqtb_size

@ A different scheme is used to compress the hash table, since its lower
region is usually sparse. When |text(p)≠0| for |p≤hash_used|, we output
two words, |p| and |hash[p]|. The hash table is, of course, densely packed
for |p≥hash_used|, so the remaining entries are output in a block.

@<Dump the hash table@>=
dump_int(hash_used); cs_count←special_control_sequence-1-hash_used;
for p←hash_base to hash_used do if text(p)≠0 then
	begin dump_int(p); dump_hh(hash[p]); incr(cs_count);
for p←hash_used+1 to undefined_control_sequence-1 do dump_hh(hash[p]);
print_ln; print_int(cs_count); print(" multiletter control sequences")

@ @<Undump the hash table@>=
undump(hash_base)(special_control_sequence)(hash_used); p←hash_base-1;
repeat undump(p+1)(hash_used)(p); undump_hh(hash[p]);
until p=hash_used;
for p←hash_used+1 to undefined_control_sequence-1 do undump_hh(hash[p]);

@ The (slightly) interesting thing about font information, as far as
dumping is concerned, is that several user font codes might be associated
with the same internal font number.

@<Dump the font information@>=
for k←7 to fmem_ptr-1 do dump_wd(font_info[k]);
for k←undefined_font+1 to font_ptr do
	@<Dump the array info for internal font number |k|@>;
for k←0 to bad_font_code-1 do
	if (font_number[k]≠undefined_font)∧(font_code[font_number[k]]≠k) then
	begin dump_int(font_number[k]); dump_int(k);
print_ln; print_int(fmem_ptr-7); print(" words of font info for ");
print_int(font_ptr-font_base); print(" preloaded font");
if font_ptr≠font_base+1 then print_char("s")

@ @<Undump the font information@>=
undump_size(7)(font_mem_size)('font mem size')(fmem_ptr);
for k←7 to fmem_ptr-1 do undump_wd(font_info[k]);
undump_size(font_base)(font_max)('font max')(font_ptr);
for k←undefined_font+1 to font_ptr do
	@<Undump the array info for internal font number |k|@>;
repeat undump_int(y);
if eof(fmt_file) then goto bad_fmt;
if y≠undefined_font then
	begin if (y<undefined_font)∨(y≥font_ptr) then goto bad_fmt;
until y=undefined_font

@ @<Dump the array info for internal font number |k|@>=
begin dump_int(font_code[k]);
if font_scaled[k] then
	begin dump_int(1);
else	begin dump_int(0);
print_nl("\font "); print_int(font_code[k]); print_char("=");
if font_scaled[k] then
	begin print(" at "); print_scaled(font_size[k]); print("pt");

@ @<Undump the array info for internal font number |k|@>=
begin undump(0)(bad_font_code)(font_code[k]); font_number[font_code[k]]←k;
if x=0 then font_scaled[k]←false
else if x=1 then font_scaled[k]←true
else goto bad_fmt;

@ @<Dump the hyphenation tables@>=
for k←0 to hyph_size do if hyph_word[k]≠0 then
	begin dump_int(k); dump_int(hyph_word[k]); dump_int(hyph_list[k]);
for k←0 to trie_max do dump_hh(trie[k]);
for k←min_quarterword+1 to trie_op_ptr do
	begin dump_int(hyf_distance[k]);
print_ln; print_int(hyph_count); print(" hyphenation exception");
if hyph_count≠1 then print_char("s");
print_nl("Hyphenation trie of length "); print_int(trie_max);
print(" has "); print_int(qo(trie_op_ptr)); print(" op");
if trie_op_ptr≠min_quarterword+1 then print_char("s")

@ @<Undump the hyphenation tables@>=
for k←1 to hyph_count do
	begin undump(0)(hyph_size)(j);
undump_size(0)(trie_size)('trie size')(trie_max);
for k←0 to trie_max do undump_hh(trie[k]);
for k←min_quarterword+1 to trie_op_ptr do
	begin undump(0)(63)(hyf_distance[k]); {a |small_number|}

@ We have already printed a lot of statistics, so we set |tracing_stats←0|
to prevent them appearing again.

@<Dump a couple more things and the closing check word@>=
dump_int(interaction); dump_int(format_ident); dump_int(69069);
w_close(fmt_file); tracing_stats←0

@ @<Undump a couple more things and the closing check word@>=
if (x≠69069)∨ eof(fmt_file) then goto bad_fmt

@ @<Create the |format_ident| and inform the user that dumping has begun@>=
print(" (format="); print(job_name); print_char(" ");
print_int(abs(year) mod 100); print_char(".");
print_int(month); print_char("."); print_int(day); print_char(")");
if interaction=batch_mode then selector←err_only
else selector←term_and_err;
while ¬ w_open_out(fmt_file) do prompt_file_name("format file name",".fmt");
print_nl("Beginning to dump on file "); 
print(w_make_name_string(fmt_file)); flush_string;
@* \[51] The main program.
This is it: the part of \TeX\ that executes all those procedures we have

Well---almost. Let's leave space for a few more procedures that we may
have forgotten.

@p @<Last-minute procedures@>

@ We have noted that there are two versions of \TeX82. One, called \.{INITEX},
has to be run first; it initializes everything from scratch, without
reading a format file, and it has the capability of dumping a format file.
The other one is called `\.{VIRTEX}'; it is a ``virgin'' program that needs
to input a format file in order to get started. \.{VIRTEX} typically has
more memory capacity than \.{INITEX}, because it does not need the space
consumed by the auxiliary hyphenation tables and the numerous calls on
|primitive|, etc.

The \.{VIRTEX} program cannot read a format file instantaneously, of course;
the best implementations therefore allow for production versions of \TeX\ that
not only avoid the loading routine for \PASCAL\ object code, they also have
a format file pre-loaded. This is impossible to do if we stick to standard
\PASCAL, but there is a simple way to fool many systems into avoiding the
initialization, as follows.\quad(1)@@We declare a global integer variable
called |ready_already|. The probability is negligible that this
variable holds any particular value like 314159 when \.{VIRTEX} is first
loaded.\quad(2)@@After we have read in a format file and initialized
everything, we set |ready_already←314159|.\quad(3)@@Soon \.{VIRTEX}
will print `\.*', waiting for the first input; and at this point we
interrupt the program and save its core image in some form that the
operating system can reload speedily.\quad(4)@@When that core image is
activated, the program starts again at the beginning; but now
|ready_already=314159| and all the other global variables have
their initial values too. The former chastity has vanished!

In other words, if we allow ourselves to test the condition
|ready_already=314159|, before |ready_already| has been
assigned a value, we can avoid the lengthy initialization. Dirty tricks
rarely pay off so handsomely.
@^dirty \PASCAL@>
@^system dependencies@>

On systems that allow such preloading, the standard program called \.{TeX}
should be the one that has \.{BASIC} format preloaded, since that agrees
with the user manual. Other versions, e.g., \.{AmSTeX}, should also
be provided for commonly used formats.

@!ready_already:integer; {a sacrifice of purity for economy}

@ @<Set init...@>=

@ Now this is really it: \TeX\ starts and ends here.

The initial test involving |ready_already| should be deleted if the
\PASCAL\ runtime system is smart enough to detect such a ``mistake.''
@^system dependencies@>

@p begin {|start_here|}
t_open_out; {open the terminal for output}
if ready_already=314159 then goto start_of_TEX;
initialize; {set global variables to their starting values}
@<Check the ``constant'' values...@>@;
if bad>0 then
	begin write(term_out,'Ouch---my internal constants have been clobbered!');
	write(term_out,'---case ',bad:0);
	goto final_end;
init if not init_strings then goto final_end;
init_prim; {call |primitive| for each primitive}
start_of_TEX: @<Initialize the output routines@>;
@<Get the first line of input and prepare to start@>;
main_control; {come to life}
final_cleanup; {prepare for death}
end_of_TEX: close_files_and_terminate;

@ Here we do whatever is needed to complete \TeX's job gracefully
on the local operating system.
@^system dependencies@>


procedure close_files_and_terminate;
var k:integer; {all-purpose index}
debug debug_help;@+gubed {the last chance to examine memory}
@<Finish the extensions@>;
stat if tracing_stats≠0 then @<Output statistics about this job@>;@;@+tats@/
@<Finish the \.{DVI} file@>;
if job_name>0 then
	begin write_ln(err_file); a_close(err_file);

@ @<Output statistics...@>=
begin print_nl("Statistics for this run:"); print_ln;@/
print_int(str_ptr); print(" strings of total length "); print_int(pool_ptr);@/
print_nl("Maximum memory usage "); print_int(max_var_used);
print_char(","); print_int(mem_end+1-hi_mem_base);@/
print_ln; print_int(cs_count); print(" multiletter control sequences");@/
print_ln; print_int(fmem_ptr); print(" words of font info for ");
print_int(font_ptr-font_base); print(" font");
if font_ptr≠font_base+1 then print_char("s");
print_ln; print_int(hyph_count); print(" hyphenation exception");
if hyph_count≠1 then print_char("s");
print_nl("Maximum stack usage "); print_int(max_in_stack);
print_char(","); print_int(max_nest_stack);
print_char(","); print_int(max_param_stack);
print_char(","); print_int(max_buf_stack+1);
print_char(","); print_int(max_save_stack+6);

@ We get to the |final_cleanup| routine when \.{\\end} or \.{\\dump} has
been scanned.

procedure final_cleanup;
label done,exit;
var c:small_number; {0 for \.{\\end}, 1 for \.{\\dump}}
@!k:small_number; {the number of times we tried to eject a page}
begin c←cur_chr;
if job_name=0 then open_err_file;
@<Eject the remaining page(s)@>;
if cur_level>level_one then
	begin print_nl("(\end occurred inside a group at level ");
@:end_}{\.{\\end occurred inside...}@>
	print_int(cur_level-level_one); print_char(")");
if c=1 then
	begin init store_fmt_file; return;@+tini@/
	print_nl("(\dump is performed only by INITEX)");
@:dump_}{\.{\\dump...only by INITEX}@>

@ The following loop is performed after |main_control| has completed its
activity, but its actions are like those done in |main_control|.

@<Eject the remaining page(s)@>=
loop@+	begin build_page; {empty the contribution list}
	if page_head=page_tail then goto done;
	if k>25 then
		print_nl("(something on current page hasn't been shipped out)");
@.something on current...@>
		goto done;

@ @<Last-minute...@>=
init procedure init_prim; {initialize all the primitives}
begin no_new_control_sequence←false;
@<Put each...@>;

@ When we begin the following code, \TeX's tables may still contain garbage;
the strings might not even be present. Thus we must proceed cautiously to get
bootstrapped in.

But when we finish this part of the program, \TeX\ is ready to call on the
|main_control| routine to do its work.

@<Get the first line...@>=
begin @<Initialize the input routines@>;
if format_ident=0 then
	begin if not open_fmt_file then goto final_end;
	if not load_fmt_file then goto final_end;
	while (loc<limit)∧(buffer[loc]=" ") do incr(loc);
@<Compute the magic offset@>;
@<Initialize the print |selector|...@>;
if (loc<limit)∧(ch_code(buffer[loc])≠escape) then start_input;
	{\.{\\input} assumed}
@* \[52] Debugging.
Once \TeX\ is working, it should be possible to diagnose most errors with
the \.{\\show} commands and other diagnostic features. But for the initial
stages of debugging, and for the revelation of really deep mysteries, you
can compile \TeX\ with a few more aids including the \PASCAL\ runtime
checks and its debugger. An additional routine called |debug_help|
will also come into play when you type `\.D' after an error message.
@^system dependencies@>

The interface to |debug_help| is primitive, but it is good enough when used
with a \PASCAL\ debugger that allows you to set breakpoints and to read
variables and change their values. After getting the prompt `\.\#', you
type either a negative number (this exits |debug_help|), or zero (this
goes to a location where you can set a breakpoint, thereby entering into
dialog with the \PASCAL\ debugger), or a positive number |m| followed by
an argument |n|. The meaning of |m| and |n| will be clear from the
program below. (If |m=13|, there is an additional argument, |l|.)

@d breakpoint=888 {place where a breakpoint is desirable}
@d avoid_breakpoint=889 {place where a breakpoint is unwanted}

debug procedure debug_help; {routine to display various things}
label avoid_breakpoint,breakpoint,exit;
var k,@!l,@!m,@!n:integer;
begin goto avoid_breakpoint;
loop@+	begin print_nl("#"); update_terminal;
	if m<0 then return;
	if m=0 then goto breakpoint;
	case m of
	@t\4@>@<Numbered cases for |debug_help|@>@;
	othercases print("?")

@ @<Numbered cases...@>=
1: print_memory_word(mem[n]); {display |mem[n]| in all forms}
2: print_int(info(n));
3: print_int(link(n));
4: print_memory_word(eqtb[n]);
5: print_memory_word(font_info[n]);
6: print_memory_word(save_stack[n]);
7: show_box(n);
	{|show_box| abbreviated by |show_box_depth| and |show_box_breadth|}
8: begin breadth_max←10000; depth_threshold←pool_size-pool_ptr-10;
	show_node_list(n); {|show_box| in its entirety}
9: show_token_list(n,null,1000);
10: print(n);
11: check_mem(n>0); {check wellformedness; print new busy locations if |n>0|}
12: search_mem(n); {look for pointers to |n|}
13: begin read(term_in,l); print_cmd_chr(n,l);
14: for k←0 to n do print(buffer[k]);
15: begin font_in_short_display←undefined_font; short_display(n);
@* \[53] Extensions.
The program above includes a bunch of ``hooks'' that allow further
capabilities to be added without upsetting \TeX's basic structure.
Most of these hooks are concerned with ``whatsit'' nodes, which are
intended to be used for special purposes; whenever a new extension to
\TeX\ involves a new kind of whatsit node, a corresponding change needs
to be made to the routines below that deal with such nodes,
but it will usually be unnecessary to make many changes to the
other parts of this program.

In order to demonstrate how extensions can be made, we shall treat
`\.{\\send}', `\.{\\open}', `\.{\\close}', and `\.{\\xsend}'
as if they were extensions. These commands are actually primitives
of \TeX82, and they should appear in all implementations of the system;
but let's try to imagine that they aren't. Then the program below
illustrates how a person could add them.

Sometimes, of course, an extension will require changes to \TeX\ itself;
no system of hooks could be complete enough for all conceivable extensions.
The features associated with `\.{\\send}' are almost all confined to the
following paragraphs, but there are small parts of the |print_ln| and
|print_char| procedures that were introduced specifically to \.{\\send}
characters. Furthermore one of the token lists recognized by the scanner
is a |send_text|; and there are a few other miscellaneous places where we
have already provided for some aspect of \.{\\send}.  The goal of a \TeX\
extender should be to minimize alterations to the standard parts of the
program, and to avoid them completely if possible.
@^system dependencies@>
@^extensions to \TeX@>

@ First let's consider the format of whatsit nodes that are used to represent
the data associated with \.{\\send} and its relatives. Recall that a whatsit
has |type=whatsit_node|, and the |subtype| is supposed to distinguish
different kinds of whatsits. Each node occupies two or more words; the
exact number is immaterial, as long as it is readily determined from the
|subtype| or other data.

We shall introduce four |subtype| values here, corresponding to \.{\\open},
\.{\\send}, \.{\\close}, and \.{\\xsend}. The second word of such whatsits
has a |send_stream| field that identifies the send-stream number (0 to 15).
In the case of \.{\\send} and \.{\\xsend}, there is also
a field that points to the reference count of a token list that should
be sent. In the case of \.{\\open}, we need three words and three auxiliary
subfields to hold the string numbers for name, area, and extension.

@d send_node_size=2 {number of words in a send/whatsit node}
@d open_node_size=3 {number of words in an open/whatsit node}
@d open_node=0 {|subtype| in whatsits that represent files to \.{\\open}}
@d send_node=1 {|subtype| in whatsits that represent things to \.{\\send}}
@d close_node=2 {|subtype| in whatsits that represent streams to \.{\\close}}
@d xsend_node=3 {|subtype| in whatsits that represent things to \.{\\xsend}}
@d send_tokens(#) == link(#+1) {reference count of token list to send}
@d send_stream(#) == info(#+1) {stream number (0 to 15)}
@d open_name(#) == link(#+1) {string number of file name to open}
@d open_area(#) == info(#+2) {string number of file area for |open_name|}
@d open_ext(#) == link(#+2) {string number of file extension for |open_name|}

@ The sixteen possible \.{\\send} streams are represented by the |send_file|
array. The |j|th file is open if and only if |send_open[j]=true|.

@!send_file:array[0..15] of alpha_file;
@!send_open:array[0..15] of boolean;

@ @<Set init...@>=
for k←0 to 15 do send_open[k]←false;

@ Extensions might introduce new command codes; but it's best to use
|extension| with a modifier, whenever possible, so that |main_control|
stays the same.

@<Put each...@>=
@!@:open_}{\.{\\open} primitive@>
primitive("send",extension,send_node); send_loc←cur_val;@/
@!@:send_}{\.{\\send} primitive@>
@!@:close_}{\.{\\close} primitive@>
@!@:xsend_}{\.{\\xsend} primitive@>

@ The variable |send_loc| just introduced is used to provide an
appropriate error message in case of ``runaway'' send texts.

@!send_loc:pointer; {|eqtb| address of \.{\\send}}

@ @<Cases of |print_cmd_chr|...@>=
extension: case chr_code of
	othercases print("[unknown extension!]")

@ When an |extension| command occurs in |main_control|, in any mode,
the |do_extension| routine is called.

@<Cases of |main_control| that are for extensions...@>=

@ @<Declare act...@>=
@t\4@>@<Declare procedures needed in |do_extension|@>@;
procedure do_extension;
var i,@!j,@!k:integer; {all-purpose integers}
@!p,@!q,@!r:pointer; {all-purpose pointers}
begin case cur_chr of
open_node:@<Implement \.{\\open}@>;
send_node:@<Implement \.{\\send}@>;
close_node:@<Implement \.{\\close}@>;
xsend_node:@<Implement \.{\\xsend}@>;
othercases confusion("ext1")
@:confusion ext1}{\quad ext1@>

@ Here is a subroutine that creates a whatsit node having a given |subtype|
and a given number of words. It initializes only the first word of the whatsit,
and appends it to the current list.

@<Declare procedures needed in |do_extension|@>=
procedure new_whatsit(@!s:small_number;@!w:small_number);
var p:pointer; {the new node}
begin p←get_node(w); type(p)←whatsit_node; subtype(p)←s;
link(tail)←p; tail←p;

@ The next subroutine uses |cur_chr| to decide what sort of whatsit is
involved, and also inserts a |send_stream| number.

@<Declare procedures needed in |do_ext...@>=
procedure new_send_whatsit(@!w:small_number);
var p:pointer; {the new node}
begin new_whatsit(cur_chr,w); scan_four_bit_int; send_stream(tail)←cur_val;

@ @<Implement \.{\\open}@>=
begin new_send_whatsit(open_node_size);
scan_optional_equals; scan_file_name;@/
open_name(tail)←cur_name; open_area(tail)←cur_area; open_ext(tail)←cur_ext;

@ When `\.{\\send 12\{...\}}' appears, we scan the token list `\.{\{...\}}'
without expanding its macros; the macros will be expanded later when this
token list is rescanned.

@<Implement \.{\\send}@>=
begin k←cs_ptr; new_send_whatsit(send_node_size);@/
cs_ptr←k; p←scan_toks(false,false); send_tokens(tail)←cur_val;

@ @<Implement \.{\\close}@>=
begin new_send_whatsit(send_node_size); send_tokens(tail)←0;

@ When `\.{\\xsend\{...\}}' appears, we expand the macros in the token
list as in \.{\\xdef} and \.{\\mark}.

@ @<Implement \.{\\xsend}@>=
begin new_whatsit(xsend_node,send_node_size); send_stream(tail)←0;
p←scan_toks(false,true); send_tokens(tail)←cur_val;

@ Each new type of node that appears in our data structure must be capable
of being displayed, copied, destroyed, and so on. The routines that we
need for send-oriented whatsits are somewhat like those for mark nodes;
other extensions might, of course, involve more subtlety here.

@<Display the whatsit...@>=
case subtype(p) of
open_node:begin print_esc("open"); print_int(send_stream(p));
	print_char("="); print_file_name(open_name(p),open_area(p),open_ext(p));
send_node:begin print_esc("send"); print_int(send_stream(p));
close_node:begin print_esc("close"); print_int(send_stream(p));
xsend_node:begin print_esc("xsend");
othercases print("whatsit?")

@ @<Make a partial copy of the whatsit...@>=
case subtype(p) of
open_node: begin r←get_node(open_node_size); words←open_node_size;
send_node,xsend_node: begin r←get_node(send_node_size);
	add_token_ref(send_tokens(p)); words←send_node_size;
close_node: begin r←get_node(small_node_size); words←small_node_size;
othercases confusion("ext2")
@:confusion ext2}{\quad ext2@>

@ @<Wipe out the whatsit...@>=
begin case subtype(p) of
open_node: free_node(p,open_node_size);
send_node,xsend_node: begin delete_token_ref(send_tokens(p));
	free_node(p,send_node_size); goto done;
close_node: free_node(p,small_node_size);
othercases confusion("ext3")
@:confusion ext3}{\quad ext3@>
goto done;

@ @<Incorporate a whatsit node into a vbox@>=do_nothing

@ @<Incorporate a whatsit node into an hbox@>=do_nothing

@ @<Let |d| be the width of the whatsit |p|@>=d←0

@ @<Advance \(p)past a whatsit node in the |line_break| loop@>=do_nothing

@ @<Prepare to move whatsit |p| to the current page, then |goto contribute|@>=
goto contribute

@ @<Process whatsit |p| in |vert_break| loop, |goto not_found|@>=
goto not_found

@ @<Output the whatsit node |p| in a vlist@>=

@ @<Output the whatsit node |p| in an hlist@>=

@ @<Finish the extensions@>=
for k←0 to 15 do if send_open[k] then a_close(send_file[k])

@ After all this preliminary shuffling, we come finally to the routines
that actually \.{\\send} or \.{\\xsend} the requested data. Let's do
\.{\\xsend} first (it's easier).

@<Declare procedures needed in |hlist_out|, |vlist_out|@>=
procedure xsend_out(@!p:pointer);
var old_setting:0..max_selector; {holds print |selector|}
@!k:pool_pointer; {index into |str_pool|}
begin synch_h; synch_v;@/
old_setting←selector; selector←new_string;
show_token_list(link(send_tokens(p)),null,300); selector←old_setting;
if cur_length>255 then
	begin print_nl("\xsend has been truncated to 255 characters!");
@:xsend_}{\.{\\xsend has been truncated...}@>
	begin_diagnostic; print_current_string; end_diagnostic;
dvi_out(xxx); dvi_out(cur_length);
for k←str_start[str_ptr] to pool_ptr-1 do dvi_out(str_pool[k]);
pool_ptr←str_start[str_ptr]; {erase the string}

@ To send a token list, we must run it through \TeX's scanner, expanding
macros and \.{\\the} and \.{\\number}, etc. This might cause runaways,
if a delimited macro parameter isn't matched, and runaways would be
extremely confusing since we are calling on \TeX's scanner in the middle
of a \.{\\shipout} command. Therefore we will put a dummy control sequence as
a ``stopper,'' right after the token list. This control sequence is
artificially defined to be \.{\\outer}.

@d special_send==special_control_sequence+2

@<Initialize table...@>=
text(special_send)←"endsend"; eq_level(special_send)←level_one;
eq_type(special_send)←outer_call; equiv(special_send)←null;

@ @<Declare procedures needed in |hlist_out|, |vlist_out|@>=
procedure send_out(@!p:pointer);
var old_setting:0..max_selector; {holds print |selector|}
@!old_mode,@!old_offset:integer; {saved values}
@!j:small_number; {send stream number}
@!q,@!r:pointer; {temporary variables for list manipulation}
begin @<Expand macros in the token list
	and make |link(cur_val)| point to the result@>;
old_setting←selector; old_offset←offset; j←send_stream(p);
if send_open[j] then selector←j
else print_nl(""); {send to the terminal if file isn't open}
show_token_list(link(cur_val),null,buf_size-10); print_ln;
flush_list(cur_val); selector←old_setting;
if send_open[j] then offset←old_offset;

@ The final line of this routine is slightly subtle; at least, the author
didn't think about it until getting burnt! There is a used-up token list
@^Knuth, Donald Ervin@>
on the stack, namely the one that contained |special_send_token|. (We
insert this artificial `\.{\\endsend}' to prevent runaways, as explained
above.) If it were not removed, and if there were numerous sends on a
single page, the stack would overflow.

@d special_send_token==cs_token_flag+special_send

@<Expand macros in the token list and...@>=
q←get_avail; info(q)←right_brace_token+"}";@/
r←get_avail; info(r)←space_token; link(q)←r;@/
link(r)←get_avail; r←link(r); info(r)←special_send_token; ins_list(q);@/
q←get_avail; info(q)←left_brace_token+"{"; ins_list(q);
{now we're ready to scan `\.\{$\langle\,$token list$\,\rangle$\.{\} \\endsend}'}
old_mode←mode; mode←0;
	{disable \.{\\prevdepth}, \.{\\spacefactor}, \.{\\lastskip}}
cs_ptr←send_loc; q←scan_toks(false,true); {expand macros, etc.}
get_token;@+if cur_tok≠special_send_token then confusion("send");
@:confusion send}{\quad send@>
end_token_list; {conserve stack space}

@ The |out_what| procedure takes care of outputting whatsit nodes for
|vlist_out| and |hlist_out|.

@<Declare procedures needed in |hlist_out|, |vlist_out|@>=
procedure out_what(@!p:pointer);
var j:small_number; {send stream number}
begin case subtype(p) of
open_node,send_node,close_node:@<Do some work that has been queued up
	for \.{\\send}@>;
othercases confusion("ext4")
@:confusion ext4}{\quad ext4@>

@ We don't implement \.{\\send} inside of leaders. (The reason is that
the number of times a leader box appears might be different in different
implementations, due to machine-dependent rounding in the glue calculations.)

@<Do some work that has been queued up...@>=
if not doing_leaders then
	begin j←send_stream(p);
	if subtype(p)=send_node then send_out(p)
	else	begin if send_open[j] then a_close(send_file[j]);
		if subtype(p)=close_node then send_open[j]←false
		else	begin cur_name←open_name(p); cur_area←open_area(p);
			if cur_ext="" then cur_ext←".tex";
			while not a_open_out(send_file[j]) do
				prompt_file_name("file name to open",".tex");
@* \[54] System-dependent changes.
This module should be replaced, if necessary, by changes to the program
that are necessary to make \TeX\ work at a particular installation.
It is usually best to design your change file so that all changes to
previous modules preserve the module numbering; then everybody's version
will be consistent with the published program. More extensive changes,
which introduce new modules, can be inserted here; then only the index
itself will get a new module number.
@^system dependencies@>
@* \[55] Index.
Here is where you can find all uses of each identifier in the program,
with underlined entries pointing to where the identifier was defined.
If the identifier is only one letter long, however, you get to see only
the underlined entries. All references are to section numbers instead of
page numbers.

This index also lists error messages and other aspects of the program
that you might want to look up some day. For example, the entry
for ``system dependencies'' lists all sections that should receive
special attention from people who are installing \TeX\ in a new
operating environment.