\datethis \input epsf \let\from=\gets \def\bit#1{\\{bit}[#1]} \def\losub#1{^{\vphantom\prime}_{#1}} % for contexts like $x'_n+x\losub n$ @*Introduction. The purpose of this program is to implement a pretty algorithm that has a very pleasant theory. But I apologize at the outset that the algorithm seems to be rather subtle, and I have not been able to think of any way to explain it to dummies. Readers who like discrete mathematics and computer science are encouraged to persevere nonetheless. The companion program called {\mc KODA-RUSKEY} should probably be read first, because it solves a significantly simpler problem. After I wrote that program, Frank Ruskey told me that he had developed a much more general procedure, in yet-unpublished work with his student Gang (Kenny) Li. I~wasn't able to refrain from pondering what their method might be, so I came up with the algorithm below, which probably produces the same sequence of outputs as theirs. @c #include @@; @@; @@; int main (int argc,char*argv[]) { @; @; @; @; return 0; } @ Given a digraph that is {\it totally acyclic}, in the sense that it has no cycles when we ignore the arc directions, we want to find all ways to label its vertices with 0s and 1s in such a way that $x\to y$ implies $\bit x\le\bit y$. Moreover, we want to list all such labelings as a Gray path, changing only one bit at a time. The algorithm below does this, with an extra proviso: Given a designated ``root'' vertex~$v$, $\bit v$ begins at 0 and changes exactly once. The simple three-vertex digraph with $x\to y\from z$ has only five such labelings, and they form a Gray path in essentially only one way, namely $(000,010,011,111,110)$. This example shows that we cannot require the Gray path to end at a convenient prespecified labeling like $11\ldots1$; and the dual graph, obtained by reversing all the arrows and complementing all the bits, shows that we can't require the path to start at $00\ldots0$. [Generalizations of this example, in which the vertices are $\{x_0,x_1,\ldots,x_n\}$ and each arc is either $x_{k-1}\to x_k$ or $x_{k-1}\from x_k$, have solutions related to continued fractions. Interested readers will enjoy working out the details.] We may assume that the given graph is connected. It is convenient to describe a connected, rooted totally acyclic digraph by calling the root~0, and by assigning the integer names $\{1,\ldots,n\}$ to the other vertices in such a way that the undirected path connecting each vertex to~0 runs through vertex names monotonically. In other words, we specify for each vertex $k>0$ a vertex $j_k= { register char *c=argv[1]; if (argc!=2) abort("Usage: %s graphspec\n",argv[0],1); for (k=1; *c; k++) { if (k>=maxn) abort("Sorry, I can only handle %d vertices!\n",maxn,2); if (*c=='+') js[k]=0; else if (*c=='-') js[k]=1; else abort("Parsing error: `%s' should start with + or -!\n",c,3); for (j=0,c++; *c>='0' && *c<='9'; c++) j=10*j+*c-'0'; @; jj[k]=j; } n=k-1; } @ Here we test the preorder condition, using the fact that |stack[0]=0| and that |l| is initially 0. @= if (j>=k) { fprintf(stderr,"Parsing error: `%d%s' should start ",j,c); abort("with a number less than %d!\n",k,4); } while (j= char jj[maxn]; /* the vertex $j_k$ */ char js[maxn]; /* 0 if $j_k\to k$,\quad 1 if $j_k\from k$ */ char stack[maxn]; /* values that are legitimate for the next $j_k$ */ @ @= register int j,k,l=0; /* heavily-used miscellaneous indices */ int n; /* size of the input graph */ @ Consider the example $$\epsfbox{li-ruskey.1}$$ in which all arcs are directed upward; it would be written \.{+0+1-2+1+0-5-0+7} in the notation above. A nonroot vertex~$k$ is called {\it positive\/} if $j_k\to k$ and {\it negative\/} if $j_k\from k$; thus $\{1,2,4,5,8\}$ are positive in this example, and $\{3,6,7\}$ are negative. We write $x\preceq y$ if there is a directed path from $x$ to $y$. Removing all vertices $x$ such that $x\preceq 0$ disconnects the graph into a number of pieces having positive roots; in our example, removing $\{0,7\}$ leaves three components rooted at $\{1,5,8\}$. We call these roots the set of {\it positive vertices near\/}~0, and denote that set by~$A$. Similarly, the {\it negative vertices near\/}~0 are obtained when we remove all $y$ such that $0\preceq y$; the set of resulting roots, denoted by~$B$, is $\{3,6,7\}$ in our example. Why are the sets $A$ and $B$ so important? Because the labelings for which $\bit0=0$ are precisely those that we obtain by setting $\bit x=0$ for all $x\preceq0$ and then labeling the subgraphs rooted at $a$ for $a\in A$. Similarly, all labelings for which $\bit0=1$ are obtained by setting $\bit y=1$ for all $0\preceq y$ and labeling the subgraphs rooted at $b$ for $b\in B$. Thus if $n_k$ is the number of labelings of the subgraph rooted at~$k$, the total number of labelings of the whole graph is $\prod_{a\in A}n_a\,+\,\prod_{b\in B}n_b$. \smallskip For each subgraph rooted at $k$, we define the positive and negative vertices near~$k$ in the same way as we did for the case $k=0$, calling those sets $A_k$ and $B_k$. Every positive vertex adjacent to 0 appears in $A$, and every negative vertex adjacent to~0 appears in $B$. These are called the {\it principal\/} elements of~$A$ and~$B$. Every nonprincipal member of~$A$ is a member of $A_b$ for some unique principal vertex~$b\in B$. Similarly, every nonprincipal member of~$B$ is a member of $B_a$ for some unique principal vertex $a\in A$. For example, 8~belongs to $A_7$, and $3\in B_1$. @= typedef struct info_struct { char id; /* name of vertex in an $A$ or $B$ set */ @; struct info_struct *sib; /* previous element in the set */ struct info_struct *ref; /* further info about nonprincipal element */ } info; @ @= list[0][0]=list[0][1]=NULL; for (k=1,i=&infonode[0]; k<=n; k++,i++) for (j=jj[k],ii=NULL; ; ii=i++,j=jj[j]) { i->id=k, i->ref=ii; i->sib=list[j][js[k]], list[j][js[k]]=i; if (j==0 || js[j]==js[k]) break; } @ @= info infonode[((maxn*maxn)>>2)+(maxn>>1)]; /* elements of $A_k$ and $B_k$ */ info *list[maxn][2]; /* heads of those sets */ @ @= info *i, *ii; /* pointers to |info| nodes */ @ Aha! We can begin to see how to get the desired Gray path. The well-known reflected Gray code for mixed-radix number systems tells us how to obtain a path $P_0$ of length $\prod_{a\in A}n_a$ for the labelings with $\bit0=0$ as well as a path $P_1$ of length $\prod_{b\in B}n_b$ for the labelings with $\bit0=1$. All we have to do is figure out a way to end $P_0$ with a labeling that differs only in $\bit0$ from the starting point of $P_1$. Suppose $P_0$ begins with $\bit a=\alpha_a$ for each $a\in A$, and $P_1$ ends with $\bit b=\beta_b$ for each $b\in B$. Then $P_0$ ends with $\bit a=\alpha_a'=(\alpha_a+\delta_a)\bmod 2$ and $P_1$ begins with $\bit b=\beta_b'=(\beta_b+\epsilon_b)\bmod 2$, where $$\delta_a=\prod_{\scriptstyle a'= char alfprime; /* $\alpha'_{ka}$ or $\beta'_{kb}$ */ char del; /* $\delta_{ka}\bmod2$ or $\epsilon_{kb}\bmod2$ */ @ We need not actually calculate the values $n_k$ exactly; we only need to know if $n_k$ is even or odd. However, this program does compute the exact values (unless they exceed our computer's word size). @= for (k=n; k>=0; k--) { register int s,t; for (s=1,i=list[k][0],ii=NULL; i; s*=nn[i->id], i=i->sib) { i->del=1; if ((nn[i->id]&1)==0) ii=i; if (i->ref) i->alfprime=i->ref->alfprime^i->ref->del; else i->alfprime=1; } for (i=list[k][0]; ii; i=i->sib) if (i==ii) ii=NULL;@+ else i->del=0; for (t=1,i=list[k][1],ii=NULL; i; t*=nn[i->id], i=i->sib) { i->del=1; if ((nn[i->id]&1)==0) ii=i; if (i->ref) i->alfprime=i->ref->alfprime^i->ref->del; else i->alfprime=0; } for (i=list[k][1]; ii; i=i->sib) if (i==ii) ii=NULL;@+ else i->del=0; nn[k]=s+t; } @ @= int nn[maxn]; /* $n_k$, the number of labelings of $G_k$ */ @ Here are two subroutines that I used when debugging. @= void print_info(int k) { register info *i; printf("Info for vertex %d: A =",k); for (i=list[k][0]; i; i=i->sib) printf(" %d",i->id); if (list[k][0]) printf(", B =");@+else printf(" (), B ="); for (i=list[k][1]; i; i=i->sib) printf(" %d",i->id); if (list[k][1]) printf("\n");@+else printf(" ()\n"); for (i=list[k][0]; i; i=i->sib) printf(" alf%d=%d, alf%d'=%d\n", i->id,i->alfprime^i->del,i->id,i->alfprime); for (i=list[k][1]; i; i=i->sib) printf(" bet%d=%d, bet%d'=%d\n", i->id,i->alfprime^i->del,i->id,i->alfprime); } @# void print_all_info(int n) { register int k; for (k=0;k<=n;k++) print_info(k); } @ @= if (verbose) { print_all_info(n); printf("(Altogether %d solutions.)\n",nn[0]); } @* Coroutines. As in the program {\mc KODA-RUSKEY}, we can represent the path-generation process by considering a system of cooperating programs, each of which has the same basic structure. There is one coroutine for each pair of nodes $(k,l)$ such that $k\in A_l$ or $k\in B_l$, and it looks essentially like this: $$\vbox{\halign{#\hfil\cr \&{coroutine} |p()|\cr \quad|{|\cr \quad\quad|while(1) {|\cr \quad\quad\quad|while (p->child_A()) return true;|\cr \quad\quad\quad|p->bit=1;@+return true;|\cr \quad\quad\quad|while (p->child_B()) return true;|\cr \quad\quad\quad|return p->sib();|\cr \quad\quad\quad|while (p->child_B()) return true;|\cr \quad\quad\quad|p->bit=0;@+return true;|\cr \quad\quad\quad|while (p->child_A()) return true;|\cr \quad\quad\quad|return p->sib();|\cr \quad\quad|}|\cr \quad|}|\cr }}$$ Here |p->child_A| is a pointer to the rightmost element of $A_k$, and |p->child_B| points to the rightmost element of $B_k$; |p->sib| points to the nearest sibling of |p| to the left, in $A_l$ or $B_l$. If any of these pointers is |NULL|, the corresponding coroutine |NULL()| is assumed to simply return |false|. The reader is urged to compare this coroutine with the analogous one in {\mc KODA-RUSKEY}, which is the special case where |p->child_A=NULL| for all~|p|. Suppose |p->child_A()| first returns |false| after it has been called $\alpha$ times; thus |p->child_A()| generates $\alpha$ different labelings, including the initial one. Similarly, suppose that |p->child_B()| and |p->sib()| generate $\beta$ and $\sigma$ different labelings, respectively, before first returning |false|. Then the coroutine |p()| itself will generate $\sigma(\alpha+\beta)$ labelings between the times when it returns |false|. The final labeling for |p| will be the final labeling for |p->sib|, together with either the initial or final labeling of the subtree rooted at~|k|, depending on whether $\sigma$ is even or odd, respectively. After the coroutine has first returned |false|, invoking it again will cause it to generate the labelings in reverse order before returning |false| a second time. Then the process repeats. @ How many coroutines can there be? Our example graph defines a total of 13 coroutines (including a coroutine~0, which corresponds to the special case where $k=0$ and there are no siblings). It isn't difficult to prove that the worst case, about $n^2\!/4$, occurs in graphs that have a specification like \.{+0+1+2+3+4-5-5-5-5-5}. Such graphs have roughly $n$ times $2^{n/2}$ labelings, so we do not have to be embarrassed about the nonlinear initialization time needed to set up data structures and to compute the values $\alpha\losub{ka}$, $\alpha'_{ka}$, $\beta\losub{kb}$, and $\beta'_{kb}$. Indeed, if $k$ is a positive vertex, it appears in $r$ coroutines $(k,l_1)$, \dots, $(k,l_r)$ if and only if the path from $k$ to~0 has the form $k\from l_1\to\cdots\to l_r=0$ or begins with $k\from l_1\to\cdots\to l_r\from l_{r+1}$. If $k$ is a negative vertex, it appears in $r$ coroutines $(k,l_1)$, \dots, $(k,l_r)$ if and only if the path from $k$ to~0 has the form $k\to l_1\from\cdots\from l_r=0$ or begins with $k\to l_1\from\cdots\from l_r\to l_{r+1}$. Only the coroutine $(k,l_1)$ is principal. Notice that if $r>1$, nodes $(l_1,\ldots,l_{r-1})$ appear only in their principal coroutine, because a nonprincipal coroutine arises only at points where the path to~0 switches directions. If there are $s$ direction-switching vertices $k$ such that the path to 0 begins $k\to l_1\from l_2$, the $2^s$ independent settings of their bits all appear in at least one labeling. And if there are $t$ direction-switching vertices $k$ such that the path to 0 begins $k\from l_1\to l_2$, the same remark applies to the $2^t$ independent settings of {\it their\/} individual bits. Therefore if the number of coroutines $(k,l)$ exceeds $(c+1)n$, the number of labelings must exceed $2^{\,\max(s,t)}\ge2^{\,c/2}$. @ In Section 1.4.2 of {\sl Fundamental Algorithms}, I wrote, ``Initialization of coroutines tends to be a little tricky, although not really difficult.'' Perhaps I should reconsider that statement in the light of the present program, because the initialization of all the coroutines considered here is more than a little tricky. In fact, I've decided not to implement the coroutines directly, although a simulation along the lines of {\mc KODA-RUSKEY} would make a nice exercise. My real goal is to come up with a loopless implementation, because I was told that no such implementation (nor even an implementation with constant {\it amortized\/} time) was presently known. [Readers who do take the time to work out the exercise suggested in the previous paragraph will notice a interesting difference with respect to the previous case: The |parent| pointers in the coroutine implementation of {\mc KODA-RUSKEY} are essentially static, but now they must in general be dynamic. For example, in a graph like $0\from1\to2\from3$, coroutine $(3,2)$ is called by both $(2,0)$ and $(2,1)$. This is related to the key difficulty we will face in coming up with a loopless way to solve our more general problem.] @* A generalized fringe. The loopless implementation in {\mc KODA-RUSKEY} is based on a notion called the {\it fringe}, which is a linear list containing nodes that are either {\it active\/} or {\it passive}. We are dealing here with a generalization of the problem solved there, so we naturally seek an extension of the fringe concept in hopes that fringes will help us again. Each loopless {\mc KODA-RUSKEY} step consists of four basic operations: \smallskip \indent\indent 1) Find the rightmost active node, |p|;\par \indent\indent 2) Complement $\bit p$;\par \indent\indent 3) Insert or delete appropriate fringe nodes at the right of |p|;\par \indent\indent 4) Make |p| passive and activate all nodes to its right. \smallskip\noindent Suitable data structures make all these operations efficient. Fortunately, the same scheme does in fact handle our more general problem, except that operation (3) must now involve both insertion and deletion. The root node 0 is always present in the fringe. And if $k$ is any node in the fringe, it is immediately followed by its current {\it descendants}, which are defined as follows: If $\bit k=0$, the descendants of $k$ are the nodes $a$ for $a\in A_k$ and their descendants. If $\bit k=1$, the descendants of $k$ are the nodes $b$ for $b\in B_k$ and their descendants. @ An example will make this clear; for simplicity, we will consider only the subgraph $G_1$ on the vertices $\{1,2,3,4\}$ of our larger example. The initial labeling 0000 means that the fringe begins with three nodes $$1_0\quad 2_0\quad 4_0\,,$$ where the subscript on $k$ indicates the value of $\bit k$. Nodes $2_0$ and $4_0$ have no descendants, because $A_2=A_4=\emptyset$. All nodes are initially active. \def\p#1{\overline#1} Since $4_0$ is the rightmost active node, we complement $\bit4$, and the fringe becomes $$1_0\quad 2_0\quad \p4_1\,.$$ The bar over 4 indicates that this node is now passive. Again, $\p4_1$ has no descendants, this time because $B_4=\emptyset$. The next step complements $\bit2$, and $B_2=\{3\}$ now enters the fringe: $$1_0\quad \p2_1\quad 3_0\quad 4_1\,.$$ The initial value of $\bit3$ is $\beta'_{23}=0$. @ Proceeding in this way, the first eight steps can be summarized as follows: $$\vbox{\halign{$#\hfil$& \quad\smash{\lower.5\baselineskip\hbox{$\cdots$ complement $\bit#$\hfil}}\cr 1_0\quad 2_0\quad 4_0&4\cr 1_0\quad 2_0\quad \p4_1&2\cr 1_0\quad \p2_1\quad 3_0\quad 4_1&4\cr 1_0\quad \p2_1\quad 3_0\quad \p4_0&3\cr 1_0\quad \p2_1\quad \p3_1\quad 4_0&4\cr 1_0\quad \p2_1\quad \p3_1\quad \p4_1&1\cr \p1_1\quad 3_1&3\cr \p1_1\quad \p3_0\cr}}$$ and at this point all of the labelings have been generated. Restarting the process causes them to be regenerated in reverse order, stopping again when the original starting point is reached: $$\vbox{\halign{$#\hfil$& \quad\smash{\lower.5\baselineskip\hbox{$\cdots$ complement $\bit#$\hfil}}\cr 1_1\quad 3_0&3\cr 1_1\quad \p3_1&1\cr \p1_0\quad 2_1\quad 3_1\quad 4_1&4\cr \p1_0\quad 2_1\quad 3_1\quad \p4_0&3\cr \p1_0\quad 2_1\quad \p3_0\quad 4_0&4\cr \p1_0\quad 2_1\quad \p3_0\quad \p4_1&2\cr \p1_0\quad \p2_0\quad 4_1&4\cr \p1_0\quad \p2_0\quad \p4_0\cr}}$$ @ Let's look more closely at what happens when $\bit k$ changes its state. Suppose the rightmost active node is~$k_0$. Then $k_0$ is immediately followed in the fringe by its descendants; those descendants were all passive, but we can reactivate them in anticipation of step~(4). The current descendants of $k_0$ are the elements of $A_k$ and {\it their\/} descendants, and each element $a\in A_k$ has $\bit a=\alpha'_{ka}$. We will say that the {\it entourage\/} of $k_0$ is the contents of the fringe at the beginning of the process that would generate the Gray path for the subgraph rooted at~$k$; and the entourage of $k_1$ is the contents of the fringe at the end of that process. Thus, the small example we've just seen shows us that the entourage of $1_0$ is $1_0\,2_0\,4_0$, and the entourage of $1_1$ is $1_1\,3_0$. In general, the entourage of $k_0$ consists of $k_0$ itself followed by the consecutive entourages of $a_t$ for each $a\in A_k$, with $t=\alpha_{ka}$, in order of increasing~$a$. We find therefore that the entourage of $0_0$ in our main example is $$0_0\quad 1_0\quad 2_0\quad 4_0\quad 5_1\quad 6_1\quad 8_0\,;$$ this is the initial contents of the fringe. The corresponding entourage of $0_1$---the final fringe---is $$0_1\quad 3_1\quad 6_1\quad 7_0\quad 8_0\,.$$ The final element of an entourage always has the form $j_s$ where either $s=0$ and $A_j=\emptyset$ or $s=1$ and $B_j=\emptyset$. Here is a short program that locates the final~$j$ value at the end of the entourage for $k_t$: @= for (k=n;k>=0;k--) for (j=0;j<=1;j++) { i=list[k][j]; if (i) fj[k][j]=fj[i->id][i->alfprime^i->del]; else fj[k][j]=k; } @ @= char fj[maxn][2]; /* final vertices in the entourages of $k_0$, $k_1$ */ @ At the point where $\bit k$ is about to change from 0 to 1, the fringe following $k_0$ contains $k$'s {\it transition string\/} $\tau_{k0}$, which consists again of the consecutive entourages $a_t$ of all $a\in A_k$, but this time with $t=\alpha'_{ka}$ instead of $t=\alpha\losub{ka}$. The fringe is then modified from `$\ldots\,k_0\,\tau_{k0}\,\ldots$' to `$\ldots\,k_1\,\tau_{k1}\,\ldots$', where the transition string $\tau_{k1}$ consists of the consecutive entourages $b_t$ of all $b\in B_k$, with $t=\beta'_{kb}$. Thus the two transition strings for vertex 0 in our main example are $\tau_{00}=1_1\,3_0\,5_1\,6_1\,8_0$ and $\tau_{01}=3_0\,6_1\,7_0\,8_0$. At the moment $\bit0$ changes from 0 to~1, the fringe changes from $0_0\,\p1_1\,\p3_0\,\p5_1\,\p6_1\,\p8_0$ to $\p0_1\,3_0\,6_1\,7_0\,8_0$; and if we run the sequence backwards, it will go from $0_1\,\p3_0\,\p6_1\,\p7_0\,\p8_0$ to $\p0_0\,1_1\,3_0\,5_1\,6_1\,8_0$ when $\bit0$ becomes~0. @ Now comes a key observation: Recall that some elements of $A_k$ and $B_k$ are called ``principal,'' namely the vertices adjacent to~$k$ (the ``children'' of $k$ in tree terminology). {\sl The nonprincipal vertices of $A_k$ and $B_k$ appear in both transition strings $\tau_{k0}$ and $\tau_{k1}$, with their entourages in the same states.} Indeed, the entourage of each principal vertex $a$ of $A_k$ is $a_1$ followed by the entourages of all $b\in B_a$; and all such $b$ are nonprincipal, because $\alpha'_{ka}=1$ and $B_a\subseteq B_k$ when $a$ is principal. Similarly, the entourage of each principal vertex $b$ of $B_k$ is $b_0$ followed by the entourages of all $a\in A_b$; and all such $a$ are nonprincipal, because $\beta'_{kb}=0$ and $A_b\subseteq A_k$ when $b$ is principal. Therefore, if we maintain the fringe as a doubly linked list---which we definitely want to do---the task of replacing one of $k$'s transition strings by the other is fairly easy to describe at link level: When $\bit k$ changes from 0 to~1, we remove the existing substring $\tau_{k0}$ from the fringe and replace it by the entourages of each $b\in B$, in increasing order of the $b$'s. If $b$ is principal, the entourage of $b$ consists of $b_0$ followed by the entourages of all $a\in A_b$, and the latter vertices are nonprincipal; therefore the entourages of every such $a$ are already properly linked within themselves, because they appear as substrings of $\tau_{k0}$. Moreover, our assumption that the tree vertices were labeled in preorder guarantees that the entourages of all $a\in A_b$ appear consecutively in\/ $\tau_{k0}$; thus they are properly linked to each other, and all we need to do when forming the entourage of a principal vertex $b\in B$ is link it to the leftmost element of $A_b$. If on the other hand $b$ is nonprincipal, none of its internal links need to be changed at all, because its entourage was already present as a substring of~$\tau_{k0}$. All these entourages appear in preorder, in both $\tau_{k0}$ and $\tau_{k1}$. Therefore, to change $\tau_{k0}$ to $\tau_{k1}$ we simply need to remove all of the principal members of $A_k$ and insert all of the principal members of $B_k$, in their appropriate preorder position. For example, we saw a moment ago that the task of going from $\tau_{00}$ to $\tau_{01}$ consists of removing $1_1$ and $5_1$, then inserting $7_0$. The total number of link-pairs we need to change is at most twice the number of elements of $A_k$ plus twice the number of elements of $B_k$. @ The preorder assumption makes still further simplification possible. Indeed, we know that the special case in {\mc KODA-RUSKEY} requires at most two splices per transition, so we can expect to discover a generalization of the principle that led to such an improvement. Suppose all of the elements of $A_k\cup B_k$ have been merged into preorder, thus giving us the ``union'' of $\tau_{k0}$ and $\tau_{k1}$. Suppose further that two vertices $b\,b'$ of $B_k$ are adjacent in this union, where $b$ is principal. It follows that $b'$ is also principal; otherwise some element of $A_k$ would intervene. When $\tau_{k0}$ is being replaced by $\tau_{k1}$ in the fringe, we therefore need to insert both $b$ and~$b'$. Fortunately, they were properly linked to each other when they last left the fringe, and neither one has entered the fringe since then; so they still are properly linked. (We also need to be sure that they were linked properly to each other during the initialization, before processing starts.) The same argument applies when two vertices $a\,a'$ of $A_k$ are adjacent and $a$ is principal. One can often, but not always, avoid some of the link-updating with respect to $\tau_{k0}$ and $\tau_{k1}$ by cleverly reordering the subtrees of~$k$. @ An algorithm that changes transition strings as just described will run in constant amortized time per output, because the cost of changing links in the fringe can be charged to the processing time when the corresponding fringe element next becomes passive. But it looks like bad news for our goal of loopless computation, because we might need to do $\Omega(n)$ operations when $\bit k$ changes. No, all is not lost: We can leave ``stale'' links in the fringe, fixing them one at a time, just before the algorithm needs to look at them! Namely, suppose the transition string $\tau_{k1}$ consists of substrings $\sigma_1\sigma_2\ldots\sigma_s$ that are properly linked within themselves but not necessarily to each other. We can prepare a list of ``fixup instructions'' $(x_1,\ldots,x_s)$, where $x_j$ says that the first element of $\sigma_j$ should be joined to the last element of $\sigma_{j-1}$, or to element $k_1$ if $j=1$. And we can plant a flag that will cause $x_j$ to be performed just as the first element of $\sigma_j$ becomes passive. The execution of $x_j$ will then move the flag in preparation for $x_{j-1}$, or it will remove the flag if $j=1$. @* Data structures. Now that we know basically what needs to be done, we are ready to design the records that will be linked together to form the fringe. These records, which we will call fnodes, naturally contain |left| and |right| fields, because the fringe is doubly linked (modulo stale pointers). An fnode also contains a |fixup| field, which (if non-|NULL|) points to information that will correct a stale |left| pointer. And of course there's also a |bit| field, giving the status of $\bit k$ in fnode number~$k$. Each fnode also has a |focus| pointer, which implicitly classifies fringe elements as active or passive, just as it did in the {\mc KODA-RUSKEY} program: Usually |p->focus=p|, except when |p| is a passive node to the immediate left of an active node. In the latter case, |p->focus| points to the first active node to the left of~|p|. Besides these dynamically changing fields, each fnode also contains two static fields that are precomputed during the initialization, namely |tau[0]| and |tau[1]|. These fields refer to sequential lists that specify the transition strings $\tau_{k0}$ and $\tau_{k1}$ for fnode number~$k$. @= typedef struct fnode_struct { char bit; /* either 0 or 1; always 0 when an A-child's bit is~0, always 1 when a B-child's bit is~1 */ struct fnode_struct *left,*right; /* neighbors in the fringe */ struct fnode_struct **fixup; /* remedy for a stale |left| link */ struct fnode_struct *focus; /* red-tape cutter for efficiency */ struct fnode_struct **tau[2]; /* list of transition string link points */ } fnode; @ The fringe is extended to contain a special fnode called |head|, which makes the list circular. In other words, |head->right| and |head->left| are the leftmost and rightmost elements of the fringe. Also, |head->left->focus| is the rightmost active element. @d head (vertex+1+n) @= fnode vertex[maxn+1]; /* the collection of all potential fringe nodes */ @ And how should we store the |tau| information? Let's ``compile'' the set of necessary link changes into a sequential list $(r_1,l_1, \ldots, r_t, l_t,\,$|NULL|) with the meaning that we want to set |@[@t$l_j$@>@]->right=@[@t$r_j$@>@]| and |@[@t$r_j$@>@]->left=@[@t$l_j$@>@]|. The case $j=1$ is, however, an exception, because it refers to a potentially unknown part of the fringe (lying to the right of $\tau_{k0}$ and $\tau_{k1}$); in that case we want to set we want to set |@[@t$l_j$@>@]->right=@[@t$r_j$@>@]->right| and |@[@t$r_j$@>@]->right->left=@[@t$l_j$@>@]|. @= tau_ptr=tau_table; for (k=0;k<=n;k++) { @; @; } @ @= fnode *tau_table[((maxn*maxn)>>1)+maxn+maxn]; /* elements of |tau| fields */ fnode **tau_ptr; /* the first unused place in |tau_table| */ char ct[maxn]; /* |ct[l]=0| if $\\{ab}[l]\in A_k$, otherwise |ct[l]=1| */ char verbose=1; /* should details of tau compilation be printed? */ info *ab[maxn]; /* an element of $A_k$ or $B_k$ */ @ It is most convenient to sort in {\it decreasing\/} order here, so that the rightmost link-updates are listed first, because we have stored them in that order and we will need them in that order. This part of the program is the most subtle, so I've included a provision for verbose printing during debugging sessions. A simple case in which $A_k=\{5\}$ and $B_k=\emptyset$ will be printed as \.{(a5)}. A slightly more complicated case in which $A_k=\{2\}$ and $B_k=\{4\}$ would be \.{(a2)(b4)} if both are principal, or \.{(a2b4)} if only 2~is principal. A more complicated case, used as an example below, has $A_k=\{1,2,5,8,11\}$ and $B_k=\{3,4,6,7,9,10\}$, with elements $\{1,2,5,7,9,10\}$ principal; this would be printed as \.{(a1)(a2b3b4)(a5b6)(b7a8)(b9)(b10a11)}. Recall that an element |j| of $A_k\cup B_k$ is principal if and only if it is a child of~$k$. The parentheses in the verbose printout are used to group $k$'s children with their nonprincipal descendants. @d principal(l) (jj[ab[l]->id]==k) @= for (i=list[k][0],ii=list[k][1],l=0; i||ii; l++) { if (!ii || (i && i->id>ii->id)) ab[l]=i, ct[l]=0, i=i->sib; else ab[l]=ii, ct[l]=1, ii=ii->sib; } if (l && verbose) { printf("Union%d: (",k); for (j=l-1;;) { printf("%c%d",'a'+ct[j],ab[j]->id); if (j==0) break; j--; if (principal(j)) printf(")("); } printf(")\n"); } @ @= if (!l) vertex[k].tau[0]=vertex[k].tau[1]=NULL; else { ab[l]=NULL; /* sentinel at end of list */ vertex[k].tau[1]=tau_ptr; @; vertex[k].tau[0]=tau_ptr; @; @; } @ Our job here is to specify the links that must change when the principal elements of~$A_k$ are deleted and the principal elements of~$B_k$ are inserted. In the simple example \.{(a5)} mentioned above, we simply set $r_1=\;$|&vertex[5]| and $l_1=\;$|vertex[k]|. The verbose printout will say `\.{k:5+}', meaning that |vertex[k]| is to be followed in the fringe by the vertex that currently follows |vertex[5]|. In the next simplest example, \.{(a2)(b4)}, we will set $r_1=\;$|&vertex[2]|, $l_1=\;$|&vertex[4]|, $r_2=\;$|&vertex[4]|, $l_2=\;$|&vertex[k]|; the verbose printout will symbolize this by `\.{4:2+ k:4}'. The result of \.{(a2b4)} would, on the other hand, be `\.{k:2+}', because only |vertex[2]| needs to be removed from the fringe when |vertex[4]| is nonprincipal. Finally, the instructions compiled from the complex example \.{(a1)(a2b3b4)(a5b6)(b7a8)(b9)(b10a11)} would be $$\.{10:8r+}\quad\.{8r:9}\quad\.{7:8}\quad\.{6r:7}\quad \.{4r:6}\quad\.{k:3}.$$ The `\.r' here denotes the rightmost element of a nonprincipal vertex's entourage. (As explained earlier, the instruction `\.{9:10}' need not be compiled.) The reader might be able to understand from these examples why the author took time out to think before writing this part of the code. @= @; @; while (ab[l]) @; tau_ptr++; /* the |tau| list terminates with |NULL| */ @ @= for (l=0;!principal(l);l++) ; /* nonprincipals at the end can be ignored */ if (ct[l]) { /* principal $b$ */ for (j=l+1; ab[j] && ct[j]; j++) ; if (ab[j]) { if (verbose) sprintf(rbuf,"%dr+",ab[j]->id); *tau_ptr++=vertex+fj[ab[j]->id][ab[j]->alfprime]; }@+else { if (verbose) sprintf(rbuf,"%d+",k); *tau_ptr++=vertex+k; } }@+else { /* principal $a$ */ if (verbose) sprintf(rbuf,"%d+",ab[l]->id); *tau_ptr++=vertex+ab[l]->id; } @ @= char rbuf[8]; /* symbolic form of $r_1$ for verbose printing */ @ Once we've found $r_1$, principal $a$'s at the end can be ignored. @= while (ab[l] && !ct[l] && principal(l)) l++; if (!ab[l]) { if (verbose) printf(" %d:%s\n",k,rbuf); *tau_ptr++=vertex+k; }@+else if (principal(l)) { if (verbose) printf(" %d:%s",ab[l]->id,rbuf); *tau_ptr++=vertex+ab[l]->id; }@+else { if (verbose) printf(" %dr:%s",ab[l]->id,rbuf); *tau_ptr++=vertex+fj[ab[l]->id][ab[l]->alfprime]; } @ At this point |ab[l]| is the |info| node from which we produced $l_{j-1}$. If it refers to a principal element, it's an element $b$ that is absent from $\tau_{k0}$ but present in $\tau_{k1}$. Otherwise it's a nonprincipal element, which appears in both $\tau_{k0}$ and $\tau_{k1}$. @= { if (principal(l)) for (l++; ab[l] && ct[l] && principal(l); l++); else@+ for (l++; !principal(l); l++); *tau_ptr++=vertex+ab[l-1]->id; if (ab[l] && !ct[l] && principal(l)) for (l++; ab[l] && !ct[l] && principal(l); l++); if (!ab[l]) { if (verbose) printf(" %d:%d\n",k,*(tau_ptr-1)-vertex); *tau_ptr++=vertex+k; }@+else if (principal(l)) { if (verbose) printf(" %d:%d",ab[l]->id,*(tau_ptr-1)-vertex); *tau_ptr++=vertex+ab[l]->id; }@+else { if (verbose) printf(" %dr:%d",ab[l]->id,*(tau_ptr-1)-vertex); *tau_ptr++=vertex+fj[ab[l]->id][ab[l]->alfprime]; } } @ The next few sections are identical to the previous ones, with $a$'s and $b$'s swapped. (Unless I have erred.) The compiled instructions for |tau[0]| in the complex example are $$\.{8r:10+}\quad \.{6r:8}\quad \.{5:6}\quad \.{4r:5}\quad \.{2:3}\quad \.{k:1} \,.$$ @= @; @; while (ab[l]) @; tau_ptr++; /* the |tau| list terminates with |NULL| */ @ @= for (l=0;!principal(l);l++) ; /* nonprincipals at the end can be ignored */ if (!ct[l]) { /* principal $a$ */ for (j=l+1; ab[j] && !ct[j]; j++) ; if (ab[j]) { if (verbose) sprintf(rbuf,"%dr+",ab[j]->id); *tau_ptr++=vertex+fj[ab[j]->id][ab[j]->alfprime]; }@+else { if (verbose) sprintf(rbuf,"%d+",k); *tau_ptr++=vertex+k; } }@+else { /* principal $b$ */ if (verbose) sprintf(rbuf,"%d+",ab[l]->id); *tau_ptr++=vertex+ab[l]->id; } @ @= while (ab[l] && ct[l] && principal(l)) l++; if (!ab[l]) { if (verbose) printf(" %d:%s\n",k,rbuf); *tau_ptr++=vertex+k; }@+else if (principal(l)) { if (verbose) printf(" %d:%s",ab[l]->id,rbuf); *tau_ptr++=vertex+ab[l]->id; }@+else { if (verbose) printf(" %dr:%s",ab[l]->id,rbuf); *tau_ptr++=vertex+fj[ab[l]->id][ab[l]->alfprime]; } @ At this point |ab[l]| is the |info| node from which we produced $l_{j-1}$. If it refers to a principal element, it's an element $a$ that is absent from $\tau_{k1}$ but present in $\tau_{k0}$. Otherwise it's a nonprincipal element, which appears in both $\tau_{k0}$ and $\tau_{k1}$. @= { if (principal(l)) for (l++; ab[l] && !ct[l] && principal(l); l++); else@+ for (l++; !principal(l); l++); *tau_ptr++=vertex+ab[l-1]->id; if (ab[l] && ct[l] && principal(l)) for (l++; ab[l] && ct[l] && principal(l); l++); if (!ab[l]) { if (verbose) printf(" %d:%d\n",k,*(tau_ptr-1)-vertex); *tau_ptr++=vertex+k; }@+else if (principal(l)) { if (verbose) printf(" %d:%d",ab[l]->id,*(tau_ptr-1)-vertex); *tau_ptr++=vertex+ab[l]->id; }@+else { if (verbose) printf(" %dr:%d",ab[l]->id,*(tau_ptr-1)-vertex); *tau_ptr++=vertex+fj[ab[l]->id][ab[l]->alfprime]; } } @ We need to link the siblings of a family together if they are adjacent and of the same type, because of one of the optimization we're doing. So we might as well link {\it all\/} siblings together. @= for (l=0,p=NULL;ab[l];l++) if (principal(l)) { q=vertex+ab[l]->id; if (p) q->right=p, p->left=q; p=q; } @ @= fnode *p,*q,*r; @* Doing it. The time has come to construct the loopless implementation in practice, as we have been doing so far in theory. @= while (1) { @; @; if (p!=head) { if (p->fixup) @left|@>; if (p->bit==0) @bit=1|@>@; else @bit=0|@>@; }@+else if (been_there_and_done_that) break; else { printf("... and now we generate in reverse:\n"); been_there_and_done_that=1;@+continue; } @; } @ @= char been_there_and_done_that; /* have we completed a cycle already? */ @ The nice thing is that the algorithm not only is loopless, its operations are simple. @= q=head->left; p=q->focus; q->focus=q; @ At this point we know that |p->right| (more precisely, the node that would be |p->right| if links weren't stale) is active. And we also know that |p->left| is not stale. @= q=p->left; p->focus=q->focus; q->focus=q; @ @left|@>= { q=*(p->fixup), r=*(p->fixup+1); p->left=q, q->right=p; if (r) r->fixup=p->fixup+2; p->fixup=NULL; } @ @bit=1|@>= { p->bit=1; if (p->tau[1]) { q=(*(p->tau[1]))->right; r=*(p->tau[1]+1); q->left=r, r->right=q; r=*(p->tau[1]+2); if (r) r->fixup=p->tau[1]+3; } } @ @bit=0|@>= { p->bit=0; if (p->tau[0]) { q=(*(p->tau[0]))->right; r=*(p->tau[0]+1); q->left=r, r->right=q; r=*(p->tau[0]+2); if (r) r->fixup=p->tau[0]+3; } } @ @= for (k=0;k<=n;k++) putchar('0'+vertex[k].bit); if (verbose) @; putchar('\n'); @ Once again, I'm writing optional code that should help me gain confidence (and pinpoint errors) while debugging. Such code also ought to help an observer understand what the program thinks it is doing, or at least what I thought it should be doing when I wrote it. The fringe is printed right-to-left, because that's the way the implicit parts of the data need to be interpreted (namely, the active/passive bits and the stale-pointer corrections). I could, of course, take the trouble to reverse the order and make the printout more intuitive; but hey, this is for educated eyes only. Passive vertices are shown in parentheses. @= for (l=0,q=head->left,p=q->focus; q!=head; q=r) { printf(" %s%d_%d%s", q!=p?"(":"", q-vertex, q->bit, q!=p?")":""); @left|@>; if (q==p) p=r->focus; } @ Here we need to maintain a stack of currently active fixups. @= typedef struct { fnode *vert; /* vertex with a stale |left| pointer */ fnode **ref; /* |tau| string with info about future vertices */ } fstack_node; @ @left|@>= if (q->fixup) fstack[++l].vert=q, fstack[l].ref=q->fixup; if (q==fstack[l].vert) { putchar('!'); /* indicate that a stale link is being fixed */ r=*(fstack[l].ref); if (*(fstack[l].ref+1)) fstack[l].vert=*(fstack[l].ref+1), fstack[l].ref+=2; else l--; }@+else r=q->left; @ @= fstack_node fstack[maxn]; /* stack used in verbose printout */ @*Priming the pump. The program is complete except for one niggling detail: We need to set up the initial contents of the fringe. Everything will maintain itself, once the whole structure is up and running, but how do we get chickens before we have eggs? Here are two recursive subroutines that come to the rescue. The first one sets |vertex[j].bit=1| in all vertices |j| such that $k\preceq j$. The second one contributes the entourage of given vertex in a given state to the current fringe. @= void setbits(char k) { register info *i; vertex[k].bit=1; for (i=list[k][0]; i; i=i->sib) if (jj[i->id]==k) setbits(i->id); } @ A global variable called |cur_vert| points to the left end of the fringe-so-far. We construct the entourages from right to left, because that's the way the |list| info is set up. All bits are assumed to be zero initially; therefore we don't need a routine to set them zero, only the |setbits| routine to make some of them nonzero. @= void entourage(char k,char t) { register fnode *p=vertex+k; register info *i; if (t) setbits(k); for (i=list[k][t]; i; i=i->sib) entourage(i->id,i->alfprime^i->del); cur_vert->left=p, p->right=cur_vert, cur_vert=p; } @ @= fnode *cur_vert; /* front of a doubly linked list under construction */ @ And now we're done! @= for (k=0;k<=n+1;k++) vertex[k].focus=&vertex[k]; cur_vert=head; entourage(0,0); cur_vert->left=head, head->right=cur_vert; @* Comment about looplessness. In practice, the algorithm would run a bit faster if the |fixup| links were dispensed with and all splicing were done directly at transition time. Constant {\it amortized\/} time---or, rather, total execution time in general---is the main criterion in most applications; there usually is no reason to care whether some generation steps are slow and others are fast, as long as the entire job is done quickly. However, the construction of loopless algorithms carries an academic cachet: ``Look at how clever I am, I can even do it looplessly!'' That's why this program has gone the extra mile to assure constant time per generation step, even though the added complications may well have been a step backwards from a practical standpoint. [Excuse me, I'm thinking only of sequential computation here; loopless algorithms can have definite advantages with respect to parallel processing.] On the other hand, many academic purists may reasonably claim that I have not actually reached my stated goal. When Gideon Ehrlich introduced the notion of loopless implementations in {\sl JACM\/ \bf20} (1973), 500--513, he insisted that the initialization time be $O(n)$; the program above can only guarantee an initialization time that is $O\bigl(\sum_k(\Vert A_k\Vert+ \Vert B_k\Vert)\bigr)$, and that sum can be $\sim n^2\!/4$. Clearly the initialization time ought to be substantially smaller than the number of outputs, or a loopless algorithm would be trivial. But I think it's fair to allow initialization to take as long as, say, $O\bigl(\min(m,n^2)\bigr)$ when there are $m$ outputs, especially when $m$ is usually (but not always) exponential in~$n$. @* Index.