\datethis \input epsf \let\possiblyflakyepsfbox=\epsfbox \def\epsfbox#1{\hbox{\possiblyflakyepsfbox{#1}}} \def\caret/{\smash{\lower4pt\hbox to0pt{\hss$\scriptscriptstyle\land$\hss}}} \def\qcaret/{`\thinspace\caret/\thinspace'} % quoted caret @s delete unknown @*Introduction. The purpose of this program is to enumerate fixed polyominoes of up to 55 cells (although I won't really be able to get that far until Moore's law carries on for a few more years). The method is essentially that of Iwan Jensen [{\tt arXiv:cond-mat/0007239}, to appear in {\sl Journal of @^Jensen, Iwan@> Statistical Physics}, spring 2001], who discovered that the important techniques of Andrew Conway [{\sl Journal of Physics\/ \bf A28} (1995), @^Conway, Andrew Richard@> 335--349] can be vastly improved in the special case of polyomino enumeration. The basic idea is quite simple: We will count the number of fixed polyominoes that span a rectangle that is $h$ cells high and $w$ cells wide, where $h$ and $w$ are as small as possible; then we will add the totals for all relevant $h$ and $w$. We can assume that $h\ge w$. For each $h$ and $w$ that we need, we enumerate the spanning polyominoes by considering one cell at a time of an $h\times w$ array, working from left to right and top to bottom, deciding whether or not that cell is occupied, and combining results for all boundary configurations that are equivalent as far as future decisions are concerned. For example, we might have a polyomino that starts out like this: $$\epsfbox{polyomino.2}$$ (This partial polyomino obviously has more than 54 cells already, but large examples will help clarify the concepts that are needed in the program below.) Most of the details of the upper part of this pattern have no effect on whether the yet-undetermined cells will form a suitable polyomino or not. All we really need to know in order to answer that question is whether the bottom cells of each column-so-far are occupied or unoccupied, and which of the occupied ones are already connected to each other. We also need to know whether the left column and right column are still blank. In this case the 26 columns have occupancy pattern \.{01001001001010110011010110} at the bottom, and the occupied cells belong to six connected components, namely $$\vbox{\halign{\.{#}\cr 01000000000000000000010000\cr 00001000001000110000000000\cr 00000001000000000000000000\cr 00000000000010000000000000\cr 00000000000000000011000000\cr 00000000000000000000000110\cr}}$$ Fortunately the fact that polyominoes lie in a plane forces the components to be nested within each other; we can't have ``crossings'' like \.{1000100} with \.{0010001}. Therefore we can encode the component and occupancy information conveniently as $$\.{0(00(00100-010-)00()0)0()0}$$ using a five-character alphabet: $$\vbox{\halign{\.#\enspace&#\hfil\cr 0&means the cell is unoccupied;\cr 1&means the cell is a single-cell component;\cr (&means the cell is the leftmost of a multi-cell component;\cr )&means the cell is the rightmost of a multi-cell component;\cr -&means the cell is in the midst of a multi-cell component.\cr}}$$ Furthermore we can treat the cases where the entire leftmost column is nonblank by considering that the left edge of the array belongs to the leftmost cell component; and the rightmost column can be treated similarly. With these conventions, we encode the boundary condition at the lower fringe of the partially filled array above by the 26-character string $$\.{0(00(00100-010-\caret/)00()0)0(-0}\,,\eqno(*)$$ using \qcaret/ to show where the last partial row ends. \vskip1pt A string like $(*)$ represents a so-called {\it configuration}. If no \qcaret/ appears, the partial row is assumed to be a complete row. The number of rows above a given occupancy pattern is implicitly part of the configuration but not explicitly shown in the notation, because this program never has to deal simultaneously with configurations that have different numbers of rows above the current partial row. @ A bit of theory may help the reader internalize these rules: It turns out that the set of all connectivity/occupancy configuration codes at the end of a row has the interesting unambiguous context-free grammar $$\eqalign{ S&\to L\,\.0J_0R \mid Z\.-I\.-Z\mid Z\.-Z\cr Z&\to\epsilon\mid\.0Z\cr L&\to\epsilon\mid Z\,\.)\mid Z\.-I\.)\cr R&\to\epsilon\mid \.(Z\mid \.(I\.-Z\cr I_0&\to\epsilon\mid\.0J_0\cr J_0&\to I_0\mid A\,\.0J_0\cr I&\to\epsilon\mid\.-I\mid\.0J\cr J&\to I\mid A\,\.0J\cr A&\to 1\mid \.(I\.)\cr }$$ [Translation: $I$ is any sequence of \.0's and \.-'s and $A$'s with each $A$ preceded and followed by~\.0; $I_0$ is similar, but with no \.- at top level.] The number $s_n$ of strings of length $n$ in $S$ has the generating function $$\sum s_nz^n={1-4z^2-4z^3+z^4-(1+z)(1-z^2)\sqrt{\mkern1mu1+z} \sqrt{\mkern1mu1-3z}\over2z^3(1-z)}=2z+6z^2+16z^3+\cdots{};$$ hence $s_n$ is asymptotically proportional to $3^n\!/n^{3/2}$. @ Any polyomino with the configuration $(*)$ in the midst of row 10 must have occupied at least $28+18+1+1+2+4=54$ cells so far, and 16 more will be needed to connect it up and make it touch the left edge of the array. Moreover, the array has only 10 rows so far, but it has 26 columns, and we will only be interested in arrays for which $h\ge w$. Therefore at least 15 additional cells must be occupied if we are going to complete a polyomino in an $h\times26$ array. In other words, we would need to consider the boundary configuration $(*)$ only if we were enumerating polyominoes of at least $54+16+15=85$ cells. Such considerations greatly limit the number of configurations that can arise; Jensen's experiments show that the total number of cases we must deal with grows approximately as $c^n$ where $c$ is slightly larger than~1.4. This is exponentially large, but it is considerably smaller than the $3^{n/2}$ configurations that arise in Conway's original method; and it's vastly smaller than the total number of polyominoes, which turns out to be approximately $4.06^n$ times $0.3/n$. @ This program doesn't actually do the whole job of enumeration; it only outputs a set of instructions that tell how to do it. Another program reads those instructions and completes the computation. @c #include @@; @@; @@; main(int argc, char *argv[]) { @; @; @; @; @; @; exit(0); } @ @= void panic(char *mess) { fprintf(stderr,"%s!\n",mess); exit(-1); } @ The user specifies the maximum size of polyominoes to be counted, $n$, and the desired width, $w$, on the command line. All $h\times w$ rectangles spanned by polyominoes of $n$ cells or less will be counted, for $w\le h\le n+1-w$. (No solutions are possible for $h>n+1-w$.) The present version of this program restricts |w| to be at most 23, for simplicity. But the packing and unpacking routines below could be adapted via \.{CWEB} change files in order to deal with values of |w| as large as 27, when we're pushing the envelope. The command line should also specify the amount of memory allocated for configurations in this program and for individual counters in the output. Statistics will be printed for guidance in the choice of those parameters. The base name of the output file should be given as the final command-line argument. This name will actually be extended by \.{.0}, \.{.1}, \dots, as explained below, because there might be an enormous amount of output. @d wmax 23 /* for quinary/octal packing into two tetrabytes */ @d nmax (wmax+wmax+126) @d bad(k,v) sscanf(argv[k],"%d",&v)!=1 @= if (argc!=6 || bad(1,n) || bad(2,w) || bad(3,conf_size) || bad(4,slave_size)) { fprintf(stderr, "Usage: %s n w confsize slavesize outfilename\n",argv[0]); exit(-2); } if (w>wmax) panic("Sorry, that w is too big for this implementation"); if (w<2) panic("No, w must be at least 2"); if (nw+w+126) panic("Eh? That n is incredible"); base_name=argv[5]; @ @= int n; /* we will count polyominoes of $n$ or fewer cells */ int w; /* provided that they span a rectangle of width $w$ and height $\ge w$ */ int conf_size; /* the number of \&{config} structures in our memory */ int slave_size; /* the number of counter positions in the slave program memory */ @* Output. Let's get the basics of output out of the way first, so that we know where we're heading. The postprocessing program {\mc POLYSLAVE} will interpret instructions according to a compact binary format, with either one or four bytes per instruction. Several gigabytes might well be generated, and my Linux system is not real happy with files of length greater than $2^{31}-1=2147483647$. Therefore this program breaks the output up into a sequence of files called \.{foo.0}, \.{foo.1}, \dots, each at most one large gigabyte in size. (That's one GGbyte${}=2^{30}$~bytes.) Some unfortunate hardware failures led me to add a |checksum| feature. @d filelength_threshold (1<<30) /* maximum file size in bytes */ @d buf_size (1<<16) /* buffer size, should be a divisor of |filelength_threshold| */ @= FILE* out_file; /* the output file */ union { unsigned char buf[buf_size+4]; /* place for binary output */ int foo; /* force |out.buf| to be aligned somewhat sensibly */ } out; unsigned char *buf_ptr; /* our current place in the buffer */ int bytes_out; /* the number of bytes so far in the current output file */ unsigned int checksum; /* a way to help identify bad I/O */ FILE *ck_file; /* the checksum file */ int file_extension; /* the number of GGbytes output */ char *base_name, filename[100]; @ @= void open_it() { sprintf(filename,"%.90s.%d",base_name,file_extension); out_file=fopen(filename,"wb"); if (!out_file) { fprintf(stderr,"I can't open file %s",filename); panic(" for output"); } bytes_out=checksum=0; } @ @= void close_it() { if (fwrite(&checksum,sizeof(unsigned int),1,ck_file)!=1) panic("I couldn't write the check sum"); if (fclose(out_file)!=0) panic("I couldn't close the output file"); printf("[%d bytes written on file %s, checksum %u.]\n", bytes_out,filename,checksum); } @ @= void write_it(int bytes) { register int k; register unsigned int s; if (bytes_out>=filelength_threshold) { if (bytes_out!=filelength_threshold) panic("Improper buffer size"); close_it(); file_extension++; open_it(); } if (fwrite(&out.buf,sizeof(unsigned char),bytes,out_file)!=bytes) panic("Bad write"); bytes_out+=bytes; for (k=s=0; k= typedef enum {@!sync,@!clear,@!copy,@!add, @!inc_src,@!dec_src,@!inc_trg,@!dec_trg} opcode; @ @d end_of_buffer &out.buf[buf_size] @= void put_inst(unsigned char o,unsigned char p) { register unsigned char *b=buf_ptr; *b++=0x80+(o<<4)+(p<16? p: 0); if (p>=16) *b++=p; if (b>=end_of_buffer) { write_it(buf_size); out.buf[0]=out.buf[buf_size]; b-=buf_size; } buf_ptr=b; } @ @= void put_four(register unsigned int x) { register unsigned char *b=buf_ptr; *b=x>>24; *(b+1)=(x>>16)&0xff; *(b+2)=(x>>8)&0xff; *(b+3)=x&0xff; b+=4; if (b>=end_of_buffer) { write_it(buf_size); out.buf[0]=out.buf[buf_size]; out.buf[1]=out.buf[buf_size+1]; out.buf[2]=out.buf[buf_size+2]; b-=buf_size; } buf_ptr=b; } @ The first six bytes of the instruction file are, however, special. Byte~0 is the number $n$ of cells in the largest polyominoes being enumerated. When a |sync| is interpreted, {\mc POLYSLAVE} outputs the current values of |count[j]| for $1\le j\le n$. Byte 1 is the number of the final row. If this number is $r$, {\mc POLYSLAVE} will terminate after interpreting the instruction |sync|~$r$. Bytes 2--5 specify the (big-endian) number of elements in the |count| array. Initially |s=t=0|, |count[0]=1|, and |count[j]| is assumed to be zero for $1\le j\le n$. @= sprintf(filename,"%.90s.ck",base_name); ck_file=fopen(filename,"wb"); if (!ck_file) panic("I can't open the checksum file"); open_it(); out.buf[0]=n; out.buf[1]=n+2-w; buf_ptr=&out.buf[2]; put_four(slave_size); @ Here's what we'll do when it's all over. @= if (buf_ptr!=&out.buf[0]) write_it(buf_ptr-&out.buf[0]); close_it(); @ Most of the output is generated by the |basic_inst| routine. @= void basic_inst(int op, int src_addr, int trg_addr, unsigned char count) { register int del; if (verbose>1) { if (op==clear) printf("{clear %d ->%d}\n",count,trg_addr); else printf("{%s %d %d->%d}\n",sym[op],count,src_addr,trg_addr); } del=src_addr-cur_src; if (del>0 && del<256) put_inst(inc_src,del); else if (del<0 && del>-256) put_inst(dec_src,-del); else if (del) put_four(src_addr); cur_src=src_addr; del=trg_addr-cur_trg; if (del>0 && del<256) put_inst(inc_trg,del); else if (del<0 && del>-256) put_inst(dec_trg,-del); else if (del) put_four(trg_addr+targ_bit); cur_trg=trg_addr; put_inst(op,count); } @ @= char *sym[4]={"sync","clear","copy","add"}; int cur_src, cur_trg; /* current source and target addresses in the slave */ int verbose=0; /* set nonzero when debugging */ @* Connectivity. The hardest task that confronts us is to figure out how to determine the cutoff threshold: Given a configuration like \.{0(00(00100-010-\caret/)00()0)0(-0}, what is the minimum number of additional cells that are needed to connect it up and to make it stretch out to at least a given number of further rows? We claimed above, without proof, that this particular configuration needs at least 16 more cells before it will be connected and touch the left boundary. Now we want to prove that claim, and solve the general problem as~well. Some cases of this problem are easy. For example, let's consider first the case when we are at the beginning or end of a complete row. Then it is clear that a configuration like \.{00-)0(0--)00(--0} needs at least $3+4$ more cells to become occupied. [Well, if this {\it isn't\/} clear, please stop now and think about it until it {\it is}. Remember that it stands for a pattern with three connected components of occupied cells; the left component is connected to the left edge, the right component is connected to the right edge, and the middle component is standing alone.] Suppose we have a pattern like $\.0^{g_0}\alpha_1\.0^{g_1}\alpha_2\ldots \.0^{g_{k-1}}\alpha_k\.0^{g_k}$, where each $\alpha_j$ is a separate component beginning with \.( and ending with \.). For example, a typical $\alpha$ might be \.{()} or \.{(-)} or \.{(-00--00-0-0)}, etc. Again the problem we face is easily solved: We need to occupy $g_0+1$ cells in order to connect $\alpha_1$ to the left edge, $g_j+2$ cells to connect $\alpha_j$ to $\alpha_{j+1}$ for $1\le ji$ and if the components $\alpha_{i+1}$, \dots, $\alpha_j$ are mutually disconnected, then we are allowed to deduct any one of the terms $(g_i+2)$, \dots, $(g_{j-1}+2)$, after which we can treat $\alpha_i\ldots\alpha_j$ as a {\it single\/} component with respect to further deductions. Finally after choosing a subset of terms to deduct, we get a bonus for each $\alpha_j$ of length~1 such that neither $g_{j-1}$ nor $g_j$ were left disconnected. (Length~1 means that the code for $\alpha_j$ is a single character, either \.1 or \.( or \.) or \.-.) @ A recursive strategy can be used to solve the minimum connectivity problem in linear time, but we must design it carefully because of the examples considered earlier. The key idea will be to associate four costs $c_{ij}$ with each subcomponent, where $0\le i,j\le1$. Cost $c_{ij}$ is the minimum number of future occupied cells needed to connect everything up within the component, with the further proviso that there is a cell below the leftmost cell if $i=1$, and a cell below the rightmost cell if $j=1$. The $2\times2$ matrix $(c_{ij})$ will then represent all we need to know about connecting this component at a higher level. \vskip1pt \def\mx#1#2#3#4{{#1\,#2\choose#3\,#4}} For example, the cost matrix for a single-character component is $\mx0111$, and the cost matrix for a multi-character component with no internal subcomponents is $\mx0112$. (The 0 in the upper left corner signifies that we don't need any cells to connect it up, since it's already connected. But we may have to occupy one or two cells as ``hooks'' at the left and/or the right if the environment wants them.) \vskip1pt \def\gmp#1#2{\mathbin{^#1_#2}} The theory that underlies the algorithm below is best understood in terms of {\it min-plus matrix multiplication\/} $C=AB$, where $c_{ij}= \min_k(a_{ik}+b_{kj})$. (I should probably use a special symbol to denote this multiplication, like $A\gmp\land+B$ instead of $AB$; see, for example, exercise 1.3.1$'$--32 in {\sl The Art of Computer Programming}, Fascicle~1. But that would clutter up a huge number of the formulas below. And I have no use for ordinary matrix multiplication in the present program. Therefore min-plus multiplication will be assumed to need no special marking in the following discussion.) \vskip1pt \def\mtx#1#2#3#4{\hbox{$\bigl({#1\atop#2}\ {#3\atop#4}\bigr)$}} Let $X_g$ be the matrix \mtx\infty\infty\infty g. Then if $\alpha_1$, \dots, $\alpha_k$ are distinct subcomponents with cost matrices $A_1$, \dots,~$A_k$, respectively, the cost of connecting up $\.0^{g_0}\alpha_1\.0^{g_1}\alpha_2\ldots\.0^{g_{k-1}}\alpha_k\.0^{g_k}$ and touching the left and right edges is the upper left corner entry of $$OX_{g_0}A_1X_{g_1}\ldots X_{g_{k-1}}A_kX_{g_k}O,\qquad \qquad \hbox{where $O=\mtx0000$},$$ because matrix $X_g$ essentially means ``occupy $g$ cells below and require the cells to the left and right of these cells to be occupied as well.'' Notice that this rule yields $(g_0+1)+(g_1+2)+\cdots+(g_{k-1}+2)+(g_k+1)$ in the special case when each matrix $A_j$ is $\mx0112$, and there's a bonus of~1 whenever we replace $A_j$ by $\mx0111$. Notice also that $X_gX_h=X_{g+h}$; thus we can essentially think of each \.0 in the configuration as a multiplication by~$X_1$. @ If $\alpha$ and $\beta$ are subcomponents that are already connected to each other, having cost matrices $A$ and~$B$ respectively, the cost matrix for $\alpha\,\.0^g\beta$ as a single component is $ANB$, where $N=\mtx0\infty\infty\infty$ is the matrix meaning ``don't occupy anything below the $\.0^g$ and don't insist that the cells to the left and right of that gap must be occupied.'' For example, this rule gives $\mx0111N\mx0111=\mx0112$ as it should. And in general, suppose $\alpha_0$, $\alpha_1$, \dots, $\alpha_k$ are subcomponents that are distinct except that $\alpha_0$ is already connected to $\alpha_k$. Then the cost matrix for $\alpha_0\.0^{g_1}\alpha_1\ldots\alpha_{k-1}\.0^{g_k}\alpha_k$ as a single subcomponent can be expressed in terms of the individual cost matrices $A_0$, $A_1$, \dots, $A_k$ by using the formula $$\min_{1\le j\le k}\bigl(A_0X_{g_1}\ldots X_{g_{j-1}}A_{j-1}NA_j X_{g_{j+1}}\ldots X_{g_k}A_k\bigr), \eqno(**)$$ because we are allowed to leave any one of the gaps $\.0^{g_j}$ disconnected. Rule $(**)$ is the basic principle that makes a recursive algorithm work. @ Our discussion so far has assumed for simplicity that we're at the end of a row. But we also need to consider the case that a gap $\.0^g$ might be $\.0^a\caret/\.0^b$ in the vicinity of the place where a partial row is being filled. Fortunately the same theory applies with only a slight variation: Instead of $X_g=X_{a+b}$, we use $X_{a+b+1}$ if $a>0$, or we use $Y_b=\mtx\infty\infty b\infty$ if $a=0$; either $a$ or $b$ or both might be zero. Matrix $Y_b$ means, ``Occupy the cells below $\.0^b$ and require the next cell on the right (but not necessarily the left) to be occupied as well.'' The extra~1 in $X_{a+b+1}$ is needed when $a>0$ because one of the newly occupied cells must pass through the current row~$r$ in order to reach row~$r+1$. @ After calculating the minimum cost of connection, we might also need to add the cost of extension to span a $w\times w$ square. Thus, if row~$r$ is partially filled and row $r-1$ is completely filled, and if $r= typedef enum { @!zero, @!one, @!rt, @!mid, @!lft, @!eol } code; /* \.0, \.1, \.), \.-, \.(, or edge delimiter in a configuration string */ @ @= char decode[5]={'0','1',')','-','('}; code reflect[5]={zero,one,lft,mid,rt}; @ Then in the cost matrices and in our answer describing the minimum connection cost of a given configuration, we will represent numbers $a+b\epsilon$ as short integers, knowing that $a$ and $b$ will never exceed 27. @d unity (1<<8) @d epsilon 1 @d uunity (unity+epsilon) @d int_part(x) ((x)>>8) @d eps_part(x) ((x)&0xff) @= typedef unsigned short cost; /* $a+b\epsilon$ represented as |(a<<8)+b| */ @# typedef struct { cost c[2][2]; } cost_matrix; @ We will need to distinguish between the cost matrices $X_g=\mtx\infty\infty\infty {g_{\mathstrut}}$ and $Y_g=\mtx\infty\infty g\infty$, as well as a third case that arises when a configuration is already connected to the left edge. @= typedef enum { @!ytyp, @!xtyp, @!otyp } gap_type; @ A few easy subroutines do the basic operations on cost matrices that we will need. @= cost_matrix a_n_b(cost_matrix a, cost_matrix b) /* computes $A N B$ */ { cost_matrix c; c.c[0][0]=a.c[0][0]+b.c[0][0]; c.c[0][1]=a.c[0][0]+b.c[0][1]; c.c[1][0]=a.c[1][0]+b.c[0][0]; c.c[1][1]=a.c[1][0]+b.c[0][1]; return c; } @ @= cost_matrix a_x_b(cost_matrix a, cost_matrix b, gap_type typ, cost g) /* computes $A X_{g\,}B$, $A Y_{g\,} B$, or $B$, depending on |typ| */ { cost_matrix c; if (typ==otyp) return b; c.c[0][0]=a.c[0][typ]+g+b.c[1][0]; c.c[0][1]=a.c[0][typ]+g+b.c[1][1]; c.c[1][0]=a.c[1][typ]+g+b.c[1][0]; c.c[1][1]=a.c[1][typ]+g+b.c[1][1]; return c; } @ @= void min_mat(cost_matrix *a, cost_matrix b) /* sets $A=\min(A,B)$ */ { if ((*a).c[0][0]>b.c[0][0]) (*a).c[0][0]=b.c[0][0]; if ((*a).c[0][1]>b.c[0][1]) (*a).c[0][1]=b.c[0][1]; if ((*a).c[1][0]>b.c[1][0]) (*a).c[1][0]=b.c[1][0]; if ((*a).c[1][1]>b.c[1][1]) (*a).c[1][1]=b.c[1][1]; } @ The algorithm we use is inherently recursive, but we implement it iteratively using a stack because it involves only simple algebraic operations. Each stack entry typically represents a string of \.0's and a subcomponent that has not been completely scanned as yet; the string of \.0's is represented by a cost matrix $X_g$ or $Y_g$, and the incomplete subcomponent is represented by a partial evaluation of formula $(**)$. At stack level~0, however, there is no incomplete subcomponent and only the |closed_cost| field is relevant. This field corresponds to the cost matrix of all components scanned so far. @= typedef struct { cost gap; /* zeros to cover in the previous gap */ gap_type gap_typ; /* type of previous gap (|xtyp| or |ytyp| or |otyp|) */ cost_matrix closed_cost; /* cost matrix with no gaps */ cost_matrix open_cost; /* cost matrix with optional gap */ } stack_entry; @ The given configuration string will appear in |c[1]| through |c[w]|; also |c[0]| and |c[w+1]| will be set to |eol|. @= code c[64]; /* codes of the current configuration string */ stack_entry stk[64]; /* partially evaluated costs */ @ This program operates in two phases that are almost identical: Before |row_end| has been sensed, the cost of a connection cell is |unity|, but afterwards it is |unity+epsilon|. In order to streamline the code I'm using a little trick explained in Example~7 of my paper ``Structured programming with |goto| statements'': I make two almost identical copies of the code, one for the actions to be taken before |k==row_end| and one for the actions to be taken subsequently. The first copy jumps to the second as soon as the condition |k==row_end| is sensed. This avoids all kinds of conditional coding and makes ``variables'' into ``constants,'' although it does have the somewhat disconcerting feature of jumping from one loop into the body of another. The program could be made faster if I would look at high speed for special cases like subcomponents of the form \.{(0--00-)}, since that subcomponent is equivalent to \.{()} with respect to the connectivity measure we are computing. But I~intentionally avoided tricky optimizations in order to keep this program simpler and easier to verify. On the other hand, I do reduce \.0-less forms like \.{(---)} to \.{()}, and \.{(-\caret/--)} to \.{(\caret/)}, etc. Such optimizations aren't strictly necessary but I found them irresistible, because they arise so frequently. Variable |row_end| in the following routine points to the character just following the \qcaret/. @= cost connectivity(register int row_end) { register int k; /* our place in the string */ register int s; /* the number of open items on the stack */ int g; /* the current gap size */ gap_type typ; /* its type */ int open; /* set nonzero if previous token was \.( or \.- */ @; scan_zeros0: @; scan_tokens0: @; scan_zeros1: @; scan_tokens1: @; @; } @ In practice |row_end| will not be zero. But I decided to make the algorithm general enough to work correctly also in that case, because only one more line of code was needed. @= s=open=0; stk[0].closed_cost=zero_cost; k=1; if (mid_or_rt(c[1])) { g=0, typ=xtyp; /* |xtyp| and |ytyp| are equivalent at the left edge */ if (row_end==1) goto scan_tokens1; else goto scan_tokens0; } @ @= if (k==row_end) { g=0, typ=ytyp; goto scan_zeros1x; } if (c[k]) panic("Syntax error, 0 expected"); typ=xtyp, g=unity, k++; while (c[k]==0) { if (k==row_end) { g+=unity+uunity, k++; /* correction for straddling rows */ goto scan_zeros1x; } g+=unity, k++; } if (k==row_end) { g+=unity; goto scan_tokens1; } @ @= cm=base_cost0; k++; switch (c[k-1]) { case lft: @+ if (!mid_or_rt(c[k])) goto scan_open0; @ if (c[k-1]!=rt) goto scan_open0; case one: @; open=0;@+ goto scan_zeros0; case mid:@+if (!(mid_or_rt(c[k]))) goto scan_mid0; @ if (c[k-1]!=rt) goto scan_mid0; case rt:@+ if (!s) { if (stk[0].closed_cost.c[1][1]) panic("Unmatched )"); stk[0].closed_cost=cm; /* already connected to left edge */ }@+else { @; @; } open=0;@+ goto scan_zeros0; case eol: goto scan_eol0; default: panic("Illegal code"); } scan_open0: @;@+goto check_eol0; scan_mid0: @; check_eol0: open=1; if (c[k]!=eol) goto scan_zeros0; if (k==row_end) goto scan_eol1; scan_eol0: panic("Row end missed"); @ @= gap_type typ; /* the current type of gap |g| */ cost_matrix cm; /* the current cost matrix */ cost_matrix acm; /* another cost matrix */ @# const cost_matrix zero_cost={0,0,0,0}; /* $0\,0\choose0\,0$ */ const cost_matrix base_cost0={0,unity,unity,unity}; /* $0\,1\choose1\,1$ */ const cost_matrix base_cost1={0,uunity,uunity,uunity}; /* $\bigl({0\atop1+\epsilon}\ {1+\epsilon\atop1+\epsilon}\bigr)$ */ @ @= { do { if (k==row_end) goto scan_tokens1a; k++; if (c[k-1]==rt) break; }@+while (mid_or_rt(c[k])); cm.c[1][1]=unity+unity; /* now |cm| is $0\,1\choose1\,2$ */ } @ @= { do { if (k==row_end) goto scan_tokens1b; k++; if (c[k-1]==rt) break; }@+while (mid_or_rt(c[k])); cm.c[1][1]=unity+unity; /* now |cm| is $0\,1\choose1\,2$ */ } @ @= if (s) @; stk[s].closed_cost=a_x_b(stk[s].closed_cost,cm,typ,g); @ @= { acm=a_n_b(stk[s].closed_cost,cm); stk[s].open_cost=a_x_b(stk[s].open_cost,cm,typ,g); min_mat(&stk[s].open_cost,acm); } @ @= s--; if (s) @; stk[s].closed_cost=a_x_b(stk[s].closed_cost,stk[s+1].open_cost, stk[s+1].gap_typ, stk[s+1].gap); @ @= { acm=a_n_b(stk[s].closed_cost,stk[s+1].open_cost); stk[s].open_cost=a_x_b(stk[s].open_cost,stk[s+1].open_cost, stk[s+1].gap_typ, stk[s+1].gap); min_mat(&stk[s].open_cost,acm); } @ @= stk[++s].gap_typ=typ; stk[s].gap=g; stk[s].closed_cost=stk[s].open_cost=cm; @ @= if (!s) { if (stk[0].closed_cost.c[1][1]) panic("Unmatched -"); s=1, stk[1].gap_typ=otyp, stk[1].closed_cost=stk[1].open_cost=cm; }@+ else { @; stk[s].closed_cost=stk[s].open_cost; } @ @= if (c[k]) panic("Syntax error, 0 expected"); typ=xtyp, g=uunity, k++; scan_zeros1x:@+ while (c[k]==0) { g+=uunity, k++; } @ @= cm=base_cost1; k++; switch (c[k-1]) { case lft: @+ if (!mid_or_rt(c[k])) goto scan_open1; @ if (c[k-1]!=rt) goto scan_open1; case one: @; open=0;@+ goto scan_zeros1; case mid:@+if (!(mid_or_rt(c[k]))) goto scan_mid1; @ if (c[k-1]!=rt) goto scan_mid1; case rt:@+ if (!s) { if (stk[0].closed_cost.c[1][1]) panic("Unmatched )"); stk[0].closed_cost=cm; /* already connected to left edge */ }@+ else { @; @; } open=0;@+ goto scan_zeros1; case eol: goto scan_eol1; default: panic("Illegal code"); } scan_open1: @;@+goto check_eol1; scan_mid1: @; check_eol1: open=1; if (c[k]!=eol) goto scan_zeros1; scan_eol1: @; /* fall through to return the answer */ @ @= { scan_tokens1a:@+ do { k++; if (c[k-1]==rt) break; }@+while (mid_or_rt(c[k])); cm.c[0][1]=uunity; cm.c[1][1]+=uunity; /* now |cm| is $\bigl({0\atop1}\ {1+\epsilon\atop2+\epsilon}\bigr)$ or $\bigl({0\atop1+\epsilon}\ {1+\epsilon\atop2+2\epsilon}\bigr)$ */ } @ @= { scan_tokens1b:@+ do { k++; if (c[k-1]==rt) break; }@+while (mid_or_rt(c[k])); cm.c[0][1]=uunity; cm.c[1][1]+=uunity; /* now |cm| is $\bigl({0\atop1}\ {1+\epsilon\atop2+\epsilon}\bigr)$ or $\bigl({0\atop1+\epsilon}\ {1+\epsilon\atop2+2\epsilon}\bigr)$ */ } @ The cost of reaching the right edge is |g| if |g| has the form $b+b\epsilon$, but it is $a+b$ if |g| has the exceptional form $a+1+b+b\epsilon$ that arises when filling an unfilled row. @= if (open) { @; if (s) panic("Missing )"); return stk[0].closed_cost.c[0][0]; }@+ else { /* we need to reach the right edge */ if (s) panic("Missing )"); if (int_part(g)==eps_part(g)) return stk[0].closed_cost.c[0][typ]+g; return stk[0].closed_cost.c[0][1]+((int_part(g)-1)<<8); } @*Data structures. If we want to count $n$-ominoes efficiently for large~$n$, our most precious resource turns out to be the random-access memory that is available. I think at least 20 bytes of memory are needed per active configuration, and that limits us to about 50 million active configurations per gigabyte of memory. Under such circumstances I don't mind recomputing results several times in order to save space. For example, many configurations will occur on several different rows, and this program recomputes their connectivity cost each time they appear. The main loop of our computation will consist of taking a viable configuration $\alpha$ and looking at its two successors $\alpha_0$ and $\alpha_1$. The value of |row_end| in $\alpha_0$ and $\alpha_1$ (the position of \qcaret/) will be one greater than its value in~$\alpha$; $\alpha_0$ will leave the new cell empty, but $\alpha_1$ will occupy it. We gain time when $\alpha_0$ and/or $\alpha_1$ have been seen before; a hash table helps us determine whether or not they are d\'ej\`a vu. Newly seen configurations are subjected to the |connectivity| bound, and accepted into the computation only if they are found to be viable. Each configuration $\alpha$ has an associated {\it generating function\/} $g(\alpha)$. For example, the generating function $5z^8+2z^9$ would mean that there are 5 ways to reach $\alpha$ with 8 cells occupied and 2 ways to reach it with 9 cells occupied (so far). We will add $g(\alpha)$ to $g(\alpha_0)$ if $\alpha_0$ is viable, and $zg(\alpha)$ to $g(\alpha_1)$ if $\alpha_1$ is viable. After that we are able to forget $\alpha$ and $g(\alpha)$, reclaiming precious memory space. Indeed, we don't actually have enough space to deal with the generating functions $g(\alpha)$. Therefore this program compiles and outputs a sequence of instructions that will be interpreted by another program, {\mc POLYSLAVE}; that program will subsequently do the actual additions, without needing a hash table or other space-hungry data. The algorithm proceeds row by row, starting in row $r=1$, and continues until no viable configurations remain. In each row it makes $w$ complete passes over the existing configurations, with |row_end| running from 0 to $w-1$. Every such pass processes and discards all configurations that were generated on the previous pass, and we are free to process them in any convenient order. Therefore we adopt a ``rolling'' strategy that uses memory with near-maximum efficiency: Suppose one pass has produced $M$ configurations in the first $M$ slots $\alpha^{(1)}$, \dots, $\alpha^{(M)}$ of a memory pool consisting of $N=|conf_size|$ total slots. We start by finding the successors $\alpha^{(M)}_0$ and $\alpha^{(M)}_1$ of $\alpha^{(M)}$, putting them into slots $N$ and $N-1$. Then slot~$M$ is free and we turn to $\alpha^{(M-1)}$, etc., thereby running out of memory only if $N$ configurations are fully in use. On the following pass we reverse direction, filling slots from 1~upwards instead of from $N$ downwards. The same strategy makes it easy to allocate the memory space needed for generating functions in the {\mc POLYSLAVE} program. Indeed, this rolling allocation scheme fills memory almost perfectly, even though each generating function occupies a variable number of bytes. (Sometimes holes do appear, if a generating function turns out to need more bytes than we thought it would. But the behavior in general is quite satisfactory.) @ Eight bytes suffice to encode any configuration $c_0\ldots c_{w-1}$ in a 5-letter alphabet, since $5^{27}<2^{64}<5^{28}$ and we are assuming that $w$ is at most~27. We define \&{cstring} to a union type, so that the hash function can readily access its individual bytes, yet packing and unpacking can be done with radix 5 or 8. (The hash function will evaluate differently on a big-endian machine versus a little-endian machine, but that doesn't matter.) @= typedef struct { unsigned int h,l; /* high-order and low-order halves */ } octa; /* two tetrabytes make one octabyte */ typedef union { octa o; unsigned char byte[8]; } cstring; /* packed version of a configuration string */ @ The |pack| subroutine produces a |cstring| from the codes in array |c|. Since we are assuming provisionally that |w| is at most 23, we can use octal notation to put 10 codes in one tetrabyte and quinary notation to put 13 in the other. But quinary notation could obviously be used in both tetrabytes, or we could use pure quinary on octabytes, in variants of this program designed for larger values of~|w|. @= cstring packit() { register int j,k; cstring packed; k=w-1,j=c[w]; if (w<=10) packed.o.h=0; else { for (; k>10; k--) j=(j<<2)+j+c[k]; packed.o.h=j; k=9,j=c[10]; } for (;k>0;k--) j=(j<<3)+c[k]; packed.o.l=j; return packed; } @ That which can be packed can be unpacked. This routine puts the results into two arrays, |sc| and |c|, because it is used only when unpacking a new source configuration $\alpha$. There's an all-binary way to divide by 5 that is faster than division on some machines (see {\sl TAOCP\/} exercise 4.4--9). But the simple `/5' works best on my computer, winning also over floating point division. Curiously, I also found that `|x*5|' is slower than `$(x<<2)+x$', but `|y-5*x|' is faster than `|y-((x<<2)+x)|'. Some quirk of pipelining probably underlies these phenomena. I am content to leave such mysteries unexplained for now, because the present speed is acceptable. @= void unpackit(cstring s) { register int j,k,q; if (w>10) { for (k=1,j=s.o.l; k<10; k++) { sc[k]=c[k]=j&7; j>>=3; } sc[10]=c[10]=j; for (k=11,j=s.o.h; k>=3; } sc[k]=c[k]=j; } @ @= code sc[64]; /* codes of the current source configuration */ @ @= c[0]=sc[0]=c[w+1]=sc[w+1]=eol; @ @= void print_config(int row_end) { register int k; for (k=1;k<=w;k++) { if (row_end==k) printf("^"); if (c[k]= typedef enum {@!active, @!raw, @!deleted} status; @ Here then are the 20 precious bytes that represent a configuration. The rolling strategy allows us to get by with only one link field. @= typedef struct conf_struct { cstring s; /* the configuration name */ unsigned int addr; /* where the slave keeps the generating function */ struct conf_struct *link; /* the next item in a hash chain or hole list */ char lo; /* smallest exponent of $z$ in the current generating function */ char hi; /* largest exponent of $z$ in the current generating function */ char lim; /* largest viable exponent of $z$, if this is a target */ status state; /* |active|, |raw|, or |deleted| */ } config; @ @= conf=(config*)calloc(conf_size,sizeof(config)); if (!conf) panic("I can't allocate the config table"); conf_end=conf+conf_size; @ The main high-level routine, called |update|, is used to add terms |p->lo| through |hi| of the generating function for configuration~|p| to the generating function for configuration~|q|. The special case |q=NULL| is used to update the counters for polyominoes that have been completed. @= void update(config *p, config *q, char hi) { if (!q) basic_inst(add,p->addr,p->lo,hi+1-p->lo); else if (q->state==raw) { q->state=active; if (q->lo!=p->lo || q->hi!=hi) basic_inst(clear,cur_src,q->addr,q->hi+1-q->lo); basic_inst(copy,p->addr,q->addr+p->lo-q->lo,hi+1-p->lo); }@+else basic_inst(add,p->addr,q->addr+p->lo-q->lo,hi+1-p->lo); } @ ``Universal hashing'' ({\sl TAOCP\/} exercise 6.4--72) is used to get a good hash function, because most of the key bits tend to be zero. @d hash_width 20 /* lg of hash table size */ @d hash_mask ((1<= int mangle(cstring s) { register unsigned int h,l; for (l=1,h=hash_bits[0][s.byte[0]]; l<8; l++) h+=hash_bits[l][s.byte[l]]; return h&hash_mask; } @ @= unsigned int hash_bits[8][256]; /* random bits for universal hashing */ config *hash_table[hash_mask+1]; /* heads of the hash chains */ @ The random number generator used here doesn't have to be of sensational quality. We can keep |hash_bits[j][0]=0| without loss of universality. @= row_end=314159265; /* borrow a register temporarily (bad style, sorry) */ for (j=0;j<8;j++) for (k=1; k<256; k++) { row_end=69069*row_end+1; hash_bits[j][k]=row_end>>(32-hash_width); } @ @= register int j,k; /* all-purpose indices */ register int row_end; /* size of the current partial row |r| */ @ On odd-numbered passes, |src| runs down towards |conf|, while |trg| starts at |conf_end-1| and proceeds downward. On even-numbered passes, |src| runs up towards |conf_end-1| and |trg| starts up from |conf|. The variables |ssrc| and |strg| have a similar significance but they refer to addresses in the slave memory. @= config *conf; /* first item in the pool of configuration nodes */ config *conf_end; /* last item (plus 1) in the pool of configuration nodes */ config *src; /* the current configuration $\alpha$ about to be recycled */ config *trg; /* the first unused configuration slot */ int ssrc, strg; /* allocation pointers for slave counts */ @ When a new configuration is created, we allocate space for its generating function in the slave module. Later on, we might discover that more space is needed because another generating function (with more terms) must be combined with it. At such times, we copy the data to another slot, leaving a hole in the configuration array and in the slave's array of counters. All holes of a given size are linked together, so that they can hopefully be plugged again soon. Here is the basic subroutine that allocates space for a configuration with |s+1| terms in its generating function. The subroutine has two versions, one for passes in which allocation goes upward and the other for passes in which allocation goes downward. It maintains statistics so that we can judge how fragmented the memory has become at the most stressful times. @= config *get_slot_up(register int s) { register config *p=slot[s]; if (p) { slot[s]=p->link; holes--, sholes-=s+1; }@+else { p=trg++; @addr| (upward) and check that memory hasn't overflowed@>; } p->state=raw; return p; } @ @addr| (upward) and check that memory hasn't overflowed@>= { if (src-trgaddr=strg; strg+=s+1; if (ssrc-strg= int holes; /* current number of holes in the target area */ int sholes; /* current number of vacated counters in slave target area */ int min_space=1000000000; /* how close did |src| and |trg| get? */ int min_holes; /* and how many holes were present at that time? */ int space_row, space_col; /* and where were we then? */ int min_sspace=1000000000; /* how close did |ssrc| and |strg| get? */ int min_sholes; /* and how many wasted counters were present then? */ int slave_row, slave_col; /* and where were we then? */ int moves; /* the number of times a hole was created */ int configs; /* total configurations recorded so far, mod $10^9$ */ int hconfigs; /* billions of configurations so far */ int r; /* number of the partially filled row */ int re; /* non-register copy of |row_end| */ config *slot[nmax+1]; /* heads of the available-slot chains */ @ @= printf("Altogether "); if (hconfigs) printf("%d%09d", hconfigs, configs); else printf("%d", configs); printf(" viable configurations examined;\n"); printf(" %d slots needed (with %d holes) in position (%d,%d);\n", conf_size-min_space,min_holes,space_row,space_col); printf(" %d counters needed (with %d wasted) in position (%d,%d);\n", slave_size-min_sspace,min_sholes,slave_row,slave_col); printf(" %d moves.\n",moves); @ @= config *get_slot_down(register int s) { register config *p=slot[s]; if (p) { slot[s]=p->link; holes--, sholes-=s+1; }@+ else { p=trg--; @addr| (downward) and check that memory hasn't overflowed@>; } p->state=raw; return p; } @ @addr| (downward) and check that memory hasn't overflowed@>= { if (trg-srcaddr=strg+1; } @ The |move_down| and |move_up| subroutines are invoked when an active target configuration |p| needs more space for its generating function. The global variable |hash| will have been set so that |hash_table[hash]=p|; we effectively move that configuration to another place in the sequential list of targets, and return a pointer to the new place. The former node |p| is now marked |deleted|, but its |addr| field remains valid (in case \\{get\_slot} is able to reuse it). @= config *move_down(config *p, int lo, int hi) { register config *q,*r; register int s=p->lo, t=p->hi; r=p->link; p->link=slot[t-s], slot[t-s]=p; p->state=deleted; holes++, sholes+=t-s+1; if (s>lo) s=lo; if (tlo=s, q->hi=t; q->s=p->s, q->lim=p->lim; hash_table[hash]=q, q->link=r; update(p,q,p->hi); moves++; return q; } @ @= config *move_up(config *p, int lo, int hi) { register config *q,*r; register int s=p->lo, t=p->hi; r=p->link; p->link=slot[t-s], slot[t-s]=p; p->state=deleted; holes++, sholes+=t-s+1; if (s>lo) s=lo; if (tlo=s, q->hi=t; q->s=p->s, q->lim=p->lim; hash_table[hash]=q, q->link=r; update(p,q,p->hi); moves++; return q; } @* The main loop. Now that we have some infrastructure in place, we can map out the top levels of this program's main processing cycle. We start with an all-|zero| configuration, at the very top of the width-$w$ array of cells that we will conceptually traverse; this configuration serves as the great-$\,\ldots\,$-great grandparent of all other configurations that will arise later. The slave module will begin by giving this configuration the trivial generating function `1' (namely $z^0$) in its counter cell number~0, meaning that there's just one way to reach the initial configuration, and that no cells are occupied so far. There is no need to initialize |conf[0].lim| or |conf[0].link|, because those fields are used only when a configuration is a target. The other fields---namely |conf[0].s|, |conf[0].addr|, |conf[0].lo|, |conf[0].hi|, and |conf[0].state|--- are initially zero by the conventions of \CEE/, and luckily those zeros happen to be just what we want. @= r=0, row_end=w; trg=conf+1; /* pretend that the previous pass produced a single result */ strg=1; @ Once again it seems best to write two nearly identical pieces of code, depending on whether the allocation is actually moving upward or downward. (We're supposed to be hoarding memory, but the space required for this program is small potatoes.) @= while (1) { @; @; @; @; } @ @= if (row_end; }@+else { if (r) { printf("Finished row %d", r); @; if (r>w) put_inst(sync,r); } @; r++, row_end=1; } if (trg==conf) break; /* the previous pass was sterile */ src=trg-1; /* start the source pointer at the highest occupied node */ ssrc=strg-1; /* and the highest occupied counter position */ trg=conf_end-1; /* start the target pointer at the highest unoccupied node */ strg=slave_size-1; /* and the highest unoccupied counter position */ re=row_end; @ @= while (src>=conf) { if (src->state==active) { unpackit(src->s); /* Put the source configuration $\alpha$ into |sc| and |c| */ if (verbose) { print_config(row_end);@+printf("\n"); } @; if (viable) @; for (k=1;k<=w;k++) c[k]=sc[k]; @; if (viable) @; } ssrc=src->addr-1; src--; /* the old |src| node is now outta here */ } @ Timely progress reports let the user know that we are still chugging along. We are about to start an upward pass if and only if |src==conf-1|. @= if (src==conf-1) printf(" (%d,%d,",conf_end-1-trg,slave_size-1-strg); else printf(" (%d,%d,",trg-conf,strg-n-1); printf("%d,%d,%d,%d,%d)\n", conf_size-min_space, min_holes, slave_size-min_sspace, min_sholes,bytes_out); @; fflush(stdout); @ Counter positions 1 through $n$ in the slave memory are reserved for the final polyomino counts. @= if (row_end; }@+else { if (r) { printf("Finished row %d", r); @; if (r>w) put_inst(sync,r); } r++, row_end=1; } if (trg==conf_end-1) break; /* the previous pass was sterile */ src=trg+1; /* start the source pointer at the lowest occupied node */ /* and we'll soon set |ssrc| to |src->addr|, which equals |strg+1|, the lowest occupied counter */ trg=conf; /* start the target pointer at the lowest unoccupied node */ strg=n+1; /* and the lowest unoccupied counter position */ re=row_end; @ @= while (srcstate==active) { ssrc=src->addr; unpackit(src->s); /* Put the source configuration $\alpha$ into |sc| and |c| */ if (verbose) { print_config(row_end);@+printf("\n"); } @; if (viable) @; for (k=1;k<=w;k++) c[k]=sc[k]; @; if (viable) @; } src++; /* the old |src| node is now outta here */ } @* Nitty-gritty. The basic logic of a so-called ``transfer-matrix'' approach is embedded in the following program steps, which change a configuration string when a new cell in row~$r$ is or is not to be occupied. Here, for example, we observe that when the previous configuration has `\.{1\caret/(}' at the end of a partial row, and if we occupy the new cell, the new configuration has `\.{(-}\caret/\thinspace' instead. But if we don't occupy that cell, the new configuration has `\.{10}\caret/\thinspace' and a further change must also be made because of the \.( that has disappeared. \def\sp(#1){\rlap{$^{\rm\,#1}$}} \def\key(#1) {&\omit&\multispan5\quad\sp(#1)\quad} Most of the cases that arise are completely straightforward. But each of the thirty combinations of two adjacent codes must of course be handled perfectly. The following chart summarizes what the program is supposed to do when the new cell is being left vacant. $$\vbox{\offinterlineskip \halign{\strut\hfil\tt#\quad&\vrule#& \hbox to4em{\hfil\tt#\hfil}& \hbox to4em{\hfil\tt#\hfil}& \hbox to4em{\hfil\tt#\hfil}& \hbox to4em{\hfil\tt#\hfil}& \hbox to4em{\hfil\tt#\hfil}& \vrule#\cr &\omit&0&1&(&-&)\cr \noalign{\vskip2pt} \omit&&\multispan5\hrulefill&\cr \omit&height3pt&&&&&&\cr 0&&00&\sp(a)&00\sp(b)&00\sp(c)&00\sp(d)&\cr 1&&10&\sp(a)&10\sp(b)&10\relax&10\sp(d)&\cr (&&(0&\sp(e)& \sp(e)&(0\relax&(0\sp(d)&\cr -&&-0&\sp(a)&-0\sp(b)&-0\relax&-0\sp(d)&\cr )&&)0&\sp(a)&-0\sp(b)&)0\relax&-0\sp(d)&\cr \rm left edge&&0&\sp(e)&\sp(e)&0\sp(c)&\sp(a)&\cr \omit&height1pt&&&&&&\cr \omit&&\multispan5\hrulefill&\cr \noalign{\smallskip} \key(a) Not viable\hfil\cr \key(b) Downgrade the successor of \.(\hfil\cr \key(c) A polyomino may have been completed\hfil\cr \key(d) Downgrade the predecessor of \.)\hfil\cr \key(e) Impossible case\hfil\cr}}$$ Special handling is necessary when a \.- is eliminated, if it is preceded or followed by nothing but zeros; for example, \.{0(010-} must become \.{010(00}, and \.{-010()0-0} must become either \.{00)0()0(0} or \.{)010(-000}. These somewhat unusual cases are approached cautiously in the program below. @d f(x,y) ((x<<3)+y) @= pair=f(sc[row_end-1],sc[row_end]); c[row_end]=zero; viable=1; switch(pair) { case f(zero,one): case f(one,one): case f(mid,one): case f(rt,one): case f(eol,rt): viable=0; /* component would be isolated */ case f(zero,zero): case f(one,zero): case f(lft,zero): case f(mid,zero): case f(rt,zero): case f(eol,zero): case f(lft,mid): case f(mid,mid): break; case f(zero,lft): case f(one,lft): case f(mid,lft): case f(rt,lft): @;@+break; case f(zero,rt): case f(one,rt): case f(lft,rt): case f(mid,rt): case f(rt,rt): @;@+break; case f(zero,mid): case f(eol,mid): @; case f(one,mid): case f(rt,mid): @; break; case f(lft,one): case f(lft,lft): case f(eol,one): case f(eol,lft): panic("Impossible configuration"); default: panic("Impossible pair"); } @ @= int pair; /* the two codes surrounding the \qcaret/ in |sc| */ int viable; /* might the target configuration lead to a relevant polyomino? */ @ In this step, we have just zeroed out a left parenthesis. If that \.( is followed by a \.-, we change the \.- to \.(; if it is followed by a \.), we change the \.) to \.1. Here ``followed by'' really means ``followed on the same level by,'' because nested subcomponents may intervene. We therefore need a level counter, |j|, as we scan to the right. If the \.( had no successor because it simply marked a connection to the right edge, we shouldn't have deleted it; its component is now disconnected, so we set |viable| to zero. @= for (k=row_end+1,j=0; ; k++) { switch (c[k]) { case lft: j++; case zero: case one: continue; case mid:@+if (j) continue; c[k]=lft;@+break; case rt:@+if (j) {@+j--;@+continue;@+} c[k]=one;@+break; case eol:@+if (j) panic("Unexpected eol"); viable=0; } break; } @ Contrariwise, an erased \.) is like an erased \.( but vice versa. @= for (k=row_end-1,j=0; ; k--) { switch (c[k]) { case rt: j++; case zero: case one: continue; case mid:@+if (j) continue; c[k]=rt;@+break; case lft:@+if (j) {@+j--;@+continue;@+} c[k]=one;@+break; case eol:@+if (j) panic("Unexpected eol"); viable=0; } break; } @ @= for (k=row_end-1; c[k]==zero; k--); if (c[k]==eol) { /* yes, the |mid| was leftmost */ for (k=row_end+1; c[k]==zero; k++) ; switch (c[k]) { case mid: case rt: case eol: break; /* no problem */ default:@+if (c[k]==one) c[k]=rt, j=0; else c[k]=mid, j=1; /* |c[k]| was |lft| */ for (k++; ; k++) { /* we must downgrade the successor of the |mid| */ switch (c[k]) { case lft: j++; case zero: case one: continue; case mid:@+if (j) continue; c[k]=lft;@+break; case rt:@+if (j) {@+j--;@+continue;@+} c[k]=one;@+break; case eol: panic("This can't happen"); } break; } } } @ @= for (k=row_end+1; c[k]==zero; k++); if (c[k]==eol) { /* yes, the |mid| was rightmost */ for (k=row_end-1; c[k]==zero; k--) ; switch (c[k]) { case mid: case lft: case eol: break; /* no problem */ default:@+if (c[k]==one) c[k]=lft, j=0; else c[k]=mid, j=1; /* |c[k]| was |rt| */ for (k--; ; k--) { /* we must downgrade the predecessor of the |mid| */ switch (c[k]) { case rt: j++; case zero: case one: continue; case mid:@+if (j) continue; c[k]=rt;@+break; case lft:@+if (j) {@+j--;@+continue;@+} c[k]=one;@+break; case eol: panic("This can't happen"); } break; } } } @ A different kind of excitement awaits us when we consider occupying the new cell. If the cases |f(lft,mid)|, |f(lft,rt)|, |f(mid,mid)|, and |f(mid,rt)| are modified here to set |viable=0|, the program will count {\it polyomino trees\/} instead of normal polyominoes. (In a polyomino tree there is exactly one way to get from one cell to another via rook moves.) These are the four cases in which already-connected cells are connected again. $$\vbox{\offinterlineskip \halign{\strut\hfil\tt#\quad&\vrule#& \hbox to4em{\hfil\tt#\hfil}& \hbox to4em{\hfil\tt#\hfil}& \hbox to4em{\hfil\tt#\hfil}& \hbox to4em{\hfil\tt#\hfil}& \hbox to4em{\hfil\tt#\hfil}& \vrule#\cr &\omit&0&1&(&-&)\cr \noalign{\vskip2pt} \omit&&\multispan5\hrulefill&\cr \omit&height3pt&&&&&&\cr 0&&01\sp(k)&01& 0(&0-\relax&0)\relax&\cr 1&&()& ()&(-\relax&--\relax&-)\relax&\cr (&&(-&\sp(e)& \sp(e)&(-\sp(f)&()\sp(f)&\cr -&&--& --&--\sp(g)&--\sp(f)&-)\sp(f)&\cr )&&-)& -)&--\relax&--\sp(h)&-)\sp(h)&\cr \rm left edge&&)\sp(i)&\sp(e)&\sp(e)&-&)&\cr \omit&height1pt&&&&&&\cr \omit&&\multispan5\hrulefill&\cr \noalign{\smallskip} \key(e) Impossible case\hfil\cr \key(f) Not viable in polyomino trees\hfil\cr \key(g) Merge with mate of \.(\hfil\cr \key(h) Merge with mate of \.)\hfil\cr \key(i) Downgrade the next component if open\hfil\cr \key(j) Downgrade the previous component if open\hfil\cr \key(k) Or possibly \.{0)\sp(i)} or \.{0(\sp(j)}\hfil\cr }}$$ The somewhat unusual case \.{01\sp(k)}\ \ becomes \.{0)\sp(i)} if no nonzero cells lie to the left but the left edge has already been occupied somewhere in the rows above. It becomes \.{0(\sp(j)} if no nonzero cells lie to the right but the right edge has already been occupied somewhere in the rows above. @= viable=1; src->lo++, src->hi++; /* implicitly multiply the generating function $g(\alpha)$ by $z$ */ switch(pair) { case f(one,zero): case f(one,one): c[row_end-1]=lft, c[row_end]=rt; case f(zero,one): case f(zero,lft): case f(zero,mid): case f(zero,rt): case f(lft,mid): case f(lft,rt): case f(mid,mid): case f(mid,rt): case f(eol,mid): case f(eol,rt): break; case f(one,lft): c[row_end-1]=lft, c[row_end]=mid;@+ break; case f(one,mid): case f(one,rt): c[row_end-1]=mid;@+ break; case f(lft,zero): case f(mid,zero): case f(mid,one): c[row_end]=mid;@+ break; case f(mid,lft): c[row_end]=mid; @;@+break; case f(rt,zero): case f(rt,one): c[row_end-1]=mid, c[row_end]=rt;@+ break; case f(rt,lft): c[row_end-1]=c[row_end]=mid;@+ break; case f(rt,mid): case f(rt,rt): c[row_end-1]=mid; @;@+break; case f(eol,zero): c[row_end]=rt; @;@+break; case f(zero,zero): @;@+ break; case f(lft,one): case f(lft,lft): case f(eol,one): case f(eol,lft): panic("Impossible configuration"); default: panic("Impossible pair"); } if (row_end==w) @; @ @= for (k=row_end+1,j=0;;k++) { switch(c[k]) { case lft: j++; case zero: case one: case mid: continue; case rt:@+if (!j) break; j--;@+ continue; case eol: panic("Unexpected eol"); } c[k]=mid; @+break; } @ @= for (k=row_end-2,j=0;;k--) { switch(c[k]) { case rt: j++; case zero: case one: case mid: continue; case lft:@+if (!j) break; j--;@+ continue; case eol: panic("Unexpected eol"); } c[k]=mid; @+break; } @ @= for (k=2;;k++) { switch(c[k]) { case zero: continue; case mid: c[k]=lft; case one: case lft: case eol: break; case rt: c[k]=one; } break; } @ @= c[row_end]=one; for (k=row_end-2; c[k]==zero; k--) ; if (!k) { /* we're introducing a new leftmost \.1 */ for (k=row_end+1;; k++) { switch(c[k]) { case zero: continue; case mid: c[k]=lft, c[row_end]=rt; case one: case lft: case eol: break; case rt: c[k]=one, c[row_end]=rt; } break; } }@+else { for (j=row_end+1; c[j]==zero; j++) ; if (c[j]==eol) { /* we're introducing a new rightmost \.1 */ if (c[k]==mid) c[k]=rt, c[row_end]=lft; else if (c[k]==lft) c[k]=one, c[row_end]=lft; } } @ @= switch(c[row_end]) { case rt: c[row_end]=mid; case zero: case mid: case lft: break; case one: c[row_end]=lft; @; } @ @= for (k=row_end-1;;k--) { switch(c[k]) { case zero: continue; case mid: c[k]=rt; case one: case rt: case eol: break; case lft: c[k]=one; } break; } @* Nittier-and-grittier. The last nontrivial hurdle facing us is the problem of what to do after a target configuration has been constructed in |c[1]| through |c[w]|. It's not a Big Problem, but it does require care, especially with respect to the generating function arithmetic. An all-|zero| target configuration is preserved only in the first few passes, before we've reached the end of row~1. @= { if (row_end==w) @; target=packit(); if (target.o.l || target.o.h || (r==1 && row_end; if (!p) @; if (p && (src->lo<=(j=p->lim))) { if (src->hihi; if (j>p->hi || src->lolo) p=move_down(p,src->lo,j); if (verbose) { printf(" -> ");@+print_config(row_end+1);@+printf("\n"); } update(src,p,j); } }@+else if (r>w) { if (verbose) printf(" -> 0\n"); update(src,NULL,src->hi); /* polyominoes completed */ } } @ @= cstring target; /* the packed name of the current target configuration */ int hash; /* its hash address */ @ @= register config *p; /* current target of interest */ @ At the end of a row we change the configuration to its left-right reflection, if the reflection is lexicographically smaller. This reduction to a canonical form reduces the number of active configurations by a factor of nearly~2, at least for the next few passes. (The reduction can be justified by observing that we could have operated from right to left instead of from left to right, on each row that follows a left-heavy row.) Notice that a code like \.{0(0} will be reflected to \.{0)0}. @= { for (j=1,k=w; j<=k; j++,k--) if (c[j]!=reflect[c[k]]) break; if (c[j]>reflect[c[k]]) for (; j<=k; j++,k--) { register int i=c[k]; c[k]=reflect[c[j]]; c[j]=reflect[i]; } } @ We take care to move |p| to the top of its hash list, when present, and to set the global variable |hash| as required by |move_down| and |move_up|. @= hash=mangle(target); p=hash_table[hash]; if (p && !(p->s.o.l==target.o.l && p->s.o.h==target.o.h)) { register config *q; for (q=p,p=p->link; p; q=p,p=p->link) if (p->s.o.l==target.o.l && p->s.o.h==target.o.h) break; if (p) { q->link=p->link; /* remove |p| from its former place in the list */ p->link=hash_table[hash]; /* and insert it at the front */ hash_table[hash]=p; } } @ If the target is viable, |p| will be set to a fresh node with |p->state=raw|. The reader can verify that |move_down| will not then be necessary. @= { j=connectivity(row_end+1); if (r>=w) j=int_part(j); else if (int_part(j)==eps_part(j)) j=int_part(j)+(w-r); else j=int_part(j)+(w-1-r); /* |j| more cells are needed in a valid polyomino */ if (src->lo+j<=n) { if (++configs==1000000000) configs=0, hconfigs++; p=get_slot_down((src->hi>n-j? n-j: src->hi) - src->lo); p->link=hash_table[hash], hash_table[hash]=p; p->s=target; p->lo=src->lo, p->hi=src->hi, p->lim=n-j; if (p->hi>p->lim) p->hi=p->lim; } } @ @= { if (row_end==w) @; target=packit(); if (target.o.l || target.o.h || (r==1 && row_end; if (!p) @; if (p && (src->lo<=(j=p->lim))) { if (src->hihi; if (j>p->hi || src->lolo) p=move_up(p,src->lo,j); if (verbose) { printf(" -> ");@+print_config(row_end+1);@+printf("\n"); } update(src,p,j); } }@+else if (r>w) { if (verbose) printf(" -> 0\n"); update(src,NULL,src->hi); /* polyominoes completed */ } } @ @= { j=connectivity(row_end+1); if (r>=w) j=int_part(j); else if (int_part(j)==eps_part(j)) j=int_part(j)+(w-r); else j=int_part(j)+(w-1-r); /* |j| more cells are needed in a valid polyomino */ if (src->lo+j<=n) { if (++configs==1000000000) configs=0, hconfigs++; p=get_slot_up((src->hi>n-j? n-j: src->hi) - src->lo); p->link=hash_table[hash], hash_table[hash]=p; p->s=target; p->lo=src->lo, p->hi=src->hi, p->lim=n-j; if (p->hi>p->lim) p->hi=p->lim; } } @* Checkpointing. One of the goals of this program is to establish new world records. Thus, local resources are probably being stretched to their current limits, and several days of running time might well be involved. It's prudent therefore to make the program stop at a suitable ``checkpoint,'' firming up what has been accomplished so far; then we won't have to go back to square one when recovering from a disaster. We should also use {\mc POLYSLAVE} to reduce the intermediate data at such times, thereby freeing up nearly all of the disk space we've been filling before we proceed to fill some more. The code in this section is executed at a particularly convenient time: A new row is about to begin, and so is a new downward pass. It's as good a time as any to dump out the configuration-table-so-far in a form that can easily be used by a special version of this program to get going again when we're ready to resume. (See the change file \.{polynum-restart.ch} for details.) @d gig_threshold 5 /* try to avoid filling more than about twice this many gigabytes of disk space */ @= if (file_extension>=gig_threshold && trg!=conf) { @; sprintf(filename,"%.90s.dump",base_name); out_file=fopen(filename,"wb"); if (!out_file) panic("I can't open the dump file"); @; @; printf("[%d bytes written on file %s.]\n",ftell(out_file),filename); exit(1); } @ A special |sync| instruction with parameter 255 tells {\mc POLYSLAVE} that it should invoke its own checkpointing activity. @= put_inst(sync,255); @; printf("Checkpoint stop: Please process that data with polyslave,\n"); printf("then resume the computation with polynum-restart.\n"); @ Since we're at the beginning of a downward pass, the user will be able to restart this program with different values of |conf_size| and |slave_size| if desired. @= dump_data[0]=n; dump_data[1]=w; dump_data[2]=r; dump_data[3]=trg-conf; dump_data[4]=strg; if (fwrite(dump_data,sizeof(int),5,out_file)!=5) panic("Bad write at beginning of dump"); if (fwrite(conf,sizeof(config),trg-conf,out_file)!=trg-conf) panic("Couldn't dump the configuration table"); @ @= int dump_data[5]; /* parameters needed to restart */ @* Computational experience. With a suitable change file it is not difficult to convert this program to a one-pass routine that does the evaluation directly, provided that $n$ and $w$ are reasonably small. For example, when $n=30$ and $2\le w\le15$, all the computations were completed in 192 seconds (on 12 December 2000). The most difficult case, which took 68 seconds to complete, was for $w=13$, when 100,488 slots (with 0 holes) and 218980 counters (with 4114 wasted) were needed. The resulting number of $n$-ominoes for $n\le30$ agreed perfectly with the answers obtained from a completely different algorithm, using my now-obsolete program {\mc POLYENUM}. That program had taken more than 15 hours to count 30-ominoes, so Jensen's method ran almost 300 times faster. @^Jensen, Iwan@> Setting $n=47$ led to much more of an adventure, of course, since all space and time requirements grow exponentially. Some runs lasted several days, and various glitches and hardware failures added to the excitement. Detailed statistics about the performance, including the histograms computed here, were helpful for planning and for diagnosing various problems. @d hist_size 100 @= for (k=0;klink,j++) chist[p->hi-p->lo]++; if (j>jj) { if (j>=hist_size) j=hist_size-1; jj=j; } hhist[j]++; hash_table[k]=NULL; } printf("Hash histogram:"); for (j=1;j<=jj;j++) printf(" %d",hhist[j]); printf("\nCounters:"); for (k=nmax;k>=0;k--) if (chist[k]) break; for (j=0;j<=k;j++) printf(" %d",chist[j]); for (k=nmax;k>=0;k--) if (slot[k]) break; if (k>=0) { printf("\nHoles:"); for (j=0;j<=k;j++) { for (p=slot[j],jj=0;p;p=p->link,jj++) ; printf(" %d",jj); slot[j]=NULL; } } printf("\n"); holes=sholes=0; @ @= int hhist[hist_size]; /* histogram of hash chain lengths */ int chist[nmax+1]; /* histogram of counter table lengths */ int jj; /* auxiliary variable for statistics calculations */ @ The greatest difficulty for $n=47$ occurred when $w=20$; indeed, more than 100 gigabytes of data were passed to {\mc POLYSLAVE} in that case, and the computation lasted several days, so the checkpointing algorithm proved to be particularly helpful. Here is a summary of the main statistics from those runs. The number of ``configs'' is the total of distinct configurations, summed over all passes. The number of ``moves'' is the number of times |move_up| or |move_down| was called to increase the space allocated to a generating function. $$\vbox{\halign{&\quad\hfil#\cr $w\ $&slots&counters&configs&moves&bytes&\mc POLYNUM&\mc POLYSLAVE\cr \noalign{\vskip2pt} 23& 0.3M& 0.3M& 109M& 3M& 0.4G& 14 min& 5 min\cr 22& 6.2M& 8.0M& 2150M& 129M& 10.2G& 267 min\rlap*& 35 min\rlap*\cr 21& 28.6M& 46.5M& 9481M&1053M& 58.8G& 1911 min\rlap*&314 min\rlap*\cr 20& 40.2M& 94.0M& 12852M&2267M&103.6G& 2960 min\rlap*&574 min\rlap*\cr 19& 31.5M& 105.6M& 9183M&2099M& 86.8G& 1803 min\rlap*&497 min\rlap*\cr 18& 19.4M& 85.5M& 5220M&1318M& 52.9G& 749 min\rlap*&324 min\rlap*\cr 17& 10.1M& 58.3M& 2514M& 678M& 26.7G& 280 min\rlap*&172 min\rlap*\cr 16& 4.5M& 34.2M& 1091M& 308M& 11.9G& 137 min\rlap*&104 min\rlap*\cr 15& 1.9M& 18.2M& 437M& 127M& 4.9G& 51 min\rlap*& 44 min\rlap*\cr 14& 0.7M& 8.8M& 167M& 49M& 2.0G& 22 min& 31 min\cr 13& 0.3M& 4.1M& 62M& 19M& 0.8G& 8 min& 12 min\cr 12& 98K& 1.8M& 23M& 7M& 0.3G& 3 min& 5 min\cr 11& 37K& 789K& 8.5M& 2.6M& 0.1G& 84 sec& 2 min\cr 10& 14K& 334K& 3.1M& 0.9M& 40M& 45 sec& 21 sec\cr 9& 5K& 148K& 1124K& 340K& 15M& 30 sec& 7 sec\cr 8& 2K& 59K& 402K& 119K& 5M& 23 sec& 2 sec\cr 7& 875& 24K& 144K& 43K& 1.9M& 20 sec& 1 sec\cr 6& 350& 10K& 50K& 14K& 618K& 17 sec& 0 sec\cr 5& 146& 4K& 17K& 5K& 206K& 14 sec& 0 sec\cr 4& 57& 1484& 5410& 1460& 59K& 11 sec& 0 sec\cr 3& 22& 553& 1658& 430& 17K& 8 sec& 0 sec\cr 2& 7& 180& 318& 76& 3K& 6 sec& 0 sec\cr }}$$ * Done on computers with 1 gigabyte of memory, thanks to Andy Kacsmar of Stanford's database group. @^Kacsmar, Andrew Charles@> \smallskip\noindent (The case $w=24$ was omitted because of the formula $$ 8{h+w-2\choose w-1}-3hw+2h+2w-8, $$ which gives the total number of $n$-ominoes spanning an $h\times w$ rectangle when $n=h+w-1$ and $h>1$ and $w>1$.) @ I was glad to see that the allocation system used here for variable-length nodes, allowing ``holes,'' worked quite well: More than 98 percent of the memory space was typically being put to good use when it was needed. In fact, only 8 holes were present when the maximum demand of 40,219,325 slots occurred in the runs for $n=47$ (row 10 and column 11 when $w=20$); only 105 counters were wasted when the maximum demand of 105,578,552 counters occurred (row 12 and column 10 when $w=19$). @ Joke: George P\'olya was a polymath who worked on polyominoes. @^joke@> @*Index.