\datethis
\def\falling#1{^{\ff{#1}}}
\def\ff#1{\mkern1mu\underline{\mkern-1mu#1\mkern-2mu}\mkern2mu}
@*Intro. This program, inspired by {\mc HISTOSCAPE-COUNT}, calculates
the number of $m\times n$ ``whirlpool permutations,'' given $m$ and~$n$.
What's a whirlpool permutation, you ask? Good question.
An $m\times n$ matrix has $(m-1)(n-1)$ submatrices of size $2\times2$.
An $m\times n$ whirlpool permutation is a permutation of $(mn)!$
elements for which the relative order of the elements in each of those
submatrices is a ``vortex''---that is, it travels a cyclic path from
smallest to largest, either clockwise or counterclockwise.
Thus there are exactly eight $2\times2$ whirlpool permutations.
If the entries of the matrix are denoted $abcd$ from top to bottom
and left to right, they are 1243, 1423, 2134, 2314, 3241, 3421, 4132, 4312.
One simple test is to compare $a:b$, $b:d$, $d:c$, $c:a$; the number
of `$<$' must be odd. (Hence the number of `$>$' must also be odd.)
The enumeration is by a somewhat mind-boggling variant of dynamic programming
that I've not seen before. It needs to represent $n+1$ elements of
a permutation of~$t$ elements, where $t$ is at most $mn$,
and there are up to $(mn)\falling{n+1}$ such partial permutations;
so I can't expect to solve the problem unless $m$ and $n$ are
fairly small. Even so, when I {\it can\/} solve the problem it's kind of
thrilling, because this program makes use of a really interesting
way to represent $t\falling{n+1}$ counts in computer memory.
The same method could actually be used to enumerate matrices of permutations
whose $2\times2$ submatrices satisfy any arbitrary relations, when those
relations depend only the relative order of the four elements.
(Thus any of $2^{24}$ constraints might be prescribed for each of the
$(m-1)(n-1)$ submatrices. The whirlpool case, which accepts
only the eight relative orderings listed above, is just one of
zillions of possibilities.)
It's better to have $m\ge n$. But I'll try some cases with $m
#include
int m,n; /* command-line parameters */
unsigned long long *count; /* the big array of counts */
unsigned long long newcount[maxmn]; /* counts that will replace old ones */
unsigned long long mems; /* memory references to octabytes */
int x[maxn+1]; /* indices being looped over */
int ay[maxn+1];
int l[maxmn],u[maxmn];
int tpow[maxmn+1]; /* factorial powers $t\falling{n+1}$ */
@;
main(int argc,char*argv[]) {
register int a,b,c,d,i,j,k,p,q,r,mn,t,tt,kk,bb,cc,pdel;
@;
for (i=1;i;
@;
}
@ @=
if (argc!=3 || sscanf(argv[1],"%d",
&m)!=1 || sscanf(argv[2],"%d",
&n)!=1) {
fprintf(stderr,"Usage: %s m n\n",
argv[0]);
exit(-1);
}
mn=m*n;
if (m<2 || m>maxn || n<2 || n>maxn || mn>maxmn) {
fprintf(stderr,"Sorry, m and n should be between 2 and %d, with mn<=%d!\n",
maxn,maxmn);
exit(-2);
}
for (k=n+1;k<=mn;k++) {
register unsigned long long acc=1;
for (j=0;j<=n;j++) acc*=k-j;
if (acc>=0x80000000) {
fprintf(stderr,"Sorry, mn\\falling(n+1) must be less than 2^31!\n");
exit(-666);
}
tpow[k]=acc;
}
count=(unsigned long long*)malloc(tpow[mn]*sizeof(unsigned long long));
if (!count) {
fprintf(stderr,"I couldn't allocate %d entries for the counts!\n",
tpow[mn]);
exit(-4);
}
@ Suppose I want to represent $n+1$ specified elements of a permutation
of $t+1$ elements. For example, we might have $n=3$ and $t=8$, and
the final four elements of a permutation $y_0\ldots y_8$ might be
$y_5y_6y_7y_8=3142$. There are $(t+1)\falling{n+1}$
such partial permutations, and I can represent them compactly with
$n+1$ integer variables $x_{t-n}$, \dots, $x_{t-1}$,~$x_t$, where
$0\le x_j\le j$. The rule is that $x_j$ is $y_j-w_j$,
where $w_j$ is the number of elements ``inverted'' by $y_j$
(the number of elements to the right of $y_j$ that are less than~$y_j$).
In the example, $w_0w_1w_2w_3=2010$, so $x_5x_6x_7x_8=1132$.
(Or going backward, if $x_5x_6x_7x_8=3141$ then $y_5y_6y_7y_8=6251$.)
This representation has a beautiful property that we shall exploit.
Every permutation $y_0\ldots y_t$ of $\{0,\ldots,t\}$ yields
$t+2$ permutations $y'_0\ldots y'_{t+1}$ of $\{0,\ldots,t+1\}$ if
we choose $y'_{t+1}$ arbitrarily,
and then set $y'_j=y_j+\hbox{[$y_j{\ge}y'_{t+1}$]}$.
For example, if $t=8$ and $y_5y_6y_7y_8=3142$, the ten permutations
obtained from $y_0\ldots y_8$ will have $y'_5y'_6y'_7y'_8y'_9=
42530$, 42531, 41532, 41523, 31524, 31425, 31426, 31427, 31428, or 31429.
And the representations $x'_5x'_6x'_7x'_8x'_9$ of those last five
elements will simply be respectively 31420, 31421, \dots, 31429!
In general, we'll have $x'_j=x_j$ for $0\le j\le t$, and $x'_{t+1}=y'_{t+1}$
will be arbitrary.
@ Now comes the mind-boggling part.
I want to maintain a count $c(x_{t-n},\ldots,x_t)$ for each setting of
the indices $(x_{t-n},\ldots,x_t)$, but I want to put those counts
into memory in such a way that I won't clobber any of the existing
counts when I'm updating $t$ to $t+1$. In particular, if $x'_{t+1}\le t-n$,
I'll want $c(x'_{t+1-n},\ldots,x'_t,x'_{t+1})$ to be stored in exactly
the same place as $c(x'_{t+1},x_{t+1-n},\ldots,x_t)$ was stored
in the previous round. But if $x'_{t+1}>t-n$, I'll store
$c(x'_{t+1-n},\ldots,x'_t,x'_{t+1})$ in a brand-new, previously
unused location of memory.
Thus we shall use a memory mapping function $\mu_t$, different for each~$t$,
such that $c(x_{t-n},x_{t-n+1},\ldots,x_t)$ is stored in location
$\mu_t(x_{t-n},x_{t-n+1},\ldots,x_t)$ during round~$t$ of the computation.
Let's go back to the example in the previous section and apply it to
whirlpool permutations for $n=3$. Denote the permutation in the first
three rows by $y_0\ldots y_8$, where $y_6y_7y_8$ is the third row and
$y_5$ is the last element of the second row. (It's a permutation
of $\{0,\ldots,8\}$, representing the relative order of a final
permutation of $\{0,\ldots,3m-1\}$ that will fill the entire matrix.)
At this point we've
calculated counts $c(x_5,x_6,x_7,x_8)$ that tell us how many such
partial whirlpool permutations have any given setting of $y_5y_6y_7y_8$. In
particular, $c(1,1,3,2)$ counts those for which $y_5y_6y_7y_8=3142$.
To get to the next round, we essentially want to know how many partial
permutations $y'_0\ldots y'_9$ of $\{0,\ldots,9\}$ will have a given setting
of $y'_6y'_7y'_8y'_9$; the second row is now irrelevant to future computations.
It's the same as asking how many permutations have $y_6y_7y_8=142$.
Answer: $c(0,1,3,2)+c(1,1,3,2)+c(2,1,3,2)+c(3,1,3,2)+c(4,1,3,2)+c(5,1,3,2)$,
because these count the permutations with $y_5y_6y_7y_8=0142$, 3142,
5142, 6142, 7142, 8142.
Those six counts $c(k,1,3,2)$ appear in memory locations $\mu_8(k,1,3,2)$,
for $0\le k\le5$.
On the next round we'll want $c'(x'_6,x'_7,x'_8,x'_9)=c'(1,3,2,x'_9)$
to be set to their sum. These new counts will appear in memory locations
$\mu_9(1,3,2,x'_9)$. So we'd like $\mu_9(1,3,2,k)=\mu_8(k,1,3,2)$
when $0\le k\le5$.
Let
$\lambda_t(x_{t-n},\ldots,x_t)=
\bigl(\cdots((x_tt+x_{t-1})(t-1)+x_{t-2})\cdots\bigr)(t-n+1)+x_{t-n}=
x_t t\falling n+x_{t-1}(t-1)\falling{n-1}+\cdots x_{t-n}(t-n)\falling0$
be the standard mixed-radix representation of $(x_t\ldots x_{t-n})$
with radices $(t+1,t,\ldots,t-n+1)$. When each $x_j$ ranges from 0 to~$j$,
$\lambda_t(x_{t-n},\ldots,x_t)$ ranges from $\lambda_t(0,\ldots,0)=0$
to $\lambda_t(t-n,\ldots,t)=(t+1)\falling{n+1}-1$.
Therefore $\lambda_t$ would be the natural choice for $\mu_t$,
if we didn't want to avoid clobbering.
Instead, we use $\lambda_t$ only when $x_t$ is sufficiently large:
We define
$$\mu_t(x_{t-n},\ldots,x_t)=\cases{
\lambda_t(x_{t-n},\ldots,x_t),&if $x_t\ge t-n$;\cr
\mu_{t-1}(x_t,x_{t-n},\ldots,x_{t-1}),&if $x_t\le t-n-1$.\cr}$$
This recursion terminates with $\mu_n=\lambda_n$, because
$x_n$ is always $\ge0$. One can also show that
$\mu_{n+1}=\lambda_{n+1}$.
Back to our earlier example, what is $\mu_8(k,1,3,2)$? Since
$2\le4$, it's $\mu_7(2,k,1,3)$. And since $3\le3$, it's
$\mu_6(3,2,k,1)$. Which is $\mu_5(1,3,2,k)$. Finally, therefore,
if $k\le1$, the value is $\lambda_4(k,1,3,2)=68+k$;
but if $2\le k\le5$ it's $\lambda_5(1,3,2,k)=60k+34$.
In this program we will keep $x_j$ in location $x_{j\bmod(n+1)}$.
Consequently the arguments to $\mu_t$ and $\lambda_t$ will always be in
locations $(x_{(t+1)\bmod(n+1)},x_{(t+2)\bmod(n+1)},\ldots,
x_{t\bmod(n+1)})$.
@=
int mu(int t) {
register int r,a,p,tt;
for (r=t%(n+1),
tt=t;o,x[r]=
x1:@+for (k=0;k<=t;k++) o,l[k]=k+1;
o,l[t+1]=0; /* circularly linked list */
k=0, kk=t%(n+1);
x2:@+if (k==n) @;
oo,p=t+1,q=l[p],x[kk]=0;
x3:@+o,ay[k]=q;
x4:@+ooo,u[k]=p,l[p]=l[q],k++,kk=(kk?kk-1:n);
goto x2;
x5:@+o,p=q,q=l[p];
if (q<=t) {
oo,x[kk]++;
goto x3;
}
x6:@+if (--k>=0) {
kk=(kk==n?0:kk+1);
ooo,p=u[k],q=ay[k],l[p]=q;
goto x5;
}
@ At this point we're ready to do the ``inner loop'' calculation,
by using all counts $c(x_{t-n},\ldots,x_t)$ for $0\le x_{t-n}\le t-n$
to obtain updated counts that will allow us to increase~$t$. The array
$a_{n-1}\ldots a_0$ corresponds to $y_{t-n+1}\ldots y_t$ in the discussion
above; we want to loop over all choices for $y_{t-n}$, namely all
choices for $a_n$. Fortunately there's a linked list containing precisely
those choices, beginning at |l[t+1]|.
@=
{
@;
for (d=0;d<=t+1;d++) o,newcount[d]=0;
oo,b=ay[n-1],c=ay[0];
if (bcc) { /* whirlpool constraint when |a| not middle */
for (d=bb+1;d<=cc;d++) oo,newcount[d]+=tmp;
}@+else { /* whirlpool constraint when |d| not middle */
for (d=0;d<=bb;d++) oo,newcount[d]+=tmp;
for (d=cc+1;d<=t+1;d++) oo,newcount[d]+=tmp;
}
}
if (pdel) {
for (d=0;d<=t-n;d++) oo,count[p+d*pdel]=newcount[j?d:0];
for (;d<=t+1;d++) ooo,x[kk]=d,count[mu(t+1)]=newcount[j?d:0];
}@+else {
for (d=0;d<=t+1;d++) ooo,x[kk]=d,count[mu(t+1)]=newcount[j?d:0];
}
}
goto x6;
}
@ Our example of $\mu_8(k,1,3,2)$ shows that the mission of this
step is sometimes impossible. But the addressing scheme is usually simple,
so I decided to exploit that fact. (Being aware, of course, that
premature optimization is the root of all evil in programming.)
@=
for (tt=t,a=0,r=t%(n+1);
a=tt-n) break;
if (a==n) pdel=0; /* a difficult case */
else {
for (p=pdel=0,a=0;a<=n;a++,r=(r?r-1:n)) {
if (r!=kk) p=p*(tt+1-a)+x[r],pdel=pdel*(tt+1-a);
else p=p*(tt+1-a),pdel=pdel*(tt+1-a)+1;
}
}
@ @=
{
t=n*i+j-1;
if (t;
fprintf(stderr," done with %d,%d ..%lld, %lld mems\n",
i,j,count[0],mems);
}
@ @d thresh 1000000000000000000
@=
for (newcount[0]=newcount[1]=newcount[2]=0,p=tpow[mn]-1;p>=0;p--) {
if (count[p]>newcount[2]) newcount[2]=count[p],pdel=p;
o,newcount[0]+=count[p];
if (newcount[0]>=thresh) ooo,newcount[0]-=thresh,newcount[1]++;
}
fprintf(stderr,"(Maximum count %lld is obtained for params",
newcount[2])@t)@>;
for (q=mn-n-1;q," %d",pdel%(q+1));
pdel/=q+1;
}
fprintf(stderr,")\n"@t(@>);
if (newcount[1]==0)
printf("Altogether %lld %dx%d whirlpool perms (%lld mems).\n",
newcount[0],m,n,mems);
else printf("Altogether %lld%018lld %dx%d whirlpool perms (%lld mems).\n",
newcount[1],newcount[0],m,n,mems);
@*Index.