The Random Sentence Generator

The "Random Sentence Generator" (or RSG as it is affectionately known) is an assignment we have given in various forms and languages in the introductory programming courses here at Stanford. Mike Cleron, a former Stanford lecturer, is credited with originating this nifty assignment idea.

Assignment concept

The basic idea is to have the students generate random expansions from a given context-free grammar. The grammar is described in a text file specifying each non-terminal with its list of possible expansions. The first task the students tackle is parsing the grammar from its file. The file format is pretty rigid, so it doesn't involve a lot of surprises. The data structure we have them build is a hashtable of definitions, where the non-terminal is the key to look up its list of possible productions. Each production is a list of tokens, where each token may be a terminal or another non-terminal.

After parsing the grammar into the table, there is a "start" non-terminal from which the process begins. The program looks up "start" in the hashtable to find its definition. This gives the list of possible productions for that non-terminal. Each production is itself a list of tokens. To expand, the program chooses one of its production at random, and then recursively expands the tokens within the production to generate the random output. When a terminal is encountered, that serves as the base case to halt the recursion.

Over the years, the students have supplied us with a good collection of humorous grammar files (plots for Star Trek episodes, "Dear John" letters, extension requests, haikus, and so on) that make the assignment particularly entertaining to run.

Here try it for yourself in Java:

You need a Java-enabled browser to experience the marvel of the RSG...


The assignment task is fairly straight-forward but involves synthesizing some interesting concepts. The value of the assignment is mostly focused on client use of the collection abstractions to solve an interesting problem, along with a little file-processing and a bit of recursion.

The bulk of the work is done in the first phase of parsing the grammar file since you encourage the students to store the grammar in a data structure that allows for easy expansion later. The grammar files are designed as white-space delimited to minimize tokenizing complication. Giving students access to (or having them write) some simple but robust file-reading routines/ADT/object is helpful here.

As it is read in, the grammar is stored into a layered data structure involving a hashtable of definitions and with nested vectors for the list of productions which are each a list of tokens. The needed collection abstractions can either be pre-provided things (such as the java.util or C++ STL objects) or ones that the students construct themselves. When teaching in C, I do the latter, but have the students build the ADTs in a earlier piece and then assign the RSG as a follow-on to help them separate the roles of implementor and client.

Expanding the grammar is quite simple, given use of the appropriate data structure to store it. The recursion here is not particularly tricky or challenging, but it does help reinforce what the students have earlier learned about recursive problem-solving.

Positioning the assignment

We usually position the RSG as a CS2 or later assignment, since it requires recursion and the use of various ADTs. It's probably not a good fit for the first recursion assignment since the recursion is not particularly central to the entire program. It makes a good "using built-in ADTs" assignment or a good "build the ADTs, then use them" follow-on assignment.

I have quite successfully used the RSG in a programming paradigm class as the "reference program", one that the students develop or see solutions for in a variety of languages (C, Ada, C++, Lisp, Java, etc.) to allow them to compare and contrast the language facilities for solving the same problem. It is pretty good choice for such a task since it stresses a language's facilities for string manipulation, file processing, and managing arbitrary collections, which are pretty useful features in any language.

A particularly notable feature of the RSG is that it doesn't need any platform-specific features (i.e. no graphics, sounds, events, threads, etc.) so it has few portability obstacles. Many of our assignments use flashy graphics, sounds, mouse input, etc. to entice and excite the students but those features tend to tie us to a particular platform. The RSG is one of the most entertaining text-only programs we have assigned.


Here are some resources that might help you adopt this program for your class. First there is the directory of all the grammar files. Many of them are quite humorous, check them out! I also have put up various versions of the handouts (in Adobe PDF) from different incarnations. I made no effort to sanitize these to remove references to my courses, so they will need some tweaking to fit into your schema, but hopefully a lot of the write-up is re-usable. I have RSG solutions in C, Pascal, Ada, LISP, and Java which you are welcome to ask me for. (I won't put them up here in case, your or my students are searching the web to avoid doing their work :-)

Contact info

Let me know if you find this page useful, have additional ideas to contribute, want to provide more grammar files, and so on. I'd love to hear from you!

Julie Zelenski
Department of Computer Science
Stanford University

This page in support Nick Parlante's ACM SIGCSE Symposia 1999 Panel on Nifty Assignments. Other nifty assignments are available!

Last updated: March 19, 1999.