Regular Expressions and Finite Automata

Difficulty: ***

The specification for this exercise is quite long, so we gave students a short introductory document to read the night before the test. Most students did very well in Parts I and II, but were then slowed down by Part III. Common errors were in the state labelling and/or the chaining of the state identifiers into and out of the make function. Quite a number of students had a go at Part IV, which is hard to get your head around in the time available. Amazingly, one student completed the entire exercise in just over two hours, scoring 100%, so it could be done.

I have made one small adjustment to the original structure by including the labels function in Part II. A number of students who attempted Part IV made heavy weather of groupTransitions, which should be a one-liner. The labels function, simple though it is, might help.

The acceptance testing algorithm used here is the simple backtracking version. However, if you implement getFrontier correctly from Part IV then you’ll find that it’s relatively straightforward to use it to build the breadth-first version of acceptance testing which is substantially more efficient (see below).

Remark: Most treatments of NFA to DFA conversion are based on the calculation of the so-called $\epsilon$-closure of a set of states. The trouble with this is that it introduces superfluous states in the resulting DFA. These are avoided by using the $\epsilon$-frontier instead, as described in the specification. Does the use of frontiers yield the minimal DFA? Sadly not.

Specification

Template File

More Information

There are countless on-line sources of information on regular expressions and automata. I would particular recommend Chapter 10 of Al Aho and Jeff Ullman’s on-line book on Foundations of Computer Science (C version). There is also an excellent article by Russ Cox on the efficiency of regular expression matching which addresses, in particular, the advantages of breadth-first matching over backtracking. It also includes further details of Ken Thompson’s original paper on regular expression search algorithms. For a tutorial-style introduction to regular expressions and automata in Haskell, this article by Simon Thompson develops variations of several of the algorithms covered here and also addresses the issue of DFA minimisation. A neat translator from NFAs to DFAs can also be found in Wouter Swierstra’s article on Practical Graph Handling.