HaLeX

A Haskell Library to Model, Manipulate and Animate Regular Languages

Joăo Saraiva

Department of Computer Science

University of Minho

Joăo Saraiva

jas@di.uminho.pt 1: The HaLeX Library

 This library defines a number of Haskell data types and functions that manipulate them, which provide the following facilities: Definition of regular expressions , non-deterministic and deterministic finite automata directly and straightforwardly in Haskell. Definition of the acceptance functions for all those models. Transformation of regular expressions into non-deterministic and deterministic finite automata. Minimization of the states of deterministic finite automata. Lex-like facilities , e.g. it generates a Haskell program from a regular expression (expressed as it's minimized finite automata). Equivalence of regular expressions and finite automata. Definition of reactive finite automata . Graphical representation of finite automata. Animation of the acceptance function of finite automata.

2: The HaLeX Library

 The HaLeX library is a small extension to Simon Thompson, "Regular Expressions and Automata using Haskell", January 2000. Johan Jeuring and Doaitse Swierstra, "Advanced Programming Concepts in a Course on Grammars and Parsing", FDPE'99. Basically, it extends definitions in these papers with the graphic visualization of automata the definition of reactive finite automata.

3: Motivation

 This library was developed in the context of a programming methodology course for undergraduate students (3rd year of our degree on software engineering). It was developed mainly with educational purposes : It was developed in order to provide a clear , efficient and concise way to define, to understand and to manipulate regular languages in Haskell. The construction of parts of this library has been proposed as assigment projects the students had to solve in that course. HaLeX is now being used to support the programming methodology course.

4: Regular Expressions

Regular expressions are modelled in Haskell by the abstract data type RegExp . We define one data constructor for each "standard" regular expression operator.

 data RegExp = Epsilon | Literal Char | Or RegExp RegExp | Then RegExp RegExp | Star RegExp | OneOrMore RegExp | Optional RegExp

and we can write:
 a = Literal 'a' aOrb = ( Literal 'a') ` Or ` ( Literal 'b')

5: Deterministic Finite Automata

Formally, a Deterministic Finite Automaton (DFA) is a five-tuple A = (V,Q,S,Z,D) . In order to make our code more readable, we prefer to define a new data type, to express the automata's five-tuple, named Dfa with a single constructor, named Dfa as well, rather than using the Haskell built-in tuples. To make our DFA's more general, we parameterize the data type with the types of states and symbols

 data Dfa st sy = Dfa [sy] -- Vocabulary [st] -- Finite set of states st -- The start state [st] -- The set of final states (st - > sy - > st) -- The transition function

where sets are modelled through the Haskell built-in lists.

6: Haskell-based deterministic finite automata: example ex1 :: Dfa Char Char ex1 = Dfa ['a','b'] ['A','B','C','D'] 'A' ['C'] delta where delta 'A' 'a' = 'A' delta 'A' 'b' = 'B' delta 'B' 'a' = 'C' delta _ _ = 'D'

7: Non-Deterministic Finite Automata with Epsilon Transitions

We define non-deterministic finite automata with epsilon transitions and several initial states.

To epsilon transitions we use the pre-defined datatype Maybe, where the constructor Nothing models epsilon-transitions, while the constructor Just models transitions labeled by its argument symbol.

 data Ndfa st sy = Ndfa [sy] -- Vocabulary [st] -- Finite set of states [st] -- The set of start states [st] -- The set of final states (st - > Maybe sy - > [st]) -- The transition function

8: Non-Deterministic Finite Automata with Epsilon Transitions: Example

 ex1 :: Ndfa Char Char ex1 = Ndfa ['a','b'] ['A','B','C'] ['A'] ['C'] delta where delta 'A' ( Just 'a') = ['A','B'] delta 'A' ( Just 'b') = ['B'] delta 'B' ( Just 'a') = ['C'] delta 'C' Nothing = ['A','B'] delta _ _ = [] 9: The Acceptance Functions

We have defined acceptance functions for all these models.

For example, the acceptance function of a DFA expresses the recursion pattern captured by the Haskell prelude foldl function. Thus, the function accept is concisely defined as follows:

 accept ( Dfa v q s z delta) simb = ( foldl delta s simb) `elem` z

10: Graphic Representation of Finite Automata

 Finite automata are more comprehensible in a graphical representation. Usually automata are represented as graphs: states are depicted as the nodes of a graph, final states get a double circle, and the initial states are explicitly mentioned usually with an incoming arrow. But, drawing graphs is a complex task and an intensive area of research. Rather than implementing a graph visualization system from scracth (and reinventing the wheel...), we compute a concrete representation for a high quality graph visualization system: the GraphViz system . GraphViz provides a collection of tools for manipulating graph structures and generating graph layouts.

11: Graphic Representation of Finite Automata digraph HaLeX { rankdir = LR ; "1" [shape = circle , color = green]; "2" [shape = doublecircle , color = red]; "3" [shape = doublecircle , color = red]; node [shape = circle , color = black]; "1" - > "2" [label = "'a'"]; "1" - > "3" [label = "'b'"]; "2" - > "4" [label = "'a','b'"]; "3" - > "3" [label = "'a'"]; "3" - > "4" [label = "'b'"]; "4" - > "4" [label = "'a','b'"]; }

12: Graphic Representation of Finite Automata

 The HaLeX library includes a pretty-printing function, named toGraph that, given a FA, produces its (textual) graph representation in the Dot language. In order to beautify the graphical representation, we have included options to ignore sink states to fuse transitions with the same origin and destination. We omit here its trivial implementation.

13: Reactive Finite Automata

 While accepting a sentence, we may wish the automaton to react. For example, we may wish to compute some trace information to perform some IO operations to compute semantic functions A reactive finite automaton (RFA) is a machine that reacts while moving from one state to another. To add reactions to finite automata we use monads and we define monadic finite automata.

14: Reactive Deterministic Finite Automata

Because we wish to associate different effects to finite automata, we parameterize the type of the DFA with the type of the monad. The reactions are defined in the (monadic) transition function. As a result, those functions not only have to indicate the destination state(s), but also the reactions to be performed in the transitions.

 data Dfa m st sy = Dfa [sy] -- Vocabulary [st] -- Finite set of states st -- The start state [st] -- The set of final states (st - > sy - > m st) -- The Monadic transition function

15: Acceptance Function

The acceptance function is now defined as the standard \Haskell\ monadic foldM function:

 dfaaccept ( Dfa _ _ s z delta) sent = do st < - foldM delta s sent return (st `elem` z)

16: Communication protocol

 ptl :: Dfa (State ([Char],[Int])) Int Char ptl = Dfa ['1','0'] [1,2,3,4,5,6,7,8,9,10,11,12,13] 1  delta where delta 1 '0' = return 2 delta 2 '0' = return 3 delta 3 '0' = return 4 delta 4 s = do { bits s ; return 5 } delta 5 s = do { bits s ; return 6 } delta 6 s = do { bits s ; return 7 } delta 7 '0' = do { values ; return 8 } delta 7 '1' = do { values ; return 10 } delta 8 '0' = return 9 delta 9 '1' = return 4 delta 10 '1' = return 11 delta 11 '1' = return 12 delta _ _ = return 13 bits x = modify (\ s - > ((fst s) + + [x],snd s)) values = modify (\ s - > ("",snd s + + [bits2int (fst s)])) 17: Animating Regular Languages

 The acceptance function finds a path from the initial state into a final state of the automata. HaLeX animations are just the step-by-step display of such a path in the graphic representation of the automaton Thus, to animate finite automata we need to compute the sequence of visited nodes. This trace information can be easily computed using our reactive FA: the monadic automata just have to accumulate in the state the nodes being visited.

18: Animating Regular Languages

For example, HaLeX derives such monadic DFA from regular expressions

 mdfa :: Dfa (State [Char]) Int Char mdfa = Dfa v q s z delta where v = " + -d." q = [1,2,3,4,5,6] s = 1 z = [3,6] delta 1 ' + ' = do { accum '2' ; return 2 } delta 1 '-' = do { accum '2' ; return 2 } delta 1 'd' = do { accum '3' ; return 3 } delta 1 '.' = do { accum '4' ; return 4 }

19: The halex Tool

 HaLeX is available at: http://www.di.uminho.pt/~jas/Research/HaLeX/HaLeX.html Having defined the HaLeX library, we can now define useful tools to manipulate and visualize regular languages. We have constructed the halex program entirely in HaLeX. Tool Demo

20: More Animations?

 Today, 16:30, GPCE Joăo Saraiva, "Component-based Programming for Higher-Order Attribute Grammars"