Linguistics 581
Regular Expressions and Finite-State Automata
Preliminaries:
Searching for patterns with regular expressions:
More complex examples
| Definition |
| Kleene-Closure: For any set of strings S, we write that set of all possible strings composed using only members of S (including the empty string). as S*. |
| Definition |
| The concatenation product of the sets of strings A and B (written AB) is the set of strings that can be constructed by concatenating an element of A with an element of B. (Similar to Cartesian product but we're defining a set of strings not a set of pairs). |
| Example: If A = { a, b } and B = { cc, d }, then AB = { acc, ad, bcc, bd } and |
A language is a set of strings.
The regular languages is a particular set of languages.
Which set? Given an alphabet Σ:
| Definition |
|
| Theorem |
| The regular languages are closed under intersection. |
| Theorem |
| The regular languages are closed under complementation. |
Failure examples:
Deterministic machine for accepting spoken numbers 1-99
Represent FSA as a state-transition table State-transition table for sheep language
More formally (p. 36):
| Theorem |
| Every regular language is accepted by some FSA (Not too hard to see). |
| Theorem |
| Every language accepted by an FSA is a regular language (Kind of tricky, homework exercise illustrates this). |
| Bottom Line |
| FSAs and regular expressions define the same set of languages. |
| Observation |
| Many infinite sets of strings are regular languages. |
| Example: Σ* |
| Observation |
| All finite sets of strings are regular languages. |
| Observation |
| An FSA M that accepts an infinite language must have a loop. |
| M has a finite set of states (of cardinality n): To accept any string w of length m (m > n), M must visit at least one state twice. |
This leads at once to an important theorem called The Pumping Lemma:
|
| Example |
a(ba)*c is a regular language. Pumping String= xyz, where
|
| a(ba)nc ∈ a(ba)*c, n = 1, 2, 3, .... |
|
The theorem is often used in its contrapositive form:
| Contrapositive Pumping Lemma |
|
If L is an infinite language and there are no strings x, y, z ∈ Σ*, y non empty, such that xynz is in L for all n > 0 or n = 0, then L is not a finite state automaton language. xynz is called a pumping string. y is called the pumpable portion of the pumping string (the portion accepted by the loop in the machine that accepts L). |
| Example |
|
It can be shown, using The Pumping Lemma, that anbn is not a finite state automaton language. |
Clearly anbn IS an infinite language. So if anbn IS a finite state automaton language, the PL applies. We proceed by assuming that anbn IS a finite automaton language and deriving a contradiction.
According to the PL, if L = anbn IS a finite automaton language then there is some value of n, say i, such that
Given the form of L, the pumpable portion y either consists entirely of a's, entirely of b's, or of a sequence of a's followed a sequence of b's. We investigate all 3 possibilities.
|
But now we've exhausted all the possibilities. Therefore L has no pumping strings. Therefore, by the PL, L is not a finite-state automaton language.
Deterministic Finite-State Automaton (DFSA) as a recognizer or string accepter:
Implementing a DFSA
Non-Deterministic Finite-State Automata (NFSA)
NFSAs as recognizers:
Implementing a NFSA