Linguistics 570
Regular Expressions and Finite-State Automata
| Definition |
| Kleene-Closure: For any set of strings S, we write that set of all possible strings composed using only members of S (including the empty string). as S*. |
| Definition |
| The concatenation product of the sets of strings A and B (written AB) is the set of strings that can be constructed by concatenating an element of A with an element of B. (Similar to Cartesian product but we're defining a set of strings not a set of pairs). |
| Example: If A = { a, b } and B = { cc, d }, then AB = { acc, ad, bcc, bd } and |
A language is a set of strings.
The regular languages is a particular set of languages.
Which set? Given an alphabet Sigma:
| Definition |
|
Searching for patterns with regular expressions:
To acheive the full power of the definition of regular languages, you need the following:
Complex expressions can get combined by the operators:
| Theorem |
| The regular languages are closed under intersection. |
| Theorem |
| The regular languages are closed under complementation. |
Also in the usual battery of regular language tricks:
Question: Are the regular languages closed under union?
| Theorem |
| Every regular language is accepted by some FSA (Not too hard to see). |
| Theorem |
| Every language accepted by an FSA is a regular language (Kind of tricky, other lecture illustrates this). |
| Summary point |
| FSAs and regular expressions define the same set of languages. |
| Observation |
| Many infinite sets of strings are regular languages. |
| Example: Sigma* |
| Observation |
| All finite sets of strings are regular languages. |
| Observation |
| An FSA accepting an infinite language must have a loop. |
| Reason: A finite set of states must serve to accept an infinite set of strings. |
Formalization of this intuition:
| Theorem: The Pumping Lemma |
| If L is an infinite finite automaton language over alphabet Sigma, then there are strings x,y,z in Sigma*, y non empty, such that xynz is in L for all n > 0 or n = 0. |
| Example |
|
a(ba)*c is a regular language. Pumping String:
|
The theorem is often used in its contrapositive form:
| Contrapositive Pumping Lemma |
|
If there are no strings x,y,z in Sigma*, y non empty, such that xynz is in L for all n > 0 or n = 0, then L is not an infinite finite automaton language. |
| Example |
|
It can be shown, using The Pumping Lemma, that anbn is not an infinite finite automaton language. |
Proof by enumeration of cases. Since y cannot be empty, there are 3 possibilities for a pumping string for anbn.
In each of these cases, using y as the repeating part of a pumping string generates strings that are not in the language: