FSA transducers

Linguistics 581

FSAs

An FSA defines a set of strings.

It is primarily used to accept and reject strings.

FSTs

Instead of defining sets of strings, FSTs defines sets of PAIRS of strings. A set of pairs in set theory is called a relation.

FSTs define relations on strings. Not all string relations can be captured by FSTs (as we'll see). The set that can be captured is called the regular relations.

Example 1: Sheep language transducer: transduces "b"s into 2 and "a"s into 1. Leaves "!" alone.

Example 2: A and B swapper: Given a string that consist of an even number of a's followed by an even number of b's, it will swap the a's and b's. Run the demo at the bottom of the file.

An FST is still an FSA with a relational alphabet.

Upper and
Lower

An FST therefore has TWO alphabets, the upper alphabets (left side of colons) and the lower alphabet (right side of colon):

In the sheep language transducer:
```
    Upper = {a,b, !}
    Lower = {1,2, !}
    
```
because the transition label "!" is an abbreviation for "!:!".
In the A and B swapper, the upper and lower alphabets are the same.
An FST also has two languages, an upper and lower language. In the sheep language transducer, the upper language is baa⁺! and the lower language is 12⁺!
An FST defines a correspondence between the strings of the upper and lower language; it can be many-one in either direction.

Recognition
Analysis
Generation

The basic idea of a transducer is that it is an FSA with two tapes, which we will call the upper and lower tapes.

Transducers can be run in three ways:

As recognizers They read both tapes and check to se that the strings on the two tapes stand in the rleation described by the machine.
As analyzers They read the lower tape and find all possible upper strings that stand in the relation described by the machine. In our terms they find all analyses of a surface string.
As generators They read the upper tape and find all possible lower strings that stand in the relation described by the machine. In our terms they find all possible realizations of an underlying string.

Non-
Determinism

AS with FSAs, FSTs can be deterministic or non-deterministic.

Example. Non-deterministic sheep language transducer.

Epsilons

FSTs can have epsilon transitions:

∅:a An "a" on the lower tape that does not corresponds to anything on the upper tape. (insert an "a" on the lower tape)
a:∅ The "a" on the upper tape does not correspond to anything on the lower tape. (omit the "a" on the lower tape)
∅:∅ Jump to another state without reading either tape.

All of these possibilities have their uses. The only truly pernicious epsilon uses, as with FSAs, involve epsilon loops.

Examples:

Can endlessly loop without advancing either tape.
While generating, can endlessly insert 0's. While analyzing just ignores leading 0s.
While analyzing, can endlessly hallucinate underlying 0's. While generating just deletes leading 0s in underlying structure.

Rules

One application of FSTs is to implement linguistic rules.

Closure
Properties

From Jurafsky and Martin, Ch 2, we learned: FSAs are closed under:

Union
Intersection
Concatenation
Complementation
Reversal

FSTs have only a a subset of these closure properties:

Union
Concatenation

Example:

R1= { < aⁿ, b^*cⁿ > | n > 0 } : A relation that pairs n underlying a's with any number of b's followed by exactly n c's.
R2= { < aⁿ, bⁿc^* > | n > 0 } : A relation that pairs n underlying a's with exactly n b's followed by any number of c's.
Int(R1,R2) = { < aⁿ, bⁿcⁿ > | n > 0 }

No regular relation can have a non-regular language as either its upper or lower language. But the lower language of this relation IS non-regular. So this relation is non-regular. And yet it is the intersectiuon of two regular relations.

In general for a transducer T1 and T2:

Intersect(Paths(T1),Paths(T2)) might only only identify a subset of Intersect(L(T1),L(T2)). Why?

Finite-State
Model

Phonological rules LOOK like context-sensitive grammar rules:

A -> B \ LC _ RC This is the FORM of context sensitive grammar rules. Chomsky grammar hierarchy:

Recursively enumerable languages (any grammar)
Context-sensitive languages (CS rules as above)
Context-Free languages (A -> w)
Regular languages (A -> w, no center embedding)

Only regular languages are describable with Finite-state automata!

It turns out the languages described by Phonological rules are regular, because they obey an implicit restriction:

No rule ever feeds itself. (Johnson 1972) For example, consider a rule like:

∅ -> ab \ a ___ b If this rule is allowed to reapply n times in the same string position it can generate the non-regular language (context-free) aⁿ bⁿ

Note: It's not the rule itself that is the problem. It's the way the rule is allowed to apply: The domain of application of the rule must always move left or right, including the case where material is inserted. Using the description from Karttunen (1991), and using "^" to mark the position where the rule applies:

Reapplying to output pos	Drifting right
a^b	a^b
aa^bb	aab^b
aaa^bbb	aabab^b
aaaa^bbbb	aababab^b

Applying the rule in the drifting-right regime (a convention assumed without comment by phonologists), the language after n applications is a(ab)ⁿb, which is regular.

Rewriting
Rules as
Regular
Relations

We allow rewrite rules. In the simplest case of a rewrite rule, an underlying form is ALWAYS rewritten ito some surface form:

abc - > de

This says EVERY instance of "abc" is turned into a "de". This means the string "abc" is not part of the surface language.

In general, rewrite rules have contexts. Consider a rule:

a - > b \ c _ d This rewrite rule has a left and right context. In effect, it says "cad" rewrites as "cbd".

cad -> cbd It can be thought of as describing a RELATION R between strings.

What strings stand in the relation? those that obey the rule.

Sample Strings Table
In	Out	Allowed?
Underlying (Lexical)	Surface	Allowed?
cad	cbd	Yes
cad	cad	No
cae	cad	Yes
cae	cad	Yes

In fact, Each rule defines a regular relation. [No proof given. Notice this is a claim about the practice of phonologists as well as a claim about human languages.]

Key Idea

Since each rule defines a regular relation, it can be described by a Finite-State Transducer.

For a an industrial strength implementation of Finite-State Transducers, see xfst (Finite-State Morphology Home page):

xfst[3]: read regex [a b c -> d e];
472 bytes. 5 states, 22 arcs, Circular.
xfst[4]: apply up
apply up> abc
apply up> de
abc
de
apply up> xr
xr
(type Control-D)
xfst[4]: apply down
apply down> abc
de
apply down> de
de
apply down> abcabc

dede
apply down> xr
xr
apply down> 
(type Control-D)
xfst[4]: apply up
apply up> dede
abcabc
abcde
deabc
dede
apply up>

It is possible to define rules that apply in parallel, simultaneously to every portion of the string they are suited for:

% xfst
xfst[0]: read regex [a -> b, b -> a]
;
124 bytes. 1 state, 3 arcs, Circular.
xfst[1]: apply up
apply up> abba
baab
apply up> exit
exit
apply up> 
xfst[1]: apply down
apply down> abba
baab
apply down> bbbbb
aaaaa
apply down> aaaaa
bbbbb
apply down>

The regular
relation
of a rule

Suppose we have a rule that

a - > b \ L _ R

A rough approximation of the corresponding regular relation is:

[Id(Σ *) Opt(Id(L) a X b Id(R))] * where '*' is just the usual Kleene star, 'Id' of a set gives the identity relation on the set and Opt (optional) of a set unions it with the empty string, and "X" is just the Cartesian product.

This is approximate because it leaves out the complications due the fact that the output of a rule may serve as a context for another application.

e-insertion
Revisited

Here's the machine in the textbook.

Here's another version generated by software tools we have in the CL lab (fsa).

Sample Strings Table
In	Out	Allowed?
Underlying (Lexical)	Surface	Allowed?
fox^s#	fox^es#	Yes
fox^s#	fox^s#	No
fop^s#	fop^s#	Yes
lag	lag	Yes
lag	lak	?

Feasible
Pairs

Corresponding to the notion of "alphabet" in our FSAs we have the the notion "feasible pairs" in our FST.

If g:k is a feasible pair, then the last row in the table is:

lag

lak

yes

Otherwise no. In other words there's some background "space" or "theory" of what can be realized as what.

Most sounds can be realized as themselves in some environment. Why?

So each sound paired with itself is one of the feasible pairs:

a:a,b:b,c:c, ... We usually abbreviate a:a as a. So

g o:e o:e s e abbreviates

g:g o:e o:e s:s e:e

Identity
Relation

The regular relation which allows every symbol to be realized as itself is special. It is called the identity relation.

Special
Symbols

Some symbols are special and are never realized as themselves:

^ Morpheme boundary. Morpheme boundaries are always deleted on the surface. So

^:∅ is a feasible pair. And

^:^ isn't.

Rule-
ordering:
Karttunen's
example

Consider the following pair of rules.

These are discussed in Karttunen (1991) and in Finite-State Morphology, Section 3.5.3, p. 137.

(a) changes N to m before p. (b) changes it to p to m following m.

This transducer implements both rules.

For xfst, I prepare a file named "kaNpat.regex" with the following contents:

[ N -> m || _ p ]
.o.
[ p -> m || m _ ];

This "composes" the two rules. Now to load these into xfst:

%xfst
xfst[1]: read regex < kaNpat.regex;
Opening file kaNpat.regex...
356 bytes. 4 states, 15 arcs, Circular.
Closing file kaNpat.regex...
xfst[2]: apply down 
apply down> kaNpat
kammat
apply down> 
xfst[2]: apply up 
apply up> kammat
kaNpat
kampat
kammat
apply up>

Points of note:

Roughly: This defines a relation that accepts any underlying and surface tapes without an underlying N.

More precisely, it accepts any surface tapes without an underlying N as long as the corresponding surface and underlying letters are feasible pairs.

The feasible pairs:

a .. z, p:m, N:n, N:m