Problem 2: Speech recognition We have an acoustic signal,
suitably digitized, and analyzed into a set of features for each
[suitably short] span of signal. This acoustic signal
could correspond to any word in the dictionary.
Which one do we choose? Or if we want a list of possibilities,
how we rank the elements of the list?
Components
of the noisy channel model
- An observation O
- A word w to which it corresponds
- A nosiy channel C which may distort w.
- A decoder responsible for find the most likely
w given O. Note:
because of noise,
the same O may potentially correspond to many different w.
The situations:
- Speech recognition. O is an acoustic signal. The word w
is the word the speaker actually uttered.
- Orthography. O is a sequence of letters (possibly misspelled).
The word w is the word the writer actually intended.
We use a probabilistic model
- w = argmaxw in V Prob(O | w)
- w = argmaxw in V Prob(w | O) * Prob(w) ÷ Prob(O)
- w = argmaxw in V Prob(w | O) * Prob(w)
P(O | w) | * | P(w) |
Likelihood | * | Prior |
- w is the word that maximizes the probability that w
gave rise to the observation (signal).
- w is the word that maximizes the product of
the likelihood of w (Prob(w | O)) times
the prior probability (Prob(w)) of w
Spelling Correction
Different kinds of spelling errors(typographic
versus cognitive). Different sources (OCR versus human).
An OCR example:
- The quick brown fox jumps over the lazy yellow dog.
- The q~ick brown foxjumps over tb l azy yellow dog.
Two approaches
- Channel modeling: Build a model of how signals are realized given the
properties of the channel. (an error or distortion model)
- Message modeling (Build a model of what messages look like)
Both models can be probabilistic.
Speech recognition:
- Acoustic model relating acoustic information to phones (channel model)
- Language model predicting what word comes next given (message model)
previous context.
Spelling correction
- Error model relating dictionary words to their misspellings
(channel model)
- Language model predicting what word comes next given (message model)
previous context.
We apply the Bayesian method to the problem of spelling correction
first using an error model.
- Observation = t (for "typo")
- Word = c (for "correction"):
We bring in our probabilistic model:
- c = argmaxc in V Prob(c| t)
- c = argmaxc in V Prob(t | c) * Prob(c)
We need two models:
- Likelihood model: P(t | c)
- Prior model: P(c)
P(c) Model
|
 
|
We build a frequency table to get the priors (dividing
counts by 44 million to get P(c)):
c | freq(c) | p(c) |
actress | 1343 | .0000315 |
cress | 0 | 0 |
caress | 4 | .0000001 |
|
P(t | c) Model
|
 
|
For the P( t | c) model,
we first classify errors..
We estimate p(t | c) using 4 confusion matrices,
one for deletion, substitution, insertion, and transposition.
Using [ ... ] for the number of times that ... happened:
- del[x,y] = [xy => x]
- ins[x,y] = [x => xy]
- sub[x,y] = [x => y]
- trans[x,y] = [xy => yx]
For a deletion, for example, we estimate P(t | c) as follows:
P(x | xy) = del[x,y] ÷ count(xy)
Table of final probabilities
|
Minimum Edit Distance
Notice that in order to apply our probability model, we
need to take the form acress, and for each
correction c in our list of candidates, we need to compute
the exact set of editing steps that takes
you from acress to c.
This can be thought of as a search for the best possible alignment
of two sets of strings. Consider
a non trivial case. Aligning the words
execution and intention to maximize
the amount of overlap.
Aligment trace
Or think of Unix diff on
intention.txt
and
execution.txt
% diff intention.txt execution.txt
1,3d0
< i
< n
< t
5c2,5
< n
---
> x
> e
> c
> u
This corresponds to the following alignment:
i n t e n t i o n
e x e c u t i o n
Idea: Finding the shortest possible edit distance
between words corresponds to finding the best possible
alignment.
Note: One can also do spelling correction without using the
probabilistic model and just using edit distance.
Just assign a cost (possibly the same cost) to each
editing operation and add, and that gives the cost
of a misspelling target under the best alignment
with a candidate source. The source that is the shortest
editing distance away is the winner.
But either way: you have to solve the best alignment
problem.
Edit Distance between two strings T and S
The smallest number of individual editing
operations needed to transform T into S
|
Minumum editing distance between T and S.
- First assign a cost to each editing operation
Alternative Levenshtein Distance (ALD): The operations
insertion and deletion cost 1. Substitution
(a combined deletion and insertion) costs 2,
except in the case of
substituting a character for itself,
which costs 0.
Note: We could put the probabilities in here,
thus combining the alignment computatiopn with the probability
computation,
using the model for Prob(T | S) sketched above).
For simplicity we use ALD. Combining the probabilities
in with the alignment calculation could give
different results than doing the 2 calculations separately.
How?
- The ALD between T and S is the minimum cost possible for
a sequence of editing operations
that transform T into S.
Essential intuition of the algorithm.
Each path from T to S goes through a sequence
of intermediate strings. If Si
lies on the optimal path P from T to S,
then then the sequence leading
from T to Si must also be optimal.
We now need ways of computing the shortest distance between any target
and any source: shortest_distance(s,t).
- We let concatenation be represented by "+":
- "inte" + "n" = "inte" concatenated with "n" = "inten"
- "exec" + "u" = "exec" concatenated with "u" = "execu"
- We give an inductive definition of path cost.
- shortest_distance(w, w) = 0
[special case: shortest_distance(eps,eps)=0]
- PathCost(w, w'+x)= shortest_distance(w,w') + ins-cost(w, w+x)
- PathCost(w+x,w') = shortest_distance(w,w') + del-cost(w+x,w')
- PathCost(w+x,w'+y) = shortest_distance(w,w') + subst-cost(w+x,w'+y)
- inten, execu
- PathCost(inten, exec+u)= shortest_distance(inten,exec) + ins-cost(inten, execu)
- PathCost(inte+n,execu) = shortest_distance(inte,execu) + del-cost(inten,execu)
- PathCost(inte+n,exec+u) = shortest_distance(inte,exec) + subst-cost(inten,execu)
- We define shortest_distance as the Minimum path cost.
Computation of a minimum edit distance
Target
|
 
|
n | 2 insert | 3
insert | 4 |   |
i | 1 insert | 2 subst | 3 del |
# | 0 | 1 del | 2 del |
  | # | e | x | Source |
Different aligments, different costs:
target | i | r | t | i | o | n |
source | i |   | t | i | o | n | 1 |
source | i | t | i | o | n |   | 6 |
The algorithm