| Car Prices |
Let's reconsider the joint distribution of COLOR and MAKE in which the two variables are independent.
Now let's consider prices:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Expected Value |
Suppose we want to get the average price of a car? We dont just add the 9 prices together and divide by 9 because that ignores the relative frequencies of the prices. A green VW price needs to be weighted more in the final average because green VWs are more common. Weighting by relative frequency:
This turns out just to be another way of taking an AVERAGE.
|
General Expected Value |
The general idea of expected value is that we have some function that assigns a number to every member of a sample space.
The expected value of the function is just the sum of its values weighted by probability
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We're interested in assigning a NUMBER to an event that characterizes the quantity of information it carries.
|
Two Probabilistic Criteria |
|
||||||||
|---|---|---|---|---|---|---|---|---|---|
|
Inverse Probabilities (Attempt 1) |
Let's try:
Examples:
p(m) = 1/8 ==> I(m) = 8 Result: Contradiction of criterion 2, additivity. Consider two independent events such that p(x1) =1/4 and p(x2)=1/8. Then:
|
||||||||
|
Log probability (Attempt 2) |
Observation: We know the probabilities of independent events multiply, but we want their combined information content to add:
I(p(x) * p(y)) = I(p(x)) + I(p(y)) One function that does something VERY like what we want is log:
Thus we have two ideas:
Examples:
Now we satisfy additivity:
The unit is bits. Think of bits as counting binary choices. Taking probabilities out of it:
|
||||||||
|
Information Quantity Defined |
Assume a random variable X with probability mass function (pmf) p. For each x in the range of X:
|
We next define entropy as the expected information quantity.
|
Expected Information Quantity |
I (Information quantity) is a function that returns a number for each member of a sample space of possible events. We can compute the expected value of I, which we call H: |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Coin tossing |
Suppose the probability of a head is 1/2. Then,
H(X) = [1/2 * 1] + [1/2 * 1] H(X) = 1 Suppose the probability of a head is 3/4. Then,
H(X) = [1/4 * 2] + [3/4 * .415] H(X) = .5 + .311 = .811
H(X) = [1/8 * 3] + [7/8 * .193] H(X) = .375 + .144 = .519 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Entropy as disorder |
Consider the graph of all possible values of p(H) (textbook: p. 63):
General fact:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Entropy as Choice Number |
Consider the 8-sided die of the text book: Suppose the probability of each face is 1/8. Then,
H(X) = 8([1/8 * 3] H(X) = 3 Suppose the probability of one die is 1/4, and the others are all 3/28. Then,
H(X) = [1/4 * 2] + [7 * 3/28 * 3.22] H(X) = .5 + 2.42 = 2.92
H(X) = [1/8 * 3] + [1/8 * 5.81] H(X) = .375 + .726 = 1.101
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 20 Questions |
Consider a varaint of 20 questions in which your pay-off halves after each question. To maximize average payoff you want a strategy that guarantees you the least average number of questions. Consider the following horse race:
One good question strategy is something like this:
Another strategy:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Entropy of Joint Distribution |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Entropy of Conditional Distribution |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Mutual Information |
We write Mutual Infomration of X and Y as I(X;Y). This is symmetric and defined as:
Lots of different intuitions. Very interesting concept, lots of applications. HEre's one intution:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Interdisiplinary Nature of Information Theory |
Fields where the notion of entropy (or something like it)
plays a role:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cross entropy (Intuition) |
Meassure average amount of surprise
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cross Entropy Definition |
Cross-Entropy
(Per Word) Cross-Entropy of Model for a corpus of size n
(Per Word) Cross-Entropy of Model for the Language
By a wonderful magical theorem, this =:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cross-entropy Intuition |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||