One important use of probabilistic models is when we try to predict some phenomenon and we don't know all the factors that can influence, or we do, but they are much too complicated to build a precise model of.
Exactly the situation with cards (for most of us)!
A set of outcomes |
|
We consider drawing cards from a deck of ordinary playing cards. Let:
We define |X| as the number of elements in the set X (also called the cardinality of the set).
|
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample Spaces |
  |
The mathematical set of possible outcomes of the sort we care about, often called Ω. When we are talking ONLY about black versus bob-black (B, N) there are two outcomes. The sample space is
On the other hand we may want to be very fine-grained and care about wbout which individual card was chosen. Then the sample space is the set CARD that we introduced above
|
||||||||||||
Samples | |
Contrast the sample space with a sample, some actual sequence of events gotten by experiment or sorting through a corpus or performing experiments. Say we draw 1000 cards from the deck, each time replacing the card we get in the deck. That's a sample of size 1000. Samples are sequences of trials. Each trial yields a element of the sample space:
With any real sample we can ask the question: Is it biased? That is, has it been chosen in some way that changes the probabilistic properties of the space. Is that coin that gave us two heads in a row weighted? Did we get our card deck from a pinochle player? It looks as if the coin may be biased, and the card samples not. Why? |
||||||||||||
Events |   |
We will associate probabilities with sets of outcomes which we call events. Consider the sample space CARDS Let's identify some possible events in CARDS:
N: Non black card = { A♥, 2♥ 3♥, ... K♥, 2♦, ... } F: Face card = { K♠, Q♠, J♠ K♣, Q♣, J♣ K♥, Q♥, J♥ K♦, Q♦, J♦ } X: Non Face card = { A♠, 2♠ 3♠, ... 10♠, A♣, 2♣, 3♠ ..., 10♣, A♥, 2♥ 3♥, ... 10♥, A♦, 2♦, 3♦ ..., 10♦ } B ∪ F: Black card or Face card B ∪ X: Black card or non-Face card F ∪ X: Face card or Non-Face card F ∪ N: Face card or non-Black card F ∪ N ∪ X: Face card or non-Black card or non-Face card { } : The IMPOSSIBLE EVENT |
||||||||||||
Probability Distribution |
  |
Prob is called a probability distribution function. It assigns a number between 0 and 1 to every set of outcomes (every event) in the sample space. We require two properties of prob:
|
||||||||||||
Conditional Probability |
  |
Let Chosen be a sample, a sequence of
cards, and let Blk be the subset of those utterances
that are black and let Fc be the subset
that are Face cards.
We define P(B | F), the conditional probability of B given F as follows:
Conditional probabilities should be thought of as Probabilities relativized to subsets of the sample space. That is, P(_|F) is a probability distribution over the events of Ω. By Axiom II:
|
||||||||||||
Chain Rule | |
From the definition of Conditional probability we immediately
get two equivalent formulations of
the chain rule.
|
||||||||||||
independence |   |
An immediate consequence of the chain rule
is an account of a special case. Suppose
|
||||||||||||
Bayes' rule | |
The chain rule is symmetric:
So it follows Prob(B | F) * Prob(F) = Prob(F | B) * Prob(B) This is called Bayes' Rule. Bayes' Rule is often written in this form:
|
||||||||||||
Maximum Likelihood Estimate of Probability |
|
Prob(A♥) is a number x between 0 and 1. It represents the
actual probability that a card will be the A♥
when randomly selected from a representative set of cards.
Claim: As you draw more and more cards (each time returning the drawn card to the deck), the ratio of the number of A♥ outcomes to the total number of outcomes will tend to approach x [the actual probability!]. Let Chosen be the total set of outcomes, and A♥ the set of outcomes that were the A♥. We estimate the probability of drawing an A♥ as follows:
This estimate of the probability is called a maximum likelihood estimate. We now justify this name. We can think of drawing a card as an experiment with 2 outcomes, A♥ or not A♥, and call those draws that were the A♥ successes. Given the true probability p of a success we can compute the probability of k successes for any sequence of N independent experiments with something called the Binomial Distribution (Binomial(k | N, p)):
When we do a maximum likelihood estimate of p, we ask what p maximizes the probability of our sample. That is, fixing the facts of our sample (N, k), we ask what pemp is such that
Let's say in 1000 trials we get 19 draws with A♥. We look for our max:
|
||||||||||||
Maximum Likelihood Estimate of Conditional Probability |
|
|
Let's imagine we have a contagious disease and a test for the disease. The facts are the following:
Our question is a policy question. Do we quarantine subjects who have a positive test result?
We're interested in the probability that a subject has the disease GIVEN a positive result. We use Bayes' rule:
The facts of the problem directly give us several of the numbers on the right hand side:
We can compute Prob(P) as follows:
We now have all the numbers to plug into Bayes' rule:
Summarizing what we know:
|
|
|
D | Bonafide Positive | P | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H | False Positive | ||||||||||||||||||||||||||||
Negative | N | ||||||||||||||||||||||||||||
  | |||||||||||||||||||||||||||||
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
  |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
False P = 15/297 ≅ 5 % | |||||||||||||||||||||||||||||
Prob(H|P) = 15/18 = 83 1/3 % | |||||||||||||||||||||||||||||
Prob(D|P) = 3/18 = 16 2/3 % |