Redundancy, Language, and Encoding

Subtitle


Alchemie du Verbe
Elle est retrouvee. Quoi? L'Eternite. C'est la mer melee au soleil.
Alchemy of the verb
It is found. What? Eternity. Which is the sun mixed with the sea.
--Arthur Rimbaud


Information  

Information: The number of bits necessary to encode the message.

A quantitative measure. Two distinct messages may have the same amount of information.

Channel (or context) based. If a certain channel (say, a particular field in a database) is defined to have one of two possible messages/values, then that channel/field has an information content of 1 bit, whichever value it chooses:

Examples

  1. Handedness: Suppose particular field in a mediacl database of information about hospital patients records whether they are left- or right- handed. Two possible values:
    1. 1eft
    2. right
    Information measure: 1 bit.
  2. Suppose an equipment database has a field for each item of equipment encoding whether it is functioning or broken. Information measure: 1 bit per item of equipment If there are 23 pieces of equipment, the total amount of functioning/broken information in the database is 23 bits.

Another way opf thinking about this: the information COULD just be encoded by putting a 1 (= right/functioning), or 0 (= left/broken) in the database fields. It can literally be encoded with 1 bit.

Example 3: Supposee a database contains a a days of the week field. There are 7 days of the week. This ccan be encoded wit a 7 3 digit binary numbers:

  1. 000 Sunday
  2. 001 Monsay
  3. 010 Tuesday
  4. 011 Wednesday
  5. 100 Thursday
  6. 101 Friday
  7. 110 Saturday
This is 3 bits. Notice you still have one 3 digit number (111) left over to use for something else. So this take a little less than 3 bits to encode.
Compression Transmission  

For efficient storage, or efficient transmission over a noisy channel, we want to pay attention to the frequency of differenet values, because this allows for the possibility of still more efficient compression of the information.

Strategy: Use longer signals for rare messages. Use the shortest signal for the most frequent message.

20 Questions  

Consider a varaint of 20 questions in which your pay-off halves after each question. To maximize average payoff you want a strategy that guarantees you the least average number of questions.

Consider the following horse race:

    Distribution X for horses
    H1 1/4
    H2 1/4
    H3 1/8
    H4 1/8
    H5 1/8
    H6 1/16
    H7 1/16

H(X) = 1/2(log 4) + 3/8(log 8) + 1/8(log 16)
  = 1 + 9/8 + .5
  = 2.625    
The optimal yes/no question strategy asks 2.625 questions on the average.

One good question strategy is something like this:

  1. Is it H1? H1: [001]
  2. If no, Is it H2? H2: [010]
  3. If no, Is it H3? H3: [011]
  4. If no, Is it H4? H4: [100]
  5. If no, Is it H5? H5: [101]
  6. If no, Is it H6? H6: [110]
After each question is a proposed message to endcode the identity of the horse corresponding to a yes-answer to that question.
    Expected Number of questions
    Horse Num of Que P(x) Expected
    Number
    H1 1 1/4 .25
    H2 2 1/4 .5
    H3 3 1/8 .375
    H4 4 1/8 .5
    H5 5 1/8 .625
    H6 6 1/16 .375
    H7 6 1/16 .375
    Average questions 3.0
    Since each question costs a bit in message length, this gives us an average message length of 3 bits.

Another strategy:

  1. Is it H1?
  2. If no, Is it H2?
  3. If no, Is it H4 or H5?
      If yes, Is it H4?
  4. If no, Is it H3?
  5. If no, Is it H5?
  6. If no, Is it H6?
    Expected Number of questions
    Horse Num of Que P(x) Expected
    Number
    H1 1 1/4 .25
    H2 2 1/4 .5
    H3 3 1/8 .375
    H4 3 1/8 .375
    H5 3 1/8 .375
    H6 6 1/16 .375
    H7 6 1/16 .375
    Average questions 2.625
This is optimal. And there is a way of encoding it (not shown here) that leads to 2.65 bits per message optimally.

This number corresponds to Entropy measure intrdouced in a previsous lecture.

Optimal
Encodings
and
Language
 

So much for optimal encodings.

In the last lecture we argued that langue was redundant.

This is the same as saying it is NOT an optimal encoding.

Entropy is a measure of the average amount of surpirse (imporbability) per signal. Optimal encodings maximize the amount of information per signal. This is the same as saying they maximize the amount of surprise per signal.

Language does not.

Can we measure this?

Guessing  


['-', '-', '-', '-', '-', '-', '-', '-', '-', 'k', 'l', '-', '-', 
'-', '-', '-', '-', 'f', '-', '-', '-', '-', '-', '-', '-', '-', 
'-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', '-', 'i', 
'-', '-', '-', 'S', '-', '-', '-', '-', '-', '-', 'e', '-', '-', 
'-', '-', '-', 'n']

['T', 'h', 'a', 't', ' ', 'c', 'h', 'u', 'c', 'k', 'l', '-', 'h',
 'e', 'a', 'd', ' ', '-', 'r', '-', '-', ' ', 'a', '-', 'c', 'o',
 'u', '-', 't', 's', ' ', '-', 's', ' ', 'p', 'l', 'a', 'y', 'i',
 'n', 'g', ' ', 'S', '-', '-', '-', 'b', '-', 'l', 'e', ' ', 'a',
 'g', '-', 'i', 'n']

['-', '-', '-', '-', ' ', '-', '-', '-', 'c', '-', '-', '-', '-',
 '-', '-', '-', ' ', 'f', '-', 'o', 'm', ' ', '-', '-', 'c', 'o',
 '-', '-', '-', 's', '-', '-', 's', '-', '-', 'l', '-', 'y', 'i',
 'n', 'g', '-', '-', '-', 'r', 'a', 'b', '-', 'l', '-', '-', '-',
 '-', '-', '-', '-']

['T', '-', '-', 't', '-', '-', '-', '-', '-', '-', 'l', '-', '-',
 '-', 'a', '-', '-', 'f', '-', '-', 'm', ' ', '-', '-', '-', '-',
 '-', '-', 't', 's', '-', '-', 's', '-', '-', '-', 'a', '-', 'i',
 '-', '-', ' ', 'S', '-', '-', 'a', '-', '-', '-', '-', '-', 'a',
 '-', '-', 'i', '-']

['-', '-', '-', 't', ' ', '-', '-', '-', 'c', 'k', 'l', 'e', 'h',
 'e', '-', 'd', '-', '-', '-', '-', 'm', ' ', '-', '-', 'c', '-',
 '-', '-', '-', '-', ' ', '-', '-', '-', '-', '-', 'a', '-', 'i',
 'n', '-', ' ', '-', 'c', 'r', 'a', '-', 'b', '-', 'e', '-', '-',
 '-', '-', '-', '-']

['-', 'h', 'a', '-', ' ', 'c', '-', 'u', '-', 'k', 'l', 'e', 'h',
 '-', 'a', 'd', ' ', 'f', 'r', '-', 'm', ' ', '-', 'c', 'c', '-',
 'u', 'n', '-', 's', '-', '-', '-', ' ', 'p', 'l', '-', '-', 'i',
 'n', '-', ' ', 'S', 'c', 'r', '-', 'b', 'b', 'l', '-', '-', '-',
 '-', '-', '-', '-']

['-', 'h', 'a', 't', ' ', 'c', '-', 'u', '-', 'k', 'l', '-', '-',
 '-', 'a', 'd', ' ', 'f', 'r', '-', 'm', '-', '-', 'c', 'c', 'o',
 'u', '-', 't', '-', ' ', 'i', 's', '-', '-', 'l', 'a', 'y', '-',
 '-', 'g', ' ', '-', 'c', 'r', '-', 'b', 'b', 'l', 'e', ' ', '-',
 'g', 'a', 'i', '-']

['T', 'h', 'a', 't', ' ', 'c', 'h', 'u', 'c', 'k', 'l', 'e', 'h',
 'e', 'a', 'd', ' ', 'f', 'r', 'o', 'm', ' ', 'a', 'c', 'c', 'o',
 'u', 'n', 't', 's', ' ', 'i', 's', ' ', 'p', 'l', 'a', 'y', 'i',
 'n', 'g', ' ', 'S', 'c', 'r', 'a', 'b', 'b', 'l', 'e', ' ', 'a',
 'g', 'a', 'i', 'n']

['T', 'h', 'a', 't', ' ', 'c', 'h', 'u', 'c', 'k', 'l', 'e', 'h',
 'e', 'a', 'd', ' ', 'f', 'r', 'o', 'm', ' ', 'a', 'c', 'c', 'o',
 'u', 'n', 't', 's', ' ', 'i', 's', '-', 'p', 'l', 'a', 'y', 'i',
 'n', 'g', ' ', '-', 'c', '-', 'a', 'b', 'b', 'l', '-', ' ', 'a',
 'g', 'a', 'i', '-']

Measures for
English
 

Given 26 letters the entropy of a completely random stream of letters would 4.7 bits/letter.

absolute rate = r = 4.7

Playing guessing games and recording the averga ehnumber of guesses required to guess a message (correctiung for measure length), we get:

Per letter entropy of English = H = 1.3
Redundancy = r - H = 3.4
That is, the average letter of English cacries 3.4 redundant bits of information.
Information
theory
 

Mathematical theory of infrmation based on notion of entropy and a set of related, probabilistically based concepts.

Founders: Shannon and Weaver