Zero knowledge proofs

Proving you know it without telling it


Review
Revealing THAT you know x without revealing x.
Heads and tails, given a 1-1 one-way function f
Alice and Bob want to play a game of heads and tails online
1) Alice choose an even/odd x.
2) Alice computes f(x) and sends it to Bob.
3) Bobs guesses even/odd (heads/tails) and sends his guess to Alice, essentially guessing which kind of x produced the f(x) he just got.
4) Alice informs Bob whether he's right by sending x
5) Bob confirms that he's not being hornswoggled by computing f(x) (Note: he probably only bothers to do this when he's lost)
6) Bob or Alice pays off depending the result.
One-way functions

In order for Alice to be able to send Bob the required information at the required time, without giving him an incredible advantage, f   must be a 1-1 one-way function.
  • Easily computable direction
    f(x) must be easily computable given x.
  • Hard to compute direction
    x must be hard to compute given f(x),
  • f   is 1-1. Given any result f(x), there is a unique x that produced it.
  • When Bob gets f(x) from Alice, he has no way of knowing what x produced it. Thus he must guess x, as desired. At the same time, once Alice sends f(x), she is committed to that x because f   is 1-1.
    Examples of one-way functions
    Raising to a power in modular arithmetic
    3353 mod 563 = 52
    log3 52 mod 562 = ??
    Note that raising to a power is 1-1 in modular arithmetic. Only one integer can have log 52 relative to base 3 mod 563.
    Multiplying two large primes together
    Using Kuchling's Python number and randpool modules. the following imports assume the module structure used in The Python RSA Gui (pyrsagui) (If you install Kuchling's Cryptography tools yourself, they will undoubtedly require different import commands):
    >>> from Crypto.Util.number import *
    >>> isPrime(8)
    0
    >>> isPrime(27)
    0
    >>> isPrime(29)
    1
    
    A list of the small prime numbers (less than 256)
    sieve =
    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
      53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 
     109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 
     173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 
     233, 239, 241, 251]
    
    Also:
    >>> isPrime(65537)
    1
    >>> pow(2,16)+1
    65537
    
    Note that 65537 is the number PyRSAGUI always uses for e.

    Finding random prime numbers! (Note: In my variant of Kuchling's implementation, RandomPool objects include methods that access Python's standard "random" module, which is unsuitable for cryptography):

    >>> from Crypto.Util.randpool import *
    >>> pool = RandomPool(32)
    >>> p = getPrime(32, pool.get_bytes)
    >>> p
    3662502317L
    >>> q = getPrime(32, pool.get_bytes)
    >>> q
    2480067991L
    >>> p = getPrime(32, pool.get_bytes)
    >>> isPrime(p)
    1
    >>> isPrime(q)
    1
    >>> n = p * q
    >>> n
    9083254763355035147L
    
    Given n = p *q, p, q prime, there is a unique pair of integers that are the factors of n, but it is in general very difficult to find p and q.
    Users of one-way functions
    Diffie-Hellman key-exchange protocol uses raising to a power mod n.
    RSA uses multiplcation of primes


    One way
    Relations
     

    Consider some instances of what we'll first call hard problems.

    Consider a graph H isomorphic to a given graph g.

    1. Creating a new graph H isomorphic to a given graph G is easy.
    2. Given two graphs G and H it is a VERY hard problem to find an ismorphism between them.
    3. Given a proposed isomorphism, it is easy to check that is IS an isomorphism.
    Consider for example the following pairs of graphs (In which not just names but also the layout has been changed):

      Graph A

      Graph B
    The desired isomorphism is:
      Graph A Graph B
      0 0
      1 3
      2 1
      3 2
      4 5
      5 4
    To check this solution in the A to B direction, we relabel the transition table for B with the names from A and make sure it is the same transition table as A's, and similarly in the B to A direction.

    It will be helpful to have TWO examples of hard graph problems.

    Consider the problem of finding a Hamiltonian cicuit. A Hamiltonian path through a graph is a path that visits each vertex exactly once. If the path can be extended by one to end up at the same vertex it started at, we have not just a Hamiltonian path, but a Hamiltonian circuit:

    Just as with the problem of isomorphism, it is easy to verify that a Hamiltonian circuit is in fact a Hamiltonian circuit if given one, but it is very hard to find one (and not every graph has one).

    Both graph problems have another important feature. It is easy to generate instances of the problem. You can generate an instance of a graph with a Hamiltonian cycle simply by defining transitions between consecutively created nodes. You can generate an instance of two graphs with an isomorphism simply by creating an arbitrary graph and then an arbitrary relabeling of its nodes. So one of these problems have 3 features:

    1. It is hard too solve (computationally impractical).
    2. It is easy to generate instances of the problem complete with known solutions
    3. Once a candidate solution is given, it is easy to check it.
    We will call problems with these features one-way relations. They are relations rather than functions because the solutions are not unique. There may be more than one Hamiltonian circuit through a graph. There are arbitrarily many graphs isomorphic to a given graph.

    Note the similarity with one-way 1-1 functions:

    1. One-way 1-1 functions are hard to compute in one direction.
    2. It is easy to generate instances of the "problem" because it is easy to choose an appropriate x   and compute f(x).
    3. For the same reason, knowing f(x),   it is easy to check a candidate y   to see if f(y)=f(x)
    Ultimately one-way functions are a special case of one-way relations. You can think of f(x)=y as a relation between x and y, written:
    f(x,y)
    
    Knowing y it is hard to find an x that stands in the f relation to it. The only thing that makes one-way 1-1 functions special, then, is that there is guaranteed to be exactly one such x
    Zero knowledge
    protocol
     

    The following protocol uses two one-way relations, the graph examples introduced above.

    Secret transfer and commitment stage:

    1. Alice and Bob share a comon graph G. Alice knows a Hamiltonian circuit (she generated G so as to have one).
    2. Alice creates new graph H isomorphic to G, encrypts the transition table of H line by line
      H' = ''
      for each line l in the table for H:
         H' += EK(l),
      
      and sends H' to Bob.

    Verification stage:

    1. Bob challenges Alice to do one of two things:
      1. Either prove that H' is an encryption of a graph isomorphic to G
      2. Show him a Hamiltonian circuit for H.
    2. Alice responds either by:
      1. Revealing K and the relabeling of G that gives H; or
      2. Decrypting only the lines of the transition table for H that show the cycle.
    Note that that Alice can't do BOTH. Because if you know both the isomorphism and the circuit for H, then you can just use the relabeling information to construct a circuit for G. So performing both tasks reveals the circuit for H. But performing just one of the two reveals nothing.

    The key wrinkle is this: Alice and Bob need to repeat step (1)-(4) an arbitrary number of times to guarantee that Alice isn't cheating. For suppose Alice DOESN'T in fact know aHamiltonian circuit for G:

    1. if Alice guesses correctly that Bob is going to ask for a proof of isomorphism, she only needs to create a graph isomorphic to G, which does not require knowing a circuit.
    2. if Alice guesses correctly that Bob is going to ask for a circuit of H, she constructs a new graph not isomorphic to G (but with the same number of nodes and edges) and builds a circuit into it that she knows.
    If she guesses wrong she's found out, because neither strategy works if Bob asks for the wrong solution. So she has a 50% chance of hornswoggling successfully for one round, a 25% chance of hornswoggliong sucessfully for 2 rounds, a 12 1/2% per ecnt of succeeding for 3, and so on.

    Bob can run the protocol for as many rounds as he needs, to guarantee whatever level of security he wishes. At the of n rounds there is a 1 in

    2n
    
    chance that Alice has succeeded in hornswoggling him. This means Bob can run the protocol until he becomes reasonably certain that Alice really DOES know a Hamiltonian circuit, without learning anything about what that Hamiltonian circuit is.

    Alice has given Bob what is called a zero-knowledge proof that she knows a circuit. A zero-knowledge proof of p is a proof that p is known which reveals 0 information about p is.

    The XOR
    Strategy
     

    The strategy being used in the zero-knowledge proof just described might be called the XOR strtegy.

    Recall the XOR stretgy from a previous lecture:

    We want to split a certain piece of information between Alice and Bob and arrange things so that neither of them can learn ANYTHING about it without the help of the other.

    Let's say the information is:

    Howdy Doody
    
    We can't simply split this in half because anyone who knows Howdy has a good chance of guessing that Doody is the other half. And anyone who knows Doody has a good chance of guessing that Howdy is the other half. And in general kmowing half of any text gives you a pretty good start on guessing the other half.

    The way to do this is with XOR-ing:

      Alice x Bob Secret
      1 x 1 0
      1 x 0 1
      0 x 1 1
      0 x 0 0
    Notice that if Alice has a 1 that can lead either to a 1 or a 0 in the secret, depending entirely on what Bob has. Notice that if Alice has a 0 that too can lead either to a 1 or a 0 in the secret, depending entirely on what Bob has. Thus knowing one of the binary numbers in Alice or Bob's half of the secret gives no information about the corresponding number in the secret.

    So Alice and Bob both have 0 information about the content of the secret (Howdy Doody).

    The zero-knowledge proof of the knowledge of the graph circuit is really a generalization of this strategy.

    We split the information of the circuit into pieces, a relabeling of G and a circuit in the relabeled graph H. Neither piece of information alone gives away the circuit in G; put together they determine it completely in a way that is easy to compute.

    Note one difference: Knowing 1/2 of an XOR string really gives NO infromation about the original bits. In contrast, revealing the circuit in H does in fact give information about the circuit in G. It just does it in a way that creates a major computational problem, a problem of a difficulty equal to the original problem of finding a circuit in G.

    The value of Alice's information is the computational work it saves. She protects it with a secret Bob would have to spend an equal amount of computer time to penetrate.

    Usefulness
    of zero-knowledge
     

    What is the use of such a thing?

    Suppose Alice knows a secret number that only Alice is supposed to know. This number might be used to verify who she is, or that she has a certain assets that can be used in a commercial transaction.

    She wants to convince Bob, a merchant, of the fact that she has the number, but she doesn't want to reveal the number. for a variety of possible reasons:

    1. She wants to re-use it.
    2. The number itself leads to information she wants to keep secret, such as her true identity.
    Thus Bob needs to become convinced that she has a number that has some given property P and he needs to learn nothing about the number.

    This is a zero-knowledge proof problem.

    We will see the logic of zero-knowledge proofs applied in digital cash.

    Zero-knowledge
    Identity
    Proofs
     

    Let's assume that what's secret is x and what's public is f(x), where f is some one-way function. Of course this is the situation with private and public keys. In general, in this kind of situation, knowledge of x can be used as a proof of identity.

    But we don't want to reveal private knowledge every time we prove our identity. So this is just the kind of situation that calls for a zero-knowledge proof.

    To illustrate the use of zero-knowledge in identity verification, we'll take as our example of f(x) 

    f(x) = ax mod n:   raising to a power in modular arithmetic.
    
    This is a one-way function because the inverse problem, the discrete-log problem is very hard.

    The following algorithm is due to Chaum, Evertse and van de Graff ["An improved Protocol for demonstrating possession of some logarithms, and some generalizations"; Advances in Cryptology: EUROCRYPT '87 Proceedings, Springer-Verlag 1988 --. 127-141].

    Requirements

    We need p, a prime, A, B, x such that:

    Ax ≡ B (mod  p)
    
    Public Private
    A, B, p x

    Commitment stage

    1. Alice generates z random numbers r1, r2, r3, ... rz, all less than p - 1.
    2. Alice sends Bob
      hi = Ari mod p
      
      for all the ri.
    3. Alice and Bob follow a coin-flipping protocol to generate z bits b1, b2, b3, ... bz.
    4. For each bit bi, Alice sends Bob something we'll call si. To compute si, Alice follows this rule:
      1. If bit bi=0,
        si = ri
        
      2. If bit bi = 1,
        si = (ri - rj) mod (p -1)
        
        where j is the lowest value for which bj= 1.

    Confirmation part I (confirming protocol adherence):

    1. For each bit bi, Bob confirms the si Alice sent conforms to the rule by doing the following:
      1. If bi=0, he confirms that
        Asi  ≡ hi  mod p 
        
        This confirms that si = ri.
      2. If bi=1, he confirms that
        Asi  ≡ hi * hj-1  mod p 
        
        This confirms that si = (ri - rj) mod (p -1) since:
        A(ri - rj)Ari * A-rj
        = Ari * (Arj)-1
        = hi * (hj)-1
        
      Notice that this confirmation step does not require that Bob know rj, which is good, since in fact Bob does NOT know rj. He has been sent:
      (rj - rj), (rl - rj), (rm - rj), ... (rp -rj)
      
      where rl - rj, rm - rj, ... rp -rj correspond to all the other bits that were 1.
    In fact, Bob knows NONE of the random numbers whose corresponding bit is 1. However, if he learns rj, he knows all of them.

    Confirmation Part II (Alice proves she knows x, the discrete log of B)

    1. Alice sends
      sz+1 = (x - rj)
      
    2. Bob confirms that:
      Asz+1 ≡ B * hj-1  mod p 
      

    Discussion

    Note first of all that by assumption

    Ax ≡ B,  Arj ≡ hj mod p
    
    so in step 7, the congruence Bob verifies should obtain because:
    Asz+1 A(x - rj)  Ax * A-rj  ≡ B  * A-rj  B  * (Arj )-1 ≡ B  * hj-1
    

    Alice successfully cheats if she succeeds in convincing Bob that she knows the discrete log of B (x), without actually knowing it.

    Now a necessary condition for Alice to cheat is that she know WHICH r is going to be rj (the first random number to be assigned 1 as its bit).

    Here's how that helps. Alice chooses a number to serve as sz+1 in advance and computes:

    Asz+1
    
    She then solves the equation:
    Asz+1 ≡ B  * y
    
    for y. Then the inverse of y is what she sends to Bob as hj. And 0, of course is what she sends as sj.

    Notice that if she does this, she does not know rj, because that is the discrete log of hj:

    Arj ≡ hj mod p
    
    And finding the discrete log of a number is a hard problem. So hj can not both be determined by independent constraints and be something we know the discrete log of.

    Now what does Alice send as her other his and sis? For the verification step in (5) each si has to pass the test:

    Asi ≡ hi * hj-1 mod p
    
    So she can choose an si at random. Then each hi thus needs to be the the z that solves the following equation:
    Asi ≡ z  * hj-1
    
    Again this is easy, but this kind of cheating again precludes knowing ri, because that is the discrete log of hi, which is hard to compute. Let's call an hi that has been computed in this way a bogus hi and an hi that has been computed by choosing ri and then computing
    Ari ≡ hi
    
    a genuine hi. The key fact distinguishing bogus his is that Alice does not know the discrete logs of bogus his.

    So on this scheme, Alice has to prepare bogus his for ALL the bits bi that are going to get value 1 in advance.

    Thus to cheat Alice must thus somehow guess ALL of the bit values right for b1...bn, because she has to send all her his before the bits are chosen.

    Now consider what happens if she guesses wrong for one of those bits. Suppose bi, a bit she thought was a 0, actually gets a 1. Then she has sent a genuine hi. Now she must send an si that passes the test:

    Asi ≡ hi * hj-1 mod p
    
    Now this as shown above is trivial if you know the discrete logs of hi AND hj. But Alice knows only the discrete log of hi, and thus is faced with the equally hard problem of finding the discrete log of hi * hj-1. Thus she is exposed.

    Suppose on the other hand that a bit she thought was a 1 gets a 0. Then she is asked to send the discrete log of a bogus hi. Once again she must solve a discrete log problem, a problem just as hard as the problem she is pretending to solve.

    Conclusion: The only way Alice can successfully cheat is to guess all the bits right in advance. She has only a 1 in 2z chance of doing that.