Stream Ciphers Part Three
What is a Secure Cipher?
PRG Security Definitions
Consider a PRG \(G: K \rightarrow \{0,1\}^n\)
We will define what it means that the output of the generator be indistinguishable from random.
We’ll pick a random key from the keyspace, take its output and identify why it is indistinguishable from a random choice of the output space.
\[[k \xleftarrow{R} K,\;output \; G(k)]\]
is indistinguishable from:
\[ [r \xleftarrow{R} \{0,1\}^n. output r]\]
In other words an adversary looking at the output of the generator can’t distinguish it from a uniform sampling of the output set.
To determine whether it is indistinguishable we use a statistical test.
A statistical test on \(\{0,1\}^n\).
We define an algorithm \(A\) such that \(A(x)\), where \(x\) is an \(n\) bit string, outputs 0 or 1.
We’ll think of 0 as saying that the input (x) is not random, and 1 as saying that it is random.
Here are some examples of what A could be - A(x) = 1 iff the number of 0s in x minus the number of 1s in x is not too far apart (eg less than 10 . root n).
Another example: A(x) = 1 iff the number of consecutive 0s (00) minus n/4 (the statistical likely number) is less than 10 . root n
One more: A(x) = 1 iff the max run of 0s in x is less then 10 . log2(n). (note that this passes the all 1 string)
In the old days we’d take a battery of similar statistical tests and if a string passes the battery it would pass.
How do we evaluate a statistical test?
We define the advantage concept.
The advantage of a statistical test against a PRG is defined as the difference of two quantities.
We ask how likely is the test to return 1 when we sample at random from the key space: \(Pr\left[ A \left( G(k) \right) = 1 \right]\) where \(k \xleftarrow{R} K\) versus how likely is it to output 1 when we give it a truly random string.
If the advantage is close to 1 the test behaved differently in pseudorandom cases than truly random output. A can distinguish G from random.
If the advantage is close to 0 the test behaves the same on pseudorandom cases from random output. A cannot distinguish the generator from random.
So for example a test that always outputs the same result will have no advantage at all.
Now we can define a secure PRG.
A PRG is secure if no efficient statistical test can distinguish it from random: For all efficient statistical tests the advantage of the test on G is negligible.
So are there provably secure PRGs? Currently unknown. If we could prove it that would imply p != np.
But we have heuristic candidates.
A secure PRG is unpredictable. The predictor feeds the statistical test.
A theorem from Yao says the converse is true: If next-bit predictors cannot distinguish G from random then no statistical test can.
We can generalise to say that any two probability distributions are indistinguishable computationally if for all efficient statistical tests the difference between the output on the two distributions is negligible.
Semantic Security
If we use a secure PRG, we get a secure stream cipher, we’ll show this.
What is a secure cipher? When we define security we define it in terms of what can the attacker do, and what is the attacker trying to do.
In the context of stream ciphers, these are used with a one time key, so the most the attacker will see is one ciphertext.
What is it for the cipher to be secure? Possible requirements:
- Attacker cannot recover the secret key.
This is no good - imagine a cipher that used a secret key bug ignored it and output the message. Key is unrecoverable, but the attacker still gets the message.
- Attacker cannot recover the entire plain text.
This is no good either. Imagine a cipher that takes two messages and outputs one in plain text, the other encrypted. The attacker gets the cipher text and can see one of the source messages.
Recall Shannon’s idea - cipher text reveals no information about the plain text.
Shannon’s perfect secrecy:
Where \((E,D)\) is a cipher over \((K,M,C)\), \((E,D)\) has perfect secrecy if \(\forall m_0,m_1 \in M \; (|m_0| = |m_1|)\) (messages of the same length).
\(\{E(k,m_0)\} = \{E(k,m_1)\}\) where \(k \leftarrow K\).
So the distributions are identical, and the attacker can’t identify anything about the messages.
The definition is too strong, in that requires long keys. We can loosen it a little to say that, rather than identical, the distributions have to be computationally indistinguishable.
So an efficient attacker cannot distinguish the distributions, even if they are different.
This is still too strong. Rather than holding true for all messages \(m_0\) and \(m_1\) it needs to be true only of messages that the attacker exhibits explicitly.
This leads us to the definition of semantic security, which for one-time keys we define as follows.
For \(b=0,1\) define experiments \(EXP(0)\) and \(EXP(1)\) as:
We have an adversary A (analogue of statistical test).
We have two challengers (one taking an input bit (b) of 0, the other 1).
The challenger picks a random key.
The adversary gives the messages \(m_0\) and \(m_1\) with equal length.
The challenger will output either the encryption of \(m_0\) or \(m_1\) (ie they output \(m_b\)).
The adversary tries to guess whether they received the encryption of \(m_0\) or \(m_1\).
We define an event \(W_b := \left[ \textrm{event that} EXP(b)=1 \right]\)
Now we can define the semantic security advantage of the adversary as:
\[\textrm{Adv}_{SS}[A,E] := \left| Pr[W_0] - Pr[W_1] \right| \in [0,1]\]
So we look at whether the attacker behaves differently depending on whether they were given \(m_0\) or \(m_1\).
We’re interested in whether the adversary output 1 in the experiment.
If the probability that the adversary output 1 in both experiments is the same, then the adversary wasn’t able to distinguish when it received \(m_0\) or \(m_1\).
So if the advantage is close to 0, the adversary couldn’t distinguish, if it’s close to 1, then the adversary could distinguish very well.
Now we can say that the cipher is semantically secure for all efficient A, if the advantage is negligible.
Semantic Security of Stream Ciphers
We want to prove that given a Generator \(G\) producing a key \(K\) in the keyspace \(\{0,1\}^n\), which we know to be secure, the resulting stream cipher \(E\) based on \(G\) is semantically secure.
We’ll show that for all semantaically secure adversaris A, there exists a PRG adversary B, such that the Advantage of A over E is less than or equal to twice the advantage of B.
Since we know that the advantage of B is negligible, it follows the advantage of A will be negligible.
The way we do this is to change the behaviour of the challenger to pick a truly random value in the keyspace. We know that this is indistinguishable from the PRG output based on the security of the PRG.
Now the adversary is playing the game with a OTP, but we know that the advantage in a OTP is 0. The adversary can’t tell the difference between the OTP and the PRG, it can’t beat the OTP, and so it can’t beat the PRG.