Discrete Probability Crash Course (Boneh)
Background Reading: https://en.wikibooks.org/wiki/High%5FSchool%5FMathematics%5FExtensions/Discrete%5FProbability
Over the years many naturally found cryptographic constructions were found to be insecure. So modern cryptography relies on rigorous proofs of security.
The language used to describe security is discrete probability.
Universes and Distributions
Discrete probability is defined over a universe \(U\). In our case the universe will always be a finite set.
Very commonly our universe will be the set of all n-bit strings, denoted by \(U = \{0,1\}^n\)
So eg \(\{0,1\}^2 = \{00,01,10,11\}\).
In any set \(\{0,1\}^n\) there are \(2^n\) elements.
A Probability Distribution \(P\) over \(U\) is a function that assigns every element in the universe a number between 0 and 1, such that the sum of P over all elements in U is 1.
That number is the weight, or probability of the element.
There are several classic examples of probability distributions.
The first is uniform distribution where for all \(x \in U\) \(P(x) = \frac{1}{|U|}\)
IE every element in the universe has the same weight.
If I sample at random over this distribution, I’ll end up with an even distribution of elements.
Another classic distribution is point distribution at x_0 where \(P(x_0) = 1\) and for all other elements 0.
If I sample from this distribution, I’ll always pick the same element.
Since in our case U is a finite set, we can capture the distribution as a Distribution vector, which is a record of the entire distribution weights.
In the case of \(\{0,1\}^3\) for example, we’ll end up with a vector of dimension 8, a vector of 8 real numbers.
Events
Consider a subset \(A \subseteq U\).
The probability of A will be the sum of the weights of all elements in A: \(Pr[A] = \sum_{x \in A} P(x) \in [0,1]\).
The set A is called an event, and the probability of A is the probability of that event.
For example let \(U = \{0,1\}^8\). The set of byte strings
Let A = { all x in U such that \(lsb_2(x) = 11\) } (the set of byte strings where the 2 lsb are both 1).
For the unform distribution on U, then the Pr[A] = 0.25
A simple bound on the probabilty of two events is the union bound.
For events \(A_1\) and \(A_2\):
\[ Pr[A_1 \cup A_2] \leq Pr[A_1] + Pr[A_2]\]
If the two events are disjoint, the sum will be equal to the union probability.
Random Variables
Random variables are intuitive objects, but the formal definition is confusing.
Formally, a random variable \(X\) is a function \(X: U \rightarrow V\)
It’s a function from the universe to a set \(V\). The set V is the set from where the random variable takes its value.
For example \(X: \{0,1\}^n \rightarrow \{0,1\}\)
This maps our n bit strings to a single bit. It could do this by taking the least significant bit, for example, such that \(X(y) = lsb(y) \in \{0.1\}\).
For the uniform distribution on U, the probability of \(X=0\) and \(X=1\) will be half.
More generally, a random variable \(X\) induces a distribution on \(V\). \(pr[X=v] := Pr[X^{-1}(v)]\)
In other words the probability that X yields \(V\) is the same as the probability of taking a sample from \(U\), applying a function to it, and seeing if the output is \(V\).
There is a particularly important random variable called the uniform random variable.
Let \(U\) be some finite set.
We write \(r \xleftarrow{R} U\) to denote a uniform random variable over U.
for all \(a \in U\): \(Pr[r=a] = \frac{1}{|U|}\)
formally, \(r\) is the identity function: \(r(x) = x \forall x \in U\)
Randomized Algorithm
A deterministic algorithm is one that takes a particular input and always return the same output: \(y \leftarrow A(m)\)
A randomized algorithm is a little different, it still takes an input \(m\) but has an implicit argument \(r\) where \(r \xleftarrow{\)} \{0,1\}^n$
\(r\) is sampled anew every time the algorithm is run. Now every time we run the algorithm we’ll get a different output.
A way to think about this is that it’s defining a random variable. Given a particular input message \(m\), it’s defining a random variable, that’s a distribution over all possible outputs of this algorithm:
\[y \xleftarrow{R} A(m)\]
For example let \(A(m;k) = E(k,m) , y \xleftarrow{R} A(m)\)
We can think of this as an encryption algorithm, the algorithm \(A\) takes a message \(m\) and an implicit input \(k\) and encrypts the message \(m\) according to the input.
It defines a random variable that is an encryption of the message \(m\) a distribution over all possible encryptions of the message \(m\), under a uniform key.
Indepedendence
Events A and B are independent if \(Pr[A \land B] = Pr[A] \cdot Pr[B}\)
If you know that event A happened, that tells you nothing about whether B happened.
the same can be said for random variables
Random variables X,Y taking values in V are independent if:
\[\forall a,b \in V: Pr[X=a \land Y=b] = Pr[X=a] \cdot Pr[Y=b]\]
XOR
XOR of two strings in \(\{0,1\}^n\) is their bitwise addition mod 2.
XOR is central in cryptography, there’s a joke that all cryptographers know is how to XOR bit strings.
Why is this? XOR has a really important property:
Suppose we have a random variable Y over \(\{0,1\}^n\), with arbitrary distribution, and X, an independent uniform variable on \(\{0,1\}^n\)
Then \(Z := Y \oplus X\) is a uniform variable on \(\{0,1\}^n\)
So if I take an arbitrary distribution, and XOR it with a uniform variable, I will end up with a uniform variable.
The Birthday Paradox
Let \(r_1, \dots , r_n \in U\) be independent identically distributed random variables.
If we choose the square root of the size of \(U\) elements, there’s a good chance that two of the elements will be the same.
So for example if we have a universe of \(\{0,1\}^{128}\) - ie \(2^{128}\) elements. If I sample from it \(2^{64}\) times, the probability of two sampled messages being the same is greater than or equal to 1/2.
It’s often described in terms of birthdays, since if we have \(1.2 * \sqrt{365} = 24\) people in a room there’s a good chance two people will share the same birthday.
But people’s birthdays are not uniform…
As we go over the square root of the universe, we converge very quickly on the probability of repeat sampling being 1.