In dictionary, cryptography is defined as the art of writing or solving codes. Historically, cryptography focused exclusively on codes (or private-key encryption schemes) ensuring secret communication between two parties who share secret in advance. Modern cryptography however has a much broader scope like data integrity, user authentication, protocols, etc. Furthermore cryptography also considers the public-key setting, where communicating users do not share any secret in advance.
Modern cryptography is design, analysis, and implementation of mathematical techniques for securing information, systems, and computation against adversarial attack. Instead of being an art in the past, nowadays modern cryptography is much more of a science and has permeated other areas of computer science.
Private-key Encryption
A private-key encryption scheme is defined by a message space M and algorithms (Gen, Enc, Dec):
- Gen: Key-generation algorithm, which generates k.
- Enc: Encryption algorithm, which takes key k and message m β M as input; and outputs cipher text c.
c β Enck(m)
- Dec: Decryption algorithm, which takes key k and cipher text c as input; outputs m or “error”.
m := Deck(m)
Note: β
means assignements to the output of an algorithm that might by randomized. :=
means assignments to the outputs of a deterministic algorithm. So we are allowing encryption to be randomized, meanwhile decryption is deterministic.
Modular arithmetic
x = x’ mod N | x and x’ have the same reminder when divided by N. |
[x mod N] | The remainder when x is divided by N. The unique value x’ β {0, …, N-1} such that x = x’ mod N |
For example:
- 25 = 35 mod 10
- 15 β [35 mod 10]
- 5 = [35 mod 10]
Kerckhoffs’s principle
The encryption scheme being used is not to be considered secret. But instead, the only secret information is the key shared by the parties. This implies the key must be chosen randomly. If not, then the attacker knows it.
Some arguments are in favor of this principle:
- It is easier to keep secret key than secret algorithm.
- It is easier to change key than to change algorithm.
- Standardization – ease of deployment, public validation.
The key space should be large enough to prevent “brute-force” exhaustive search attacks. However have a large key space does not necessarily guarantee that an encryption scheme is secure.
Hexadecimal Notation
Hexadecimal notation is just a way of describing integers in a different base (base 16) other than the usual decimal base (base 10) that we are familiar with. There is a one to one correspondence between each hex digit and sequence of four bits (called nibble, half of a byte). For example:
Hexadecimal | Decimal | Binary |
0x10 | 1*16 + 0 = 16 | 0001 0000 |
0xAF | A*16 + F = 10*16 + 15 = 175 | 1010 1111 |
ASCII Representation
Even though ultimately everything in a computer is represented as bits, ASCII representation is very often used to represent English letters or characters. In ASCII representation each character is represented using 1 byte (8 bits, 2 hex digits). For example
Character | Hexadecimal | Binary |
‘1’ | 0x31 | 0011 0001 |
‘F’ | 0x46 | 0100 0110 |
Vigenère Cipher
The key of this encryption scheme is a string of letters from English alphabet. To encrypt a given plaintext with a key, one simply shifts every character in the plaintext by the amount dictated by the next character of the key modular 26, wrapping around in the key as needed. Decryption just reverses the process. For example:
plain text: tellhimaboutme
key: cafecafecafeca
cipher text: veqpjiredozxoe
A variant of Vigenère cipher is to view plain text and key in terms of bytes rather than of letters. The key is a string of bytes. The plain text is a string of ASCII characters. To encrypt, XOR each byte in the plain text with the next byte of the key, wrapping around in the key as needed. We start with an ASCII plain text and we are going to have a cipher text which is presented in Hexadecimal. The decryption can reverse the process, because XOR is invertible.
- It is easier to implement.
- It is easier to use since we are not limited to lowercase characters.
- It is easier to work with byte-wise XOR rather than modular addition. Take a byte of plain text, and take a byte of the key, and XOR them together to get the next byte of the cipher text.
For example, to encrypt the string “Hello!” with key 0xA1 2F
:
plain text: 0x48 65 6C 6C 6F 21
XOR with: 0xA1 2F A1 2F A1 2F
cipher text: 0xE9 4A CD 43 CE 0E
Attacking the Vigerère cipher
When we are attacking the scheme, we are given some cipher text, we are going to try to recover the key, there are two steps involved until we have the whole thing:
Step1: Determine the key length
It is possible to determine the plain text letter frequencies, by referring to Wikipedia or by calculating them yourself over some corpus. Let pi
for 0 β€ i β€ 255 be the frequency of byte i in the plain text. Suppose we use ASCII representation of English, pi = 0
for i < 32 or i > 127, while p97
is the frequency of character ‘a’. You probably would also need to take account the punctuations, spaces, numerals, upper and lower cases, etc. The distribution of pi
is very far from uniform.
If the key length is N, then every N-th byte of the plain text is encrypted using the same XOR. If we take very N-th byte in the cipher text and then calculate the frequencies of different bytes, then what you expect to get are exactly the pi
values, but in some permuted order.
On the other hand if we take every M-th byte (M is not a multiple of N), then we are picking out bytes that are all XOR-ed by different key bytes. In this case we would not expect the frequencies of those bytes to match up the frequencies of the underlying plain text, we should get something close to uniform.
The both cases above can be clearly distinguished by calculating the summation of candidate frequencies squared, i.e.: β(qi)2
. In the former case, β(qi)2 β β(pi)2
, but in the latter one β(qi)2 β 1/256
. What you do is simply try all possible values for the key length and find the max value of β(qi)2
.
The time cost of this step is about 256 * L, where the key length is between 1 and L.
Step2: Determine each byte of the key
Assume the key length N is known. Starting with any byte i we want, we pluck out every N-th bytes in the cipher text. We could call this the i-th cipher text stream. Every byte in the cipher text stream was generated by XOR-ing some underlying plain text with the same byte of the key.
Then we try to decrypt that stream using every possible byte B ranging from 0 to 255. It will give us a candidate plain text stream for each possible value of that byte B. We still expect that the plain text frequencies are observed. We can deduce a couple of things when guessing whether B is correct:
- All bytes in the plain text stream will be meaningful, say value is between 32 and 127.
- The calculated frequencies should be close to known letter frequencies
β(qi pi) β β(pi)2 β 0.065
The time cost of this step is about 2562 * L, where the key length is between 1 and L.
For comparison, if the brute-force key search is used, the number of possible keys of length L is about 256L.
The attack gets more reliable as the cipher text grows larger. The attack still works for short cipher texts, but more tweaking and manual involvement is needed.
Principles of Modern Cryptography
Cryptography was defined as an art in the past, in the late ’70s cryptography began to develop into more of a science. The three core principles of modern cryptography:
- Formal definitions – precise, mathematical model and definition of what security means
- Assumptions – clearly stated and unambiguous
- Proofs of security – move away from design-break-patch
Cryptography remains partly an art as well, even within the paradigm of formal definitions and rigorous proofs. There is lots of room for creativity in coming up with novel definitions for new applications.
Perfect Secrecy
In general cryptographic definitions have two components:
- Thread mode: what capabilities the attacker is assumed to have in real world.
- Security guarantee/goal: what we want to prevent the attacker from doing.
In the settings of private-key encryption scheme, there are several different threat models we could consider:
Cipher text only attack | The attacker only gets to observe cipher text being sent and nothing else. |
Known plain text attack | In addition to the cipher text whose underlying plain text is unknown, the attacker knows the cipher text encrypted using the same key along with the corresponding plain text. |
Chosen plain text attack | In addition to the cipher text whose underlying plain text is unknown, the attacker is able to obtain cipher text encrypted using the same key corresponding to the plain text of the attacker’s choice. |
Chosen cipher text attack | In addition to having the ability to carry out a chosen plain text attack, the attacker is able to get the parties to decrypt certain cipher texts of that attacker’s choice. |
For many years the cryptographers converge upon the definition of secure encryption, that an encryption scheme is secure if the following is true (the informal definition):
Regardless of any prior info that the attackers has about the plain text, the cipher text should leak no additional information about the plain text.
Probability review
Random variables are variable that take on (discrete) values with certain probabilities. Discrete means there is some finite set of values that variable can possibly take.
Probability distribution of a random variable specifies the probabilities with which the variable takes on each possible value. Each probability must be between 0 and 1, all of them must sum to 1.
Event is a particular occurrence in some experiment. Pr[E]
means the probability of event E.
Conditional probability is the probability that one event occurs, assuming some other event occurred. Pr[A|B] = Pr[A and B] / Pr[B]
Two random variables X and Y are independent if for all x, y: Pr[X=x | Y=y] = Pr[X=x]
Law of total probability: say E1, E2, …, En are a partition of all possibilities, then for any event A:
Pr[A] = βi Pr[A and Ei] = βi Pr[A|Ei]βPr[Ei]
Formal definition
Let M be a random variable denoting the value of the message, the M ranges over the message space M. The random variable M is meant to reflect the likelihood of different messages being sent by the parties and this takes into account anything that the attacker might know about what those messages might be. For example: Pr[M='attack today'] = 0.7; Pr[M='don't attack'] = 0.3
.
Let K be a random variable denoting the key, the K ranges over the key space K. If we fix the encryption scheme (Gen, Enc and Dec), then key generation algorithm Gen defines for you the probability distribution for K: Pr[K=k] = Pr[Gen outputs key k]
.
A very crucial and important assumption: M and K are independent. It means the message that a party sends does not depend on the key used to encrypt that message.
Let C be a random variable denoting the value of the cipher text. The process of sampling a random message, sampling a key, and encrypting the message using the key defines some distribution on the cipher text.
What we are going to do now is to equate “the attacker’s information about the plain text” with “the attacker’s known distribution over the plain text”. The perfect secrecy will require that observing the cipher text should not change the attacker’s knowledge about the distribution of the plain text at all.
The scheme is defined to be perfectly secure if and only if the a priori distribution before observing a cipher text and the a posterior distribution after the observing a cipher text should be exactly equal. The formal definition is below:
Encryption scheme (Gen, Enc, Dec) with message space M and cipher text space C is perfectly secret if, for every distribution over M, every m β M, and every c β C with
Pr[C=c] > 0
, it holds thatPr[M=m | C=c] = Pr[M=m]
.
We could use this to demonstrate that the shift cipher is not perfectly secret.
One-Time Pad
The scheme that achieves this formal definition is known as the one-time pad, which was patented in 1917 by Vernam, and proven perfectly secret by Shannon in 1940s.
Message space M | The set of all bit strings of length exactly n.M = {0, 1}n |
Gen | Choose a uniform key k from the set of all n-bit strings, i.e.: k β {0, 1}n .Each possible n-bit string is chosen with probability 1/2n . |
Enc | Bit-wise XOR the key with the message.Enck(m) = k β m |
Dec | Bit-wise XOR the key with the cipher text.Deck(c) = k β c |
The correctness can be proven by the fact that Deck(Enck(m)) = k β (k β m) = (k β k) β m = m
.
In order to prove that one-time pad is perfect secret, let us just fix some arbitrary distribution over the massage space M, and fix arbitrary m, c β {0, 1}n. Using Bayes’ Law, we could get:
Pr[M=m | C=c]
= Pr[C=c | M=m] β Pr[M=m] / Pr[C=c]
Use the law of total probability, we could express the probability of cipher text equal to some value c. Note M=m’ do indeed partition the space because the message has to take on some value, and the only possibility of those values are the every message in the message space M.
Pr[C=c]
= βm' Pr[C=c | M=m'] β Pr[M=m']
= βm' Pr[K=(m'βc)] β Pr[M=m'] # one-time pad specifics
= βm' 2-n β Pr[M=m']
= 2-n
Continue the derivation above, we could show this below and complete the proof: one-time pad achieves perfect secrecy.
Pr[M=m | C=c]
= Pr[C=c | M=m] β Pr[M=m] / Pr[C=c]
= Pr[K=(m'βc)] β Pr[M=m] / 2-n
= 2-n β Pr[M=m] / 2-n
= Pr[M=m]
Random Number Generation
Two steps to generate random number in computers (handled by underlying operating system):
- Continually collect a ‘pool’ of high-entropy (i.e., unpredictable) data.
- External inputs to the processing chip: network events, hard disk access times, keystroke, etc.
- Hardware random-number generator
- When random bits needed, process this data to generate independent, uniform sequence of bits. This process may be blocked until sufficient entropy is available.
My Certificate
For more on Introduction to Classical Cryptography, please refer to the wonderful course here https://www.coursera.org/learn/cryptography
Related Quick Recap
I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai