Message Authentication Codes & Authenticated Encryption

Table of Contents

Besides the secrecy of communication, we also need to be concerned with integrity, which ensures that a message received by the receiver, originated from the intended sender, and was not modified, even if an attacker controls the channel.

The standard error-correction techniques are not enough, because here we are not concerned with random errors, instead we are concerned here with an attacker who can decide how to modify what’s being sent.

Message Integrity

The right tool for ensuring integrity (in the private key setting) is called Message Authentication Codes, which work as follows:

Sender and receiver have shared a key k in advance.
When sender sends a message m, it uses key k to compute a tag t = Mac_k(m), where Mac is an algorithm.
The sender transmits both of m and t to the receiver across the channel.
In the case there is an attacker, who can modify both of m and t to m' and t'.
Receiver can use shared key k to verify the correctness of the tag t' on the message m', i.e. Verify_k(m', t'), which returns 1 if the message did originate from the sender.

Note in this process, there is no concern of secrecy at all. Secrecy and Integrity are orthogonal concerns, it is possible to have either one without the other. NOTE; in general just using encryption does not provide any integrity whatsoever. This is related to the issue of malleability. In particular, the one time pad (perfect secrecy) is not providing any sort of integrity protection at all.

Message Authentication Codes

A message authentication code is defined by three probabilistic polynomial time algorithms.

Gen	takes as input `1ⁿ` where n is security pamameter; outputs `k`; assuming `\|k\| ≥ n`.
Mac	takes as input key `k`, and message `m ∈ {0,1}*`; outputs tag `t := Mac_k(m)`. Most commonly used Macs are deterministic.
Vrfy	takes as input key `k`, message `m`, and tag `t`; output 1 (accept) or 0 (reject). `Vrfy_k(m, Mac_k(m)) = 1`

We neither want to make assumptions about what the sender might authenticate, nor what forgeries are “meaningful”. We want to have a strong enough definition, such that a scheme satisfying that definition can be used anywhere. There is only one standard of security for message authentication codes.

threat model	“Adaptive chosen-message attack”: the attacker is able to induce the sender to authenticate arbitrary messages of the attacker’s choice.
security goal	“Existential unforgeability”: attacker should be unable to forge a valid tag on any message not authenticated by the sender.

Formal definition

For a fixed message authentication code Π, and attacker A; we define a randomized experiment Forge_A,Π(n), where n is security parameter. The experiment works as follows:

Generate a key k ← Gen(1ⁿ)
The attacker A interacts with an MAC oracle Mac_k(∙) specifying a set of messages M, and get back the corresponding tags.
The attacker A outputs a pair of message and tag (m, t).
The attacker A succeeds and experiments evaluates to 1, if Vrfy_k(m, t) = 1 and m ∈ M.

Π is secure if for all probabilistic polynomial-time attackers A, there is a negligible function ε such that:

Pr[ Forge_A,Π(n) = 1 ] ≤ ε(n)

Note that replay attacks are not prevented. No stateless mechanism can prevent them. Replay attacks can be a significant real-world concern. For example: an attacker replays a request of transferring $100 for 10 times, in order to achieve the goal of transferring $1,000. The replay attacks need to be protected against from a higher level of protocol.

A Fixed-Length MAC

Intuitively, we need a keyed function Mac such that, for a uniform key k, given Mac_k(m₁), Mac_k(m₂), ..., it should be infeasible for an attacker to predict the value of Mac_k(m) for any m ∉ {m₁, m₂, ...}. Particularly, a pseudorandom function would imply the properties we want. A pseudorandom function is indistinguishable from a random (uniform) function, which has exact property that “knowing value of f(m₁), f(m₂), ...” does not mean we will learn about the value of f of any additional point. i.e. f(m).

Let F be denote a length-preserving pseudorandom function (say, a block cipher), we could construct the message authenticate code Π:

Gen	choose a uniform key k for F.
Mac_k(m)	output F_k(m)
Vrfy_k(m, t)	output 1 if and only if F_k(m) = t

The theorem: Π is a secure message authentication code. As usual, this could be proven by reduction. Imagine an attacker A who is implemented as a subroutine in a distinguisher D. The D is interacting with either a random function or a pseudorandom function F with a uniform key k.

Every time A requests a tag on some message m_i, the distinguisher D will forward m_i to the MAC oracle and get back a value of t_i, which will be given to the A.
Attacker A outputs its forgery: a pair of message and tag (m, t) and m ∉ {m_i}.
Distinguisher D forwards m to the oracle MAC, and get back a value t'.
D outputs 1 if t = t'.

The pseudorandom case: when D interacts with F_k for a uniform k, the view of the attacker A is identical to its view in the real message authentication code experiment.

Pr[D^F_k ouputs 1] = Pr[ Forge_A,Π(n) = 1]

The real uniform case: when D interacts with f, then seeing f(m₁), …, f(m_i) does not help predict f(m) for any m ∉ {m₁, …, m_i}.

Pr[D^f outputs 1] = 2^-n

Since F is a pseudorandom function, we have:

|Pr[D^F_k outputs 1] - Pr[D^f ouputs 1]| < neg(n)
⟹ Pr[D^F_k outputs 1] < Pr[D^f ouputs 1] + neg(n)
⟹ Pr[ Forge_A,Π(n) = 1] ≤ 2^-n + neg(n)

Drawbacks

In practice, pseudorandom functions (block ciphers) have short, fixed-length block size. So the previous construction is limited to authenticating short, fixed-length messages. This is quite a limitation.

CBC-MAC for Arbitrary-Length Messages

The basic CBC-MAC construction works in the following way:

Given a message, which is split into m₁ ... m_l; where each m_i represents a block of the message.
Calculate the tag for the first block, t₁ = F_k(m₁).
Use tags as chaining value, XOR it with the next message: t_i = F_k(m_i ⊕ t_i-1).
The last tag t_l is based on m₁ through m_l.

CBC-MAC	CBC-mode encryption
deterministic	random, there is initialization vector (IV)
only outputs final tag value	outputs value for each block

Both of “deterministic” and “outputs final tag” are essential for the security of CBC-MAC.

Security of Basic CBC-MAC

Basic CBC-MAC can handle messages of arbitrary, but fixed, length. If F is a length-preserving pseudorandom function, then for any fixed ℓ, basic CBC-MAC is a secure MAC for messages of length n∙ℓ, where n is the length of each block. That is, the sender and receiver must agree on the length parameter ℓ in advance, otherwise basic CBC-MAC is not secure.

Extensions of CBC-MAC

There are extensions of CBC-MAC to handle variable length messages. One of the simplest is to prepend the message length before applying basic CBC-MAC. Now to authenticate a message consists of block m₁ through m_l:

Encode the value ℓ as a single block (representing ℓ using n bits, padding to the left with zeros if necessary) as m₀.
Calculate the tag for the first block, t₀ = F_k(m₀)
Run the basic CBC-MAC: t_i = F_k(m_i ⊕ t_i-1)

This will give us a secure MAC which can handle messages of any length ℓ (the number of blocks). This can also be adapted to handle messages whose length is not a multiple of the block length.

Hash Functions

Hash functions are important cryptographic primitives. Very roughly speaking, a cryptographic hash function is a function that maps an arbitrary length input to a short, fixed-length output, called digest. You can consider two classes of hash functions: keyed or unkeyed. To meaningfully define security properties, keyed hash functions are needed. But in practice, hash functions are usually unkeyed.

Let H be a hash function, H: {0,1}^* → {0,1}ⁿ, a collision is a pair of distinct inputs x, x' such that H(x) = H(x'). H is collision-resistant if it is infeasible to find a collision in H.

Generic Hash-Function Collision Attacks

For a hash function H: {0,1}^* → {0,1}ⁿ, the output length of the hash function is only n bits, there are 2ⁿ possible outputs of the hash function. If we compute all H(x₁), …, H(x_2ⁿ+1) (1 input more than 2ⁿ), we are guaranteed to find a collision (some pair of inputs, that are distinct, but have the same hash value).

But we can do much better than above using Birthday attack, which evaluate the hash function on 2ⁿ/2 distinct inputs. It could be that all of the 2ⁿ/2 hash values just computed turn out to be distinct. But there is a high probability that we could find two inputs that map to the same output.

Similar to the problem of randomly throwing balls into bins, in the question of hash function, the number of bins is the number of possible outputs of the hash function ( 2ⁿ ), the balls denote computation of our hash function (the output value that we obtain when evaluating the function on some input).

We can mode the hash function as a random function (actually any deviation from random will only make it easier to find collisions). So we assume that the hash function chooses a random output for every distinct input you feed it.

Theorem: When the number of balls is O(N^1/2), the probability of a collision is 50%.

The Birthday Paradox tells us that if there is 23 people in a room, then there is at least 50% probability that some pair of two people share a common birthday.

In the case of hash function, this means if we evaluate the hash function about 2^n/2 (square root of the size of the range of the hash function) times, then we expect with probability 50% to find some pair of inputs that hash to the same output. This makes it significant in terms of the time required to find a hash function collision.

In terms of security, if we want security against attackers running in time about 2ⁿ, then we need the output length of our hash function to be about 2n bits long. Importantly, this is twice the length that we need for block cipher keys. So if we want a block cipher to defend against brute force attacks running in time 2ⁿ, we can use a block cipher keys that are exactly n bits long. On the other hand for a hash function, we need the output length of the hash function to be twice that, i.e. 2n. Only then, do we have a hope of getting security against 2ⁿ time attackers. For example: a block cipher with 128-bit key provides equivalent security to a hash function with 256-bit output.

Hash Functions in Practice

There are a few hash functions that are very commonly encountered.

MD5	128-bit output length, not secure any more. Collisions found in 2004. Don’t use.
SHA-1	160-bit output length. Theoretical analysis indicates some weaknesses. Migrate to SHA-2.
SHA-2	In the same design family as SHA-1. 256-bit or 512-bit output lengths. No known significant weaknesses.
SHA-3 Keccak	Very different design than SHA family. Support 224, 256, 384, 512-bit outputs.

Hash-and-MAC for Arbitrary-Length Messages

Let us assume between two parties, there are 2 channels: one is secure but can only handle short messages, the other one is insecure

Sender	1. Take the message M and hash it down to a shorter fixed length digest h. `h = H(M)`. 2. Send the digest h to the receiver using the *secure channel*. 3. Transmit the longer message M over the insecure channel.
Receiver	1. Check whether the hash of the longer message M is equal to the digest h. `h =? H(M)`

This method is secure, the adversary can not temper with the digest h since it is transmitted in a secure channel. The adversary can try to temper with the longer message M, but the attacker will not be able to convince the receiver to accept a different message M’, unless the attacker can find an M’ that also hashes to the same digest h. But if the hash function is collision resistant, then the attacker won’t be able to do this.

In the case there is no secure channel, we can achieve the same by using MAC for digest. Assume two parties share some key k in advance, and there is no secure channel between 2 parties. Sender has some longer message M that it wants to transmit reliably to the receiver.

Sender	1. Take the message M and hash it down to a shorter fixed length digest h. `h = H(M)`. 2. Compute a message authentication code on the digest h using the shared key k. `t = Mac_k(h)`. 3. Send the digest h and tag t to the receiver over the *insecure channel* (as a way of sending h over what’s effectively a reliable secure channel). 4. Transmit the longer message M over the insecure channel.
Receiver	1. Check whether the hash of the longer message M is equal to the digest h. `h =? H(M)` 2. Verify the tag t. `Vrfy_k(h, t) =? 1`

You could realize quickly that there is no need for the sender to actually transmit the digest h. It suffices for the sender to transmit the longer message M and the tag t. It is also obvious that if the MAC is secure for fixed-length messages, and H is collision-resistant, then the Hash-and-MAC construction is a secure MAC for arbitrary-length message.

Instantiation in Practice

The simple combination of hash function and block cipher-based MAC is not a good idea:

Potential block length mismatch, e.g. SHA-1 has 160-bit output length, but AES has 128-bit block length.
Still need to implement two crypto primitives: a hash function and a block cipher.

To address these issues, researchers have designed what’s called HMAC.

HMAC for Arbitrary-Length Messages

HMAC is another method of extending message authentication code for arbitrary-length messages based on certain types of hash functions. It can be viewed as following the Hash-and-MAC paradigm, with part of the hash function being used as a block cipher.

HMAC can be defined using MD5, SHA-1 and SHA-2, but not SHA-3. Note MD5 alone is not secure any more, nevertheless it remains secure when used to construct HMAC. But still it is prudent to avoid MD5.

Authenticated Encryption

Authenticated encryption is an encryption scheme that achieves the joint goals of secrecy and integrity. Sender and receiver want to be able to communicate while ensuring both the secrecy and the integrity of their communication. This is achievable by simply combining any private key encryption scheme with a message authentication code.

The definition of security for a MAC has nothing to do with hiding information about the message. In fact, if the MAC is deterministic (as most constructions like CBC-MAC, HMAC are), then it will leak whether or not the same message is being encrypted twice, that is the tags will be the same, if the corresponding messages are identical. This will be problematic.

Fashion
Encrypt and authenticate	Not secure, tag t might leak information about m.
Encrypt then authenticate	The combination is CPA-secure and a secure MAC, if: 1. the underlying encryption scheme is CPA-secure, 2. the underlying message authentication code is secure.

Actually authenticated encryption combination achieve something stronger: given a bunch of ciphertexts and tags (c₁, t₁), (c₂, t₂), … corresponding to chosen plaintexts m₁, m₂, … It is infeasible for an attacker to generate any new valid ciphertext (c, t) that receiver would accept as valid, the receiver will, with overwhelming probability, output an error message.

It turns out that the property of authenticated encryption, in combination with CPA-security, implies that the encryption scheme achieves Chosen Ciphertext Attacks security (CCA-security).

Secure Communication Sessions

Consider two parties who wish to communicate securely (with both secrecy and integrity) over the course of a session. Session means period of time over which parties are willing to maintain state.

It is nature to use Authenticated Encryption scheme to communicate, if an attacker tries to inject or modifies any ciphertext, that will be detected as invalid. However there are still several attacks that can still be carried out.

Replay attack	The attacker keeps sending (replays) a single ciphertext to receiver. A priori, the receiver has no idea whether this is intended.
Reordering attack	The attacker, who might have control over scheduling in the network, change the order of ciphertexts.
Reflection attack	The attacker redirects the ciphertext from the sender to the sender as if it was from the receiver.

These all cause mismatches between the view of the sender and receiver in their session. So using authenticated encryption alone is not enough to get the security and integrity guarantee that we want in a secure session.

Actually these attacks can be simply prevented using counters and identities. The sender, when sending a message, includes both a counter value and its identity.

Enc_k(id|message|counter)

The receiver, upon decrypting, will check the identity and also verifies that the sequence number matches what she expected.

One can define a notion of secure sessions and prove that this construction (authenticated encryption + counters and identities) realizes this notion of secure sessions.

My Certificate

For more on Message Authentication Codes & Authenticated Encryption, please refer to the wonderful course here https://www.coursera.org/learn/cryptography

My #90 course certificate from Coursera

Related Quick Recap

Private Key Encryption

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai