Factoring, RSA, Discrete-Logarithm and Diffie-Hellman

Table of Contents

Problems like addition, multiplication, modular arithmetic, exponentiation can be solved in polynomial time, so they are seen as easy problem. Factoring a random number seems not hard, because 50% of the time, random number is even; 1/3 of the time, random number is divisible by 3. But the problem of factoring some special numbers can not be done in polynomial time, particularly breaking a product of two large, equal-length primes into its constituent factors is conjectured to be hardest.

However, the factoring problem is not directly useful for cryptography. Another problem called the RSA problem, which is not identical but related to the factoring problem, is what we are interested in.

RSA Problem

Suppose N = p * q, where p and q are two distinct, equal-length, odd primes. Also suppose Z^*_N is the set of invertible elements under multiplication modulo N and it is actually a group (in the Group Theory). The order of

Actually computing φ(N) given N is equivalent to factoring N. So we have an interesting asymmetry situation here, given a number N:

If someone knows the factorization of N, s/he can compute the group order φ(N).
If someone does not know the factorization of N, s/he can still perform operations (multiply elements and reduce the modulo N) in the group Z^*_N, however they are not able to compute the order φ(N) of the group.

Recall Fermat’s theorem, which states: “Let G be a finite group of order m, for any integer e, define f_e(g) = g^e. If gcd(e, m) = 1, then f_e is a permutation.” Here we have group Z^*_N of order φ(N). Let’s fix e with gcd(e, φ(N)) = 1, then Fermat’s theorem tells us that the exponentiation to the e-th power is a permutation on Z^*_N.

For example: The elements of Z^*₁₅ are {1, 2, 4, 7, 8, 11, 13, 14}. Define e = 3, f₃(x) = [x³ mod 15]. Then f3 is a permutation on Z^*₁₅.

1³ = 1 mod 15
2³ = 8 mod 15
4³ = 64 = 4 mod 15
7³ = 343 = 13 mod 15
8³ = 512 = 2 mod 15
11³ = 1331 = 11 mod 15
13³ = 2197 = 7 mod 15
14³ = 2744 = 14 mod 15

Fermat’s theorem further states: “if d = e^-1 mod m, then f_d is also a permutation and is the inverse of f_e.” That is raising to the d-th power undoes the operation of raising to the e-th power. Another way to express this is “x^d is the e-th root of x modulo N“, because if we take x to the d and raise it to the e-th power, we get x again. It is a unique e-th root, because of the fact that raising to the e-th power is a permutation.

For example: p = 3, q = 11, so N = 33, φ(N) = 20. Now suppose e = 7, gcd(7, 20) = 1, then f₇ is a permutation of the group. Since we have d = 3 such that d * e = 1 mod φ(N) ⟹ 3 * 7 = 1 mod 20. So the 7-th root of 2 modulo 33, is equal to the 3-th power of 2 modulo 33. i.e. 2³ = 8.

The asymmetry situation revisited:

If p, q are known:
1. N and φ(N) can be computed.
2. d = e^-1 mod φ(N) can be computed.
3. possible to compute e-th root modulo N.
If p, q are not known, but only N is known:
1. computing φ(N) is as hard as factoring N.
2. computing d is as hard ad factoring N.

This is very useful for public key cryptography.

RSA Assumption

If we don’t have d, then there is no other way to compute the e-th root of elements modulo N. Unfortunately, we can not show that it is equivalent to factoring N, we therefore are going to state this as an independent assumption called the RSA Assumption.

The RSA Assumption informally states: “Given a modulus N and an integer e (relatively prime to φ(N), it is hard to compute the e-th root of a uniform element y ∈ Z^*_N.

In order to define the assumption formally, we need to be careful and specify how N, e are generated. Let GenRSA be an algorithm that takes input 1ⁿ (where n is the security parameter, i.e. the number of bits), is going to output three elements: N, e and d, where N = p * q, and e * d = 1 mod φ(N).

Then we can define an experiment , say RSA-inverse, relative to GenRSA, and any attacker / adversary algorithm A:

Experiment RSA-inverse_A,GenRSA(n):
 - Compute(N, e, d) ← GenRSA(1ⁿ)
 - Choose uniform y ∈ Z^*_N
 - x ← Ask A(N, e, y) to compute e-th root of y mod N
 - Experiment evaluates to 1 if x^e = y mod N

The RSA problem is hard relative to GenRSA, if it holds that, for all probabilistic polynomial time algorithms A, the probability, with which A can succeed in the RSA-inverse experiment, is negligible.

Pr[ RSA-inverse_A,GenRSA(n) = 1 ] < negligible(n)

Implementing `GenRSA`

One way to implement GenRSA:

Generate uniform n-bit primes p and q.
Get modulus N = p * q.
Choose arbitrary e with gcd(e, φ(N)) = 1.
Compute d = [e^-1 mod φ(N)].
Output (N, e, d).

The way e is chosen does not really seem to affect hardness of the RSA problem. In practice, it is very common to set e = 3 or e = 2¹⁶ + 1, to allow efficient exponentiation.

RSA and Factoring

If factoring moduli output by GenRSA is easy (that is, if given some modulus N output by GenRSA, and you can easily factor it and get p and q), then the RSA problem must be easy relative to GenRSA. This means if factoring is easy, then RSA problem is easy. So factoring being hard is a necessary condition for the RSA problem to possibly be hard.

However, formally, the hardness of the RSA problem is not known to be implied by the hardness of the factoring problem. Now, as far as our current algorithms lead us to believe, the RSA problem is equally as hard as factoring. So even though it’s true that mathematically speaking and formally speaking the hardness of factoring does not imply hardness of RSA, as far as cryptographic assumptions go, it’s reasonable to assume that the RSA problem is hard as it is currently to assume that factoring is hard.

Cyclic Groups

Cyclic groups give rise to another class of assumptions on which public key cryptography can be based. Cyclic groups are defined as follow: let G be a finite group of order m (written using multiplicative notation), g ∈ G. Consider the set of all powers of g, that is {g⁰ = 1, g¹ = g, g², …}. From the Fermat’s theorem we know g^m = 1 = g⁰, so the set will start to cycle, it has at most m elements, because the group G itself has m elements. If the new set has m elements, then it is all of G, we say g is a generator of G. If the group G has any element that is the generator, then we will say that the group G is cyclic.

Z_N (written using additive notation) is cyclic for any value of N, because the element 1 is always a generator. For example additive group Z₈, besides 1, the number 3 is also a generator.

0 * 3 = 0 mod 8
1 * 3 = 3 mod 8
2 * 3 = 6 mod 8
3 * 3 = 9 = 1 mod 8
4 * 3 = 12 = 4 mod 8
5 * 3 = 15 = 7 mod 8
6 * 3 = 18 = 2 mod 8
7 * 3 = 21 = 5 mod 8

Theorem 1: any group of prime order is cyclic, and every non-identity element is a generator.

This theorem is very useful because you don’t have to care about details of the underlying group, if the order is prime, then the group is cyclic.

Theorem 2: If p is prime, then the multiplicative group Z^*_p is cyclic (of order p-1).

In this theorem, the p is prime, but the order of the group Z^*_p is not prime. For example Z^*₁₁ = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, the group is cyclic. 2 is a generator.

2⁰ = 1 mod 11
2¹ = 2 mod 11
2² = 4 mod 11
2³ = 8 mod 11
2⁴ = 16 = 5 mod 11
2⁵ = 32 = 10 mod 11
2⁶ = 64 = 9 mod 11
2⁷ = 128 = 7 mod 11
2⁸ = 256 = 3 mod 11
2⁹ = 512 = 6 mod 11
2¹⁰ = 1024 = 1 mod 11 = 2⁰

Uniform Sampling

Given a group G of order m and generator g, it is easy to sample a uniform element h ∈ G:

Choose uniform x ∈ {0, …, m-1}
Set h := g^x.

The reason that this gives you a uniform element of the group G is because there is a one-to-one correspondence between the exponents {0,…,m-1} and the elements of the group. This is very useful when we want to sample an element uniformly from the group.

Discrete-Logrithm Problem and Assumption

Fix a cyclic group G of order m, and generator g, we know that {g⁰, g¹, g², …} is exactly the group G itself. What this means is that for every h in the group, there’s a unique exponent x ∈ Z_m (the range from 0 to m-1), such that g^x = h. We’ll define the discrete logarithm of h with respect to g to be this value x in the group G.

log_gh = x

For example, in the group Z^*₁₁, N = 11, m = 10. Suppose g = 2, then log₂10 will be 5.

The discrete-logarithm problem in G roughly speaking is: given a generator g and an element h, compute the discrete logarithm of h with respect to g, i.e. log_gh.

The discrete-logarithm assumption in G is: solving the discrete logarithm problem in G is hard (can not be done in polynomial time).

Let’s define a group generation algorithm ? (script G), which on input 1ⁿ (where n is the security parameter), outputs a cyclic group G, its order q (with |q| = n, the bit length of q is n), and a generator g.

For any such group generation algorithm ? and any attacker / adversary A, we can define the following experiment:

Experiment Discrete-logarithm_A,?(n):
 - Compute (G, q, g) ← ?(1ⁿ)
 - Choose uniform h ∈ G
 - x ← Ask A(G, q, g, h) to compute the discrete logarithm of h
 - Experiment evaluates to 1 if g^x = h

The discrete-logarithm problem is hard relative to the group generation algorithm ? if for all probabilistic polynomial time algorithms A, the probability, with which A can successfully compute this discrete logarithm of the uniform element h, is negligible.

Pr[ Discrete-logarithm_A,?(n) = 1 ] < negligible(n)

Now it is very useful to define problems that are based on, but not known to be equivalent to the discrete logarithm problem.

Diffie-Hellman Problems

As usual, fix a group G with a generator g. h₁ and h₂ are elements of the group G. We are going to define DH_g(h₁, h₂) = DH(g^x, g^y) = g^xy, where x is the discrete logarithm of h₁ with respect to g, and y is the discrete logarithm of h₂ with respect to g.

For example, in the group Z^*₁₁, N = 11, m = 10. Suppose g = 2, then DH₂(5, 8) = DH(2^(log₂5) , 2^(log₂8)) = 2^(3*4) = 4096 = 4 mod 11.

The Computational Diffie-Hellman (CDH) problem asks to compute DH_g(h₁, h₂), which could be done by computing the discrete logarithms of h₁ and h₂, but it is hard, and we can not efficiently solve the Computational Diffie-Hellman problem.

The Decisional Diffie-Hellman (DDH) problem asks for something stronger. Given a g, h₁ and h₂, it is even harder to distinguish the correct answer DH_g(h₁, h₂) from a completely uniform and independent element of G.

Decisional Diffie-Hellman Problem

To formally define the Decisional Diffie-Hellman problem, let ? (script G) be a group-generation algorithm. We say that the DDH problem is hard relative to the group generation algorithm ?, and for all probabilistic polynomial time algorithm Attacker / Adversary A, the difference in these two probabilities is negligible:

| Pr[ A(G, q, g, g^x, g^y, g^z) = 1 ] -
  Pr[ A(G, q, g, g^x, g^y, g^xy) = 1 ] |
< negligible(n)

where, x, y, and z are all chosen uniformly at random. For the first probability, we give A (along with G, q, and g) three uncorrelated elements g^x, g^y, and g^z; and for the second probability, we give A the g^x, g^y and correct solution g^xy. So it is hard for any algorithm A to distinguish the correct solution g^xy from a completely uniform element g^z.

If the discrete-logarithm problem is easy, so is the CDH problem. If the CDH problem is easy, so is the DDH problem. This means the DDH assumption is a stronger assumption than CDH assumption, which is in turn stronger than the discrete logarithm assumption.

Group Selection

For cryptographic applications, we have to discuss what kind of groups are there, for which all these problems are conjectured to be hard. In general:

It is best to use groups of prime order, because prime-order groups are always cyclic.
The discrete logarithm is “easier” if the order of the group has small prime factors.

It turns out that in cryptography, there are really only 2 common classes of groups that people use.

Common Class 1: Subgroup of Z^*_p

Prime-order subgroup of Z^*_p, where p is prime. Remember Z^*_p itself is cyclic, but because we want to work in a prime order group, we are going to restrict ourselves to a subset of the Z^*_p, which has prime order. This subgroup can be found as follows:

Find p and q such that p = t * q + 1.
Take the subgroup of t-th powers, i.e. G = { [x^t mod p] | x ∈ Z^*_p}.

For example: p = 11, Z^*₁₁ = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, let q = 5, t = 2, G = {1, 3, 4, 5, 9}, which is derived as below. G is a closed group, the order of this group is (p-1)/t = q. G is cyclic since p is prime.

[1² mod 11] = 1
[2² mod 11] = 4
[3² mod 11] = 9
[4² mod 11] = 5
[5² mod 11] = 3
[6² mod 11] = 3
[7² mod 11] = 5
[8² mod 11] = 9
[9² mod 11] = 4
[10² mod 11] = 1

So if we work in the subgroup of t-th powers modulo p, then we do indeed get a cyclic group of prime order in which the discrete-logarithm problem and Diffie-Hellman problems are conjectured to be hard.

Common Class 2: Subgroup of Elliptic Curve Group

Another class of groups in which the discrete-logarithm problem and Diffie-Hellman problems are conjectured to be hard, are the prime order subgroups of an elliptic curve group. Elliptic curve cryptography means cryptography based on the discrete logarithm problem or the Diffie-Hellman problems in such groups.

For our purposes, we will usually describe algorithms in “abstract” groups, so we can ignore details of the underlying group, and can instantiate with any appropriate group.

Choosing Parameters

We have discussed two classes of cryptographic assumptions:

Factoring-based	Factoring, RSA assumption.
Discrete-logarithm based	Discrete-logarithm, Computational Diffie-Hellman assumption, Decisional Diffie-Hellman assumption.

All these problems are believed to be hard (no polynomial time algorithms). In practice, in order to set concrete value properly for some parameters, we need to understand how hard these problems are.

Recall what we have learned in symmetric key cryptography:

Block cipher with n-bit key ≈ security against 2ⁿ-time attacks.
Hash function with n-bit output ≈ security against 2^n/2-time attacks.

Algorithms for Factoring

There exist factoring algorithms that run in much less than 2^||N|| time, where ||N|| is the length of the modulus that we are factoring. The best known algorithm (asymptotically) is the General Number Field Sieve with a heuristic running time of approximately 2^O(||N||^1/3 log||N||^2/3).

So if you have a 128-bit modulus N, that is trivial to factor; whereas 128-bit key in symmetry encryption is sufficient enough to give good security in practice.

Algorithms for Discrete-logarithm

There are two classes of algorithms:

Ones that work for arbitrary “generic” groups
Ones that target specific groups

For the first class, the best “generic” algorithm run in time 2^n/2 in a group of order 2ⁿ (where n is the bit length of the order). This is known to be optimal.

For the second class, the best known algorithm (asymptotically) for subgroups of Z^*_p is Number Field Sieve with running time heuristically about 2^O(||p||^1/3 log||p||^2/3).

For (appropriately chosen) elliptic-curve groups, nothing better than generic algorithms is known! In some sense the elliptic curve groups are currently optimal with respect to what we could hope for as far as security is concerned.

Recommendation by NIST

If you want to achieve 112-bit security equivalent to a 112-bit symmetric key cipher:

Factoring	2048-bit modulus
Discrete-logarithm, order-q subgroup of Z^*_p	\|\|q\|\| = 224, \|\|p\|\| = 2048
Discrete-logarithm, elliptic-curve of order q	\|\|q\|\| = 224

Roughly speaking, using elliptic curve groups will give you the same security at better efficiency.

In any case, these parameters are much larger than what you have for symmetric key algorithms. Perhaps with the exception of hash functions. And this explains in part why they are much less efficient than the symmetric key cryptography.

My Certificate

For more on Factoring, RSA, Discrete-Logarithm and Diffie-Hellman, please refer to the wonderful course here https://www.coursera.org/learn/cryptography

My #90 course certificate from Coursera

Related Quick Recap

Number Theory and Group Theory

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!