Lagrangian Duality and KKT Condition

Table of Contents

In the case of unconstrained non-linear programs, we may determine whether the objective function is convex and then use the first order condition (FOC) to find all local minima. However in practice, most of the time, nonlinear programs are constrained. We can not simply ignore those constraints treating the program as an unconstrained, and then try to find an solution by seeing whether it is feasible (or by trying to find a closest solution that is feasible). We need other tools to deal with constrained non-linear programs.

Lagrange Relaxation

Relaxing a nonlinear program directly (by ignoring all constraints) is an overkill, since it may result in the infeasible solutions, which is something we really don’t want. However we can design a new program that encourage feasibility. For an nonlinear program like this:

max_x∈Rⁿ f(x)
s.t. g_i(x) ≤ b_i,  ∀i = 1, ..., m

We could try to put these hard constraints g_i(x) and put them into the objective function f(x), and make them soft constraints. They are called soft, because that you are allowed to violate these constraints, but any violation will result in penalty. For the constraint i, we associate a unit reward for feasibility λ_i ≥ 0 to it (or penalty for in-feasibility). If a solution x^- satisfies constant i (so b_i - g_i(x^-) ≥ 0), we reward the solution by λ_i [b_i - g_i(x^-)]. We can add this reward into the relaxed nonlinear programs.

Original NLP:

z^* = max_x∈Rⁿ { f(x) | g_i(x) ≤ b_i,  ∀i = 1, ..., m }

Lagrange relaxed NLP:

z^L(λ) = max_x∈Rⁿ f(x) + Σ^m_i=1 λ_i [b_i - g_i(x)]
where λ_i ≥ 0
m: number of constraints

The new objective function L(x|λ) = f(x) + Σ^m_i=1 λ_i [b_i - g_i(x)] is called Lagrangian, given λ_i which is called Lagrange multipliers.

Convexity

The function z^L(λ) is convex over λ ∈ [0, ∞)ⁿ. This is amazing, isn’t it? Your primal nonlinear program can be non-convex, and even you might have no idea how to solve it due to its non-convexity. However Lagrange relaxation will give you a dual program that is always convex. In most practical applications, a Lagrange dual program is solved by numerical algorithms.

Lagrange Weak Duality

Recall in linear programming, if you have a primal linear program but no way to solve it directly, then you could try to solve its dual program, which is going to provide you some information:

Any feasible dual solution gives us a bound to the primal linear program.
Looking for an dual optimal solution that actually gives a tight bound (the smallest upper bound, or the biggest lower bound)

Similarly, Lagrange relaxation actually provides a bound for the original NLP. For example, when we are solving a maximization problem, Lagrange relaxation is going to give us an upper bound. For the two NLPs (the original z^* and the Lagrange relaxed z^L(λ)), we have z^L(λ) ≥ z^* for all λ ≥ 0. This is called weak duality.

So, in the case of nonlinear programming with constraints, when you can not solve it directly, Lagrange relaxation can provide some useful information (upper bound for maximization or lower bound for minimization). Then it is natural to define:

min_λ≥0 z^L(λ)
= min_λ≥0 { max_x∈Rⁿ f(x) + Σ^m_i=1 λ_i [b_i - g_i(x)] }

So we can try to find the best value of λ (the best solution of the Lagrange relaxed NLP) which in turn will give us the best bound in original NLP.

Lagrange Strong Duality

Let w^* = min_λ≥0 z^L(λ) be the optimized objective value to the Lagrange dual program. w^* = z^*, if the primal NLP is a “regular” convex program.

Linear program duality is actually a special case of Lagrange duality. If you apply Lagrange duality on a linear program, its Lagrange dual program will be the same to its dual linear program.

The KKT Condition

The KKT condition is named by three scholars: Karush, Kuhn, and Tucker. It is going to be something similar for the first-order condition (FOC) for unconstrained problems. You may consider that KKT condition as the “first-order condition” for constrained optimization. For a regular NLP:

max_x∈Rⁿ f(x)
s.t. g_i(x) ≤ b_i,  ∀i = 1, ..., m

if x^- is a local max, then there exists λ ∈ R^m, such that:

g_i(x^-) ≤ b_i for all i = 1, …, m
λ ≥ 0, and ∇f(x^-) = Σ^m_i=1 λ_i ∇g_i(x^-)
λ_i [b_i - g_i(x^-)] = 0 for all i = 1, …, m
- This is complimentary slackness, where λ_i is the dual variable and b_i - g_i(x) is the slack. This means if you have a nonbinding constraint, its corresponding Lagrange multiplier must be zero, or the other way: if λ_i is positive, its corresponding constraint must be binding.

The condition “if x^- is a local max” is necessary for all NLPs, but it will be sufficient for convex programs (both objective function f(x) and feasible region g_i(x) ≤ b_i are convex).

So if we want to have x^- to be a local maximum, there are 3 conditions that must be met:

Primal feasibility: g_i(x^-) ≤ b_i for all i = 1, …, m. It means x^- must be feasible.
Dual feasibility: λ ≥ 0, and ∇f(x^-) = Σ^m_i=1 λ_i ∇g_i(x^-)
- ∇f(x^-) is the first order condition (FOC) for the Lagrangian L(x^-|λ)
Complementary slackness: λ_i [b_i - g_i(x^-)] = 0 for all i = 1, …, m

Still, KKT condition is not the silver bullet, it cannot be used to solve any problem you like. Because finding all local maxima can be time-consuming – adding one additional λ will multiply the number of conditions by 2.

My Certificate

For more on Lagrangian Duality and KKT Condition, please refer to the wonderful course here https://www.coursera.org/learn/operations-research-theory

My 130th certificate from Coursera

Related Quick Recap

Convex Analysis

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai

Lagrange Relaxation

Convexity

Lagrange Weak Duality

Lagrange Strong Duality

The KKT Condition

My Certificate

Related Quick Recap

Related Posts

My 163rd course certificate from Coursera

Kubernetes Deployment and Networking

Cloud Computing: Law Enforcement, Competition and Tax