In the case of unconstrained non-linear programs, we may determine whether the objective function is convex and then use the first order condition (FOC) to find all local minima. However in practice, most of the time, nonlinear programs are constrained. We can not simply ignore those constraints treating the program as an unconstrained, and then try to find an solution by seeing whether it is feasible (or by trying to find a closest solution that is feasible). We need other tools to deal with constrained non-linear programs.

Lagrange Relaxation

Relaxing a nonlinear program directly (by ignoring all constraints) is an overkill, since it may result in the infeasible solutions, which is something we really don’t want. However we can design a new program that encourage feasibility. For an nonlinear program like this:

maxx∈Rn f(x)
s.t. gi(x) ≤ bi,  ∀i = 1, ..., m

We could try to put these hard constraints gi(x) and put them into the objective function f(x), and make them soft constraints. They are called soft, because that you are allowed to violate these constraints, but any violation will result in penalty. For the constraint i, we associate a unit reward for feasibility λi ≥ 0 to it (or penalty for in-feasibility). If a solution x- satisfies constant i (so bi - gi(x-) ≥ 0), we reward the solution by λi [bi - gi(x-)]. We can add this reward into the relaxed nonlinear programs.

Original NLP:

z* = maxx∈Rn { f(x) | gi(x) ≤ bi,  ∀i = 1, ..., m }

Lagrange relaxed NLP:

zL(λ) = maxx∈Rn f(x) + Σmi=1 λi [bi - gi(x)]
where λi ≥ 0
m: number of constraints

The new objective function L(x|λ) = f(x) + Σmi=1 λi [bi - gi(x)] is called Lagrangian, given λi which is called Lagrange multipliers.


The function zL(λ) is convex over λ ∈ [0, ∞)n. This is amazing, isn’t it? Your primal nonlinear program can be non-convex, and even you might have no idea how to solve it due to its non-convexity. However Lagrange relaxation will give you a dual program that is always convex. In most practical applications, a Lagrange dual program is solved by numerical algorithms.

Lagrange Weak Duality

Recall in linear programming, if you have a primal linear program but no way to solve it directly, then you could try to solve its dual program, which is going to provide you some information:

  1. Any feasible dual solution gives us a bound to the primal linear program.
  2. Looking for an dual optimal solution that actually gives a tight bound (the smallest upper bound, or the biggest lower bound)

Similarly, Lagrange relaxation actually provides a bound for the original NLP. For example, when we are solving a maximization problem, Lagrange relaxation is going to give us an upper bound. For the two NLPs (the original z* and the Lagrange relaxed zL(λ)), we have zL(λ) ≥ z* for all λ ≥ 0. This is called weak duality.

So, in the case of nonlinear programming with constraints, when you can not solve it directly, Lagrange relaxation can provide some useful information (upper bound for maximization or lower bound for minimization). Then it is natural to define:

minλ≥0 zL(λ)
= minλ≥0 { maxx∈Rn f(x) + Σmi=1 λi [bi - gi(x)] }

So we can try to find the best value of λ (the best solution of the Lagrange relaxed NLP) which in turn will give us the best bound in original NLP.

Lagrange Strong Duality

Let w* = minλ≥0 zL(λ) be the optimized objective value to the Lagrange dual program. w* = z*, if the primal NLP is a “regular” convex program.

Linear program duality is actually a special case of Lagrange duality. If you apply Lagrange duality on a linear program, its Lagrange dual program will be the same to its dual linear program.

The KKT Condition

The KKT condition is named by three scholars: Karush, Kuhn, and Tucker. It is going to be something similar for the first-order condition (FOC) for unconstrained problems. You may consider that KKT condition as the “first-order condition” for constrained optimization. For a regular NLP:

maxx∈Rn f(x)
s.t. gi(x) ≤ bi,  ∀i = 1, ..., m

if x- is a local max, then there exists λ ∈ Rm, such that:

  1. gi(x-) ≤ bi for all i = 1, …, m
  2. λ ≥ 0, and ∇f(x-) = Σmi=1 λi ∇gi(x-)
  3. λi [bi - gi(x-)] = 0 for all i = 1, …, m
    • This is complimentary slackness, where λi is the dual variable and bi - gi(x) is the slack. This means if you have a nonbinding constraint, its corresponding Lagrange multiplier must be zero, or the other way: if λi is positive, its corresponding constraint must be binding.

The condition “if x- is a local max” is necessary for all NLPs, but it will be sufficient for convex programs (both objective function f(x) and feasible region gi(x) ≤ bi are convex).

So if we want to have x- to be a local maximum, there are 3 conditions that must be met:

  1. Primal feasibility: gi(x-) ≤ bi for all i = 1, …, m. It means x- must be feasible.
  2. Dual feasibility: λ ≥ 0, and ∇f(x-) = Σmi=1 λi ∇gi(x-)
    • ∇f(x-) is the first order condition (FOC) for the Lagrangian L(x-|λ)
  3. Complementary slackness: λi [bi - gi(x-)] = 0 for all i = 1, …, m

Still, KKT condition is not the silver bullet, it cannot be used to solve any problem you like. Because finding all local maxima can be time-consuming – adding one additional λ will multiply the number of conditions by 2.

My Certificate

For more on Lagrangian Duality and KKT Condition, please refer to the wonderful course here

Related Quick Recap

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at

All of your support will be used for maintenance of this site and more great content. I am humbled and grateful for your generosity. Thank you!

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!