Compared with linear programs, non-linear programs (NLPs) are much more difficult to solve. In an NLP, a local minimum is not always a global minimum. A greedy search method like gradient decent may be trapped at a local minimum. Also even the NLP has optimal solutions, these optimal solutions may exist on a boundary, but NOT on an extreme point. This is quite different from linear programs, whose optimal solutions often reside on extreme points.



Actually, no one has invented an efficient algorithm for solving general NLPs. What we can do however is to create some conditions that:

  1. make a local minimum always a global minimum, or
  2. guarantee that an optimal solution reside on an extreme point.

Either condition will help us a lot, we will need convex analysis.

Convex Sets and Functions

A set F โŠ† Rn is said to be convex if for all ฮป โˆˆ [0, 1] such that ฮป x1 + (1-ฮป) x2 โˆˆ F for all x1, x2 โˆˆ F. For example, in a 2 dimension setting, the convex combination (using ฮป) of vector x1 and vecotr x2 can be treated as a segment. So for the set F to be a convex set, any segmentation must also be in F.

A function f: Rn โ†’ R is said to be convex, if its domain is convex F โŠ† Rn and this condition is met:

f( ฮป x1 + (1-ฮป) x2 ) โ‰ค ฮป f(x1) + (1-ฮป) f(x2)

for all ฮป โˆˆ [0, 1] and x1, x2 โˆˆ F

For example, in a 2 dimension setting again, as long as the function f looks like having an upward curvature, whenever you take two points on the function f(x1) and f(x2), and combine them to get a line segment (using ฮป), then the line segment should lie above functional values between x1 and x2.

A similar concept is called concave. For a convex domain F โŠ† Rn, a function f: Rn โ†’ R is concave over F if -f is convex.

It won’t be hard to understand that linear functions is a special case of non-linear functions:

  • Feasible region (intersection of several half spaces) of a linear program is convex.
  • Linear functions is both convex and concave.


Global Optimality

Let’s take a look at how convex functions may help us. First convex functions guarantee global optimality. For a convex (concave) function f over a convex domain F, a local minimum (maximum) is a global minimum (maximum). Note, this actually requires two conditions:

  1. a convex function, and
  2. a convex set as the function’s domain.

When there is only 1 condition is met, you can not get global optimality.

Extreme Points

Now we have known that minimizing a convex function over a convex set (feasible region), a local minimum is guaranteed to be global. What happens when minimizing a concave function? Certainly the minimum is on the boundary of the feasible region.

Formally speaking, for any concave function that has a global minimum over a convex feasible region, there exists a global minimum that is an extreme point.

Convex Programs

Suppose we have a linear program like this:

minxโˆˆRn { f(x) | gi(x) โ‰ค bi  โˆ€i = 1,...,m }

# feasible region is the intersection
# of several half spaces gi(x)

If the feasible region Fi = {x โˆˆ Rn | gi(x) โ‰ค bi โˆ€i = 1, ..., m} is convex, and f(x) and gi(x) are all convex, then a local minimum is global minimum. The non-linear programs like this, is called convex programs. If the f(x) is concave, and objective is to maximize it, it is also called a convex program. There are plenty of efficient algorithms to solve convex programs. This is much better since people still can not efficiently solve general non-linear programs.

Unlike solving a problem using algorithms (where the solutions are usually numerical), analytically solving a problem means to express the solution as a function of problem parameters symbolically.

Twice Differentiable Functions

In general for any function, we could use the definition of convex to show the function’s convexity. However, this is usually difficult and tedious. When the function is twice differentiable, i.e. the second-order derivative exists, there are some useful properties.

Single-Variate

For a twice differentiable single-variate function f: R โ†’ R over an interval (a, b):

  • f is convex over (a, b) if and only if f''(x) โ‰ฅ 0 for all x โˆˆ (a, b)
  • x- is a local minimum over (a, b) only if f'(x-) = 0
  • If f is convex over (a, b), x* is a global minimum over (a, b), if and only if f'(x*) = 0

Note that the two boundary points a and b may need special considerations so that’s why here we only talk about an open interval.

The condition f'(x) = 0 is called the first order condition (FOC):

  1. For all functions, FOC is necessary for a local minimum.
  2. For convex function, FOC is also sufficient for a global minimum (very useful).


Multi-Variate

Unlike single-variate functions, for multi-variate functions, second-order derivative โˆ‚2f(x) / โˆ‚x2i โ‰ฅ 0 is necessary but not sufficient to claim that f(x) is convex in all directions, it only implies f(x) is convex in the directions of axis xi, we need other methods to test whether the function is convex in all directions.

Most of the time, functions we encounter in practice, we have this property: for a twice differentiable multivariate function f: Rn โ†’ R, if its second order derivatives are all continuous, then

โˆ‚2f(x) / โˆ‚xiโˆ‚xj = โˆ‚2f(x) / โˆ‚xjโˆ‚xi

for all i = 1...n and j = 1...n

Recall the definition of gradient โˆ‡f(x) and hessian โˆ‡2f(x) of a multi-variate function f(x):

โˆ‡f(x) = โŽก โˆ‚f(x) / โˆ‚x1 โŽค
        โŽข โˆ‚f(x) / โˆ‚x2 โŽฅ
        โŽข    . . .    โŽฅ
        โŽฃ โˆ‚f(x) / โˆ‚xn โŽฆ

โˆ‡2f(x) = โŽก โˆ‚2f(x) / โˆ‚x1โˆ‚x1  โˆ‚2f(x) / โˆ‚x1โˆ‚x2  ...            โŽค
         โŽข โˆ‚2f(x) / โˆ‚x2โˆ‚x1  ...                            โŽฅ
         โŽฃ ...                             โˆ‚2f(x) / โˆ‚xnโˆ‚xn โŽฆ

So most of the time, Hessian is symmetric.

For a twice differentiable single-variate function f: R โ†’ R over a domain F:

  • f is convex over F if โˆ‡2f(x) is positive semi-definite for all x โˆˆ F
  • x- is an interior local minimum only if โˆ‡f'(x-) = 0
  • If f is convex, x* is a global minimum if and only if โˆ‡f'(x*) = 0

Positive Semi-Definite

Positive semi-definite Hessians in Rn are generalizations of non-negative second-order derivatives in R. A symmetric matrix A is positive semi-definite if xT Ax โ‰ฅ 0 for all x โˆˆ Rn.

It is usually not easy to directly tell if a matrix is positive semi-definite, however there is a useful proposition. For a symmetric matrix, following statements are equivalent:

  • A is positive semi-definite
  • A’s eigenvalues are all non-negative
  • A’s principal minors are all non-negative
    • A’s level-k principle minors is the determinant of a k-by-k sub-matrix whose diagonal is part of A’s diagonal
    • A sufficient condition is that A’s leading (top-left) principal minors are all positive. In many cases we would look at leading principle minors first because if all of them are all strictly positive, then all your principal minors will be positive (then they are of course non-negative). If all your leading principle minors are non-negative, then this is not enough, you need to have all your leading principle minors to be positive.

So the three things are equivalent, which means if you are able to show that A’s eigenvalues are all non-negative or if A’s principal minors are all non-negative, then you’ll have shown that this matrix is positive semi-definite.



My Certificate

For more on Convex Analysis, please refer to the wonderful course here https://www.coursera.org/learn/operations-research-theory


Related Quick Recap


I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai

All of your support will be used for maintenance of this site and more great content. I am humbled and grateful for your generosity. Thank you!




Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!