Compared with linear programs, non-linear programs (NLPs) are much more difficult to solve. In an NLP, a local minimum is not always a global minimum. A greedy search method like gradient decent may be trapped at a local minimum. Also even the NLP has optimal solutions, these optimal solutions may exist on a boundary, but NOT on an extreme point. This is quite different from linear programs, whose optimal solutions often reside on extreme points.
Actually, no one has invented an efficient algorithm for solving general NLPs. What we can do however is to create some conditions that:
- make a local minimum always a global minimum, or
- guarantee that an optimal solution reside on an extreme point.
Either condition will help us a lot, we will need convex analysis.
Convex Sets and Functions
A set F β Rn
is said to be convex if for all Ξ» β [0, 1]
such that Ξ» x1 + (1-Ξ») x2 β F
for all x1, x2 β F
. For example, in a 2 dimension setting, the convex combination (using Ξ»
) of vector x1
and vecotr x2
can be treated as a segment. So for the set F
to be a convex set, any segmentation must also be in F
.
A function f: Rn β R
is said to be convex, if its domain is convex F β Rn
and this condition is met:
f( Ξ» x1 + (1-Ξ») x2 ) β€ Ξ» f(x1) + (1-Ξ») f(x2)
for all Ξ» β [0, 1] and x1, x2 β F
For example, in a 2 dimension setting again, as long as the function f
looks like having an upward curvature, whenever you take two points on the function f(x1)
and f(x2)
, and combine them to get a line segment (using Ξ»), then the line segment should lie above functional values between x1
and x2
.
A similar concept is called concave. For a convex domain F β Rn
, a function f: Rn β R
is concave over F
if -f
is convex.
It won’t be hard to understand that linear functions is a special case of non-linear functions:
- Feasible region (intersection of several half spaces) of a linear program is convex.
- Linear functions is both convex and concave.
Global Optimality
Let’s take a look at how convex functions may help us. First convex functions guarantee global optimality. For a convex (concave) function f
over a convex domain F
, a local minimum (maximum) is a global minimum (maximum). Note, this actually requires two conditions:
- a convex function, and
- a convex set as the function’s domain.
When there is only 1 condition is met, you can not get global optimality.
Extreme Points
Now we have known that minimizing a convex function over a convex set (feasible region), a local minimum is guaranteed to be global. What happens when minimizing a concave function? Certainly the minimum is on the boundary of the feasible region.
Formally speaking, for any concave function that has a global minimum over a convex feasible region, there exists a global minimum that is an extreme point.
Convex Programs
Suppose we have a linear program like this:
minxβRn { f(x) | gi(x) β€ bi βi = 1,...,m }
# feasible region is the intersection
# of several half spaces gi(x)
If the feasible region Fi = {x β Rn | gi(x) β€ bi βi = 1, ..., m}
is convex, and f(x)
and gi(x)
are all convex, then a local minimum is global minimum. The non-linear programs like this, is called convex programs. If the f(x)
is concave, and objective is to maximize it, it is also called a convex program. There are plenty of efficient algorithms to solve convex programs. This is much better since people still can not efficiently solve general non-linear programs.
Unlike solving a problem using algorithms (where the solutions are usually numerical), analytically solving a problem means to express the solution as a function of problem parameters symbolically.
Twice Differentiable Functions
In general for any function, we could use the definition of convex to show the function’s convexity. However, this is usually difficult and tedious. When the function is twice differentiable, i.e. the second-order derivative exists, there are some useful properties.
Single-Variate
For a twice differentiable single-variate function f: R β R
over an interval (a, b)
:
f
is convex over(a, b)
if and only iff''(x) β₯ 0
for allx β (a, b)
x-
is a local minimum over(a, b)
only iff'(x-) = 0
- If
f
is convex over(a, b)
,x*
is a global minimum over(a, b)
, if and only iff'(x*) = 0
Note that the two boundary points a and b may need special considerations so that’s why here we only talk about an open interval.
The condition f'(x) = 0
is called the first order condition (FOC):
- For all functions, FOC is necessary for a local minimum.
- For convex function, FOC is also sufficient for a global minimum (very useful).
Multi-Variate
Unlike single-variate functions, for multi-variate functions, second-order derivative β2f(x) / βx2i β₯ 0
is necessary but not sufficient to claim that f(x)
is convex in all directions, it only implies f(x)
is convex in the directions of axis xi
, we need other methods to test whether the function is convex in all directions.
Most of the time, functions we encounter in practice, we have this property: for a twice differentiable multivariate function f: Rn β R
, if its second order derivatives are all continuous, then
β2f(x) / βxiβxj = β2f(x) / βxjβxi
for all i = 1...n and j = 1...n
Recall the definition of gradient βf(x)
and hessian β2f(x)
of a multi-variate function f(x)
:
βf(x) = β‘ βf(x) / βx1 β€
β’ βf(x) / βx2 β₯
β’ . . . β₯
β£ βf(x) / βxn β¦
β2f(x) = β‘ β2f(x) / βx1βx1 β2f(x) / βx1βx2 ... β€
β’ β2f(x) / βx2βx1 ... β₯
β£ ... β2f(x) / βxnβxn β¦
So most of the time, Hessian is symmetric.
For a twice differentiable single-variate function f: R β R
over a domain F
:
f
is convex overF
ifβ2f(x)
is positive semi-definite for allx β F
x-
is an interior local minimum only ifβf'(x-) = 0
- If
f
is convex,x*
is a global minimum if and only ifβf'(x*) = 0
Positive Semi-Definite
Positive semi-definite Hessians in Rn
are generalizations of non-negative second-order derivatives in R
. A symmetric matrix A is positive semi-definite if xT Ax β₯ 0
for all x β Rn
.
It is usually not easy to directly tell if a matrix is positive semi-definite, however there is a useful proposition. For a symmetric matrix, following statements are equivalent:
- A is positive semi-definite
- A’s eigenvalues are all non-negative
- A’s principal minors are all non-negative
- A’s level-k principle minors is the determinant of a k-by-k sub-matrix whose diagonal is part of A’s diagonal
- A sufficient condition is that A’s leading (top-left) principal minors are all positive. In many cases we would look at leading principle minors first because if all of them are all strictly positive, then all your principal minors will be positive (then they are of course non-negative). If all your leading principle minors are non-negative, then this is not enough, you need to have all your leading principle minors to be positive.
So the three things are equivalent, which means if you are able to show that A’s eigenvalues are all non-negative or if A’s principal minors are all non-negative, then you’ll have shown that this matrix is positive semi-definite.
My Certificate
For more on Convex Analysis, please refer to the wonderful course here https://www.coursera.org/learn/operations-research-theory
Related Quick Recap
I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai