Compared with linear programs, non-linear programs (NLPs) are much more difficult to solve. In an NLP, a local minimum is not always a global minimum. A greedy search method like gradient decent may be trapped at a local minimum. Also even the NLP has optimal solutions, these optimal solutions may exist on a boundary, but NOT on an extreme point. This is quite different from linear programs, whose optimal solutions often reside on extreme points.

Actually, no one has invented an efficient algorithm for solving general NLPs. What we can do however is to create some conditions that:

- make a local minimum always a global minimum, or
- guarantee that an optimal solution reside on an extreme point.

Either condition will help us a lot, we will need convex analysis.

## Convex Sets and Functions

A set `F ā R`

is said to be convex if for all ^{n}`Ī» ā [0, 1]`

such that `Ī» x`

for all _{1} + (1-Ī») x_{2} ā F`x`

. For example, in a 2 dimension setting, the convex combination (using _{1}, x_{2} ā F`Ī»`

) of vector `x`

and vecotr _{1}`x`

can be treated as a segment. So for the set _{2}`F`

to be a convex set, any segmentation must also be in `F`

.

A function `f: R`

is said to be convex, if its domain is convex ^{n} ā R`F ā R`

and this condition is met:^{n}

`f( Ī» x`_{1} + (1-Ī») x_{2} ) ā¤ Ī» f(x_{1}) + (1-Ī») f(x_{2})
for all Ī» ā [0, 1] and x_{1}, x_{2} ā F

For example, in a 2 dimension setting again, as long as the function `f`

looks like having an * upward curvature*, whenever you take two points on the function

`f(x`_{1})

and `f(x`_{2})

, and combine them to get a line segment (using Ī»), then the line segment should lie above functional values between `x`_{1}

and `x`_{2}

.A similar concept is called ** concave**. For a convex domain

`F ā R`^{n}

, a function `f: R`^{n} ā R

is **over**

*concave*`F`

if `-f`

is convex.It won’t be hard to understand that linear functions is a special case of non-linear functions:

- Feasible region (intersection of several half spaces) of a linear program is convex.
- Linear functions is both convex and concave.

### Global Optimality

Let’s take a look at how convex functions may help us. First convex functions guarantee global optimality. For a convex (concave) function `f`

over a convex domain `F`

, a local minimum (maximum) is a global minimum (maximum). Note, this actually requires two conditions:

- a convex function, and
- a convex set as the function’s domain.

When there is only 1 condition is met, you can not get global optimality.

### Extreme Points

Now we have known that minimizing a ** convex** function over a convex set (feasible region), a local minimum is guaranteed to be global. What happens when minimizing a

**function? Certainly the minimum is on the boundary of the feasible region.**

*concave*Formally speaking, for any ** concave** function that has a global minimum over a convex feasible region, there exists a global minimum that is an extreme point.

## Convex Programs

Suppose we have a linear program like this:

`min`_{xāRn} { f(x) | g_{i}(x) ā¤ b_{i} āi = 1,...,m }
# feasible region is the intersection
# of several half spaces g_{i}(x)

If the feasible region `F`

is convex, and _{i} = {x ā R^{n} | g_{i}(x) ā¤ b_{i} āi = 1, ..., m}`f(x)`

and `g`

are all convex, then a local minimum is global minimum. The non-linear programs like this, is called _{i}(x)** convex programs**. If the

`f(x)`

is **, and objective is to**

*concave***it, it is also called a convex program. There are plenty of efficient algorithms to solve convex programs. This is much better since people still can not efficiently solve general non-linear programs.**

*maximize*Unlike solving a problem using algorithms (where the solutions are usually numerical), analytically solving a problem means to express the solution as a function of problem parameters symbolically.

## Twice Differentiable Functions

In general for any function, we could use the definition of convex to show the function’s convexity. However, this is usually difficult and tedious. When the function is twice differentiable, i.e. the second-order derivative exists, there are some useful properties.

### Single-Variate

For a twice differentiable single-variate function `f: R ā R`

over an interval `(a, b)`

:

`f`

is convex over`(a, b)`

if and only if`f`

for all^{''}(x) ā„ 0`x ā (a, b)`

`x`

is a^{-}minimum over*local*`(a, b)`

only if`f`

^{'}(x^{-}) = 0- If
`f`

is convex over`(a, b)`

,`x`

is a^{*}minimum over*global*`(a, b)`

, if and only if`f`

^{'}(x^{*}) = 0

Note that the two boundary points a and b may need special considerations so that’s why here we only talk about an open interval.

The condition `f`

is called the ^{'}(x) = 0** first order condition** (FOC):

- For all functions, FOC is
for a local minimum.*necessary* - For convex function, FOC is also
for a global minimum (very useful).*sufficient*

### Multi-Variate

Unlike single-variate functions, for multi-variate functions, second-order derivative `ā`

is ^{2}f(x) / āx^{2}_{i} ā„ 0** necessary** but not

**to claim that**

*sufficient*`f(x)`

is convex in all directions, it only implies `f(x)`

is convex in the directions of axis `x`_{i}

, we need other methods to test whether the function is convex in all directions.Most of the time, functions we encounter in practice, we have this property: for a twice differentiable multivariate function `f: R`

, if its second order derivatives are all continuous, then^{n} ā R

`ā`^{2}f(x) / āx_{i}āx_{j} = ā^{2}f(x) / āx_{j}āx_{i}
for all i = 1...n and j = 1...n

Recall the definition of gradient `āf(x)`

and hessian `ā`

of a multi-variate function ^{2}f(x)`f(x)`

:

`āf(x) = ā” āf(x) / āx`_{1} ā¤
ā¢ āf(x) / āx_{2} ā„
ā¢ . . . ā„
ā£ āf(x) / āx_{n} ā¦
ā^{2}f(x) = ā” ā^{2}f(x) / āx_{1}āx_{1} ā2f(x) / āx_{1}āx_{2} ... ā¤
ā¢ ā2f(x) / āx_{2}āx_{1} ... ā„
ā£ ... ā^{2}f(x) / āx_{n}āx_{n} ā¦

So most of the time, Hessian is symmetric.

For a twice differentiable single-variate function `f: R ā R`

over a domain `F`

:

`f`

is convex over`F`

if`ā`

is positive semi-definite for all^{2}f(x)`x ā F`

`x`

is an interior^{-}minimum only if*local*`āf`

^{'}(x^{-}) = 0- If
`f`

is convex,`x`

is a^{*}minimum if and only if*global*`āf`

^{'}(x^{*}) = 0

#### Positive Semi-Definite

Positive semi-definite Hessians in `R`

are generalizations of non-negative second-order derivatives in ^{n}`R`

. A ** symmetric** matrix A is positive semi-definite if

`x`^{T} Ax ā„ 0

for all `x ā R`^{n}

.It is usually not easy to directly tell if a matrix is positive semi-definite, however there is a useful proposition. For a ** symmetric** matrix, following statements are equivalent:

- A is positive semi-definite
- A’s eigenvalues are all non-negative
- A’s principal minors are all non-negative
- A’s level-k principle minors is the determinant of a k-by-k sub-matrix whose diagonal is part of A’s diagonal
- A sufficient condition is that A’s leading (top-left) principal minors are all positive. In many cases we would look at leading principle minors first because if all of them are all strictly
, then all your principal minors will be positive (then they are of course non-negative). If all your leading principle minors are*positive*, then this is not enough, you need to have all your leading principle minors to be positive.*non-negative*

So the three things are equivalent, which means if you are able to show that A’s eigenvalues are all non-negative or if A’s principal minors are all non-negative, then you’ll have shown that this matrix is positive semi-definite.

## My Certificate

For more on ** Convex Analysis**, please refer to the wonderful course here https://www.coursera.org/learn/operations-research-theory

## Related Quick Recap

*I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai*

All of your support will be used for maintenance of this site and more great content. I am humbled and grateful for your generosity. Thank you!

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!

Tweet