Convex Analysis - KZHU.ai 🚀

Table of Contents

Compared with linear programs, non-linear programs (NLPs) are much more difficult to solve. In an NLP, a local minimum is not always a global minimum. A greedy search method like gradient decent may be trapped at a local minimum. Also even the NLP has optimal solutions, these optimal solutions may exist on a boundary, but NOT on an extreme point. This is quite different from linear programs, whose optimal solutions often reside on extreme points.

Actually, no one has invented an efficient algorithm for solving general NLPs. What we can do however is to create some conditions that:

make a local minimum always a global minimum, or
guarantee that an optimal solution reside on an extreme point.

Either condition will help us a lot, we will need convex analysis.

Convex Sets and Functions

A set F ⊆ Rⁿ is said to be convex if for all λ ∈ [0, 1] such that λ x₁ + (1-λ) x₂ ∈ F for all x₁, x₂ ∈ F. For example, in a 2 dimension setting, the convex combination (using λ) of vector x₁ and vecotr x₂ can be treated as a segment. So for the set F to be a convex set, any segmentation must also be in F.

A function f: Rⁿ → R is said to be convex, if its domain is convex F ⊆ Rⁿ and this condition is met:

f( λ x₁ + (1-λ) x₂ ) ≤ λ f(x₁) + (1-λ) f(x₂)

for all λ ∈ [0, 1] and x₁, x₂ ∈ F

For example, in a 2 dimension setting again, as long as the function f looks like having an upward curvature, whenever you take two points on the function f(x₁) and f(x₂), and combine them to get a line segment (using λ), then the line segment should lie above functional values between x₁ and x₂.

A similar concept is called concave. For a convex domain F ⊆ Rⁿ, a function f: Rⁿ → R is concave over F if -f is convex.

It won’t be hard to understand that linear functions is a special case of non-linear functions:

Feasible region (intersection of several half spaces) of a linear program is convex.
Linear functions is both convex and concave.

Global Optimality

Let’s take a look at how convex functions may help us. First convex functions guarantee global optimality. For a convex (concave) function f over a convex domain F, a local minimum (maximum) is a global minimum (maximum). Note, this actually requires two conditions:

a convex function, and
a convex set as the function’s domain.

When there is only 1 condition is met, you can not get global optimality.

Extreme Points

Now we have known that minimizing a convex function over a convex set (feasible region), a local minimum is guaranteed to be global. What happens when minimizing a concave function? Certainly the minimum is on the boundary of the feasible region.

Formally speaking, for any concave function that has a global minimum over a convex feasible region, there exists a global minimum that is an extreme point.

Convex Programs

Suppose we have a linear program like this:

min_x∈Rⁿ { f(x) | g_i(x) ≤ b_i  ∀i = 1,...,m }

# feasible region is the intersection
# of several half spaces g_i(x)

If the feasible region F_i = {x ∈ Rⁿ | g_i(x) ≤ b_i ∀i = 1, ..., m} is convex, and f(x) and g_i(x) are all convex, then a local minimum is global minimum. The non-linear programs like this, is called convex programs. If the f(x) is concave, and objective is to maximize it, it is also called a convex program. There are plenty of efficient algorithms to solve convex programs. This is much better since people still can not efficiently solve general non-linear programs.

Unlike solving a problem using algorithms (where the solutions are usually numerical), analytically solving a problem means to express the solution as a function of problem parameters symbolically.

Twice Differentiable Functions

In general for any function, we could use the definition of convex to show the function’s convexity. However, this is usually difficult and tedious. When the function is twice differentiable, i.e. the second-order derivative exists, there are some useful properties.

Single-Variate

For a twice differentiable single-variate function f: R → R over an interval (a, b):

f is convex over (a, b) if and only if f^''(x) ≥ 0 for all x ∈ (a, b)
x^- is a local minimum over (a, b) only if f^'(x^-) = 0
If f is convex over (a, b), x^* is a global minimum over (a, b), if and only if f^'(x^*) = 0

Note that the two boundary points a and b may need special considerations so that’s why here we only talk about an open interval.

The condition f^'(x) = 0 is called the first order condition (FOC):

For all functions, FOC is necessary for a local minimum.
For convex function, FOC is also sufficient for a global minimum (very useful).

Multi-Variate

Unlike single-variate functions, for multi-variate functions, second-order derivative ∂²f(x) / ∂x²_i ≥ 0 is necessary but not sufficient to claim that f(x) is convex in all directions, it only implies f(x) is convex in the directions of axis x_i, we need other methods to test whether the function is convex in all directions.

Most of the time, functions we encounter in practice, we have this property: for a twice differentiable multivariate function f: Rⁿ → R, if its second order derivatives are all continuous, then

∂²f(x) / ∂x_i∂x_j = ∂²f(x) / ∂x_j∂x_i

for all i = 1...n and j = 1...n

Recall the definition of gradient ∇f(x) and hessian ∇²f(x) of a multi-variate function f(x):

∇f(x) = ⎡ ∂f(x) / ∂x₁ ⎤
        ⎢ ∂f(x) / ∂x₂ ⎥
        ⎢    . . .    ⎥
        ⎣ ∂f(x) / ∂x_n ⎦

∇²f(x) = ⎡ ∂²f(x) / ∂x₁∂x₁  ∂2f(x) / ∂x₁∂x₂  ...            ⎤
         ⎢ ∂2f(x) / ∂x₂∂x₁  ...                            ⎥
         ⎣ ...                             ∂²f(x) / ∂x_n∂x_n ⎦

So most of the time, Hessian is symmetric.

For a twice differentiable single-variate function f: R → R over a domain F:

f is convex over F if ∇²f(x) is positive semi-definite for all x ∈ F
x^- is an interior local minimum only if ∇f^'(x^-) = 0
If f is convex, x^* is a global minimum if and only if ∇f^'(x^*) = 0

Positive Semi-Definite

Positive semi-definite Hessians in Rⁿ are generalizations of non-negative second-order derivatives in R. A symmetric matrix A is positive semi-definite if x^T Ax ≥ 0 for all x ∈ Rⁿ.

It is usually not easy to directly tell if a matrix is positive semi-definite, however there is a useful proposition. For a symmetric matrix, following statements are equivalent:

A is positive semi-definite
A’s eigenvalues are all non-negative
A’s principal minors are all non-negative
- A’s level-k principle minors is the determinant of a k-by-k sub-matrix whose diagonal is part of A’s diagonal
- A sufficient condition is that A’s leading (top-left) principal minors are all positive. In many cases we would look at leading principle minors first because if all of them are all strictly positive, then all your principal minors will be positive (then they are of course non-negative). If all your leading principle minors are non-negative, then this is not enough, you need to have all your leading principle minors to be positive.

So the three things are equivalent, which means if you are able to show that A’s eigenvalues are all non-negative or if A’s principal minors are all non-negative, then you’ll have shown that this matrix is positive semi-definite.

My Certificate

For more on Convex Analysis, please refer to the wonderful course here https://www.coursera.org/learn/operations-research-theory

My 130th certificate from Coursera

Related Quick Recap

Minimum Cost Network Flow

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai