Discrete Optimization: Linear Programming

Table of Contents

Invented by George Dantzig in 1947, Linear Programming is one of the most fundamental tools in combinatorial optimization. You have two views: the geometrical view and the algebraic view. There are beautiful connection between them.

This is what a linear program looks like, which is minimizing a linear objective function and is subject to a set of inequality constraints.

min c₁ x₁ + ... + c_n x_n

such that
a₁₁ x₁ + ... + a_1n x_n ≤ b₁
...
a_m1 x₁ + ... + a_mn x_n ≤ b_m

x_i ≥ 0 (1 ≤ i ≤ n)

You of course can do maximization, as long as you negate the entire objective function:

min c₁ x₁ + ... + c_n x_n
⟹ max -(c₁ x₁ + ... + c_n x_n)

If x_i must be able to take negative values, you can replace it with 2 other variables x⁺_i and x^-_i, but both of them meet the requirement that they shall be non-negative:

x_i
⟹ x⁺_i - x^-_i

A equality constraint can be replace with two inequality constraints. However, since we are talking about linear programming, variable x_i can not be integers, and constraints must be linear.

Geometrical View

λ₁ v₁ + ... + λ_n v_n is a convex combination of points v₁, ..., v_n if

λ₁ + ... + λ_n = 1
λ_i ≥ 0 (1 ≤ i ≤ n)

A set S in Rⁿ is convex if it contains all the convex combinations of the points in S. The intersection of convex sets is a convex set. This is very useful, because a linear constraint a₁₁ x₁ + ... + a_1n x_n ≤ b₁ represents a half space, which is also a convex set. The intersection of a set of half spaces (a bunch of constraints) is also convex, called polyhedron. If it is finite, it is called polytope. Every point in a polytope is a convex combination of its vertices.

We are obsessed with vertices, because the theorem: at least one of the points where the objective value is minimal is a vertex. We know the optimal solution is going to be at least one of the vertices. So we can solve an linear program “geometrically” by enumerating all the vertices, and select the one with the smallest objective value.

Algebraic View

Explore the vertices in ploytope to find the optimal solution seems hard and even impossible, a more intelligent way of solving the problem is to connect the geometric view to the algebraic view, using the famous Simplex algorithm.

Finding Basic Feasible Solution

First let’s back up and look at how to find solutions to linear systems:

a₁₁ x₁ + ... + a_1n x_n = b₁
...
a_m1 x₁ + ... + a_mn x_n = b_m

x_i ≥ 0 (1 ≤ i ≤ n)

Usually high school will teach how to use Gaussian elimination to solve it. You basically express some of the variables x₁, ..., x_m (basic variables) in terms of the other ones x_m+1, ..., x_n (non-basic variables).

x₁ = b₁ + ∑ⁿ_i=m+1 a_1i x_i
...
x_m = b_m + ∑ⁿ_i=m+1 a_mi x_i

As long as b₁, ..., b_m ≥ 0, you will have a basic feasible solution (BFS), where all non-basic variables x_m+1, ..., x_n can be zero.

BUT, recall that constraints in a linear program are all inequality for example a₁₁ x₁ + ... + a_1n x_n ≤ b₁. This won’t be a big issue, because we can transform the inequality to equality, if we add one additional s_i, called slack variables:

a₁₁ x₁ + ... + a_1n x_n + s₁ = b₁
...
a_m1 x₁ + ... + a_mn x_n + s_m = b_m

s₁, ..., s_m ≥ 0

So, summarize on how to find the basic feasible solution:

Re-express the constraints as equations, by adding slack variables
Select m variables which will be the basic variables (m is the number of constraints)
Re-express them in terms of the non-basic variable only using Gaussian elimination
If b₁, ..., b_m ≥ 0, then we have a basic feasible solution.

You may immediately come up with an idea that is trying to generate all basic feasible solutions, and select the one with the best objective function value. However this is usually not possible in practice, because there are a huge number of solutions n! / (m! (n-m)!). The Simplex Algorithm offers a better way to find it.

The Simplex Algorithm

The Simplex algorithm is essentially a local search algorithm, it moves from one basic feasible solution to another basic feasible solution. The beautiful thing is it guaranteed to find the global optimum, because of convexity. The key is how to make move?

Select a non-basic variable with a negative coefficient (called entering variable x_l).
Introduce the entering variable in the basis by removing an existing basic variable (called leaving variable x_e).
Perform Gaussian elimination. It is possible to get negative b_i after doing the elimination.

To avoid negative b_i, we have to choose the leaving variable carefully, we must maintain feasibility by finding the smallest ratio between the b_i and the minus of the coefficient of the entering variable.

l = argmin_{i:a_ie<0} b_i/(-a_ie)

Then when you do the Gaussian elimination, this will help keep all b_i positive, which will give you another basic feasible solution. This entire set of operation is called pivoting in linear programming.

We now are able to move from basic feasible solution, but when to stop?

Take the objective function
Replace all the basic variables with the non-basic variables

A basic feasible solution is optimal if its objective function, after having eliminated all basic variables, is of the form:

c₀ + c₁ x₁ + ... + c_n x_n
with c_i ≥ 0 (1 ≤ i ≤ n)

Overall, the Simplex algorithm can be expressed in just 4 lines of code:

while ∃ 1 ≤ i ≤ n: c_i < 0 do
  choose e such that c_e < 0;
  l = arg-min_{i:a_ie<0} b_i / (-a_ie);
  pivot(e, l);

When Algorithm Unbounded by Below

There is a nasty situation that c_e < 0 (the coefficient of the selected entering variable in objective function) but all a_ie > 0 (all the coefficients of the selected entering variable in the linear systems). It means you are not able to select leaving variable, since you need a negative a_ie.

For an entering variable, you can not select a leaving variable for it. The reason behind the scene is that the entering variable can be arbitrarily large positive value, which will make objective function arbitrarily low. It basically means that the algorithm here is not bounded by below, there is a mistake in the modeling.

When `b_i` Becomes Zero

When a b_i is zero, it will cause the corresponding variable x_i is always selected as leaving variable. When you do the pivoting, you are going to stay at the same value of the objective function, without improvement. We have to find essentially another way to guarantee termination. There are a few useful ways:

Bland rule	Always select the first entering variable with negative coefficient, in the objective function.
Pivoting rule	Breaking ties when selecting the leaving variable by using a lexicographic rule.
Perturbation methods	Perturb the basis, and then go back later on.

The First Basic Feasible Solution

We use essentially the Simplex algorithm to find the first basic feasible solution. We transform the original linear program by:

Add artificial variables y₁, ..., y_m for each constraint
Change the objective function to y₁ + ... + y_m

min y₁ + ... + y_m

such that
a₁₁ x₁ + ... + a_1n x_n + y1          = b₁
...
a_m1 x₁ + ... + a_mn x_n          + ym = b_m

x_i ≥ 0 (1 ≤ i ≤ n)

The y_i variables in this new linear program just give us a new basis. What we are going to do is basically minimize the sum of these y_i variables, then we can optimize the objective function.

If the objective function is greater than zero, we know we don’t have a feasible solution.
If the objective function is zero, we know we have a basic feasible solution (in terms of variables other than y_i), then we can do the optimization of the original linear program.

Matrix Notations

The linear program expressed in matrix is:

min c x
s.t. A x = b

The Simplex algorithm can also be expressed using matrices:

A_B: The matrix for coefficients of basic variables.
x_B: Column vector of basic variables.
A_N: The matrix for coefficients for non-basic variables.
x_N: Column vector of non-basic variables.
b: Column vector of right hand side values.

Putting them together, we have:

A_B x_B + A_N x_N = b
⟹ A_B x_B = b - A_N x_N
⟹ x_B = A^-1_B b - A^-1_B A_N x_N
⟹ x_B = b' - A'_N x_N

The solution is feasible if b' ≥ 0.

The objective function can be re-expressed as:

c x = c_B x_B + c_N x_N
= c_B (A^-1_B b - A^-1_B A_N x_N) + c_N x_N
= c_B A^-1_B b + (c_N - c_B A^-1_B A_N) x_N
= c_B A^-1_B b + (c_N - c_B A^-1_B A_N) x_N + (c_B - c_B A^-1_B A_B) x_B
= c_B A^-1_B b + (c - c_B A^-1_B A) x

We can define c_B A^-1_B as Π, which is called Simplex multiplier. Now objective function c x becomes:

c x = Π b + (c - Π A)x

When we have c - Π A ≥ 0, we have optimal solution.

Linear programming is often presented with a tableau, which is easier for pivoting.

Duality

Duality theory is basically looking at linear programming in two different ways:

Primal	Dual
`min c^T x s.t. A x ≥ b x_j ≥ 0`	`max y^T b s.t. y^T A ≤ c y_j ≥ 0`

If the primal has an optimal solution, then the dual has an optimal solution with the same objective function value. The simplex multiplier Π = c_B A^-1_B are a feasible solution to the dual. The dual of the dual is the primal.

For more on Discrete Optimization: Linear Programming, please refer to the wonderful course here https://www.coursera.org/learn/discrete-optimization

Related Quick Recap

Discrete Optimization: Local Search

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai

Geometrical View

Algebraic View

Finding Basic Feasible Solution

The Simplex Algorithm

When Algorithm Unbounded by Below

When bi Becomes Zero

The First Basic Feasible Solution

Matrix Notations

Duality

Related Quick Recap

Related Posts

My 163rd course certificate from Coursera

Kubernetes Deployment and Networking

Cloud Computing: Law Enforcement, Competition and Tax

When `b_i` Becomes Zero