Invented by George Dantzig in 1947, Linear Programming is one of the most fundamental tools in combinatorial optimization. You have two views: the geometrical view and the algebraic view. There are beautiful connection between them.

This is what a linear program looks like, which is minimizing a linear objective function and is subject to a set of inequality constraints.

min c1 x1 + ... + cn xn

such that
a11 x1 + ... + a1n xn ≤ b1
am1 x1 + ... + amn xn ≤ bm

xi ≥ 0 (1 ≤ i ≤ n)

You of course can do maximization, as long as you negate the entire objective function:

min c1 x1 + ... + cn xn
⟹ max -(c1 x1 + ... + cn xn)

If xi must be able to take negative values, you can replace it with 2 other variables x+i and x-i, but both of them meet the requirement that they shall be non-negative:

⟹ x+i - x-i

A equality constraint can be replace with two inequality constraints. However, since we are talking about linear programming, variable xi can not be integers, and constraints must be linear.

Geometrical View

λ1 v1 + ... + λn vn is a convex combination of points v1, ..., vn if

λ1 + ... + λn = 1
λi ≥ 0 (1 ≤ i ≤ n)

A set S in Rn is convex if it contains all the convex combinations of the points in S. The intersection of convex sets is a convex set. This is very useful, because a linear constraint a11 x1 + ... + a1n xn ≤ b1 represents a half space, which is also a convex set. The intersection of a set of half spaces (a bunch of constraints) is also convex, called polyhedron. If it is finite, it is called polytope. Every point in a polytope is a convex combination of its vertices.

We are obsessed with vertices, because the theorem: at least one of the points where the objective value is minimal is a vertex. We know the optimal solution is going to be at least one of the vertices. So we can solve an linear program “geometrically” by enumerating all the vertices, and select the one with the smallest objective value.

Algebraic View

Explore the vertices in ploytope to find the optimal solution seems hard and even impossible, a more intelligent way of solving the problem is to connect the geometric view to the algebraic view, using the famous Simplex algorithm.

Finding Basic Feasible Solution

First let’s back up and look at how to find solutions to linear systems:

a11 x1 + ... + a1n xn = b1
am1 x1 + ... + amn xn = bm

xi ≥ 0 (1 ≤ i ≤ n)

Usually high school will teach how to use Gaussian elimination to solve it. You basically express some of the variables x1, ..., xm (basic variables) in terms of the other ones xm+1, ..., xn (non-basic variables).

x1 = b1 + ∑ni=m+1 a1i xi
xm = bm + ∑ni=m+1 ami xi

As long as b1, ..., bm ≥ 0, you will have a basic feasible solution (BFS), where all non-basic variables xm+1, ..., xn can be zero.

BUT, recall that constraints in a linear program are all inequality for example a11 x1 + ... + a1n xn ≤ b1. This won’t be a big issue, because we can transform the inequality to equality, if we add one additional si, called slack variables:

a11 x1 + ... + a1n xn + s1 = b1
am1 x1 + ... + amn xn + sm = bm

s1, ..., sm ≥ 0

So, summarize on how to find the basic feasible solution:

  1. Re-express the constraints as equations, by adding slack variables
  2. Select m variables which will be the basic variables (m is the number of constraints)
  3. Re-express them in terms of the non-basic variable only using Gaussian elimination
  4. If b1, ..., bm ≥ 0, then we have a basic feasible solution.

You may immediately come up with an idea that is trying to generate all basic feasible solutions, and select the one with the best objective function value. However this is usually not possible in practice, because there are a huge number of solutions n! / (m! (n-m)!). The Simplex Algorithm offers a better way to find it.

The Simplex Algorithm

The Simplex algorithm is essentially a local search algorithm, it moves from one basic feasible solution to another basic feasible solution. The beautiful thing is it guaranteed to find the global optimum, because of convexity. The key is how to make move?

  1. Select a non-basic variable with a negative coefficient (called entering variable xl).
  2. Introduce the entering variable in the basis by removing an existing basic variable (called leaving variable xe).
  3. Perform Gaussian elimination. It is possible to get negative bi after doing the elimination.

To avoid negative bi, we have to choose the leaving variable carefully, we must maintain feasibility by finding the smallest ratio between the bi and the minus of the coefficient of the entering variable.

l = argmini:a_ie<0 bi/(-aie)

Then when you do the Gaussian elimination, this will help keep all bi positive, which will give you another basic feasible solution. This entire set of operation is called pivoting in linear programming.

We now are able to move from basic feasible solution, but when to stop?

  1. Take the objective function
  2. Replace all the basic variables with the non-basic variables

A basic feasible solution is optimal if its objective function, after having eliminated all basic variables, is of the form:

c0 + c1 x1 + ... + cn xn
with ci ≥ 0 (1 ≤ i ≤ n)

Overall, the Simplex algorithm can be expressed in just 4 lines of code:

while ∃ 1 ≤ i ≤ n: ci < 0 do
  choose e such that ce < 0;
  l = arg-mini:a_ie<0 bi / (-aie);
  pivot(e, l);

When Algorithm Unbounded by Below

There is a nasty situation that ce < 0 (the coefficient of the selected entering variable in objective function) but all aie > 0 (all the coefficients of the selected entering variable in the linear systems). It means you are not able to select leaving variable, since you need a negative aie.

For an entering variable, you can not select a leaving variable for it. The reason behind the scene is that the entering variable can be arbitrarily large positive value, which will make objective function arbitrarily low. It basically means that the algorithm here is not bounded by below, there is a mistake in the modeling.

When bi Becomes Zero

When a bi is zero, it will cause the corresponding variable xi is always selected as leaving variable. When you do the pivoting, you are going to stay at the same value of the objective function, without improvement. We have to find essentially another way to guarantee termination. There are a few useful ways:

Bland ruleAlways select the first entering variable with negative coefficient, in the objective function.
Pivoting ruleBreaking ties when selecting the leaving variable by using a lexicographic rule.
Perturbation methodsPerturb the basis, and then go back later on.

The First Basic Feasible Solution

We use essentially the Simplex algorithm to find the first basic feasible solution. We transform the original linear program by:

  1. Add artificial variables y1, ..., ym for each constraint
  2. Change the objective function to y1 + ... + ym
min y1 + ... + ym

such that
a11 x1 + ... + a1n xn + y1          = b1
am1 x1 + ... + amn xn          + ym = bm

xi ≥ 0 (1 ≤ i ≤ n)

The yi variables in this new linear program just give us a new basis. What we are going to do is basically minimize the sum of these yi variables, then we can optimize the objective function.

  • If the objective function is greater than zero, we know we don’t have a feasible solution.
  • If the objective function is zero, we know we have a basic feasible solution (in terms of variables other than yi), then we can do the optimization of the original linear program.

Matrix Notations

The linear program expressed in matrix is:

min c x
s.t. A x = b

The Simplex algorithm can also be expressed using matrices:

  • AB: The matrix for coefficients of basic variables.
  • xB: Column vector of basic variables.
  • AN: The matrix for coefficients for non-basic variables.
  • xN: Column vector of non-basic variables.
  • b: Column vector of right hand side values.

Putting them together, we have:

AB xB + AN xN = b
⟹ AB xB = b - AN xN
⟹ xB = A-1B b - A-1B AN xN
⟹ xB = b' - A'N xN

The solution is feasible if b' ≥ 0.

The objective function can be re-expressed as:

c x = cB xB + cN xN
= cB (A-1B b - A-1B AN xN) + cN xN
= cB A-1B b + (cN - cB A-1B AN) xN
= cB A-1B b + (cN - cB A-1B AN) xN + (cB - cB A-1B AB) xB
= cB A-1B b + (c - cB A-1B A) x

We can define cB A-1B as Π, which is called Simplex multiplier. Now objective function c x becomes:

c x = Π b + (c - Π A)x

When we have c - Π A ≥ 0, we have optimal solution.

Linear programming is often presented with a tableau, which is easier for pivoting.


Duality theory is basically looking at linear programming in two different ways:

min cT x
s.t. A x ≥ b
xj ≥ 0
max yT b
s.t. yT A ≤ c
yj ≥ 0

If the primal has an optimal solution, then the dual has an optimal solution with the same objective function value. The simplex multiplier Π = cB A-1B are a feasible solution to the dual. The dual of the dual is the primal.

For more on Discrete Optimization: Linear Programming, please refer to the wonderful course here

Related Quick Recap

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at

All of your support will be used for maintenance of this site and more great content. I am humbled and grateful for your generosity. Thank you!

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!