Discrete Optimization: Mixed Integer Program

Table of Contents

Mixed Integer Program is a big field. Essentially a mixed integer program is a linear program with m constraints but some of the n variables are integers.

min   c₁x₁ + ... + c_nx_n
s.t.  a₁₁x₁ + ... + a_1nx_n ≤ b₁
      ...
      a_m1x₁ + ... + a_mnx_n ≤ b_m

      x_i ≥ 0
      x_i is integer (i ∈ I)

The difference between having integer variables or not having integer variable is big. That’s the gap between being polynomial (P) and NP-complete. Having integers is what makes problems really complex. The knapsack problem is the most simple example of mixed integer program:

max  ∑_i∈I v_ix_i
s.t. ∑_i∈I w_ix_i ≤ K
     x_i ∈ {0,1} (i ∈ I)

Warehouse Location Problem

Another well-known MIP problem is the warehouse location problem. Say we have a bunch of customers and we also have a few locations to open warehouses for the customers. The goal is actually to choose which warehouses to open so that you can serve all the customers. Every one of the customer will be allocated to exactly one warehouse. What you want to minimize is the sum of:

the cost of maintaining these warehouses
the cost of transportation from the warehouses to the customers

In order to find a Mixed Integer Program model for the warehouse location problem, like finding a linear program, we need to decide what are the decision variables, the constraints, and the objective functions.

Decision variables	`x_w = 1` if warehouse w is open `y_wc = 1` if warehouse w serves customer c
Constraints	`y_wc ≤ x_w` a warehouse can serve a customer only if it is open `∑_w∈W y_wc = 1` a customer must be served by exactly one warehouse
Objective function	`∑_w∈W c_wx_w + ∑_w∈W,c∈C t_wc y_wc` `c_w` the fixed cost if warehouse w is open `t_wc` the transportation cost to serve customer c from warehouse w

Put everything together, this is the Mixed Integer Program model for the warehouse location problem:

min ∑_w∈W c_wx_w + ∑_w∈W,c∈C t_wc y_wc

s.t. y_wc ≤ x_w     (w∈W, c∈C)  # constraint per customer
     ∑_w∈W y_wc = 1  (c∈C)
     x_w ∈ {0, 1}  (w∈W)
     y_wc ∈ {0, 1}  (w∈W, c∈C)

MIP models like to have decision which are actually yes or no, so the integer variables are typically 0/1 variables. Once you have the 0/1 variables, expressing linear constraints is typically very simple.

Branch and Bound

How to solve Mixed Integer Program models? One of the basic techniques to solve them is to use branch and bound. In this technique, the two basic steps are:

Branching	Split the problem into sub-problems
Bounding	Find an optimistic relaxation

What is nice about Mixed Integer Program modes is that they have natural relaxation: Linear relaxation. The key idea is that you remove the integrality constraints on variables and you have a relaxation, which can be used as the bounding part of the Branch and Bound. So basically the branch and bound for MIP models is going to be solving the linear relaxation at a particular node.

If the linear relaxation is worse than the best solution found so far, prune this node. It basically means that the associated sub-problem to that node is proved sub-optimal. Get rid of the node.
If the linear relaxation is integral, we have found a feasible solution. Update the best feasible solution if appropriate.

If none of the two items above happens, then you need to do branching. How? Actually the simplest scheme is to use linear relaxation to drive branching:

Find an integer value x that has a fractional value f in the linear relaxation
Create two sub-problems x ≤ floor(f) and x ≥ ceiling(f).
Add the two sub-problems to the set of constraints.
Repeat the branch and bound on the sub-problems.

In the case of knapsack problem, the linear relaxation is basically replacing the x_i variables, allowing them to take any values between 0 and 1.

max  ∑_i∈I v_ix_i
s.t. ∑_i∈I w_ix_i ≤ K
     0 ≤ x_i ≤1 (i ∈ I)

The linear relaxation is the same as the greedy approach, i.e. order the item by the most value per weight and select them in that order, until you get a fractional value f and you take that fractional value to get a relaxation. Here we basically branch on the last item x_i that gives you this fractional value f. What is the implication of “taking the item x_i” and “not taking the item x_i” in the solution?

Taking the item `x_i`	Some of more valuable items before `x_i` will be removed.
Not taking the item `x`	You are still going to take more valuable items that were included before `x_i`.

When is branch and bound going to be effective? It will be really effective when the value of the linear relaxation is very, very close to the optimal value of the Mixed Integer Program problem. A good Mixed Integer Program model is the one with a good linear relaxation, such that you can prune the search space much better.

When there are lots of variable, which fractional variable should we branch on? One of the good heuristic is to branch on the most fractional value, i.e. close to 0.5.

Basic Modeling Techniques

There are a lot of the techniques for solving Mixed Integer Programs, that are based on refinements of branch and bound, i.e. pruning sub-optimal solutions. However, a necessary condition is their linear relaxation has to be precise and effective.

Stronger Linear Relaxation

The warehouse location problem revisited. Recall previously we have a constraint saying that “if a warehouse is closed, no customers can be assigned to it.”

y_wc ≤ x_w     (w∈W, c∈C)       # model 1

Of course this constraint is for each of customers. Can we model it in a different way? Definitely. We can replace all these similar constraints for customers y_wc with just one constraint for the warehouse. We are basically going to sum all the customers which can be assigned to a warehouse.

∑_c∈C y_wc ≤ |C|x_w     (w∈W)    # model 2

Now the question is which model has stronger linear relaxation? Model 1. Because any solution to the model 1, will also be a solution to model 2, but not vice-versa.

Big-M Transformation

The non-equal constraint x ≠ y is not a linear constraint. We can re-express it as a disjunction of two linear constraints: x ≤ y + 1 ∨ x ≥ y + 1. However Mixed Integer Problems do not support disjunction. Big-M Transformation is used extensively to remove disjunctions.

By introducing a 0/1 variable b and a big number M, the non-equal constraint can be re-written as a conjunction of two linear inequalities:

x ≤ y - 1 + b M         # trivial when b = 1
x ≥ y + 1 - (1-b) M     # trivial when b = 0

So in a sense this Big-M transformation is basically going to make sure that we can transform this disjunction into a conjunction. And essentially x being different from y, which can be represented as x < y (when b = 0), or x > y (when b = 1).

Now, what will linear relaxation do on these constraints? If b = 0.5, then M / 2 is still a big number. We will have these new constraints, which essentially will always be satisfied:

x ≤ a big number
x ≥ a negative big number.

In this particular case, what happens is that it’s like you removed these constraints from the linear relaxation. The objective is basically going to ignore these two constraints, assuming it can always minimize, which is the worst that can happen. The relaxation is probably going to be poor.

0/1 Variables

Mixed Integer Program loves binary values, another technique is to binarize all variables. For an integer variable, it has to take integer values and you know the range of that variable. What you do is:

Replace that integer value by plenty of binary variables
Use the binary variable to tell “is that variable taking that value?”

For instance, originally the integer variable x ∈ {0, 1, 2, 3} is replace by 4 binary variables:

b_x=0 b_x=1 b_x=2 b_x=3
b_x=i = 1 if variable x = i

As soon as I have the 0/1 variables, it’s really easy to express the constrains in Mixed Integer Program models.

Cutting Planes

The purpose of cutting planes is to add new constraints to the model that are going to improve the linear relaxation and to make the model easier to solve. There are many ways of generating these cuts.

Gomory’s Cuts

Gomory’s cuts is essentially to look at the Simplex tableau and derive the cut from the Simplex tableau. Recall that the simplex method algorithm is dealing with basic feasible solution, which is to express some of the basic variables x₁ ... x_m in terms of the non-basic variables x_m+1 ... x_n :

x₁ = b₁ - ∑ⁿ_j=m+1 a_1j x_j
...
x_m = b_m - ∑ⁿ_j=m+1 a_mj x_j

Obviously b₁ ... b_m has to be greater than or equal to zero. And an obvious basic feasible solution is:

x₁ = b₁
...
x_m = b_m
x_j = 0 (m < j ≤ n)

If all the b variables are integer, we are great that we have a solution. But in practice, in general, it’s not going to be the case, some of these b variables are going to be assigned fractional values, from where we can deduce a cut, i.e. a Gomory Cut.

Suppose b₁ is fractional:

x₁ + ∑ⁿ_j=m+1 a_1j x_j = b₁

the first step is to round coefficients a_1j down to integer values ⎣a_1j⎦, then we have a valid inequality and everything x₁, a_1j are integers, but b₁ is fractional:

x₁ + ∑ⁿ_j=m+1 ⎣a_1j⎦ x_j ≤ b₁

We can further round b₁ down to get:

x₁ + ∑ⁿ_j=m+1 ⎣a_1j⎦ x_j ≤ ⎣b₁⎦

This cut is valid, since this new constraint does not remove any feasible solution. But this cut prunes the current basic feasible solution, which violates the new constraint.

We can further re-scale the coefficients of this cut:

x₁ + ∑ⁿ_j=m+1 a_1j x_j = b₁
 -
x₁ + ∑ⁿ_j=m+1 ⎣a_1j⎦ x_j ≤ ⎣b₁⎦
 =
∑ⁿ_j=m+1 (a_1j - ⎣a_1j⎦) x_j ≥ b₁ - ⎣b₁⎦

What to do with this cut? We are going to put it back inside the tableau row by introducing a slack variable s:

∑ⁿ_j=m+1 (a_1j - ⎣a_1j⎦) x_j - s = b₁ - ⎣b₁⎦
⟹ s = -(b₁ - ⎣b₁⎦) + ∑ⁿ_j=m+1 (a_1j - ⎣a_1j⎦) x_j

Obviously this is primal infeasible but dual feasible. We can re-optimize the dual, and get another relaxation.

Put everything together, the Gomory’s Cuts can be summarized as:

Solve the linear relaxation
Choose a row whose constant b is fractional, and add the Gomory’s Cut
Apply the dual simplex to obtain feasibility
Iterate until
- The solution is integral, or
- There is no feasible solution

This algorithm was invented in 1958 by Gomory, even before the invention of the Complexity Theory in 1970s. So at that moment, we did not know the big gap between linear programs (polynomial time) and mixed integer programs (NP-completeness), this algorithm was mostly considered as a beautiful theoretical result, but not a practical one.

In 1990s, the Gomory’s Cuts were completed revived in the Branch-and-Bound algorithm. Nowadays, Gomory’s Cuts becomes practical again and are essentially integrated in all the commercial software for Mixed Integer Programs solvers.

Polyhedral Cuts

Convex hull, if imaged in a 2D or 3D graph, is the smallest convex set of all solutions (integer points) to an Mixed Integer Program. Particularly, in 3D graph, the convex hull is a polyhedral. The vertices of this polyhedral are the optimal solution to the Mixed Integer Program. We can use linear relaxation here to optimize Mixed Integer Program’s objective function, if we can optimize it to a vertex. which is going to be an integer solution, and that’s going to be the optimal solution to the Mixed Integer Program.

This method is called Polyhedral Cuts, which essentially cuts that represent the facets of the convex hull of the integer solutions. These cuts are valid since they don’t remove any solution. The cuts are as strong as possible, because if we have all of them, we could use linear programming t solve the problem. Another benefit is that they exploit the problem structure, completely orthogonal to the syntactic which are based on information in the tableau (Gomory’s Cuts).

Facets

To find a facet of the convex hull is essentially to find n affinely independent solutions (vertex points) satisfying the constraint at equality.

Affinely independence is not hard to understand: if you have a set of n points, they’re going to be affinely independent, if when you extend with one more coordinate, say putting a 1 for that coordinate for every one of them, then you can show that they are linearly independent:

x₁, ..., x_n are affinely independent if and only if (x₁, 1), ..., (x_n, 1) are linearly independent.

Linearly independence is more familiar to us:

x₁, ..., x_n are linearly independent if and only if α₁ x₁ + ... + α_n x_n = 0 implies that a_i = 0 for all i.

Warehouse Location

Take the warehouse location problem as example, we have a few constraints saying you can only serve a customer C with warehouse is W, if warehouse W is open:

y_wc ≤ x_w     (w∈W, c∈C)

Are these inequality constraints facets? You need to find n points that are affinely independent. Suppose we have only one warehouse w, and looking at customer 1, then you can generate these points here which satisfy these inequality constraints:

x_w  y_w1  y_w2  y_w3  ...  y_wc
0   0   0    0        0     # when x_w is zero, y_w1 must be zero
1   1   0    0        0     # when x_w is one, y_w1 can be zero
1   1   1    0        0     # other customers can also use the warehouse
1   1   0    1        0     # we can find different points
...                         # knowing they're going to be affinely independent
1   1   0    0        1

Of course you can generalize that if you have many warehouses by you know, finding the right values for these different warehouses. So, in a sense, this is the kind of ways you prove that something is a facet. You look for these points, trying to show that they satisfy the constraints of the inequality and they are independent of each other.

Node Packing

Another example is the Node Packing problem. Given a graph G with a variety of vertices V and then you have edges E between these vertices. And what you want to find the largest number of vertices, i.e. a subset W of V, such that any two of them is not connected by an edge.

For instance in the graph below, as soon as we take vertex 1, we can’t take anything else, because they are all connected to it. So vertex 1 itself is a packing. Another obvious example of packing is vertex 6 and 4, which are not directly connected by an edge.

          2
     /    |     \
6         |         3
|     \   |     /   |
|         1         |
|     /         \   |
5 - - - - - - - - - 4

How can we express the node packing problem as an Mixed Integer Program? Here is the linear relaxation of the Mixed Integer Program.

max x₁ + ... + x₆
s.t.  x₁ + x₂ ≤ 1
      x₁ + x₃ ≤ 1
      x₁ + x₄ ≤ 1
      x₁ + x₅ ≤ 1
      x₁ + x₆ ≤ 1
      x₂ + x₃ ≤ 1
      x₃ + x₄ ≤ 1
      x₄ + x₅ ≤ 1
      x₂ + x₆ ≤ 1
      x₅ + x₆ ≤ 1
      0 ≤ x₁ ... x₆ ≤ 1

The linear relaxation is basically going to assign 1/2 to everyone of the decision variables, the objective function is going to be 6 * 1/2 = 3. It’s basically telling you that the maximum you will ever be able to do is to pack 3 nodes on this simple example. Can we do better than that? Is there any kind of properties of the solution that would strengthen the relaxation?

There is a very useful concept that we find everywhere in optimization, which is cliques. All the vertices in a clique are connected to all the other vertices, how many vertices can you actually select in a pack? At most one, because if you select one, it’s connected to every vertices else. So as soon as you find a clique, you know exactly that you can at most, that you can select at most one of the vertices in that clique.

We’re going to look at the maximal clique. That is the largest clique that we can find, which is a clique that essentially cannot be extended further. As soon as you have a maximal clique, say for node 1 to c, you can state the constraint which says that most one of these nodes can be selected in the solution:

x₁ + ... x_c ≤ 1

At the same time, we can show that a maximal clique is a facet.

Now we can use the clique constraints to model the node packing problem again, and this is the new linear relaxation:

max x₁ + ... + x₆
s.t.  x₁ + x₂ + x₃ ≤ 1
      x₁      + x₃ + x₄ ≤ 1
      x₁           + x₄ + x₅ ≤ 1
      x₁                + x₅ + x₆ ≤ 1
      x₁ + x₂                + x₆ ≤ 1
      0 ≤ x₁ ... x₆ ≤ 1

This linear relaxation will ignore x₁, putting it equals to 0, and then give a completely fractional value for all of the other variables. The linear relaxation is giving you 5 * 1/2 = 2.5. Now we know already that the best packing we will ever be able to do is two. Notice we don’t have the convex hull of the node packing solution points, because the result given by this linear relaxation is still fractional. So what that means is that, these clique constraints are not completely characterizing the problem, you will need other kinds of cuts.

Cover Cuts

Cover Cuts is a method of generation of cuts, by looking at constraints which are linear inequalities:

∑ⁿ_j=1 a_j x_j ≤ b

Can we find facets for these constraints? The key to this question is the concept of cover. A cover is going to be a set of variables C ⊆ N = {1, ..., n} if you set them all to 1 you’re going to violate the constraints, i.e. ∑_j∈C a_j > b. A cover is minimal if C \ {j} is not a cover for any j ∈ C.

If C is a cover, obviously we have a valid inequality, at least one of the variables has to be zero, and sum of all variables has be smaller than the size of the cover minus 1.

∑_j∈C x_j ≤ |C| - 1

You can make cover stronger by extending it to take other variables, whose coefficients are greater than all of the coefficients of the cover.

E(C) = C ∪ {j | ∀i∈C: a_j ≥ a_i}

For example, consider this inequality, and two of its possible minimal covers:

11x₁ + 6x₂ + 6x₃ + 5x₄ + 5x₅ + 4x₆ + x₇ ≤ 19

x₁ + x₂ + x₃ ≤ 2                    # cover 1
x₃ + x₄ + x₅ + x₆ ≤ 3               # cover 2
x₁ + x₂ + x₃ + x₄ + x₅ + x₆ ≤ 3      # extended cover for 2

Branch and Cut is a powerful algorithm, the basic idea is:

Formulate the application as an Integer Mixed Program
Solve the linear relaxation if the linear relaxation is integer, you have an optimal solution. Finish.
Find a Polyhedral cut which prunes the linear relaxation and is a facet if possible
- If found, go to step 2
- If not found, branch and repeat the whole process

The third step is important, when we’ve gotten an solution to the linear relaxation, we are wondering “Can we generate cover cut on demand, and find a cover cut that’s going to remove this solution of the linear relaxation?”

The cover inequality ∑_j∈C x_j ≤ |C| - 1 can be rewritten into ∑_j∈C (1 - x_j) ≥ 1, because at least one of the variables must be 0. So essentially finding a cover will require two conditions:

It is a cover: ∑_j∈C a_j > b
The cover is violated by the solution of linear relaxation x^*_j: ∑_j∈C (1 - x^*_j) < 1

This is equivalent to a beautiful mathematical program called Separation of Cover Cuts:

min ∑_j∈N (1 - x^*_j)z_j
s.t.
     ∑_j∈N a_jz_j > b
     z_j ∈ {0,1}

We look for the variables z_j. If all the variables are assigned to 1, we have a cover. and if the objective function is lower than 1, then we have a cover cut!

Consider another example, the constraint and its linear relaxation gives you a fractional solution:

45x₁ + 46x₂ + 79x₃ + 54x₄ + 53x₅ + 125x₆ ≤ 178

x^* = (0, 0, 3/4, 1/2, 1, 0)

Its separation problem is quite simple to derive:

min    z₁ +   z₂ + 1/4 z₃ + 1/2 z₄         +    z₆
s.t. 45z₁ + 46z₂ +  79 z₃ + 54  z₄  + 53z₅ + 125z₆ > 178

This separation program is the Mixed Integer Program that we have to solve for finding a cover cut of original problem.

If we replade z_j by (1 - y_j), the objective function will turn to max, the constraint will be come less than or equal to something, it is quite interesting to see that the Separation program will actually be a Knapsack problem.

max ∑_j∈N (1 - x^*_j)(1 - y_j)
s.t.
     ∑_j∈N a_j(1 - y_j) ≤ b
     y_j ∈ {0,1}

So essentially, we are generating these Branch and Cut using a knapsack algorithm here.

Traveling Salesman Problem

Recall the Traveling Salesman Problem is to visit all vertices V once in a graph G = (V, E), meanwhile minimizing the objective function, say traveling cost or distance. We could try to model the Traveling Salesman Problem as an Mixed Integer Program. Like the past, we have to choose decision variables, constraints, and an objective function.

Decision variables	Decide whether an edge is part of the tour `x_e = 1` if edge `e` is in the solution
Constraints	The degree constraint: each vertex has exactly the degree of two, i.e each vertex has exactly two edges. `δ(v)`: edges adjacent to vertex `v` `δ(S)`: edges with exactly one vertex in `S ⊆ V` `γ(S)`: edges with both vertices in `S ⊆ V`
Objective function	Assume the cost related to edge `e` is `c_e`, the total cost of the tour is `∑_e∈E c_ex_e`

Now we can try to have an Mixed Integer Program for the Traveling Salesman Problem:

min ∑_e∈E c_ex_e
s.t.  x_δ(v) = 2          v ∈ V     # degree constraints
      x_e ∈ {0,1}        e ∈ E

where x_{{e_1, ..., e_n}} = x_{e_1} + ... + x_{e_n}

The degree constraints are basically telling you that you every time you go to a vertex you have to get out. However, instead of generating one tour, the model above will generate lots of sub-tours (circles in graph), and there is no connections among the sub-tours.

To prevent the generation of sub-tour, the number of edges in a sub-tour with n vertices has to be n-1. So we need to add one more constraint, forcing you to leave the subset of vertices before forming a sub-tour:

x_γ(S) ≤ |S| - 1    S ⊂ V     # subtour constraints

However, there are exponentially many of these sub-tours, you don’t want to list all of them. You want to generate them on demand. To do that, you need a separation problem. We can re-express the sub-tour constraint, saying: we have at least one edge coming to the set S and going out from S.

x_δ(S) ≥ 2     S ⊂ V     # subtour constraints

Once you re-express it that way, the separation problems become a really nice combinatorial problems that can be solved in polynomial time. The key idea here is that you will build a new graph

With the same set of vertices, the same set of edges, i.e. G^* = (V, E)
But the weight of the edges is going to be the value in the linear relaxation, c_e = x^*_e.

To find the sub-tour constraints that were violated, you try to:

Find a min cut in G^*
If the the weight of the cut is smaller than 2, i.e. x_δ(S) < 2 then we have isolated a subtour constraint violated by the linear relaxation. Finding such a cut only takes polynomial time.
Continue to find more cuts until you can’t actually find any sub-tour constraints which is violated.

Does that mean that finally you have a solution to the Traveling Salesman Problem? No. You probably still have a subtour (cycle) and have fractional value there. We need more other kinds of cuts, for example Come Cuts.

For more on Discrete Optimization: Mixed Integer Program, please refer to the wonderful course here https://www.coursera.org/learn/discrete-optimization

Related Quick Recap

Discrete Optimization: Linear Programming

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai