Mixed Integer Program is a big field. Essentially a mixed integer program is a linear program with m
constraints but some of the n
variables are integers.
min c_{1}x_{1} + ... + c_{n}x_{n}
s.t. a_{11}x_{1} + ... + a_{1n}x_{n} ≤ b_{1}
...
a_{m1}x_{1} + ... + a_{mn}x_{n} ≤ b_{m}
x_{i} ≥ 0
x_{i} is integer (i ∈ I)
The difference between having integer variables or not having integer variable is big. That’s the gap between being polynomial (P) and NPcomplete. Having integers is what makes problems really complex. The knapsack problem is the most simple example of mixed integer program:
max ∑_{i∈I} v_{i}x_{i}
s.t. ∑_{i∈I} w_{i}x_{i} ≤ K
x_{i} ∈ {0,1} (i ∈ I)
Warehouse Location Problem
Another wellknown MIP problem is the warehouse location problem. Say we have a bunch of customers and we also have a few locations to open warehouses for the customers. The goal is actually to choose which warehouses to open so that you can serve all the customers. Every one of the customer will be allocated to exactly one warehouse. What you want to minimize is the sum of:
 the cost of maintaining these warehouses
 the cost of transportation from the warehouses to the customers
In order to find a Mixed Integer Program model for the warehouse location problem, like finding a linear program, we need to decide what are the decision variables, the constraints, and the objective functions.
Decision variables  x_{w} = 1 if warehouse w is openy_{wc} = 1 if warehouse w serves customer c 
Constraints  y_{wc} ≤ x_{w} a warehouse can serve a customer only if it is open∑_{w∈W} y_{wc} = 1 a customer must be served by exactly one warehouse 
Objective function  ∑_{w∈W} c_{w}x_{w} + ∑_{w∈W,c∈C} t_{wc} y_{wc} c_{w} the fixed cost if warehouse w is opent_{wc} the transportation cost to serve customer c from warehouse w 
Put everything together, this is the Mixed Integer Program model for the warehouse location problem:
min ∑_{w∈W} c_{w}x_{w} + ∑_{w∈W,c∈C} t_{wc} y_{wc}
s.t. y_{wc} ≤ x_{w} (w∈W, c∈C) # constraint per customer
∑_{w∈W} y_{wc} = 1 (c∈C)
x_{w} ∈ {0, 1} (w∈W)
y_{wc} ∈ {0, 1} (w∈W, c∈C)
MIP models like to have decision which are actually yes or no, so the integer variables are typically 0/1 variables. Once you have the 0/1 variables, expressing linear constraints is typically very simple.
Branch and Bound
How to solve Mixed Integer Program models? One of the basic techniques to solve them is to use branch and bound. In this technique, the two basic steps are:
Branching  Split the problem into subproblems 
Bounding  Find an optimistic relaxation 
What is nice about Mixed Integer Program modes is that they have natural relaxation: Linear relaxation. The key idea is that you remove the integrality constraints on variables and you have a relaxation, which can be used as the bounding part of the Branch and Bound. So basically the branch and bound for MIP models is going to be solving the linear relaxation at a particular node.
 If the linear relaxation is worse than the best solution found so far, prune this node. It basically means that the associated subproblem to that node is proved suboptimal. Get rid of the node.
 If the linear relaxation is integral, we have found a feasible solution. Update the best feasible solution if appropriate.
If none of the two items above happens, then you need to do branching. How? Actually the simplest scheme is to use linear relaxation to drive branching:
 Find an integer value
x
that has a fractional valuef
in the linear relaxation  Create two subproblems
x ≤ floor(f)
andx ≥ ceiling(f)
.  Add the two subproblems to the set of constraints.
 Repeat the branch and bound on the subproblems.
In the case of knapsack problem, the linear relaxation is basically replacing the x_{i}
variables, allowing them to take any values between 0 and 1.
max ∑_{i∈I} v_{i}x_{i}
s.t. ∑_{i∈I} w_{i}x_{i} ≤ K
0 ≤ x_{i} ≤1 (i ∈ I)
The linear relaxation is the same as the greedy approach, i.e. order the item by the most value per weight and select them in that order, until you get a fractional value f
and you take that fractional value to get a relaxation. Here we basically branch on the last item x_{i}
that gives you this fractional value f
. What is the implication of “taking the item x_{i}
” and “not taking the item x_{i}
” in the solution?
Taking the item x_{i}  Some of more valuable items before x_{i} will be removed. 
Not taking the item x  You are still going to take more valuable items that were included before x_{i} . 
When is branch and bound going to be effective? It will be really effective when the value of the linear relaxation is very, very close to the optimal value of the Mixed Integer Program problem. A good Mixed Integer Program model is the one with a good linear relaxation, such that you can prune the search space much better.
When there are lots of variable, which fractional variable should we branch on? One of the good heuristic is to branch on the most fractional value, i.e. close to 0.5.
Basic Modeling Techniques
There are a lot of the techniques for solving Mixed Integer Programs, that are based on refinements of branch and bound, i.e. pruning suboptimal solutions. However, a necessary condition is their linear relaxation has to be precise and effective.
Stronger Linear Relaxation
The warehouse location problem revisited. Recall previously we have a constraint saying that “if a warehouse is closed, no customers can be assigned to it.”
y_{wc} ≤ x_{w} (w∈W, c∈C) # model 1
Of course this constraint is for each of customers. Can we model it in a different way? Definitely. We can replace all these similar constraints for customers y_{wc}
with just one constraint for the warehouse. We are basically going to sum all the customers which can be assigned to a warehouse.
∑_{c∈C} y_{wc} ≤ Cx_{w} (w∈W) # model 2
Now the question is which model has stronger linear relaxation? Model 1. Because any solution to the model 1, will also be a solution to model 2, but not viceversa.
BigM Transformation
The nonequal constraint x ≠ y
is not a linear constraint. We can reexpress it as a disjunction of two linear constraints: x ≤ y + 1 ∨ x ≥ y + 1
. However Mixed Integer Problems do not support disjunction. BigM Transformation is used extensively to remove disjunctions.
By introducing a 0/1 variable b
and a big number M
, the nonequal constraint can be rewritten as a conjunction of two linear inequalities:
x ≤ y  1 + b M # trivial when b = 1
x ≥ y + 1  (1b) M # trivial when b = 0
So in a sense this BigM transformation is basically going to make sure that we can transform this disjunction into a conjunction. And essentially x
being different from y
, which can be represented as x < y
(when b = 0
), or x > y
(when b = 1
).
Now, what will linear relaxation do on these constraints? If b = 0.5
, then M / 2
is still a big number. We will have these new constraints, which essentially will always be satisfied:
x ≤ a big number
x ≥ a negative big number.
In this particular case, what happens is that it’s like you removed these constraints from the linear relaxation. The objective is basically going to ignore these two constraints, assuming it can always minimize, which is the worst that can happen. The relaxation is probably going to be poor.
0/1 Variables
Mixed Integer Program loves binary values, another technique is to binarize all variables. For an integer variable, it has to take integer values and you know the range of that variable. What you do is:
 Replace that integer value by plenty of binary variables
 Use the binary variable to tell “is that variable taking that value?”
For instance, originally the integer variable x ∈ {0, 1, 2, 3} is replace by 4 binary variables:
b_{x=0} b_{x=1} b_{x=2} b_{x=3}
b_{x=i} = 1 if variable x = i
As soon as I have the 0/1 variables, it’s really easy to express the constrains in Mixed Integer Program models.
Cutting Planes
The purpose of cutting planes is to add new constraints to the model that are going to improve the linear relaxation and to make the model easier to solve. There are many ways of generating these cuts.
Gomory’s Cuts
Gomory’s cuts is essentially to look at the Simplex tableau and derive the cut from the Simplex tableau. Recall that the simplex method algorithm is dealing with basic feasible solution, which is to express some of the basic variables x_{1} ... x_{m}
in terms of the nonbasic variables x_{m+1} ... x_{n}
:
x_{1} = b_{1}  ∑^{n}_{j=m+1} a_{1j} x_{j}
...
x_{m} = b_{m}  ∑^{n}_{j=m+1} a_{mj} x_{j}
Obviously b_{1} ... b_{m}
has to be greater than or equal to zero. And an obvious basic feasible solution is:
x_{1} = b_{1}
...
x_{m} = b_{m}
x_{j} = 0 (m < j ≤ n)
If all the b
variables are integer, we are great that we have a solution. But in practice, in general, it’s not going to be the case, some of these b
variables are going to be assigned fractional values, from where we can deduce a cut, i.e. a Gomory Cut.
Suppose b_{1}
is fractional:
x_{1} + ∑^{n}_{j=m+1} a_{1j} x_{j} = b_{1}
the first step is to round coefficients a_{1j}
down to integer values ⎣a_{1j}⎦
, then we have a valid inequality and everything x_{1}
, a_{1j}
are integers, but b_{1}
is fractional:
x_{1} + ∑^{n}_{j=m+1} ⎣a_{1j}⎦ x_{j} ≤ b_{1}
We can further round b_{1}
down to get:
x_{1} + ∑^{n}_{j=m+1} ⎣a_{1j}⎦ x_{j} ≤ ⎣b_{1}⎦
This cut is valid, since this new constraint does not remove any feasible solution. But this cut prunes the current basic feasible solution, which violates the new constraint.
We can further rescale the coefficients of this cut:
x_{1} + ∑^{n}_{j=m+1} a_{1j} x_{j} = b_{1}

x_{1} + ∑^{n}_{j=m+1} ⎣a_{1j}⎦ x_{j} ≤ ⎣b_{1}⎦
=
∑^{n}_{j=m+1} (a_{1j}  ⎣a_{1j}⎦) x_{j} ≥ b_{1}  ⎣b_{1}⎦
What to do with this cut? We are going to put it back inside the tableau row by introducing a slack variable s
:
∑^{n}_{j=m+1} (a_{1j}  ⎣a_{1j}⎦) x_{j}  s = b_{1}  ⎣b_{1}⎦
⟹ s = (b_{1}  ⎣b_{1}⎦) + ∑^{n}_{j=m+1} (a_{1j}  ⎣a_{1j}⎦) x_{j}
Obviously this is primal infeasible but dual feasible. We can reoptimize the dual, and get another relaxation.
Put everything together, the Gomory’s Cuts can be summarized as:
 Solve the linear relaxation
 Choose a row whose constant
b
is fractional, and add the Gomory’s Cut  Apply the dual simplex to obtain feasibility
 Iterate until
 The solution is integral, or
 There is no feasible solution
This algorithm was invented in 1958 by Gomory, even before the invention of the Complexity Theory in 1970s. So at that moment, we did not know the big gap between linear programs (polynomial time) and mixed integer programs (NPcompleteness), this algorithm was mostly considered as a beautiful theoretical result, but not a practical one.
In 1990s, the Gomory’s Cuts were completed revived in the BranchandBound algorithm. Nowadays, Gomory’s Cuts becomes practical again and are essentially integrated in all the commercial software for Mixed Integer Programs solvers.
Polyhedral Cuts
Convex hull, if imaged in a 2D or 3D graph, is the smallest convex set of all solutions (integer points) to an Mixed Integer Program. Particularly, in 3D graph, the convex hull is a polyhedral. The vertices of this polyhedral are the optimal solution to the Mixed Integer Program. We can use linear relaxation here to optimize Mixed Integer Program’s objective function, if we can optimize it to a vertex. which is going to be an integer solution, and that’s going to be the optimal solution to the Mixed Integer Program.
This method is called Polyhedral Cuts, which essentially cuts that represent the facets of the convex hull of the integer solutions. These cuts are valid since they don’t remove any solution. The cuts are as strong as possible, because if we have all of them, we could use linear programming t solve the problem. Another benefit is that they exploit the problem structure, completely orthogonal to the syntactic which are based on information in the tableau (Gomory’s Cuts).
Facets
To find a facet of the convex hull is essentially to find n affinely independent solutions (vertex points) satisfying the constraint at equality.
Affinely independence is not hard to understand: if you have a set of n points, they’re going to be affinely independent, if when you extend with one more coordinate, say putting a 1 for that coordinate for every one of them, then you can show that they are linearly independent:
x_{1}, ..., x_{n}
are affinely independent if and only if (x_{1}, 1), ..., (x_{n}, 1)
are linearly independent.
Linearly independence is more familiar to us:
x_{1}, ..., x_{n}
are linearly independent if and only if α_{1} x_{1} + ... + α_{n} x_{n} = 0
implies that a_{i} = 0
for all i
.
Warehouse Location
Take the warehouse location problem as example, we have a few constraints saying you can only serve a customer C with warehouse is W, if warehouse W is open:
y_{wc} ≤ x_{w} (w∈W, c∈C)
Are these inequality constraints facets? You need to find n points that are affinely independent. Suppose we have only one warehouse w
, and looking at customer 1
, then you can generate these points here which satisfy these inequality constraints:
x_{w} y_{w1} y_{w2} y_{w3} ... y_{wc}
0 0 0 0 0 # when x_{w} is zero, y_{w1} must be zero
1 1 0 0 0 # when x_{w} is one, y_{w1} can be zero
1 1 1 0 0 # other customers can also use the warehouse
1 1 0 1 0 # we can find different points
... # knowing they're going to be affinely independent
1 1 0 0 1
Of course you can generalize that if you have many warehouses by you know, finding the right values for these different warehouses. So, in a sense, this is the kind of ways you prove that something is a facet. You look for these points, trying to show that they satisfy the constraints of the inequality and they are independent of each other.
Node Packing
Another example is the Node Packing problem. Given a graph G
with a variety of vertices V
and then you have edges E
between these vertices. And what you want to find the largest number of vertices, i.e. a subset W
of V
, such that any two of them is not connected by an edge.
For instance in the graph below, as soon as we take vertex 1, we can’t take anything else, because they are all connected to it. So vertex 1 itself is a packing. Another obvious example of packing is vertex 6 and 4, which are not directly connected by an edge.
2
/  \
6  3
 \  / 
 1 
 / \ 
5          4
How can we express the node packing problem as an Mixed Integer Program? Here is the linear relaxation of the Mixed Integer Program.
max x_{1} + ... + x_{6}
s.t. x_{1} + x_{2} ≤ 1
x_{1} + x_{3} ≤ 1
x_{1} + x_{4} ≤ 1
x_{1} + x_{5} ≤ 1
x_{1} + x_{6} ≤ 1
x_{2} + x_{3} ≤ 1
x_{3} + x_{4} ≤ 1
x_{4} + x_{5} ≤ 1
x_{2} + x_{6} ≤ 1
x_{5} + x_{6} ≤ 1
0 ≤ x_{1} ... x_{6} ≤ 1
The linear relaxation is basically going to assign 1/2 to everyone of the decision variables, the objective function is going to be 6 * 1/2 = 3. It’s basically telling you that the maximum you will ever be able to do is to pack 3 nodes on this simple example. Can we do better than that? Is there any kind of properties of the solution that would strengthen the relaxation?
There is a very useful concept that we find everywhere in optimization, which is cliques. All the vertices in a clique are connected to all the other vertices, how many vertices can you actually select in a pack? At most one, because if you select one, it’s connected to every vertices else. So as soon as you find a clique, you know exactly that you can at most, that you can select at most one of the vertices in that clique.
We’re going to look at the maximal clique. That is the largest clique that we can find, which is a clique that essentially cannot be extended further. As soon as you have a maximal clique, say for node 1 to c, you can state the constraint which says that most one of these nodes can be selected in the solution:
x_{1} + ... x_{c} ≤ 1
At the same time, we can show that a maximal clique is a facet.
Now we can use the clique constraints to model the node packing problem again, and this is the new linear relaxation:
max x_{1} + ... + x_{6}
s.t. x_{1} + x_{2} + x_{3} ≤ 1
x_{1} + x_{3} + x_{4} ≤ 1
x_{1} + x_{4} + x_{5} ≤ 1
x_{1} + x_{5} + x_{6} ≤ 1
x_{1} + x_{2} + x_{6} ≤ 1
0 ≤ x_{1} ... x_{6} ≤ 1
This linear relaxation will ignore x_{1}
, putting it equals to 0, and then give a completely fractional value for all of the other variables. The linear relaxation is giving you 5 * 1/2 = 2.5. Now we know already that the best packing we will ever be able to do is two. Notice we don’t have the convex hull of the node packing solution points, because the result given by this linear relaxation is still fractional. So what that means is that, these clique constraints are not completely characterizing the problem, you will need other kinds of cuts.
Cover Cuts
Cover Cuts is a method of generation of cuts, by looking at constraints which are linear inequalities:
∑^{n}_{j=1} a_{j} x_{j} ≤ b
Can we find facets for these constraints? The key to this question is the concept of cover. A cover is going to be a set of variables C ⊆ N = {1, ..., n}
if you set them all to 1 you’re going to violate the constraints, i.e. ∑_{j∈C} a_{j} > b
. A cover is minimal if C \ {j}
is not a cover for any j ∈ C
.
If C
is a cover, obviously we have a valid inequality, at least one of the variables has to be zero, and sum of all variables has be smaller than the size of the cover minus 1.
∑_{j∈C} x_{j} ≤ C  1
You can make cover stronger by extending it to take other variables, whose coefficients are greater than all of the coefficients of the cover.
E(C) = C ∪ {j  ∀i∈C: a_{j} ≥ a_{i}}
For example, consider this inequality, and two of its possible minimal covers:
11x_{1} + 6x_{2} + 6x_{3} + 5x_{4} + 5x_{5} + 4x_{6} + x_{7} ≤ 19
x_{1} + x_{2} + x_{3} ≤ 2 # cover 1
x_{3} + x_{4} + x_{5} + x_{6} ≤ 3 # cover 2
x_{1} + x_{2} + x_{3} + x_{4} + x_{5} + x_{6} ≤ 3 # extended cover for 2
Branch and Cut is a powerful algorithm, the basic idea is:
 Formulate the application as an Integer Mixed Program
 Solve the linear relaxation if the linear relaxation is integer, you have an optimal solution. Finish.
 Find a Polyhedral cut which prunes the linear relaxation and is a facet if possible
 If found, go to step 2
 If not found, branch and repeat the whole process
The third step is important, when we’ve gotten an solution to the linear relaxation, we are wondering “Can we generate cover cut on demand, and find a cover cut that’s going to remove this solution of the linear relaxation?”
The cover inequality ∑_{j∈C} x_{j} ≤ C  1
can be rewritten into ∑_{j∈C} (1  x_{j}) ≥ 1
, because at least one of the variables must be 0. So essentially finding a cover will require two conditions:
 It is a cover:
∑_{j∈C} a_{j} > b
 The cover is violated by the solution of linear relaxation
x^{*}_{j}
:∑_{j∈C} (1  x^{*}_{j}) < 1
This is equivalent to a beautiful mathematical program called Separation of Cover Cuts:
min ∑_{j∈N} (1  x^{*}_{j})z_{j}
s.t.
∑_{j∈N} a_{j}z_{j} > b
z_{j} ∈ {0,1}
We look for the variables z_{j}
. If all the variables are assigned to 1, we have a cover. and if the objective function is lower than 1, then we have a cover cut!
Consider another example, the constraint and its linear relaxation gives you a fractional solution:
45x_{1} + 46x_{2} + 79x_{3} + 54x_{4} + 53x_{5} + 125x_{6} ≤ 178
x^{*} = (0, 0, 3/4, 1/2, 1, 0)
Its separation problem is quite simple to derive:
min z_{1} + z_{2} + 1/4 z_{3} + 1/2 z_{4} + z_{6}
s.t. 45z_{1} + 46z_{2} + 79 z_{3} + 54 z_{4} + 53z_{5} + 125z_{6} > 178
This separation program is the Mixed Integer Program that we have to solve for finding a cover cut of original problem.
If we replade z_{j}
by (1  y_{j})
, the objective function will turn to max
, the constraint will be come less than or equal to something, it is quite interesting to see that the Separation program will actually be a Knapsack problem.
max ∑_{j∈N} (1  x^{*}_{j})(1  y_{j})
s.t.
∑_{j∈N} a_{j}(1  y_{j}) ≤ b
y_{j} ∈ {0,1}
So essentially, we are generating these Branch and Cut using a knapsack algorithm here.
Traveling Salesman Problem
Recall the Traveling Salesman Problem is to visit all vertices V
once in a graph G = (V, E)
, meanwhile minimizing the objective function, say traveling cost or distance. We could try to model the Traveling Salesman Problem as an Mixed Integer Program. Like the past, we have to choose decision variables, constraints, and an objective function.
Decision variables  Decide whether an edge is part of the tourx_{e} = 1 if edge e is in the solution 
Constraints  The degree constraint: each vertex has exactly the degree of two, i.e each vertex has exactly two edges.δ(v) : edges adjacent to vertex v δ(S) : edges with exactly one vertex in S ⊆ V γ(S) : edges with both vertices in S ⊆ V 
Objective function  Assume the cost related to edge e is c_{e} , the total cost of the tour is∑_{e∈E} c_{e}x_{e} 
Now we can try to have an Mixed Integer Program for the Traveling Salesman Problem:
min ∑_{e∈E} c_{e}x_{e}
s.t. x_{δ(v)} = 2 v ∈ V # degree constraints
x_{e} ∈ {0,1} e ∈ E
where x_{{e_1, ..., e_n}} = x_{e_1} + ... + x_{e_n}
The degree constraints are basically telling you that you every time you go to a vertex you have to get out. However, instead of generating one tour, the model above will generate lots of subtours (circles in graph), and there is no connections among the subtours.
To prevent the generation of subtour, the number of edges in a subtour with n
vertices has to be n1
. So we need to add one more constraint, forcing you to leave the subset of vertices before forming a subtour:
x_{γ(S)} ≤ S  1 S ⊂ V # subtour constraints
However, there are exponentially many of these subtours, you don’t want to list all of them. You want to generate them on demand. To do that, you need a separation problem. We can reexpress the subtour constraint, saying: we have at least one edge coming to the set S
and going out from S
.
x_{δ(S)} ≥ 2 S ⊂ V # subtour constraints
Once you reexpress it that way, the separation problems become a really nice combinatorial problems that can be solved in polynomial time. The key idea here is that you will build a new graph
 With the same set of vertices, the same set of edges, i.e.
G^{*} = (V, E)
 But the weight of the edges is going to be the value in the linear relaxation,
c_{e} = x^{*}_{e}
.
To find the subtour constraints that were violated, you try to:
 Find a min cut in
G^{*}
 If the the weight of the cut is smaller than 2, i.e.
x_{δ(S)} < 2
then we have isolated a subtour constraint violated by the linear relaxation. Finding such a cut only takes polynomial time.  Continue to find more cuts until you can’t actually find any subtour constraints which is violated.
Does that mean that finally you have a solution to the Traveling Salesman Problem? No. You probably still have a subtour (cycle) and have fractional value there. We need more other kinds of cuts, for example Come Cuts.
For more on Discrete Optimization: Mixed Integer Program, please refer to the wonderful course here https://www.coursera.org/learn/discreteoptimization
Related Quick Recap
I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai