Discrete Optimization: Scheduling, Column Generation

Table of Contents

Scheduling is one of the most fascinating areas of Discrete Optimization. They are beautiful scientifically meanwhile there is a lot of application in practice. Constraint programming has been really successful in this area. The problem you usually deal with is to minimize some project duration, subject to some precedence constraints (say, some task has to come before others), and/or disjunctive constraints (say, no two tasks scheduled on the same machine).

Scheduling

The simplest scheduling problem can usually be modeled like this:

A set of tasks Ω
Each task t has a duration d(t)
Each task t executes on machine m(t) and a machine must handle its tasks sequentially, no two tasks scheduled on the same machine can overlap in time (disjunctive constraints).
A set of precedence constraints (b, a) stating that task a must start after task b has completed.

What you want to do is to find and ordering of the tasks on each machine, and schedule all these tasks so that you minimize the project completion time, i.e. finish the project as early as possible.

Behind the scene, this model is compiled into decision variables and constraints:

Every activity has 3 variables: starting date, ending date, and duration (s, e, d)
- A constraint is used to link these three variables s + d = e
Each procedure (b, a) has precedence constraint s_a ≥ e_b
Each machine m gives a disjunctive constraint disj(t₁, ..., t_n)

The beautiful thing is that minimizing project duration under precedence constraints is a polynomial time problem.

Feasibility of Disjunctive Constraints

However disjunctive constraints are quite different, even for a simple one machine, detecting feasibility of a disjunctive constraint is NP-complete hard. So instead of solving this problem exactly, we will approximate it efficiently. Assume some notations:

s(Ω): The earliest time at which one of the tasks inside the set of tasks Ω can start
e(Ω): The latest time at which all tasks inside the set of tasks Ω must finish
d(Ω): The sum of the duration of all tasks inside the set of tasks Ω

At this point we are trying to find if these disjunctive constraints are feasible. Let’s first try a simple feasibility test, to check if a set of tasks T can be scheduled on the machine, we check this little inequality:

s(T) + d(T) ≤ e(T)

In a sense, this is a terrible test. The set of tasks T may pass the test easily, but actually proven to be infeasible.

How to improve it? Instead of tasks T, let’s consider the subset Ω of the tasks inside T. We want to look at all possible subsets of the tasks inside T, to check which subsets are feasible:

s(Ω) + d(Ω) ≤ e(Ω) for all Ω ⊆ T

Now the new problem is there are exponential many of subsets, that does not sound good. Actually in practice we only need to look at quadratic many of them. Suppose a set of tasks contains 3 tasks, we only need to consider the starting time of t₁ and the ending time of t₂, the task t₃ … are only used to increase the total duration.

   s(t₁)
   |----t₁----
       ------t₃-------
                ...
                -------t₂-------|
                                e(t₂)

So we don’t need to consider all the possible combinations of all 3 tasks, we only need to pick a starting time, and an ending time, and pack everything in between. This is usually call task intervals:

S(t₁, t₂) = {t in T | s(t) ≥ s(t₁) and e(t) ≤ e(t₂)}

Now the feasibility test can only focus on tasks intervals, we could apply the feasibility test for all task intervals in T, the complexity of this test is cubic O(|T|³). We can do better to make it quadratic or even logarithmic.

The intuition behind the quadratic algorithm is to look at one ending time e, and in one sweep from e to the starting time of all tasks, we could compute all the task intervals that if e is the latest ending time.

So in this particular case, you are going to first fix the ending time e and then look at s₃, s₂, and s₁. We going to compute the task interval incrementally by adding the duration, and perform the feasibility test.


 -------+-----+-----------+---------+------>  time
        s₁    s₂          s₃         e

        < - - - - - - - - - - - - - -

The pseudo-code of the algorithm:

d := 0;              // fix ending time, total duration is zero
for each task t in decreasing order of s_t
  if e_t <= e         // ending time of task is earlier than e
    d := d + d_t;     // add the duration of task to total duration
    if s_t + d > e    // perform the feasibility test
      return failure;      // feasibility test failed
return success;      // feasibility test successful

You would have a successful particular value of e given all tests are passed, but we need to test all possible e‘s. It is linear for every one of these e‘s, this algorithm is going to be quadratic because we have a linear of times of e.

So far we have been assuming that none of the tasks can be interrupted, but if we can actually interrupt the task, it is even possible to do better using preemptive scheduling, which essentially is another kind of relaxation. One-machine preemptive feasibility test can be computed in O(|T| log|T|).

Disjunctive Pruning: Edge Finding Rule

Edge Finding Rule is one of the rules that we can apply for pruning. The key idea is you select a set Ω of tasks, and a task i ∉ Ω. What we are wondering is that has this task i to be scheduled always after the other tasks in Ω.

To determine if task i must start after all tasks in Ω:

s(Ω ∪ {i}) + d(Ω ∪ {i}) > e(Ω)

Once you know that the task i has to start after all the tasks in Ω, you would need to update the starting time of task i to max(γ ∈ Ω) s(γ) + d(γ).

The edge finding rules can be enforced in strongly polynomial time. This is basically the type of pruning that happens in the scheduling algorithm. So there are many of, you know many of other types but they do essentially the same thing, pushing the starting date or pushing the ending date.

Search Strategy for Disjunctive Scheduling

The search strategy, most of the time, is choosing a machine and then ordering the task in the machine, and then repeating that for the other machine. But:

Which machine?
- The tightest machine.
Which task?
- A task that can be scheduled first (or last)
- A task that is as tight as possible

Large Neighborhood Search

Large Neighborhood Search is an amazing technique, which is a hybrid of Local Search and Constraint Programming (or Mixed Integer Programming).

Recall that Constraint Programming is very good at:

Find feasible solutions
Optimize small combinatorial space

When you combine Local Search and Constraint Programming, in a sense, you exploited two strengths of Constraints Programming for finding a high quality solution and you exploit local search for scalability.

The first step that you start with is to find a feasible solution using Constraint Programming. The second step is that you select a neighborhood using Local Search, using the feasible solution you just found. However this neighborhood is gonna be large, you can used the Constraint Programming again to optimize the neighborhood, finding the best neighbor in that neighborhood. Then you can repeat this process forever for improving the quality of the solutions that you have.

Find a feasible solution (Constraint Programming)
Select a neighborhood using the feasible solution in step 1 (Local Search)
Find the best neighbor in the neighborhood (Constraint Programming)
Go to step 2

What is the neighborhood? In the solution, there are:

Some variables that you believe them are nice, and you keep them fixed.
The remaining variables that you can try to find better values for them.

The neighborhood will be found out by changing values for the remaining variables, the one that you are not fixing, such that you improve the quality of the solution that you have found so far.

The way to choose the fixed variables and the variables to fine tune are problem-specific. In some particular case you can have a completely random neighborhood that’s going to behave very well. It depends.

Of course, you can generalize this to Mixed Integer Program. You can find a feasible solution using Mixed Integer Programming and you can exploit the neighborhood using Mixed Integer Programming, also.

Column Generation

Recall when solving an Mixed Integer Program with exponentially many constraints, we generate constraints on demand, based on the value of linear relaxation.

The Column Generation is the similar idea, but reverse. When solving a Linear Program with exponentially many variables which usually represent complex objects, again, we want generate these variables on demand. Branch and Price is a good example of Branch and Bound but using Column Generation at every one of the nodes.

The Cutting Stock Problem

Suppose you have large wood boards of length L, and you want to cut them into small shelves of different sizes, which are required by your customers. You want to find how many wood boards you need to meet the demand.


|- - - - - - - - - - - - - - -|  wood board

|- - -|                          shelf type 1 with size l₁, demand d₁
|- - - - -|                      shelf type 2 with size l₂, demand d₂
...
|- - - - - - - - -|              shelf type n with size l_n, demand d_n

Using binary decision variables for each wood board sounds a good idea to build a model:

y_b = 1 if board b is used in the solution
x_sb is the number of shelves of type s cut from type b

however this will lead to complicated constraints like:

x_sb ≤ y_b M (using Big-M notation) a board is used if some shelf is cut from it
∑_s∈S l_s x_sb ≤ L the shelves cut from a board can not exceed the capacity of the board
∑_b∈B x_sb ≥ d_s meet the demand from customers

Boards are actually interchangeable, so this model is terrible because it has lots of symmetries, which will lead to a very bad linear relaxation.

Another way to model this problem is to reason about the cutting configurations, i.e. specific ways to cut a board. We can find all these configurations. Each configuration will specify the number of shelves of different types that it consists of. For example, configuration 1 will produce 2 shelves of type 1 and 1 shelves of type 5.

Now the decision variable will be the number of each type of configurations: x_c. Now the new model will be very simple, with just one constraint: meet the demand.

min ∑_c∈C x_c
s.t.
    ∑_c∈C n_cs x_c ≥ d_s (s ∈ S)
    x_c ∈ ℕ

where
n_cs is the number of shelf of type s, that configuration c is providing

This new model has very strong relaxation, and there is no symmetries.

Note the capacity constraint is actually built into the configurations. Every configuration c must satisfy ∑_s∈S l_s n_cs ≤ L. It might be impossible to enumerate all of them, so we can not generate them at the very beginning. We are going to generate these configurations, one at a time, on demand.

So this the tableau at a higher level, in this particular case all the variables, which represent the configurations. Every one of these column is a configuration, and that configuration is telling you, how many of various types of shelf it is producing.

           x₁     x₂      ...    x_i
obj        1      1      ...     1      d₁
shelf₁     n_1,1    n_2,1    ...    n_i,1    d₂
...
shelf_|S|   n_1,|S|   n_2,|S|   ...   n_i,|S|   d_|S|

               new column added here ↑

Say that we have a bunch of configurations, we have solved the linear program, we get a good solution. And then can we improve this linear relaxation? And what we do is basically to:

Find a new column (configuration) which is actually a new knapsack problem, which must satisfy two conditions:
- feasibility ∑_s∈S l_s n_cs ≤ L
- quality, i.e. reduced cost is negative to ensure entering the basis (of the Simplex method)
Add that column into the tableau matrix, and
Get a better linear relaxation

Branch and Price

If you want to do branch and price, it is the same idea except that column generation is at one particular node of the tree. Column generation is giving you a good lower bound at every one of the nodes by generating variables on the fly. And as soon as you do a branching decision, you can start regenerating a new column for improving the value of the linear relaxation.

You basically iterate:

branching, and then
doing column generation to find a really nice lower bound.

It’s a very interesting setting when you know you have different kinds of complex objects that you have to manipulate.

For more on Discrete Optimization: Scheduling, Column Generation, please refer to the wonderful course here https://www.coursera.org/learn/discrete-optimization

Related Quick Recap

Discrete Optimization: Mixed Integer Program

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai

Scheduling

Feasibility of Disjunctive Constraints

Disjunctive Pruning: Edge Finding Rule

Search Strategy for Disjunctive Scheduling

Large Neighborhood Search

Column Generation

The Cutting Stock Problem

Branch and Price

Related Quick Recap

Related Posts

My 163rd course certificate from Coursera

Kubernetes Deployment and Networking

Cloud Computing: Law Enforcement, Competition and Tax