Constraint Programming is one of the main paradigms to actually tackle optimization problems. The idea is to look at the problem constraints, and try to use constraints to reduce the set of values that each variables can take, and remove the values that can not appear in any solution. Constraint programming can help model problems at very high level. It conveys the structure of the problem as explicit as possible. You give the structure to the solver, then the solver will explore it. Constraint programming is a complete method, it is NOT heuristic. Given enough time, it will find an optimal solution to an optimization problem.

Computational Paradigm

Unlike the Branch-and-Bound, which focuses on bounding, relaxation and optimistic evaluation, the Constraint Programming focuses on feasibility and is called Branch-and-Prune.

BranchingDecompose the problem into subproblems and explore the subproblems.
Try all the possible values of a variable until a solution is found or it can be proven that no solution exists.
PruningReduce the search space as much as possible, in other words, use constraints to remove, from the variable domains, values that cannot belong to any solution.
⎡        ⎤ ⟹   C1   ⟹   ⎡       Constraint store        ⎤
|        | ⟹   Cn   ⟹   |   C1, ⟷  ⎡              ⎤    |
| Search |                 |   C2, ⟷  | Domain store |    |      
|        | ⟸ success ⟸  |   ... ⟷  |              |    |
⎣        ⎦ ⟸ failure  ⟸ ⎣   Cn  ⟷  ⎣              ⎦    ⎦

The Search makes choices and send new constraints, the Constraint store is the guy who basically trying to see if the constraint is satisfied, and to derive as much information as it can, then send back to the Search about whether a constraint is feasible or not.

The Domain store inside the Constraint store is basically capturing the search space, which typically is the associated domain to every one of the variables. The constraints C1, C2, ... Cn do not know about each other, they have two goals in life.


A constraint can be arbitrarily complex, but essentially it has two goals: feasibility checking and pruning. Every one of the constraints in a constraint programming system, has a dedicated algorithm to do, these two tasks. Consider an example: there are 2 variables: x ∈ {0, 1, 2} and y ∈ {1, 2, 3}, which means the domain of x is {0, 1, 2}, and the domain of y is {1, 2, 3}. There is also a constraint that x ≠ y.

Feasibility checkingA constraint checks if it can be satisfied given the values in the domains of its variables.
– Am I feasible with respect to the domains?
– Can I find an assignment for the variables using these domains such that the constraint is satisfied?

In the example: the constraint x ≠ y is going to be feasible if the size of the union of the two domains are greater than 1.
| D(x) ∪ D(y) | > 1
PruningIf satisfied, a constraint determines which values in the domain can not be part of any solution.
– Can I knock down some values in these domains, such that I reduce the search space?

In the example: the pruning happens when
1. one variable, say x, can take only one value 1 (x ∈ {1}), and
2. the other variable, say y, can not take that value 1 (y ∈ {1, 2, 3})
D(x) = {1} ⟹ D(Y) \ {1}

The core of any constraint programming solver is a propagation engine. It is like a simple iterative algorithm:

propagate ()
    select a constraint;
    if the constraint is infeasible given the domain store then
      return failure;
      apply the pruning algorithm associated with the constraint;
  until NO constraint can remove any value from the domain of its variables;
  return success;

Only when the propagation engine returns success, there you reached the stable state, which is called a fixed point. And then you know that at this point you’ve inferred as much information as you can.

Consider a linear constraint:

a1 x1 + ... an xn ≥ b1 y1 + ... + bm ym

ai, bi ≥ 0 are constants
xi, yi are variables with domains D(xi) and D(yi)

What we are interested in is essentially propagating that constraint optimally, removing as many values as we can, from the domain of the variables.

First, for the feasibility test, we basically test if this inequality is satisfied or not:

a1 max(D(x1)) + ... an max(D(xn)) ≥ b1 min(D(y1)) + ... + bm min(D(ym))

Second, for pruning, assuming the feasibility test is satisfied:

l = a1 max(D(x1)) + ... + an max(D(xn))
r = b1 min(D(y1)) + ... + bm min(D(ym))

We have l ≥ r, and also have ai max(D(xi)) ≥ ai xi.

l ≥ r
ai max(D(xi)) + l ≥ ai xi + r
ai max(D(xi)) + l - r ≥ ai xi
xi ≤ ⎣ [ai max(D(xi)) + l - r] / ai

Similarly, we have bi yi ≥ bi min(D(yi)):

l ≥ r
bi yi + l ≥ bi min(D(yi)) + r
bi yi ≥ bi min(D(yi)) + r - l
yi ≥ ⎡ [bi min(D(yi)) + r - l] / bi

Modeling Language

One of the key aspects of constraint programming is the fact that the language is able to capture very complex model, very complex idiosyncratic constraints.

Reified constraint is a very useful concept. It’s basically the ability to transform a constraint into value 0 (if the constraint is false) or value 1 (if the constraint is true).

Element constraint is used in many application, it is the ability to index an array or a matrix with complex expressions involving variables. Then we have the ability to actually implement logical combination of constraints.

For instance, x and y are variables, c is an array of integers, also we have a constraint x = c[y].

x ∈ {3, 4, 5}
y ∈ {0, 1, 2, 3, 4, 5}
c = [3, 4, 5, 5, 4, 3]

If now we have the information that x ≠ 3, first we can remove value 3 from the domain of x. Using propagation, next we can prevent y from getting values that is going to give 3 to x. So the value 0 and 5 are also removed from the domain of y.

x ∈ {3, 4, 5}

c = [3, 4, 5, 5, 4, 3]
y ∈ {0, 1, 2, 3, 4, 5}

Global Constraints

Global constraints capture the combinatorial substructure arising in many applications. They are more natural and make modeling easier. We capture this structure and we give them directly to the solver. And then the solver has a lot of information, in particular, the ability to use dedicated algorithm for each one of these substructure.

One of the big impact of global constraints is that they can actually detect infeasibility earlier in the search space, and make it possible to prune the search space more. Therefore, they make your search space smaller, and your search more efficient. However this depends on the constraint, sometimes some global constraints deal with feasibility test and pruning very efficiently, but sometimes you will have to relax your standards.

⎡        ⎤ ⟹   C1   ⟹   ⎡       Constraint store          ⎤
|        |                 |   -----                         |
|        | ⟹   Cn   ⟹   |  | C1,| ⟷  ⎡              ⎤   |
| Search |                 |  | C2,| ⟷  | Domain store |   |      
|        | ⟸ success ⟸  |  | ...| ⟷  |              |   | 
|        |                 |  | Cn | ⟷  |              |   | 
⎣        ⎦ ⟸ failure  ⟸ ⎣   ----- ⟷  ⎣              ⎦   ⎦
                         Global constraints 

Recall in the computation paradigm of constraint programming, all the constraints talk to the domain independently, and they don’t communicate. Each of them may be satisfied and pass the feasibility test, however it is not true that all constrains will be feasible if they are considered as a whole. For instance: given x1, x2, x3 ∈ {1, 2}, the constrains x1 ≠ x2, x2 ≠ x3, x3 ≠ x1 are feasible if you think about them independently. However, the constraint all_different(x1, x2, x3) won’t pass the feasibility test.

The focus of constraint programming is the feasibility. How to use it to find an optimal solution?

  1. Solve a sequence of satisfaction problems
  2. Find a solution
  3. Impose a constraint that the next solution must be better

At least theoretically, constraint programming guarantees to find an optimal solution. In practice, it may take too long. It will be strong when the new constraint reduces the search space. Scheduling problems are good examples.

Symmetry Breaking

A lot of problems in practice are of natural symmetry. Exploring symmetrical parts of the search space is useless, you will want to actually explore that information to do search in a better fashion. There are many kinds of symmetries. Symmetry-breaking constraints is the tool that we can use to break these symmetries.

Variable Symmetry

A good example of variable symmetry is called Balance Incomplete Block Designs (BIBDs). Why BIBD is important? One of the reasons is that this is an example of combinatorial design. The input of BIBD is a set of five numbers (v, b, r, k, l), the output is a v-by-b matrix, whose elements values are either 0 or 1. This matrix will satisfy 3 constraints:

  1. The number of 1 on every row of the matrix has to be r.
  2. The number of 1 on every column of the matrix has to be k.
  3. The scalar product of every two rows has to be exactly l.

A simple example of BIBD is (3, 3, 2, 2, 1), now you’re going to see all the symmetries: swapping two rows or swapping two columns gives you another solution! Because essentially, there is nothing distinguishing these rows, or columns.

1 1 0       swap first row       0 1 1
0 1 1   ⟹ and second row  ⟹   1 1 0
1 0 1                            1 0 1

1 1 0       swap first col       1 1 0
0 1 1   ⟹ and second col  ⟹   1 0 1
1 0 1                            0 1 1

So what we want is to avoid exploring all these set of configurations. We want to find a solution without exploring them all. How to break the variable symmetries in BIBDs? A nice way is to impose a lexicographical ordering on both the rows and the columns, so the search space is reduced dramatically.

1 1 0      lexicographical       0 1 1
0 1 1      ⟹ ordering  ⟹      1 0 1
1 0 1                            1 1 0
Value Symmetry

Scene allocation for shooting a movie is an example of value symmetry. There are:

  1. a set of scenes
  2. a set of actors
  3. a set of days

Every actor plays in some of the scenes, get paid at a rate by the day. At most k scenes can be shot per day. The objective is to minimize the total cost. Essentially, for every scene, you have to find which day is going to be shot.

What are the symmetries that we have in this problem? There is no symmetries in actors, neither in scenes. However, there is no difference between Monday or Friday. The symmetries are the days on which the scenes are gonna be shot. So if days of s is a solution, then p(s) is also a solution where the days of s have been permuted by permutation p.

When we pick a day for a scene, we can pick either the days already used, or one additional new day.

Day(Scenek) = {1, 2, ..., max[Day(Scene1), ..., Day(Scenek-1)] + 1}

max[Day(Scene1), ..., Day(Scenek-1)]: existing days

Doing that, you remove all these values symmetries.

Golden Standard for Pruning

Recall that  there is two things that a constraint is to do:

  • Feasibility test: can we find a solution to the constraint?
  • Pruning: can we eliminate values from the domains?

The golden standard for pruning is after pruning, if value v is in the domain of variable x, then there exists a solution to the constraint with value v assigned to variable x, so in a sense, you know that every value for every variable is part of a solution to that constraint. If you have only the domains and the constraint, there is no way you can actually have a better pruning, then you have an optimal pruning.

Of course in practice, it may not be possible to implement this golden standard in polynomial time. If that’s the case, we’d have to relax the standards and do incomplete pruning, or we settle for exponential time.

Implementation: Knapsack

The constraint of Knapsack problem can be expressed as l ≤ ∑k∈R wk xk ≤ u, xk ∈ {0, 1}, where wk is the weight of item k.

Feasibility testThe basic algorithm that we’re going to use for feasibility is based on dynamic programming, which is pseudo-polynomial time is the numbers are small.
PruningAgain we exploit the dynamic programming table
1. Forward phase: keep dependency links
2. Backward phase: update dependency links to keep only feasible values

Implementation: alldifferent

The algorithm of alldifferent(x1, ..., xn) is based on graph theory.

Feasibility testCan we find values in the domains of the variables x1, ..., xn, so that all variables are assigned a different value?
PruningAre there any values in the domain of a variable xk, that the variable can not take? I.e. if the variable xk takes that value, there will be no solution.

A bipartite graph is used to implement feasibility test and pruning. A bipartite graph is a graph with two types of vertices:

  1. One type of vertices is for variables
  2. The other type of vertices is for values.
  3. Edges are only between variables and values.
 x1, x2, x3, ...., xn
  \  /     \
 1, 2, 3, 4, 5, 6, ...., m

A matching for a graph G = (V, E) is a set of edges in E such that no two edges in E share a vertex. A maximum matching M for a graph G is a matching with the largest number of edges.

Feasibility Test

Basically feasibility test amounts to finding the maximum matching. How to find a maximum matching?

  1. Start with any matching
  2. Improve the matching
    • Start from a free vertex x
    • If there is an edge (x, v) where v is not matched, then insert (x, v) into the matching
    • If all values are taken, x can not be assigned to any value. Just take a value, say v, which has already been matched to y. Remove edge (y, v) from the matching, add edge (x, v) into the matching, and restart this step (Improve the matching) with y (instead of x).
  3. When there is no improvement possible, we have a maximum matching

In the step 2, an alternating path is actually used. An alternating path P for a matching M is a path from variable vertex x in X to a value vertex v in V (both of x and v are free) such that the edges in the path are alternatively in E \ M (all edges except the edges in the matching) and M (the matching).

Create a directed graph:

  • Edges NOT in the matching E \ M are oriented from top to bottom
  • Edges in matching M are oriented from bottom to top.
 x1, x2, x3, ...., xn
  ⥣  ⥥    ⥣
 1, 2, 3, 4, 5, 6, ...., m

You can use depth-first or best-first search to find an alternative path. Its complexity is O( |X| + |E| ) where X is the set of vertices and E is the set of edges.


For pruning, value vertex v must be removed from the domain of variable vertex x if the edge (x, v) does NOT appears in any maximum matching. An naive and inefficient approach is to:

  1. Force the edge (x, v) in the matching, remove all other edges (x, w).
  2. Search for a maximum matching, using (x, v). If the maximum matching is smaller than the number of variables, then v must be removed from the domain of x.

The problem of this approach is we have to take all these edges and then find a maximum matching. It takes a lot of time. We can do better using the Basic Property by Berge:

An edge belongs to some but not all maximum matchings, if and only if, given a maximum matching M, it belongs to either:

  • An even alternating path starting at a free value vertex, or
  • An even alternating cycle.

The property tells us whether an edge belongs to at least one maximum matching, so we can NOT prune them. And we can prune the edges that are not in any maximum matching (which of course does not belong to all maximum matchings).

How to prune?

  1. Given a matching M, create a directed graph like what you did in feasibility test, but reverse the direction of the edges:
    • Edges NOT in the matching E \ M are oriented from bottom to top
    • Edges in matching M are oriented from top to bottom.
  2. Search for even alternating path P starting from a free variable vertex.
    • Edges found in this step can not be pruned.
  3. Search for all strongly connected components (cycles) C and collect all the edges belonging to them.
    • Edges found in this step can not be pruned.
  4. Remove all the edges that are not in M, P, C

The complexity of this approach is O( (|X| + |V|) * |E| ).

Other Constraints to Improve Communication

Redundant constraints can help speed up models tremendously. Essentially these are constraints that don’t add anything to your model from a semantics standpoint, i.e. the model will have the same exact number of solutions. However Redundant constraints express properties of the solutions not captured by the model, they are computationally significant, because they will help reduce the search space.

But redundant constraints’ role is not limited to expressing properties of the solutions and boosting the propagation of other constraints. Redundant constraints can also:

  • Provide a more global view
  • Combine existing constraints, and
  • Improve communication between various variables

Surrogate constraints are the new constraints that combine existing constraints.

Implied constraints are that you take two constraints and you derive a property from them.

Dual Modeling

Sometimes for a particular problem, you have different way of modeling it, using different decision variables, and you don’t express the constraints in the same way. However two models may have complementary strengths, and you don’t know which model is the best. The idea of dual modeling is to state multiple models of a problem and link the multiple models with new constraints.


In constraint programming, search is based on feasibility information. We have First-Fail Principle that we want to apply, which tells you “try first where you are the most likely to fail”. By starting with the hard stuff, what we hope is that we’ll have a very small search tree. And that easy stuff we can take care of at the end.

How to implement the search? In a lot of the search procedures, you start by:

  1. Iterate over the variables
  2. Non-deterministically choose a value for the variables, and
  3. Add constraints inside constraint store

Recall the computational paradigm, what the Search is doing is always adding constraints to the constraint store, which is pruning the search base. When adding a constraint to the constraint store returns a failure, the Search again non-deterministically choose another value that has not been tried before.

⎡        ⎤ ⟹   C1   ⟹   ⎡       Constraint store          ⎤
|        |                 |   -----                         |
|        | ⟹   Cn   ⟹   |  | C1,| ⟷  ⎡              ⎤   |
| Search |                 |  | C2,| ⟷  | Domain store |   |      
|        | ⟸ success ⟸  |  | ...| ⟷  |              |   | 
|        |                 |  | Cn | ⟷  |              |   | 
⎣        ⎦ ⟸ failure  ⟸ ⎣   ----- ⟷  ⎣              ⎦   ⎦
                         Global constraints 

Variable/Value Labeling

This is probably the most simple search procedure that you can design in a constraint programming system.

  1. Choose the variable with the smallest domain (First-fail principle)
    • If the number of domain ties:
      • Choose the most constrained variable (Color a map problem)
      • Choose the variable which is likely to remove more values (Queen’s problem)
  2. Choose a value that leaves as many options as possible to the other variables, which helps finding a solution.

Value/Variable Labeling

Instead of choosing a variable and then a value, we can look at the problem in a completely different fashion. We want to look at the value first, and then choose a variable to which we want to assign the value.

Domain Splitting

Sometimes Variable/Value or Value/Variable Labeling does work well, because taking a random value for a particular variable is going to be a very strong commitment. We don’t want that, we only want a weak commitment. This is why Domain Splitting can be helpful. In Domain Splitting:

  1. Select a variable
  2. Split its domain in two or more sets (a much weaker commitment than selecting a value)
  3. The variable only needs to be greater or smaller than the boundary of the sets.

Feasibility vs. Optimality

In constraint programming, most of the time, we try to use feasibility information for pruning. But sometimes we have an optimization problem, we may also focus on the objective function, which does not change the way search works. The critical aspect is you can use information on the objective function to drive the search.

Symmetry Breaking

Previously, we talked about implementing symmetry breaking in constraints. These constraints have the dramatic effect on search, dictating the way how search is done. Is that good? Well sometimes yes, sometimes no. It will essentially prevents you from using any kind of sophisticated characteristic because the search is basically constrained. These symmetry breaking constraints were imposing a particular order, preventing you from using First-fail principle.

Can we actually get rid of this? Yes. What we have to do is take the symmetry breaking constraints, instead of imposing them in the model, but introduce them dynamically during the search. The big difference is the ordering will be different, and its going to be discovered incrementally by looking at the variables which are the most difficult to assign, again the First-fail principle.

Randomization and Restarts

The basic key intuition here is that sometimes there is no obvious search ordering, but the structure of the problem is such that there is one, but you don’t know yet. How to find it? Brute force search:

  1. Randomization: try a random ordering
  2. Restarts: if no solution is found after some limit, restart the search.

For more on Discrete Optimization: Constraint Programming, please refer to the wonderful course here

Related Quick Recap

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!