Discrete Optimization: Local Search

Table of Contents

Previously we talked about constraint programming, which actually works with partial assignments, we try to extend them, and to prune search space as much as possible.

Another method, called local search, is very different. Simply put, local search is moving the configuration from it’s current state to a state which is very close to it, i.e. changing values a little bit, in the hope of getting closer to a particular solution. Local search actually works with complete assignments to the decision variables and modify them. So local search is way more brutal, it will violate some of constraints for sure, but we will try to avoid and eliminate those violation as we proceed.

Type of Problems

The way of driving local search depends on the type of the problem:

Satisfaction problems: start with an invisible configuration, and move towards feasibility.
- Transform it into an optimization problem.
- Use the concept of violation: count how many constraints are violated by a configuration.
- Minimize violations: move towards configurations with fewer violations.
Pure optimization problems: start with sub-optimal solutions, move towards optimal configuration.
- There are no constraints at all, but there is an objective function, for instance to minimize the cost or maximize some other things.
Optimization with constraints problems
- You have both constraints and optimization. There will be different trade-offs in the complexity of the neighborhood, the complexity of the objective function, and the complexity of the search.

Formalization of Local Search

How to define local moves? There are many choices, however, the simplest thing is just to select a decision variable and change its value. Once you have defined all the moves that you can take, you have to design a strategy on deciding the move that you are really going to select and execute.

The strategy Max/Min Conflict Heuristic has been very successful in a variety of problems. You choose a variable that appear in the large numbers of violations. Then what we try to do is to change the value of that variable so that you decrease the number of violations the most.

In the case you have an objective function f that you want to optimize, then local moves defines a neighborhood, which essentially is the configurations that are close to the configuration you are considering right now. The key in local search is the concept of local minima. A configuration c is defined as local minima with respect to neighborhood N if:

∀ n ∈ N(c): f(n) ≥ f(c)

Local minima can be arbitrarily bad, or arbitrarily good. There is no guarantees for global optimality, and escaping local optima is a critical issue in local search.

Escape Local Minima

When we greedy move inside a neighborhood, we are guaranteed that the value of the configuration we found is a local minimum.

But there may be something really good somewhere else, and you don’t know how to get to it. Therefore, in practice, escaping these local minima of poor quality is a very important issue. The concept connectivity is used to avoid local minima, essentially what it means is that you want to be able go from any configuration S to at least one optimal solution O.

Connectivity is defined as a neighborhood N is connected, if from every configuration S, some optimal solution O can be reached by a sequence of moves, i.e.

S = S₀ → S₁ → ... → S_n = O

where S_i ∈ N(S_i-1)

Connectivity does not guarantee optimality, it basically means that you can apply local move from any configuration to get to the optimum. It means that there is a path, we need to find that path. In connected neighborhood, you may have more hope and in some cases guarantees to get the optimal solution.

Heuristic vs Metaheuristic

A heuristic is something that, essentially, tells you how to choose the next neighbor using local information (the state s and its neighborhood). It drives the search towards a local minimum.

Metaheuristic is different, its goal is to escape local minima, driving the search towards a global minimum, typically using some memory or learning.

General Pattern

This is the general pattern that we usually use in local search, the goal is to find suitable function L and S.

`s`	States, either solutions or configurations.
`N(s)`	Neighborhood of `s`.
`L(N(s),s)`	The set of legal neighbors that `s` can move to. It is usually conditioned on the value of the objective function. Local improvements: `L(N, s) = {n in N \| f(n) < f(s)}` No degradation: `L(N, s) = {n in N \| f(n) ≤ f(s)}` Potential degradation (consider all neighbors) `L(N, s) = N`
`S(L(N(s),s),s)`	Select one of the legal neighbors following some greedy selection: `S(L, s) = argmin(n in L) f(n)`
`f(s)`	Objective function.

function LocalSearch(f, N, L, S) {
  s := GenerateInitSolution();
  s_star := s;
  for k := 1 to MaxTrials do
    if IsSatisfied(s) or f(s) < f(s_star) then
      s_star := s;
    s := S(L(N(s),s),s);
  return s_star; 
}

Swaps, K-OPT and Traveling Salesman

Sometimes, instead of assign variables to different values, we swap two configurations. The reason we do this is that swapping can implicitly keep some constraints always feasible, for instance: demand of some kind of products. So we actually have two types of constraints:

Hard constraints: always feasible or satisfied during the search
Soft constraints: may be violated during the search

We sometimes don’t want just to resolve about whether a constraint is violated or not. But instead we want to find out how much the constraint is violated.

The idea of K-OPT is to replace the notion of one favorable swaps by a search of a favorable sequence of swaps. Do not search for the entire set of sequence, do not fix the number of moves k at the beginning, but build sequence incrementally, i.e. explore the sequence, find the sequence with 2 moves 2-OPT, with 3 moves 3-OPT, …, and so on. Compare these sequences and find out which sequence make the best improvement and execute it.

In the traveling salesman problem, this is how K-OPT works. Assume we already have a tour, but here we only show two segments: t1 → t2, and t4 → t3:

  t1 -------> t2
           
           t3
          ➚
       t4

Select one vertex t1, which is pointing to t2, so we actually selected the edge t1 → t2, which is long, maybe we can find something shorter than it.
From t2, we find the edge t2 → t3 is shorter than t1 → t2. We consider remove t1 → t2, and add t2 → t3.
But t4 already has an edge pointing to t3, we can not let 2 edges pointing to t3, so we consider remove edge t4 → t3.
To make the tour complete again, we add edge t1 → t4. Now we are have a 2-OPT, but we won’t do it, we want to compute the objective function at this moment, then we can continue to explore.

  t1          t2
   \         ↙︎
    \      t3
     ↘︎    
       t4

To further explore, we restart the process but this time using edge t1 → t4, trying to find if t4 is able to connect another vertex with smaller edge.

    t1          t2
     \         ↙︎
 t6   \      t3
  ↘︎    ↘︎    
  t5    t4

Say, we find t5, the edge t4 → t5, is smaller than t1 → t4. So we remove t1 → t4, add t4 → t5. The edge t6 → t5 is already there, we have to remove it, and finally add t1 → t6 to complete the tour again. Now we are have a 3-OPT, but we won’t do it, we want to compute the objective function at this moment, then we can continue to explore.

    t1          t2
  ⬃            ↙︎
 t6          t3
           
  t5  ← t4

Now, using t1 → t6, you can further explore to find 4-OPT, 5-OPT, etc, and compute objective function for each of them, until you can not continue this process. Then you can look back at the various tours that you have explored: 2-OPT, 3-OPT, 4-OPT, … and select the best one. The key point here is that you are not exploring the entire neighborhood of the sequence.

Optimization Under Constraints: Graph Coloring

Graph coloring is a very good example of optimization problem under constraints. In a graph with many vertices, there will be edges among some vertices. We want to color all vertices using smallest number colors, such that adjacent vertices are given a different color.

G = (N, E)

where:
N = 0 ... n-1 nodes
E: edges

There are two aspects:

Optimization: Minimize the number of colors.
Constraints: Two different vertices must be colored differently.

How to combine the two aspects in local search? There are generally 3 ways:

Reduce the optimization problem into a sequence of feasibility problems.
Stay only in the space of feasible solutions, so no violations of the constraints.
Consider both feasible and infeasible configurations.

Optimization as Feasibility

We are obsessed with feasibility, so we want to view the optimization problem as a sequence of feasibility problems. How to do it?

Find an initial solution with k colors, say using greedy algorithm.
Remove 1 color, say the k-th color, most likely we will have violations immediately.
Now the goal is to find a feasible solution with this k-1 colors. How? Try to minimize the number of violations.
Once we have a new solution, repeat the process with fewer colors, until we can not find more feasible solutions.

Stay in the Feasible Space

Another method is to assume that you are always in the feasible space, and focusing on the objective function only: we want to minimize the number of colors:

Original objective function:
min max_i∈0...n-1 c_i
s.t. c_i ≠ c_j (⟨i, j⟩ ∈ E)

where:
c_i ∈ N: the color assigned to vertex i

The search neighborhood is changing the color of a vertex. However changing the color of a vertex typically does not change the number of colors. We have to change the way we think about objective function.

The new way of thinking this problem is to introduce the concept Color classes C_i, which is the set of vertices colored with color i. Now the new objective function is to have as many vertices as possible in each class, so that the total number of Color classes will be as small as possible.

Indirect objective function:
max ∑^n-1_i=0 |C_i|²

One of the problems that you may have is that it may be actually very difficult to move from one feasible coloring to another one. You may be heavily constrained by the fact that you’re working in the space of feasible coloring. Once you are trying to keep the constraints feasible, you have to think of a neighborhood, that allows you to actually explore feasible coloring in a reasonable fashion. And not being stuck in a popular configuration where there is no way to move because you would violate at least one constraint.

Explore Both Feasible and Infeasible Configurations

Now we need an objective function that balances both the feasibility f and optimization O:

min w_f f + w_o O

where:
w_f and w_o are weights

Like the previous method, the search neighborhood is changing the color of a vertex. In order to express the new objective function, we need to introduce another concept called Bad edges. A bad edge is simply an edge whose adjacent vertices have the same color. B_i denotes the set of bad edges between vertices colored with i.

In the method, we will have to focus on both:

Decreasing the number of colors, like previous method, we have objective function max ∑^n-1_i=0 |C_i|²
Decreasing the violations on the constraints, we have objective function min ∑ⁿ_i=1 |B_i|

The combined objective function now becomes:

min ∑ⁿ_i=1 2 |C_i| |B_i| - ∑^n-1_i=0 |C_i|²

We multiply |B_i| with 2 |C_i|, it will ensure the local minima of this objective function will give you a feasible coloring. Why? Consider a coloring C₁, ..., C_k, also assume we have violation, i.e. B_i is not empty, so this particular configuration won’t be a local minima, because you can always improve it by changing the color of a single vertex.

Add an additional color k+1
Select an edge in B_i and change the color of one of its vertices from color i to color k+1

How does the objective function change?

	Before the move	After the move	Decreased by
Feasibility	`2 \|C_i\| \|B_i\|`	`2 (\|C_i\|-1) (\|B_i\|-1)`	`2 \|C_i\| + 2 \|B_i\| - 2 ≥ 2 \|C_i\|`
Optimization	`\|C_i\|²`	`(\|C_i\|-1)² + 1`	`2 \|C_i\| - 2`

Because 2 |C_i| - 2 is always smaller than 2 |C_i| by 2, so the objective function is always going to decrease by at least 2. So which basically means that if your coloring is not feasible (B_i is not empty), I always have a way to decrease the objective function by 2. Once you search terminates, you don’t have to worry. You will have a solution to your problem.

Heuristic

The best neighbor (randomization is very important)

function BestNeighbor(N, s)
  N_star := {n ∈ N | f(n) = min_s∈N f(s)}
  return n ∈ N_star with probability 1/|N_star|

The best improvement

function BestImprove(s)
  return LocalSearch(f, N, L, BestNeighbor);

First Neighbor

function FirstNeighbor(N, s)
  return n ∈ N in lexicographic order

First improvement

function FirstImprove(s)
  return LocalSearch(f, N, L, FirstNeighbor)

Multi-Stage

Multi-Stage is to avoid scanning entire (quadratic) neighborhood, but still keep a greedy flavor.

Max/Min-Conflict	Stage1: select the *variable* with the most violations Stage2: select the *value* with the fewest resulting violations
Min-Conflict	Stage1: randomly select a *variable* with some violations Stage2: select the *value* with the fewest resulting violations

Select a neighbor at random

function Random(N,s)
  select n∈N with probability 1/|N|
  if f(n)<f(s) then
    return n;
  else
    return s;

Metaheuristic

The goal of metaheuristic is to escape local minima, driving the search towards a global minimum.

Iterated Local Search is to execute multiple local search from different starting configuration, at the end of the day, you can get the configuration that give you the best solution you have found. This method can be combined with many other metaheuristics.

function IteratedLocalSearch(f, N, L, S)
  s := GenerateInitSolution();
  s_star := s;
  for k := 1 to MaxSearches do
    s := LocalSearch(f, N, L, S, s)
    if f(s) < f(s_star) then
      s_star := s;
    // you may want to have a new starting point
    s := GenerateNextSolution(s);
  return s_star

Metropolis Heuristic

Metropolis Heuristic is a method inspired by statistical physics. The idea is to accept a move if it improves the objective value, otherwise accept the move with some probability. The probability depends on how bad the move is.

function Metropolis(N, s)
  select n ∈ N with probability 1/|N|;
  if f(n) ≤ f(s) then
    return n;   # objective value improved
  else with probability exp[ -Δ / t]
    return n;
  else
    return s;

where
Δ: f(n)-f(s)
t: temperature

If Δ = f(n) - f(s) is very large, the probability of accepting a degrading move will be very small.
If the temperature t is large, the probability will be very large (but still ≤ 1), you are basically performing a random walk.

Simulated Annealing

Based on statistical physics, Annealing original means the heating and cooling schedules to produce crystals with few defects. In discrete optimization, Simulated Annealing is a method which uses Metropolis algorithm but adjust the temperature t dynamically:

Start with a high temperature (essentially a random walk)
Decrease the temperature progressively
When the temperature is low, you are essentially doing a local improvement search.

function SimulatedAnnealing(f, N)
  s := GenerateInitSolution();
  t := InitTemp(s);
  s_star := s;
  for k := 1 to MaxSearches do
    s := LocalSearch(f, N, L, Metropolis, s);
    if f(s) < f(s_star) then
      s_star := s;
    t := UpdateTemp(s, t);
  return s_star;

Simulated Annealing has a very interesting property: if neighborhood is connected, you are guaranteed to converge to the global optimum. But you need to use a very slow temperature cooling schedule, usually slower than exhaustive search. In practice, temperature cooling should be much faster, say linear progression t_k+1 = α t_k.

Tabu Search

The key abstract idea of Tabu Search is to maintain the sequence of nodes already visited, and you never want to go there again.

function LocalSearch(f, N, L, S, s_1)
  s_star := s_1;
  t := (s_1)     # the sequence of nodes visited
  for k := 1 to MaxTrials do
    if IsSatisfied(s) or f(s_k) < f(s_star) then
      s_star := s_k;
    # select best config that has not been visited
    s_k := S(L(N(s_k),t),t);
    t := (t, s_k)   # add node to the sequence
  return s_star;

But one of the basic issues is that it’s very expensive to maintain all these states that you been there. So what you can do is to use a Short-Term Memory, i.e. you only keep a small suffix of visited nodes (you visited recently). You may increase or decrease the size of the list dynamically:

Decrease when the selected nodes degrades the objective.
Increase when the selected nodes improves the objective.

When storing and comparing entire states are still too costly, you can store an abstraction of them, say swaps, moves or transitions. The abstraction can be designed either complex or simple.

At some point, you were in a really good state. And you thought you would find a very high quality solution. So what you can do is what is called Intensification: you keep very high-quality solutions and return them periodically. When you are stuck in this long walk where nothing really is improving, you can return to one of these good states and you restart the search from there.

If you have not seen this improvement for a long time, you can randomly change the value of some variables to diversify the current state, this is called Diversification.

In problems where you have both very difficult constraints and very difficult objective function, strategic oscillation is kind of scheme where you want to balance the time that you spend in the feasible and the infeasible region.

For more on Discrete Optimization: Local Search, please refer to the wonderful course here https://www.coursera.org/learn/discrete-optimization

Related Quick Recap

Discrete Optimization: Constraint Programming

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai