Previously we talked about constraint programming, which actually works with partial assignments, we try to extend them, and to prune search space as much as possible.
Another method, called local search, is very different. Simply put, local search is moving the configuration from it’sΒ current state to a state which is very close to it, i.e. changing values a little bit, in the hope of getting closer to a particular solution.Β Local search actually works with complete assignments to the decision variables and modify them. So local search is way more brutal, it will violate some of constraints for sure, but we will try to avoid and eliminate those violation as we proceed.
Type of Problems
The way of driving local search depends on the type of the problem:
- Satisfaction problems: start with an invisible configuration, and move towards feasibility.
- Transform it into an optimization problem.
- Use the concept of violation: count how many constraints are violated by a configuration.
- Minimize violations: move towards configurations with fewer violations.
- Pure optimization problems: start with sub-optimal solutions, move towards optimal configuration.
- There are no constraints at all, but there is an objective function, for instance to minimize the cost or maximize some other things.
- Optimization with constraints problems
- You have both constraints and optimization. There will be different trade-offs in the complexity of the neighborhood, the complexity of the objective function, and the complexity of the search.
Formalization of Local Search
How to define local moves? There are many choices, however, the simplest thing is just to select a decision variable and change its value. Once you have defined all the moves that you can take, you have to design a strategy on deciding the move that you are really going to select and execute.
The strategy Max/Min Conflict Heuristic has been very successful in a variety of problems. You choose a variable that appear in the large numbers of violations. Then what we try to do is to change the value of that variable so that you decrease the number of violations the most.
In the case you have an objective function f
that you want to optimize, then local moves defines a neighborhood, which essentially is the configurations that are close to the configuration you are considering right now. The key in local search is the concept of local minima. A configuration c
is defined as local minima with respect to neighborhood N
if:
β n β N(c): f(n) β₯ f(c)
Local minima can be arbitrarily bad, or arbitrarily good. There is no guarantees for global optimality, and escaping local optima is a critical issue in local search.
Escape Local Minima
When we greedy move inside a neighborhood, we are guaranteed that the value of the configuration we found is a local minimum.
But there may be something really good somewhere else, and you don’t know how to get to it. Therefore, in practice, escaping these local minima of poor quality is a very important issue. The concept connectivity is used to avoid local minima, essentially what it means is that you want to be able go from any configuration S
to at least one optimal solution O
.
Connectivity is defined as a neighborhood N
is connected, if from every configuration S
, some optimal solution O
can be reached by a sequence of moves, i.e.
S = S0 β S1 β ... β Sn = O
where Si β N(Si-1)
Connectivity does not guarantee optimality, it basically means that you can apply local move from any configuration to get to the optimum. It means that there is a path, we need to find that path. In connected neighborhood, you may have more hope and in some cases guarantees to get the optimal solution.
Heuristic vs Metaheuristic
A heuristic is something that, essentially, tells you how to choose the next neighbor using local information (the state s
and its neighborhood). It drives the search towards a local minimum.
Metaheuristic is different, its goal is to escape local minima, driving the search towards a global minimum, typically using some memory or learning.
General Pattern
This is the general pattern that we usually use in local search, the goal is to find suitable function L
and S
.
s | States, either solutions or configurations. |
N(s) | Neighborhood of s . |
L(N(s),s) | The set of legal neighbors that s can move to.It is usually conditioned on the value of the objective function. Local improvements: L(N, s) = {n in N | f(n) < f(s)} No degradation: L(N, s) = {n in N | f(n) β€ f(s)} Potential degradation (consider all neighbors) L(N, s) = N |
S(L(N(s),s),s) | Select one of the legal neighbors following some greedy selection:S(L, s) = argmin(n in L) f(n) |
f(s) | Objective function. |
function LocalSearch(f, N, L, S) {
s := GenerateInitSolution();
s_star := s;
for k := 1 to MaxTrials do
if IsSatisfied(s) or f(s) < f(s_star) then
s_star := s;
s := S(L(N(s),s),s);
return s_star;
}
Swaps, K-OPT and Traveling Salesman
Sometimes, instead of assign variables to different values, we swap two configurations. The reason we do this is that swapping can implicitly keep some constraints always feasible, for instance: demand of some kind of products. So we actually have two types of constraints:
- Hard constraints: always feasible or satisfied during the search
- Soft constraints: may be violated during the search
We sometimes don’t want just to resolve about whether a constraint is violated or not. But instead we want to find out how much the constraint is violated.
The idea of K-OPT is to replace the notion of one favorable swaps by a search of a favorable sequence of swaps. Do not search for the entire set of sequence, do not fix the number of moves k
at the beginning, but build sequence incrementally, i.e. explore the sequence, find the sequence with 2 moves 2-OPT
, with 3 moves 3-OPT
, …, and so on. Compare these sequences and find out which sequence make the best improvement and execute it.
In the traveling salesman problem, this is how K-OPT works. Assume we already have a tour, but here we only show two segments: t1 β t2
, and t4 β t3
:
t1 -------> t2
t3
β
t4
- Select one vertex
t1
, which is pointing tot2
, so we actually selected the edget1 β t2
, which is long, maybe we can find something shorter than it. - From
t2
, we find the edget2 β t3
is shorter thant1 β t2
. We consider removet1 β t2
, and addt2 β t3
. - But
t4
already has an edge pointing tot3
, we can not let 2 edges pointing tot3
, so we consider remove edget4 β t3
. - To make the tour complete again, we add edge
t1 β t4
. Now we are have a2-OPT
, but we won’t do it, we want to compute the objective function at this moment, then we can continue to explore.
t1 t2
\ βοΈ
\ t3
βοΈ
t4
To further explore, we restart the process but this time using edge t1 β t4
, trying to find if t4
is able to connect another vertex with smaller edge.
t1 t2
\ βοΈ
t6 \ t3
βοΈ βοΈ
t5 t4
Say, we find t5, the edge t4 β t5
, is smaller than t1 β t4
. So we remove t1 β t4
, add t4 β t5
. The edge t6 β t5
is already there, we have to remove it, and finally add t1 β t6
to complete the tour again. Now we are have a 3-OPT
, but we won’t do it, we want to compute the objective function at this moment, then we can continue to explore.
t1 t2
β¬ βοΈ
t6 t3
t5 β t4
Now, using t1 β t6
, you can further explore to find 4-OPT, 5-OPT, etc, and compute objective function for each of them, until you can not continue this process. Then you can look back at the various tours that you have explored: 2-OPT, 3-OPT, 4-OPT, … and select the best one. The key point here is that you are not exploring the entire neighborhood of the sequence.
Optimization Under Constraints: Graph Coloring
Graph coloring is a very good example of optimization problem under constraints. In a graph with many vertices, there will be edges among some vertices. We want to color all vertices using smallest number colors, such that adjacent vertices are given a different color.
G = (N, E)
where:
N = 0 ... n-1 nodes
E: edges
There are two aspects:
- Optimization: Minimize the number of colors.
- Constraints: Two different vertices must be colored differently.
How to combine the two aspects in local search? There are generally 3 ways:
- Reduce the optimization problem into a sequence of feasibility problems.
- Stay only in the space of feasible solutions, so no violations of the constraints.
- Consider both feasible and infeasible configurations.
Optimization as Feasibility
We are obsessed with feasibility, so we want to view the optimization problem as a sequence of feasibility problems. How to do it?
- Find an initial solution with
k
colors, say using greedy algorithm. - Remove 1 color, say the
k-th
color, most likely we will have violations immediately. - Now the goal is to find a feasible solution with this
k-1
colors. How? Try to minimize the number of violations. - Once we have a new solution, repeat the process with fewer colors, until we can not find more feasible solutions.
Stay in the Feasible Space
Another method is to assume that you are always in the feasible space, and focusing on the objective function only: we want to minimize the number of colors:
Original objective function:
min maxiβ0...n-1 ci
s.t. ci β cj (β¨i, jβ© β E)
where:
ci β N: the color assigned to vertex i
The search neighborhood is changing the color of a vertex. However changing the color of a vertex typically does not change the number of colors. We have to change the way we think about objective function.
The new way of thinking this problem is to introduce the concept Color classes Ci
, which is the set of vertices colored with color i
. Now the new objective function is to have as many vertices as possible in each class, so that the total number of Color classes will be as small as possible.
Indirect objective function:
max βn-1i=0 |Ci|2
One of the problems that you may have is that it may be actually very difficult to move from one feasible coloring to another one. You may be heavily constrained by the fact that you’re working in the space of feasible coloring. Once you are trying to keep the constraints feasible, you have to think of a neighborhood, that allows you to actually explore feasible coloring in a reasonable fashion. And not being stuck in a popular configuration where there is no way to move because you would violate at least one constraint.
Explore Both Feasible and Infeasible Configurations
Now we need an objective function that balances both the feasibility f
and optimization O
:
min wf f + wo O
where:
wf and wo are weights
Like the previous method, the search neighborhood is changing the color of a vertex. In order to express the new objective function, we need to introduce another concept called Bad edges. A bad edge is simply an edge whose adjacent vertices have the same color. Bi
denotes the set of bad edges between vertices colored with i
.
In the method, we will have to focus on both:
- Decreasing the number of colors, like previous method, we have objective function
max βn-1i=0 |Ci|2
- Decreasing the violations on the constraints, we have objective function
min βni=1 |Bi|
The combined objective function now becomes:
min βni=1 2 |Ci| |Bi| - βn-1i=0 |Ci|2
We multiply |Bi|
with 2 |Ci|
, it will ensure the local minima of this objective function will give you a feasible coloring. Why? Consider a coloring C1, ..., Ck
, also assume we have violation, i.e. Bi
is not empty, so this particular configuration won’t be a local minima, because you can always improve it by changing the color of a single vertex.
- Add an additional color
k+1
- Select an edge in
Bi
and change the color of one of its vertices from colori
to colork+1
How does the objective function change?
Before the move | After the move | Decreased by | |
Feasibility | 2 |Ci| |Bi| | 2 (|Ci|-1) (|Bi|-1) | 2 |Ci| + 2 |Bi| - 2 β₯ 2 |Ci| |
Optimization | |Ci|2 | (|Ci|-1)2 + 1 | 2 |Ci| - 2 |
Because 2 |Ci| - 2
is always smaller than 2 |Ci|
by 2, so the objective function is always going to decrease by at least 2. So which basically means that if your coloring is not feasible (Bi
is not empty), I always have a way to decrease the objective function by 2. Once you search terminates, you don’t have to worry. You will have a solution to your problem.
Heuristic
The best neighbor (randomization is very important)
function BestNeighbor(N, s)
N_star := {n β N | f(n) = minsβN f(s)}
return n β N_star with probability 1/|N_star|
The best improvement
function BestImprove(s)
return LocalSearch(f, N, L, BestNeighbor);
First Neighbor
function FirstNeighbor(N, s)
return n β N in lexicographic order
First improvement
function FirstImprove(s)
return LocalSearch(f, N, L, FirstNeighbor)
Multi-Stage
Multi-Stage is to avoid scanning entire (quadratic) neighborhood, but still keep a greedy flavor.
Max/Min-Conflict | Stage1: select the variable with the most violations Stage2: select the value with the fewest resulting violations |
Min-Conflict | Stage1: randomly select a variable with some violations Stage2: select the value with the fewest resulting violations |
Select a neighbor at random
function Random(N,s)
select nβN with probability 1/|N|
if f(n)<f(s) then
return n;
else
return s;
Metaheuristic
The goal of metaheuristic is to escape local minima, driving the search towards a global minimum.
Iterated Local Search is to execute multiple local search from different starting configuration, at the end of the day, you can get the configuration that give you the best solution you have found. This method can be combined with many other metaheuristics.
function IteratedLocalSearch(f, N, L, S)
s := GenerateInitSolution();
s_star := s;
for k := 1 to MaxSearches do
s := LocalSearch(f, N, L, S, s)
if f(s) < f(s_star) then
s_star := s;
// you may want to have a new starting point
s := GenerateNextSolution(s);
return s_star
Metropolis Heuristic
Metropolis Heuristic is a method inspired by statistical physics. The idea is to accept a move if it improves the objective value, otherwise accept the move with some probability. The probability depends on how bad the move is.
function Metropolis(N, s)
select n β N with probability 1/|N|;
if f(n) β€ f(s) then
return n; # objective value improved
else with probability exp[ -Ξ / t]
return n;
else
return s;
where
Ξ: f(n)-f(s)
t: temperature
- If
Ξ = f(n) - f(s)
is very large, the probability of accepting a degrading move will be very small. - If the temperature
t
is large, the probability will be very large (but still β€ 1), you are basically performing a random walk.
Simulated Annealing
Based on statistical physics, Annealing original means the heating and cooling schedules to produce crystals with few defects. In discrete optimization, Simulated Annealing is a method which uses Metropolis algorithm but adjust the temperature t
dynamically:
- Start with a high temperature (essentially a random walk)
- Decrease the temperature progressively
- When the temperature is low, you are essentially doing a local improvement search.
function SimulatedAnnealing(f, N)
s := GenerateInitSolution();
t := InitTemp(s);
s_star := s;
for k := 1 to MaxSearches do
s := LocalSearch(f, N, L, Metropolis, s);
if f(s) < f(s_star) then
s_star := s;
t := UpdateTemp(s, t);
return s_star;
Simulated Annealing has a very interesting property: if neighborhood is connected, you are guaranteed to converge to the global optimum. But you need to use a very slow temperature cooling schedule, usually slower than exhaustive search. In practice, temperature cooling should be much faster, say linear progression tk+1 = Ξ± tk
.
Tabu Search
The key abstract idea of Tabu Search is to maintain the sequence of nodes already visited, and you never want to go there again.
function LocalSearch(f, N, L, S, s_1)
s_star := s_1;
t := (s_1) # the sequence of nodes visited
for k := 1 to MaxTrials do
if IsSatisfied(s) or f(s_k) < f(s_star) then
s_star := s_k;
# select best config that has not been visited
s_k := S(L(N(s_k),t),t);
t := (t, s_k) # add node to the sequence
return s_star;
But one of the basic issues is that it’s very expensive to maintain all these states that you been there. So what you can do is to use a Short-Term Memory, i.e. you only keep a small suffix of visited nodes (you visited recently). You may increase or decrease the size of the list dynamically:
- Decrease when the selected nodes degrades the objective.
- Increase when the selected nodes improves the objective.
When storing and comparing entire states are still too costly, you can store an abstraction of them, say swaps, moves or transitions. The abstraction can be designed either complex or simple.
At some point, you were in a really good state. And you thought you would find a very high quality solution. So what you can do is what is called Intensification: you keep very high-quality solutions and return them periodically. When you are stuck in this long walk where nothing really is improving, you can return to one of these good states and you restart the search from there.
If you have not seen this improvement for a long time, you can randomly change the value of some variables to diversify the current state, this is called Diversification.
In problems where you have both very difficult constraints and very difficult objective function, strategic oscillation is kind of scheme where you want to balance the time that you spend in the feasible and the infeasible region.
For more on Discrete Optimization: Local Search, please refer to the wonderful course here https://www.coursera.org/learn/discrete-optimization
Related Quick Recap
I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai