Filling a knapsack is an NP-hard optimization problem, it is widely believed that in the worst case it takes exponential time to solve. One of the targets in optimization, is to try to delay the exponential growth as late as possible, so we can solve the real practical problems. Often it is impossible to find the best solution is the humongous set of possibilities, we can lower our standard, we just want to find a high quality solution but only close to the best one.

Greedy Algorithms

To fill a knapsack, the simplest method is to choose the most valuable item and then the next most valuable items that can fit into the knapsack. The key idea on all the greedy algorithms is going to be the same – pick one item at a time in a greedy fashion. The only thing that’s going to differ is the meaning of greedy:

  1. More items – start with small ones and take as many as possible.
  2. Valuable items – start with the most valuable items.
  3. Value density – start with the items with the most dollar per kilogram.

The advantages of greedy algorithms are quick to design and implement, and fast to run. They can give you a first solution, which can be treated as a baseline. You can start doing better than it. However there are also shortcomings, for instance: there is no quality guarantees, and the quality of solution vary widely on the input.

Modeling the Problem

What you have to do first is always find a description of the problems that everybody can agree upon. The way of formalizing an optimization task as a mathematical model is the key for actually solving optimization problems.

Given a set of items I, each item i โˆˆ I is characterized by its weight wi and its value vi. Knapsack’s capacity is K. We want to find a subset of items in I, such that the subset has maximum value, meanwhile does not exceed the its capacity K.

To define an optimization model, 3 steps are needed:

  • First, we need to choose some decision variables, which tell you what are to decide.
  • Then we can express the problem constraints in terms of these decision variables. The constraints specify what is feasible.
  • Last we need to define the objective function, i.e. what you are trying to maximize or minimize.

The complete model of knapsack problem is:

max ฮฃiโˆˆI vi xi
s.t. ฮฃiโˆˆI wi xi โ‰ค K
     xi โˆˆ {0, 1}

xi : decision variables
wi : weight of item i
vi : value of item i
K : capability of knapsack

The number of all possible solution is 2|I|, when number of items is large, enumerating all possible solution and brute force search in the search space to find the best solution is impossible. We need other way to find optimal solution to the knapsack problems in reasonable time.

Dynamic Programming

Dynamic programming is a very widely used technique. It works very well for certain classes of problems, particularly computational biology, but sometimes it doesn’t work at all.

For the knapsack problem, let’s assume there are n items in the set I, so I = {1, 2, ..., n}. O(k, j) denotes the optimal solution for the knapsack problem of capacity k and first j items [1...j], and obviously we are interest in finding O(K, n). Now the knapsack problem can be expressed as:

max ฮฃiโˆˆ1...j vi xi
s.t. ฮฃiโˆˆi...j wi xi โ‰ค K
     xi โˆˆ {0, 1}

Now assume we know how to solve O(k, j-1) for all k in [0...K], we are considering one more item, and want to solve O(k, j). If the weight wj of that particular item j is lower than the capacity k, i.e. wj โ‰ค k, we have two choices:

  1. Do not select the item j, then the optimal solution is O(k, j-1).
  2. Select the item j, then the optimal solution is vj + O(k - wj, j-1).

So we can build this beautiful recurrence relationship:

O(k, j) = max( O(k, j-1) , vj + O(k - wj, j-1) ) if wj โ‰ค k
O(k, j) = O(k, j-1) otherwise

O(k, 0) = 0 for all k

If we write program in the top-down fashion, i.e. first compute O(k, j) and then compute O(k, j-1), the program will be very inefficient. However we can compute the recurrence bottom-up: O(k, 0) โ†’ O(k, 1) โ†’ O(k, 2) .... After calculating all of the optimal O(K, n), to get the solution, the only thing you’ve to do is trace back. If O(k, j) equals to O(k, j-1), you will know that item j was not selected.

Because in computer, only log(K) bits are required to represent K, this algorithm is in fact exponential in terms of input. This is algorithm is pseudo-polynomial, it is efficient only when K is small.

Branch and Bound

Branch and bound method is used in decision tree, trying to explore only a tiny part of it and finding the optimum solution. You will iterate two steps: branching and bounding.

BranchingSplit the problem into a number of subproblems, like in exhaustive search.
BoundingFind and optimistic estimate of the best solution to the subproblem.
Maximization: upper bound
Minimization: lower bound

How to find the optimistic estimate? We can linear relax the decision variables of the problem. Essentially the linear relaxation is not just very easy to compute, but also giving you a very good value for bounding. We will use linear relaxation in side the branch and bound algorithm. Linear relaxation can help prune a bigger part of decision tree, and the search space that you have to explore will be much smaller.

max ฮฃiโˆˆI vi xi
s.t. ฮฃiโˆˆI wi xi โ‰ค K
     xi โˆˆ {0, 1}

Relaxed to

max ฮฃiโˆˆI vi xi
s.t. ฮฃiโˆˆI wi xi โ‰ค K
     0 โ‰ค xi โ‰ค 1

Search Strategies

You can search that decision tree in many different ways: depth-first, best-first, least-discrepancy, and many others.

Depth-firstGo deep. When you have an optimistic evaluation of a note, you compare it to the best solutions that you have found so far, and if it’s worse, then you just throw the node’s entire sub-tree away.
Pretty memory efficient – you always have essentially one branch at any one time.
Best-frstExpend all the nodes, select the node with the best estimation.
In the worst case you have to store the entire tree in memory: exponential time + exponential space. On the other hand if you have the perfect evaluation, then
you will select the minimal number of notes.
Least-discrepancyAssuming you have a good heuristic, which makes few mistakes. The search tree is binary, and
1. branching left means following the heuristic
2. branching right means the heuristic is wrong

Least-discrepancy search wants to avoid mistake at all costs, we basically explore a search space by increasing number of mistakes, so what we are really doing is trusting the heuristic less and less, and exploring the search space in waves:
1. First wave, no mistake (always branching left)
2. Second wave, 1 mistake (always branching left, but 1 branching right)
3. Third wave, 2 mistakes (always branching left, but 2 branching right)

This method prunes search tree like the Depth-first, however, depending upon the way you implement the search, there is an trade off between space and time. It can be time efficient but take a lot of space. Or you can implement it by doing redundant work and it will take minimal space.

Relaxation and search is a match made in discrete optimization heaven. You have to find both the right relaxation, and the right part of the search tree that you will explore.

For more on Discrete Optimization: Knapsack Problems, please refer to the wonderful course here

Related Quick Recap

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!