The Main Search and Sort Algorithms

Finding what you need, in the order you need it!

James Collerton
10 min readDec 29, 2021
Finding the right solutions!

Audience

This article is aimed at developers looking to spruce up their knowledge of the basic searching and sorting algorithms. It requires some knowledge of Java, and at least a rudimentary understanding of where these types of problems might arise.

We hope to cover:

  • Shortest Path Algorithms: Dijkstra’s and A* algorithm.
  • O(n^2) Sort Algorithms: Insertion/ bubble/ selection sort.
  • O(n log(n)) Sort Algorithms: Merge sort and quick sort.
  • Topological sort.
  • Complex sorting algorithms: Heapsort, radix sort, bucket sort and counting sort.

Each of these will be covered in a level of detail reflective of how commonly they are used. Note, the following are also important algorithms, however they have been covered in my previous blog posts. The requisite links are below.

Argument

Shortest Path Algorithms

If you’re not familiar with graphs, then I would suggest reading the article here to give yourself a bit of a refresher.

Shortest path algorithms focus on finding the path between two nodes, while minimising the sum of the weights of the edges of the path. This can be done for directed, undirected or mixed graphs.

There are actually several types of shortest path algorithm:

  1. Single-source shortest path problem: Shortest path from a given node to all other nodes.
  2. Single-destination shortest path problem: Shortest path from all nodes to a given node.
  3. All-pairs shortest path problem: Shortest paths between all nodes.

We’re mainly going to focus on type 1.

Enter Djikstra’s algorithm.

Pretend we have the below graph, we are starting at A and trying to get to E.

An example graph

What we are looking to do is to generate a table similar to the below.

Djikstra’s Table

Our table has three columns: the vertex we are looking to find, the shortest distance from our original vertex, and the previous vertex in our shortest path.

It’s really important to understand that the shortest route to a node v is equal to the shortest route to the node connected to v, combined with the smallest distance to v itself. For example the shortest route to get to D is equal to the minimum of:

  1. Shortest route to get to B plus the B-D edge weight
  2. Shortest route to get to C plus the B-C edge weight.

This is as B and C are the only two nodes connected to D, except for E which is our target!

Hopefully this will become clearer with an example. For the algorithm we need two lists: visited and unvisited. We also initiate our table as below.

Initialising our table of values

We repeat the below steps to carry out Djikstra:

  1. Visit the node with the smallest distance from the start node.
  2. For the current node examine its unvisited neighbours.
  3. Calculate the distance of each neighbour to the start node where the path goes through the current node (this is the same as the current node’s minimum distance to the start node + the edge weight to the neighbour node).
  4. If the calculated distance is less than the known distance, update the shortest distance in the neighbour node.
  5. If we have updated the nodes shortest distance update the previous vertex of the updated node. This is now as we will step back through our current node to find the shortest route!
  6. Add current node to the list of visited nodes.

Let’s see how that might work in our example.

We start at node A(1), and examine B and C (2). We calculate the distance to the start node (3), B‘s distance is 2, C‘s distance is 7. We update both of these in the table as they are less than infinite (4), and set the previous node to A (5). We then add A to our list of visited nodes (6). We start again from B (closest node).

Let’s code this up (a useful article is here).

A caveat to Djikstra’s algorithm is that it only works when the weights are positive. If any are negative we need to use the Bellman–Ford algorithm.

For the eagle-eyed amongst you, you may notice an issue with Djikstra. It has no idea if you’re getting closer to your goal, only that you’ve gone a shorter distance. You might be going a small distance, but traversing the wrong way!

This is the motivation behind the A* algorithm. It is similar in its aims, but includes a heuristic element. It attempts to find the best route so far, and therefore calculate what the best next step is.

The heuristic that defines ‘best’ can be defined in a number of ways. Let’s revisit the old graph, but also add a measure of how close we are to the end.

Our previous graph, with added heuristic

We’ve defined our heuristic as how many nodes we are away from our end node, E. Now, instead of ordering by the distance they have gone, we order by the distance + our heuristic. This means that nodes further from the end node will be further down our queue (in our example we didn’t have a queue, we instead used the getNextNode method).

n-squared Sort Algorithms

The previous section covered our searching algorithms, now we’re going to look into sorting.

The first one we will look at is ‘selection sort’. This is probably one of the most intuitive sorting algorithms. Let’s examine the below diagram.

Selection sort

The purpose of the pointer is to mark where we have sorted up to. In our first pass we find the minimum of the array. We then swap it out with the first element and move the pointer along. This is done as we know everything before the pointer is sorted!

We continually do this with the rest of the array. An implementation is below.

The next algorithm is bubble sort. Again, let’s clarify with a diagram.

Bubble sort

Bubble sort is slightly different. We start off at the beginning and compare all pairs. We swap the pair if the left hand item is larger than the right hand one. We can see the largest item ‘bubbles’ up to the top.

Once this is done we start again. However, now we can ignore the last item as we know it is in the right place! An implementation is below.

Next we cover insertion sort. Although it has O(n^2)complexity, in practical situations this is slightly more performant than selection and bubble sort.

Insertion sort works by dividing our list into two parts: sorted and unsorted. We will keep our sorted part of the list to the left, and our unsorted part to the right, using a pointer to track where in the list we are. We expand using a diagram below.

Insertion sort

Quick note, the initial order of the list has changed to better demonstrate the algorithm. With insertion sort we have a pointer that moves along the array, and a temporary value to store the value at that pointer in. We shuffle the elements greater than the temporary value (but before the pointer) along, before placing the temporary value in its rightful place.

n-log(n) Sort Algorithms

The next sort algorithms are slightly more complex, but also more efficient. We begin with merge sort.

Merge sort hinges off the idea that we can create a sorted array using two sorted halves of the same array. Consider the below diagram.

Merging two half arrays to make a complete one

Here we take the first element of the second array as it is the lowest. We then take the next three elements of the first array and insert them afterwards, iterating through both until we reach the end.

Given two sorted subarrays, we can make a sorted array. However, how do we sort the subarrays?

This is the role of recursion. We split the array continually until it contains only a single element. A single element is a sorted subarray! We can recurse back up, following the same pattern we defined earlier.

Splitting an original, unsorted array into subarrays for sorting!

At this point, the clearest explanation is through an implementation.

We now move onto quicksort. Merge sort is O(n log(n)) worst case running time, with an O(n) space requirement. Quicksort is O(n log(n)) average case running time, with a worst case running time of O(n^2). However, it is in place, so we use O(1) space.

We can use a randomisation step to improve the likelihood that we get an O(n log(n)) runtime for quicksort.

This algorithm works in a similar fashion to merge sort, in that it uses recursion. We first select a pivot, then put all of the items lower than the pivot to the left, and all items greater than the pivot to the right.

Choosing a pivot and moving items to the left and the right.

We then recursively do this, choosing pivots to the left and the right. It’s easier to explain the technique through the implementation.

Topological sort

The final sort we will use is for sorting graphs. A lot of this section is based off the excellent article here.

Many real-world problems can be represented using directed graphs. A popular example is school class prerequisites (if I have to take class A before class B). Let’s imagine we have the below prerequisites.

Class prerequisites

Imagine we want to take class E. To do so we need to take class C and class D. In order to take class C we need to take classes A and B. This gives us a sort of implicit ordering.

Topological sort gives us a non-unique topological ordering in O(V + E) time. Essentially this means we can align all of the nodes, such that the edges are always pointing to the right.

Visually demonstrating topological sort

Not all graphs will have a topological ordering. Any graph with a directed cycle won’t have one, as there is no way we can start! We must have a directed-acyclic-graph (DAG).

The topological sort algorithm at a top level is:

  1. Find any unvisited node.
  2. Starting with this node do depth first search (DFS) to all unvisited nodes.
  3. As we recurse back up add current node to the order in reverse order.

How does this work for our example? We start with node D. We then visit E. As there are no more connected nodes we get E-D, then reverse it to add D-E to our topological ordering.

We pick another node B. We then visit C, but don’t go through to E as we have seen it. We get B-C, which we then reverse to get C-B and add to our ordering, giving us B-C-D-E.

Finally we visit the only unvisited node, A, which is reversed (doing nothing), and added to the start of our ordering: A-B-C-D-E.

Our implementation is below:

An alternative is Kahn’s algorithm.

Complex sorting algorithms

There are more complex sorting algorithms which are probably beyond the scope of this article. However, we can at least list them and how they roughly work. I would do some googling of your own to find implementations!

  • Heapsort: An improved selection sort, but with a worst case of O(n log(n))
  • Radix sort: Good for lexicographically sorting. Complexity is O(d*(n+b)), d being the number of digits in the given list, n the number of elements in the list, and b is the base used, which is normally base 10 for decimal representation. This makes more sense on looking at the algorithm.
  • Bucket sort: Divides unsorted arrays into several buckets. Each bucket is sorted by using another sorting algorithm, or recursively applying the same bucket algorithm. Average complexity of O(n).
  • Counting sort: Technique based on keys in a range. This counts the number of objects having a key and does some arithmetic to calculate their position. Complexity of O(n+k), n is the number of elements in input array, k is the range of input.

Search and Sort Cheatsheet

* k is the number of buckets in bucket based algorithms.
* n is the number of array items.
* v,e are the number of vertices and edges in a graph.

Conclusion

In conclusion we have, admittedly somewhat randomly, covered a selection of searching and sorting algorithms!

--

--

James Collerton

Senior Software Engineer at Spotify, Ex-Principal Engineer at the BBC