How I Would Design… Suggested Search!

A System Design Demonstration

7 min readNov 12, 2021

Audience

This article is the next in my series of how I would design popular applications. It is recommended (although not entirely necessary) to read the previous posts I’ve helpfully compiled in a list here. We will expect a basic familiarity with architecture principles and AWS, but hopefully this post is approachable for most engineers.

Argument

Initially, let’s look at our problem statement.

The System to Design

We are hoping to design a searching autosuggest feature. No doubt you will have seen these systems in practice on Google, or most websites these days. An example is below.

The requirements of our system are:

The service should return a list of the top 3 suggestions based on what is typed.
The suggestions are ordered according to how recently and frequently they have previously been searched.

The usual non-functional criteria are also present. The system must be scalable, resilient and available.

The Approach

We have a standard approach to system design which is explained more thoroughly in the article here. However the steps are summarised below:

Requirements clarification: Making sure we have all the information before starting. This may include how many requests or users we are expecting.
Back of the envelope estimation: Doing some quick calculations to gauge the necessary system performance. For example, how much storage or bandwidth do we need?
System interface design: What will our system look like from the outside, how will people interact with it? Generally this is the API contract.
Data model design: What our data will look like when we store it. At this point we could be thinking about relational vs non-relational models.
Logical design: Fitting it together in a rough system! At this point I’m thinking at a level of ‘how would I explain my idea to someone who knows nothing about tech?’
Physical design: Now we start worrying about servers, programming languages and the implementation details. We can superimpose these on top of the logical design.
Identify and resolve bottlenecks: At this stage we will have a working system! We now refine the design.

With that said, let’s get stuck in!

Requirements Clarification

Initially I would be wondering how many people are going to be using our autosuggest system, the average length of a search, and the average length of a result. For argument’s sake let’s say a billion searches are made a day, with an average length of ten characters, returning results of twenty characters.

In this specific example it doesn’t matter too much what the exact figures are, the core issue is that we are dealing with a lot of traffic!

Back of the Envelope Estimation

If we have a billion searches a day, each with a length of ten characters, then we will have 10GB of searches returning 20GB of results each 24 hours, which is a lot!

System Interface Design

The main place we need to interface with the system is in making a search. Each time a button is pressed we can send a request to a /search?prefix=<typed search> or similar endpoint. We would then return the regular 2XX, 4XX and 5XX response codes.

However, as we will see later, it is also useful to have a search endpoint for when they hit enter, completing the search, which we will define as /search?term=<typed search> with the usual response codes.

Data Model Design

This is what the majority of this particular problem hinges on. To address it we need to introduce a new data structure, the trie. A lot of this section is copied from my previous article you can find here.

Tries are a type of tree data structure, it contains nodes, edges, a root, and parent-child relationships. However, a trie acts as a key-value data store. The key is always a string, but the value can be anything. In fact, the value can be nothing at all! Tries can also be used to store a unique list of keys.

Let’s explore this concept with an example.

A dictionary of last names to first names

In the above diagram we have a trie for storing dictionaries of last names to first names. We can see it contains a single last name (Smith), and a single first name (John). Let’s add another two people, Jane Smith and Agon Smakaj.

We share the first two letters of the key, and from there branch out. It’s reasonably easy to see how this can be expanded into other applications.

One common use of this pattern is for primary keys in data stores. Additionally we may use it to search values by a prefix (looking useful?). If we were looking for names of people with last name beginning ‘Sm’, we could use this structure to retrieve all three of our entries. Often as tries expand we will limit these retrievals to a set number, say the first m.

Hybrid Tries

From the previous diagrams we may notice that this structure is most useful when our keys share a lot of common letters. If this is not the case then our tree becomes too spread out and stops being helpful.

To circumvent this issue we may use a hybrid structure.

Using a hybrid structure to incrementally expand trees

We set a threshold for our tree, and continue adding strings to a node until we reach this threshold. At this point we expand the node to a more traditional trie structure and carry on as before.

Using a Trie

So how can we employ this for our problem? Let’s examine the below.

Each of the leaves represents a complete word, along with its hit count. From left to right we have abacus, actor, acting and adder. Each incomplete word stores the top three words below it for quick retrieval.

If someone types the letter A we go to the first node, grabbing actor, acting and adder. If they were to add a subsequent letter we would go to the next node and grab actor and adding. Easy!

We also need a way of updating the trie depending on searches. We don’t want to track every time a user types in a half finished search, which is why we introduced a new endpoint for finished searches. Whenever someone sends a completed search we can log it and increment a counter next to it.

This can then be used to update our trie structure periodically. We can collect the data over a day (or any other period), then each 24 hours use it to alter our trie so the most used terms are added or updated. We could also add a threshold after which we store a search, so we don’t store everything.

Logical Design

So far we have established a model for how we will store, update and expose recommendations. We now need to outline how we can implement this design across different components.

This is our basic logical design. Our client sends requests to a gateway that forwards them on to both the service responsible for generating suggestions, as well as another service that stores all of the finished searches in order to generate new suggestions.

The final service runs periodically and updates the tries we have saved in order to improve our suggestions! It may use a technique such as weighting depending on how popular a search is over time. It may even use something really fancy like machine learning!

Physical Design

Now we have a rough logical design, we can turn this into something physical.

In this basic physical design we implement our gateway using an AWS API Gateway, which forwards suggestion requests to an ECS cluster, which in turn forwards requests to an ECS cluster storing the trie in memory. This trie service is backed up by an S3 bucket, which stores tries permanently as JSON to be deserialised when needed (for example if we bring up a new node in the cluster).

The finished searches (as well as being responded to) are sent to an AWS Kinesis stream which aggregates them over a period. These aggregations are sent to a service responsible for persisting them to a DynamoDb store. Periodically another set of ECS based services read from the store and send update requests to the trie service.

Identify and Resolve Bottlenecks

There are a number of bottlenecks. Initially, do we want to store all of our suggestions in a single trie? We can reasonably easily start splitting up our tries as below.

Splitting tries depending on the first letter of the alphabet

We can then redirect requests depending on the first part to the appropriate trie. This means we can spread the load over multiple services and scale them up independently (for example, I’m sure more suggestions start with ‘a’ than ‘x’).

The other thing we can do is add a cache layer to the autosuggest service. We don’t necessarily need to hit the trie each time if something has been recently searched.

The final design may look something like the below.

Conclusion

In conclusion we have shown how we might design and scale an autosuggest feature!