System Design Interview: Chapter 04

Disclaimer

The contents here are derived while reading the book "System Design Interview: An Insider’s Guide" . Only the important concepts that may be needed in future for reference will be mentioned here. Please go to the official book at System Design Interview Book for more detail.

Chapter - 04: Designing a Rate Limiter

Limit the number of requests from client.

All requests in excess than the defined threshold are blocked.

Main Benefit is to prevent Denial of Service Attack. Another benefit is to reduce cost. Also, to not impact performance of server by overloading them with excess requests from some unruly accounts/users.

Implementation

Server side rate limiting because we dont have control over client side implementation.

In real world, most servers already have API gateway. API gateway provides rate limiting capacity. Or, in designing an API gateway, you can add rate limiting in its design.

Algorithms for Rate Limiting

Token Bucket

Amazon and Stripe use this algorithm.

How this works:

A token bucket is a container with a pre-defined capacity.
Tokens are put into the bucket at a preset rate periodically. Once the bucket is full, no more tokens are added. For example, the rate might be 2 tokens per second. After two seconds, when the bucket has 4 tokens, additional tokens will overflow.
When a request arrives, we check if there is enough tokens in bucket. Each request takes one token. When a request comes, we take one token, and the request goes through.
If there are not enough tokens, the request is dropped.

About the algorithm

Algorithm takes in two parameters: Bucket Size and Refill Rate.
- Bucket Size is the max number of tokens allowed in the bucket.
- Refill Rate is the number of token put into the bucket every second.
Usually necessary to have different bucket for different API endpoint. If throttling per IP address is required, buckets per IP is needed. So there is no specified no of buckets that is optimal for any design. It is case specific. Could be a global bucket if a global rate limit is to be enforced.
Algo is easy to implement, and is memory efficient.
Con: Finding the right parameters might be a challenge.

Leaking Bucket

Algorithm is similar to Token Bucket ; but the requests are processed at a fixed rate. It usually implements FIFO queue.

How this works:

When a request arrives, system checks if the queue is full.
If not full, the request is added to the queue.
If full, the request is dropped.
Requests are pulled from the queue and processed at a regular intervals.

About the algorithm:

Algorithm takes in two paramters: Bucket size and Outflow Rate
- Bucket Size: it is the queue size. The queue holds the requests to be processed.
- Outflow Rate: how many requests can be processed at a fixed rate, usually in requests per seconds.
Pros: Memory efficient. Requests are processed at a fixed rate, thus suitable when a stable outflow rate is needed.
Cons: old requests may fill the queue and might rate limit recent requests. Als, amy be hard to tune the parameters.

Fixed Window Counter

How this works:

The algorithm divides the time into fix-sized time windows.
Each time window is assigned a counter.
Each requests increments the counter by one.
Once the counter reaches the pre-defined threshold, any new requests are dropped until a new window starts.

A major demerit of this algorithm is that at the edge of time windows, in case of a burst of traffic, more requests will go through than allowed quota.

Sliding Window Log
Sliding Window Counter