System Design Interview: Chapter 01
Disclaimer
The contents here are derived while reading the book "System Design Interview: An Insider’s Guide" . Only the important concepts that may be needed in future for reference will be mentioned here. Please go to the official book at System Design Interview Book for more detail.
Chapter - 01: From Zero to Million Users
Single User Request
A user may have a browser or an app on a mobile device to access a webserver. Following things happen when the user accesses the webserver.
- User inputs domain name. The domain name goes to the DNS. DNS returns an IP for the webserver.
- Once the IP is obtained, an HTTP request is made from the user to the webserver.
- The webserver responds with a HTML page or a JSON for the user to render.
Vertical Scaling & Horizonal Scaling
- 
     
      Vertical Scaling:
     
     
- Adding more power (CPU, RAM etc).
- Good when the traffic is low.
- There is a hard limit. Impossible to add CPU and memory after certain point to a single server.
- Does not offer redundancy.
- 
     
      Horizonal Scaling:
     
     
- Scale out by adding more servers.
- Can handle redundancy.
Load Balancers
When the volume of users and the number of requests increase, single server may be overloaded. A load balancer is a solution that balances the load and mitigates the problem users may face when single server is overloaded when many servers are available.
A load balancer sits between the user application or the web browser and the webserver. This means the request first go to the load balacer. The goal of load balancer is to evenly distribute incoming traffic among webservers that are defined in a load-balanced set .
One of the design practise is that only the load balancer is connected by a public IP address whereas the web servers are not connected through the public IP. The load balancers forwards the requests to the webservers to their private IP. Since the public IP of the web servers is not exposed, this is better from a security stand point.
If a web servers goes offline, the load balancer can automatically forward all the forth coming requests to remaining web servers.
Database and its replication
Its better to separate web server and the database infrastructure. This allows separate and independent scaling of the infrastructures by separating the scope of Data Tier and Web Tier .
Database replication is usually done with a master and slave relationship, where the (only one) master holds the original and the (multiple copies) slaves hold the copies.
Master generally only supports write operation. All the data modifying operations like insert, delete and update must be sent to the master.
Slaves handle the read operation. Since the ration of read to write operations is usually very high, there are multiple slave copies.
- 
     
      Failure handling
     
     
- In case a slave goes offline or is not available when only one slave is in the system, the read requests are temporarily handled by the master.
- In case a slave goes offline or is not available, availability of multiple slaves means another slave can handle the read request.
- 
      If a master goes offline, one of the slave is promoted to the role of the master. But there are other complicated techniques to handle the shortcomings that come with this approach.
      - 
       
        Shortcomings
       
       
- The slave that is promoted to master may not have the latest copy of data. Missing data needs to be updated by running some recovery steps.
 - 
       
        Other techniques when the master goes offline
       
       
- Multimaster technique
- Circular Replication
 
Caching
To serve the requests that are accessed more frequently, a cache tier is added into the system. A cache provides a temporary storage for frequently accessed data so that data is served more quickly. This also improves the performance of the database and/or webserver while also reducing load on them.
Cache Tier is a temporary, and is much faster than database. Having a separate cache tier allows independent scaling of the cache tier.
The Cache Tier sits between the Web Tier and Database Tier . When a request is received by the web server, the web server first checks the cache for the data. If the data is available, data is sent back to the client from the cache. If not, the database is quiried. Then, the content is stored in the cache, and sent to the client. This is the simplest read through caching. Other caching techniques are also available.
Content Delivery Network (CDNs)
- Geographically dispersed servers used to deliver static content, like images, videos, CSS, JS files by caching.
- When a user visits a website, the CDN server closest to the user will deliver the static content.
- If the contents are not in CDN, the contents are obtained from the server and cached in CDN. The same content is also provided to the user (similar to read through cache.) The content remains in CDN until the TTL in the HTTP header of the content expires.
- Any forthcoming requests are served from the CDN, if available.
- CDN providers charge for data in and out of CDN. Thus, caching infrequently used content has no significant benefit.
- Appropriate cache expiry, use of CDN provided APIs to version or remove data, and plan for CDN failures are some items that system designer should be aware of.
Stateless Architecture
- Stateful server keeps the state data whereas stateless server does not keep the state data in the web tier; rather it is stored in a persistent storage.
- When scaling the web tier horizontally with many web servers, the stateful design is ineffifient because the state info is restricted to a particular server. Any request that want access to the state date need to be routed to the particular server.
- Stateless desgin saves the state data in a shared data store. Any request to access the state data to any webserver can fetch the data from the shared data store.
- The shared data store can be any form of database; relational database, Memcached/Redis, NoSQL etc.
- Stateless design also allows for auto scaling in design, which means automatically adding or removing web servers based on traffic load.
- This design also allows for more flexibility in horizontal scaling.
Message Queue
- For asynchronous communication, for example, photo customization task including blurring, sharpening etc
- web servers publish jobs to the message queue. The job takes time to complete. workers pick up the job and asynchronously perform the job
- Message queue consits of producers or publishers that puglish or create message to the queue. Servers or services called consumers or subscribers pick up the job.
- producers can produce message even when the consumers are not available. consumers can pick up and work on message even when no producers are there.
- since asynchronous, producers and consumers can be scaled independently (and as needed).
Database Scaling
- 
     
      Vertical Scaling:
     
     
- Means add more computing power. Add more on CPU, RAM, DISK etc
- Going beyond certain point creates a single point of failure. Greater risk if things fail.
- 
     
      Horizonal Scaling:
     
     
- Adding more servers.
- Also called Sharding . Sharding separates large database into smaller and more easily managed parts called shards. Each shards share the same schema, but the data in each shard is unique to the shard.
- Sharding a database design consideration: choose the sharding key appropriately. Make sure the data can be evenly distributed, and is unique.
- 
      Some of the problems/challenges/complexities that sharding faces are:
      - Resharding Data: When a single shard cannot hold data because of increase in data volume OR when some or single shard face exhaustion faster than others due to uneven data distribution. This rquires updating the sharding function and moving data around. Consistent Hashing is another solution.
- Celebrity Problem: Excessive access of data in a single shard could overload the shard. For example, data related to many celebrities end up in a single shard. Solution is to allocate a shard for eacch celebrity, Or futher partition in each shard.
- Join and DeNormalization: Sharded data is hard to perform join operations on. A workaround is to de-normalize the database so that queires can be performed in a single table.
 
Conclusion
Its an iterative process to handle million or more users. Following items must be taken into account:
- serverless webservers.
- redundancy in each tier.
- cache as much as you can.
- support multiple data centers.
- host static assets in CDNs.
- scale data tier by sharding.
- split each tiers into individual services.
- monitor system for logs and metrics; automate