Raft Algorithm

Raft algorithm introduced by Diego Ongaro and John Ousterhout at USENIX 2014 link to paper , in a paper titled In Search of an Understandable Consensus Algorithm . The best description of Raft algo is at this github.io link .

Raft algorithm achieves consensus. Consensus here means multiple servers agreeing on the same information. This is an important requirement in distributed algorithm; and before the proposal of Raft Algorithm, Paxos was the algorithm of choice. Paxos was proposed by Lesli Lamport .

In any scenario where a client sends a request to a server, and the server responds back with a response. And for a client server situation, there can be two broad classes of systems: Single Server System and Multiple Server System .

In a Single Server System, there is only one server. Since there is a single source, there is no concern of concensus.

In a Multiple Server System, there is more than one server. For example, a kube cluster with multple api-servers, or a redis cluster with multiple masters. In cases like this, the concern of concensus is more important. Any server in symmetric multiple server system can be either a Leader , Follower or Candiadate . Only the leader can interact with the client, and all other nodes/servers sync with the leader. In an asymmetric system, any server can respond to the client, and all other server nodes are required to sync with the server that responded to the client request.

Following outlines the Raft algorithm:

Every server node starts as a follower;
After certain timeout (e.g. when any follower timesout to get a heartbeat from a master), an election is started to elect a new leader. Follower node becomes Candidate node.
- The follower node votes itself, and requests votes from other nodes by issuing RequestVotes RPC .
- If the term number of the candidate requesting the votes is less than other candidate nodes in the cluster, AppendEntries RPC is rejected and other nodes retain the candidate status. If the term number is greater, the candidate node is elected as the new leader.
If a candidate receives majority votes, it becomes a leader.
- Once leader, it sends out HeartBeat to all the follower with a new term-number.
If not elected a leader, a candidate node goes to follower state.
If no leader elected, contest the election. Election conducted again.
- Raft algorithm uses randomized timeout to ensure that split voltes are rare; and that they are resolved quickly. I.e., in most cases, only a single server will time out.
- In case of split voltes, each candidate again uses a random timeout before initiating next election. This again reduces the liklihood of another split vote in the new election.
Only the leader is allowed to interact with the client. At any term, a system can have at most 1 leader.

There is a good answer at this stack overflow link that describes the Raft algorithm.

The leader node uses AppendEntries RPC to all the followers to sync their logs with the current leader. This is done until all the followers safely replicate the entries in their logs. In case of inconsistencies, for example, when a master crashes, the conflicting logs are overwritten with the entries from the leader's log.
Leader also uses AppendEntries RPC with no entries as an heartbeat signal.

Some good references at: