The leader constitutes a communication bottleneck in leader-based state machine replication (SMR) protocols like Multi-Paxos and HotStuff. Specifically, in each phase of a consensus instance, a non-leader process (follower) receives one message from the leader and sends one message back; however, in a system with $n$ processes, the leader needs to send $n-1$ messages to the followers, and receive at least a quorum of messages back from them. Therefore, scaling such protocols requires reducing the disproportionate load on the leader. PigPaxos: Devouring the Communication Bottlenecks in Distributed Consensus addresses communication bottleneck imposed by leaders and improves the throughput of leader-based SMR protocols by introducing Pig, a communication primitive to form quorums. Before delving into Pig, let us see how quorums are formed in most SMR protocols, such as Multi-Paxos and HotStuff.
Traditional quorum formation protocols
Assume that the size of any quorum is $q$. Any process can form a quorum by executing the formQuorum function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Code for process i
function formQuorum()
sq = generate a unique sequence number
send <FormQuorum, sq> to all processes
quorum = ∅
loop
wait until receiving a message <AckFormQuorum, sq> from j
quorum = quorum U {j}
until |quorum| >= q
return quorum
upon receiving a message <FormQuorum, sq> from process j
send <AckFormQuorum, sq> to j
We are now ready to see how Pig forms quorums.
Pig: a communication primitive to form quorums
Pig is a communication primitive to form quorums. Again, assume that the size of any quorum is $q$. Any process can form a quorum by executing the formQuorumPig function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Code for process i
function formQuorumPig()
start a timer
sq = generate a unique sequence number
send <FormQuorumRelay, sq> to the members of i.Relays
quorum = ∅
loop
upon receiving a message <AckFormQuorumRelay, sub_quorum> from j
quorum = quorum U sub_quorum U {j}
until timer is expired or |quorum| >= q
return quorum
upon receiving a message <FormQuorumRelay, sq> from process j
start a timer
send <FormQuorumFollower, sq> to the members of i.Followers
sub_quorum = ∅
loop
upon receiving a message <AckFormQuorumFollower, sq> from k
sub_quorum = sub_quorum U {k}
until timer is expired or |sub_quorum| >= p
send <AckFormQuorumRelay, sub_quorum> to j
upon receiving a message <FormQuorumFollower, sq> from process j
send <AckFormQuorumFollower, sq> to j
Remarks
- Each process $i$ has a set of relay processes denoted by $i.Relays$. This set can be static (i.e., pre-configured) or dynamic (i.e., selected randomly during execution).
- Each relay process $i$ has a set of followers denoted by $i.Followers$. This set can also be static or dynamic.
- A process might follow multiple relay processes.
- This protocol has two important parameters: $q$ and $p$ that are the size of quorums and the number of responses that a relay process needs to receive in order to send a response to the process that wants to form a quorum.
- In the paper there is no analytical result for computing the probability of forming of a quorum based on the number of relays, the number of followers that each relay has, etc. Instead, the paper only presents some numerical evaluations.