ABSTRACT
We describe our experience in building a fault-tolerant data-base using the Paxos consensus algorithm. Despite the existing literature in the field, building such a database proved to be non-trivial. We describe selected algorithmic and engineering problems encountered, and the solutions we found for them. Our measurements indicate that we have built a competitive system.
- Burrows, M. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, pp. 335--350 Google ScholarDigital Library
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, pp. 205--218 Google ScholarDigital Library
- Cristian, F. Reaching agreement on processor-group membership in synchronous distributed systems. Distributed Computing 4, 4 (1991), 175--188.Google ScholarDigital Library
- Ghemawat, S., Gobioff, H., and Leung, S.-T. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (Dec. 2003), pp. 29--43. Google ScholarDigital Library
- Gray, C., Cheriton, D. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (1989), pp. 202--210. Google ScholarDigital Library
- Johnson, S. C. Yacc: Yet another compiler-compiler.Google Scholar
- Lamport, Shostak, and Pease. The byzantine generals problem. In Advances in Ultra-Dependable Distributed Systems, N. Suri, C. J. Walter, and M. M. Hugue (Eds.), IEEE Computer Society Press. 1995.Google Scholar
- Lamport, L. The part-time parliament. ACM Transactions on Computer Systems 16, 2 (1998), 133--169. Google ScholarDigital Library
- Lamport, L. Paxos made simple. ACM SIGACT News 32, 4 (Dec. 2001), 18--25.Google Scholar
- Lampson, B. W. How to build a highly available system using consensus. In 10th International Workshop on Distributed Algorithms (WDAG 96) (1996), Babaoglu and Marzullo, Eds., vol. 1151, Springer-Verlag, Berlin Germany, pp. 1--17. Google ScholarDigital Library
- Lee, E. K., and Thekkath, C. A. Petal: Distributed virtual disks. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (Cambridge, MA, 1996), pp. 84--92. Google ScholarDigital Library
- MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (2004), pp. 105--120. Google ScholarDigital Library
- Moessenboeck, H. A generator for production quality compilers. In Proceedings of the 3rd International Workshop on Compiler Compilers - Lecture Notes in Computer Science 477 (Berlin, Heidelberg, New York, Tokyo, 1990), Springer-Verlag, pp. 42--55. Google ScholarDigital Library
- Oki, Brian M., and Liskov, Barbara H. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems. In Proceedings of the 7th annual ACM Symposium on Principles of Distributed Computing (1988), pp. 8--17. Google ScholarDigital Library
- Parr, T. J., and QUONG, R. W. Antlr: A predicated-ll(k) parser generator. Software-Practice and Experience 25, 7 (JULY 1995), 789--810. Google ScholarDigital Library
- Prisco, R. D., Lampson, B. W., and Lynch, N. A. Revisiting the paxos algorithm. In 11th International Workshop on Distributed Algorithms (WDAG 96) (1997), pp. 111--125. Google ScholarDigital Library
- Schneider, F. B. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22, 4 (1990), 299--319. Google ScholarDigital Library
- von Neumann, J. Probabilistic logics and synthesis of reliable organisms from unreliable components. Automata Studies (1956), 43--98.Google Scholar
Index Terms
- Paxos made live: an engineering perspective
Recommendations
Relaxed Paxos: quorum intersection revisited (again)
PaPoC '22: Proceedings of the 9th Workshop on Principles and Practice of Consistency for Distributed DataDistributed consensus, the ability to reach agreement in the face of failures, is a fundamental primitive for constructing reliable distributed systems. The Paxos algorithm is synonymous with consensus and widely utilized in production. Paxos uses two ...
S-Paxos: Offloading the Leader for High Throughput State Machine Replication
SRDS '12: Proceedings of the 2012 IEEE 31st Symposium on Reliable Distributed SystemsImplementations of state machine replication are prevalently using variants of Paxos or other leader-based protocols. Typically these protocols are also leader-centric, in the sense that the leader performs more work than the non-leader replicas. Such ...
Fast Flexible Paxos: Relaxing Quorum Intersection for Fast Paxos
ICDCN '21: Proceedings of the 22nd International Conference on Distributed Computing and NetworkingPaxos, the de facto standard approach to solving distributed consensus, operates in two phases, each of which requires an intersecting quorum of nodes. Multi-Paxos reduces this to one phase by electing a leader but this leader is also a performance ...
Comments