前言
CAP理论时长会在面试中遇到,在中文互联网很难找到解释得很清楚的,于是在英文的维基百科上找到关于CAP理论的具体解释,我认为其解释得比较清楚,阅读过程中产生了本篇译文.
翻译形式为机翻+本人的理解,抛砖引玉,欢迎大家提出宝贵的意见和建议.
正文
CAP theorem
In theoretical computer science, the CAP theorem, also named Brewer’s theorem after computer scientist Eric Brewer, states that any distributed data store can only provide two of the following three guarantees:
- Consistency:
Every read receives the most recent write or an error.- Availability:
Every request receives a (non-error) response, without the guarantee that it contains the most recent write.- Partition tolerance
The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
CAP定理
在计算机理论中,CAP理论(计算机科学家Eric Brewer提出,所以也叫做Brewer’s理论),任何分布式存储系统只能如下三个保证中的两个:
- 一致性
每次读取能收到最近的写入或者错误. - 可用性
每一次请求都能收到一个(非错误)返回,但不能保证包含最近的写入. - 分区容错
尽管多个节点通过网络丢弃(或延迟)了任意个消息,系统也能够继续运行.
When a network partition failure happens, it must be decided whether to
- cancel the operation and thus decrease the availability but ensure consistency or to
- proceed with the operation and thus provide availability but risk inconsistency.
当网络分区发生错误,它必须做出决定:
- 取消操作确保一致性但因此降低可用性
- 继续操作确保可用性但有不一致的风险
Thus, if there is a network partition, one has to choose between consistency and availability. Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions
因此,作为一个网络分区必须在一致性和可用性之间做出选择.注意:CAP定理中的一致性定义与ACID数据库事务中的一致性保定有着很大的不同.
Eric Brewer argues that the often-used “two out of three” concept can be somewhat misleading because system designers only need to sacrifice consistency or availability in the presence of partitions, but that in many systems partitions are rare.
Eric Brewer 认为,经常被提及的"三分之二"概念可能会产生误导,因为系统设计师们只需要在存在分区时牺牲一致性或可用性,但是很多系统中分区是很少见的.
Explanation
No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the presence of a partition, one is then left with two options: consistency or availability. When choosing consistency over availability, the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning. When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.
In the absence of a partition, both availability and consistency can be satisfied.
Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE philosophy, common in the NoSQL movement for example, choose availability over consistency.
解释
分布式系统都面临着网络故障的风险,所以网络分区通常都要容忍.在分区存在的情况下,只剩下两个选项:一致性或者可用性.
- 当选择一致性比可用性更重要时,如果网络分区无法保证特定信息是最新的.将返回一个错误或者超时.
- 当先择可用性比一致性更重要时,系统将总是处理这次查询并尝试返回近期可用的版本,即便网络分区无法保证信息是最新的.
在没有分区的情况下,一致性和可用性都能到很好的保证.
按照传统ACID原则是的的数据库系统(如RDBMS)选择一致性比可用性更重要,然而,围绕BASE哲学的系统(例如NoSQL系统)选择了可用性比一致性更重要.
History
According to University of California, Berkeley computer scientist Eric Brewer, the theorem first appeared in autumn 1998. It was published as the CAP principle in 1999 and presented as a conjecture by Brewer at the 2000 Symposium on Principles of Distributed Computing (PODC). In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer’s conjecture, rendering it a theorem.
In 2012, Brewer clarified some of his positions, including why the often-used “two out of three” concept can be somewhat misleading because system designers only need to sacrifice consistency or availability in the presence of partitions; partition management and recovery techniques exist. Brewer also noted the different definition of consistency used in the CAP theorem relative to the definition used in ACID.
A similar theorem stating the trade-off between consistency and availability in distributed systems was published by Birman and Friedman in 1996. Birman and Friedman’s result restricted this lower bound to non-commuting operations.
The PACELC theorem, introduced in 2010,[9] builds on CAP by stating that even in the absence of partitioning, there is another trade-off between latency and consistency.
Blockchain technology sacrifices immediate consistency for availability and partition tolerance, by requiring a specific number of “confirmations”, Blockchain consensus algorithms are basically reduced to eventual consistency.
历史
根据加州大学伯克利分校计算机科学家 Eric Brewer 的说法,该定理于 1998 年秋季首次出现。 它于 1999 年作为 CAP 原理发表,并作为 Brewer 在 2000 年分布式计算原理研讨会 (PODC) 上的一个猜想提出。 2002 年,麻省理工学院的赛斯吉尔伯特和南希林奇发表了布鲁尔猜想的正式证明,将其变成了一个定理
2012 年,Brewer 澄清了他的一些立场,包括为什么经常使用的“三分之二”概念可能会有些误导,因为系统设计人员只需要在存在分区的情况下牺牲一致性或可用性;存在分区管理和恢复技术。 Brewer 还注意到 CAP 定理中使用的一致性定义与 ACID 中使用的定义不同。
Birman 和 Friedman 在 1996 年发表了一个类似的定理,说明了分布式系统中一致性和可用性之间的权衡。 Birman 和 Friedman 的结果将这个下限限制为非通勤操作。
2010 年引入的 PACELC 定理建立在 CAP 的基础上,指出即使在没有分区的情况下,延迟和一致性之间也存在另一种权衡。
区块链技术为了可用性和分区容错牺牲了即时一致性,通过要求特定数量的“确认”,区块链共识算法基本上被简化为最终一致性。