



1654年,帕斯卡尔和费马共同解决了“点问题”, 创造了早期的直接概率推理理论。三十年后,雅各布·伯努利将概率理论扩展到了归纳推理。伯努利指出,在现实中,为了预先枚举所有可能性来确定“哪一种可能性更大”是徒劳的。



Here, however, another way for attaining the desired is really opening for us. And, what we are not given to derive a priori, we at least can obtain a posteriori, that is, can extract it from a repeated observation of the results of similar examples. [2, p. 18]

为了证明这种方法的有效性,伯努利证明了二项分布的一个大数定律版本。设 X_n 表示参数 r/t(二者为整数)的伯努利分布样本。如果 c 是某个正整数,则伯努利展示出,对于足够大的 N 来说:

换句话说,来自二项分布的样本比率在 (r−1)/t 到 (r+1)/t 的概率至少比在此范围外的概率高出 c 倍。所以通过获取足够多的样本,我们“几乎能像预先知道参数一样从后验中确定参数”。

伯努利还推导出给定 r 和 t 的情况下,为达到特定准确度所需的样本数量。比如,若 r=30 且 t=50,他展示出:

having made 25550 experiments, it will be more than a thousand times more likely that the ratio of the number of obtained fertile observations to their total number is contained within the limits 31/50 and 29/50 rather than beyond them [2, p. 30]


  1. 它的界限取决于已知参数,无法量化未知参数的不确定性。
  2. 达到高置信度所需的实验数量过多,限制了实用性。

德·莫瓦尔在他的《The Doctrine of Chances》中改进了伯努利的工作,推导出更紧凑的界限,但仍未提供在参数未知时量化不确定性的方式,仅给出了这样的定性指导:

if after taking a great number of Experiments, it should be perceived that the happenings and failings have been nearly in a certain proportion, such as of 2 to 1, it may safely be concluded that the Probabilities of happening or failing at any one time assigned will be very near that proportion, and that the greater the number of Experiments has been, so much nearer the Truth will the conjectures be that are derived from them. [3, p. 242]



Given the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named. [4, p. 4]


the case of an event concerning the probability of which we absolutely know nothing antecedently to any trials made concerning it [4, p. 11]

贝叶斯提出,没有任何信息相当于均匀先验分布[5, p. 184–188]。通过均匀先验和几何类比,他成功近似了后验分布的积分。

并能回答问题,如“若观察到某二项分布的 y 次成功和 n−y 次失败,参数 θ 在 a 和 b 之间的概率是多少”。


Bayes essay ’Towards solving a problem in the doctrine of chances’ is extremely difficult to read today–even when we know what to look for. [5, p. 179]


贝叶斯去世十年后,可能对贝叶斯的发现并不知情,拉普拉斯也致力于类似问题,并独立采取了相同的方法。他重新审视了著名的点数问题,这次考虑了技能游戏的情况,将玩家获胜概率建模为带有未知参数 p 的伯努利分布。拉普拉斯也选择了均匀先验,仅指出:

because the probability that A will win a point is unknown, we may suppose it to be any unspecified number whatever between 0 and 1. [6]



De Moivre, nevertheless, did not discover the inverse method. This was first used by the Rev. T. Bayes, in Phil. Trans. liii. 370.; and the author, though now almost forgotten, deserves the most honourable rememberance from all who read the history of this science. [7, p. vii



I know only one case in mathematics of a doctrine which has been accepted and developed by the most eminent men of their time, and is now perhaps accepted by men now living, which at the same time has appeared to a succession of sound writers to be fundamentally false and devoid of foundation. Yet that is quite exactly the position in respect of inverse probability [8]


让 p 表示二项分布的未知参数。



因此,θ 的均匀先验等价于 p 上的 1/π p^{−1/2} (1 − p)^{−1/2}。作为替代,费舍尔提倡最大似然方法、p值以及频率学派的概率定义。



frequentist definitions themselves lead to no results of the kind that we need until the notion of reasonable degree of belief is reintroduced, and that since the whole purpose of these definitions is to avoid this notion they necessarily fail in their object. [10, p. 34]


There is no more need for [the idea that the uniform distribution ofthe prior probability was a necessary part of the principle of inverse probability] than there is to say that an oven that has once cooked roast beef can never cook anything but roast beef. [10, p. 103]



后来,Welch 和 Peers 通过研究后验分布中的单尾可信区间来评估先验的频率匹配性能。并指出 Jeffreys 提出的先验在单参数模型中是渐近最优的,为先验提供了进一步的理由,这与直觉表明我们可以量化Bayes“绝对一无所知”的标准相一致。


In my experience teaching many academic physicians, when physicians are presented with a single-sentence summary of a study that produced a surprising result with P = 0.05, the overwhelming majority will confidently state that there is a 95% or greater chance that the null hypothesis is incorrect. [12]


it is shown that actual evidence against a null (as measured, say, by posterior probability or comparative likelihood) can differ by an order of magnitude from the P value. For instance, data that yield a P value of .05, when testing a normal mean, result in a posterior probability of the null of at least .30 for any objective prior distribution. [15]


for testing “precise” hypotheses, p values should not be used directly, because they are too easily misinterpreted. The standard approach in teaching–of stressing the formal definition of a p value while warning against its misinterpretation–has simply been an abysmal failure. [16]




让我们考虑一个单参数概率模型,参数为 θ。假如我们有一个先验分布 π(θ),如何测试该先验是否合理地表达了贝叶斯所要求的“无知”?

我们可以选择一个样本量 n 和一个真实值 θtrue,然后随机从分布 P(· |θtrue) 中采样观测值 y = (y1, … , yn)^T。然后,我们计算包含后验分布95%概率质量的双尾可信区间 [θa, θb],并记录该区间是否包含 θtrue。然后我们重复实验,改变 n 和 θtrue,观察 π(θ) 的覆盖性能。

如果 π(θ) 是一个好的先验,那么θ_true 被可信区间包含的次数会稳定在95%附近。


 function coverage-test(n, θ_true, α):cnt ← 0N ← a large numberfor i ← 1 to N doy ← sample from P(·|θ_true)t ← integrate_{-∞}^θ_true π(θ | y)dθif (1 - α)/2 < t < 1 - (1 - α)/2:cnt ← cnt + 1end ifend forreturn cnt / N









s²=y’y/n. 带入 u=ns²/(2σ²).














 function binomial-coverage-test(n, θ_true, α):cov ← 0for y ← 0 to n dot ← integrate_0^θ_true π(θ | y)dθif (1 - α)/2 < t < 1 - (1 - α)/2:cov ← cov + binomial_coefficient(n, y) * θ_true^y * (1 - θ_true)^(n-y)end ifend forreturn cov








Let us then imagine a person present at the drawing of a lottery, who knows nothing of its scheme or of the proportion of Blanks to Prizes in it. Let it further be supposed, that he is obliged to infer this from the number of blanks he hears drawn compared with the number of prizes; and that it is enquired what conclusions in these circumstances he may reasonably make. [4, p. 19–20]


Let him first hear ten blanks drawn and one prize, and let it be enquired what chance he will have for being right if he gussses that the proportion of blanks to prizes in the lottery lies somewhere between the proportions of 9 to 1 and 11 to 1. [4, p. 20]








The consideration of the [influence of past events on the probability of future events] leads me to speak of births: as this matter is one of the most interesting in which we are able to apply the Calculus of probabilities, I manage so to treat with all care owing to its importance, by determining what is, in this case, the influence of the observed events on those which must take place, and how, by its multiplying, they uncover for us the true ratio of the possibilities of the births of a boy and of a girl. [18, p. 1]


When we have nothing given a priori on the possibility of an event, it is necessary to assume all the possibilities, from zero to unity, equally probable; thus, observation can alone instruct us on the ratio of the births of boys and of girls, we must, considering the thing only in itself and setting aside the events, to assume the law of possibility of the births of a boy or of a girl constant from zero to unity, and to start from this hypothesis into the different problems that we can propose on this object. [18, p. 26]


有了均匀先验,B = 251527, G = 241945, θ表示男孩出生的概率,我们就得到了后验




下面是一些使用p_true = B / (B + G)的模拟数据,显示了随着观察到更多的新生儿,答案可能会如何演变。







Inventing a new criterion for finding “the optimal objective prior” has proven to be a popular research pastime, and the result is that many competing priors are now available for many situations. This multiplicity can be bewildering to the casual user.

I have found the reference prior approach to be the most successful approach, sometimes complemented by invariance considerations as well as study of frequentist properties of resulting procedures. Through such considerations, a particular prior usually emerges as the clear winner in many scenarios, and can be put forth as the recommended objective prior for the situation. [20]




p值为1-0.951 =0.049。


p值为1-0.979 =0.021




To reject the question, [how do we find the prior representing “complete ignorance”?], as some have done, on the grounds that the state of complete ignorance does not “exist” would be just as absurd as to reject Euclidean geometry on the grounds that a physical point does not exist. In the study of inductive inference, the notion of complete ignorance intrudes itself into the theory just as naturally and inevitably as the concept of zero in arithmetic.

If one rejects the consideration of complete ignorance on the grounds that the notion is vague and ill-defined, the reply is that the notion cannot be evaded in any full theory of inference. So if it is still ill-defined, then a major and immediate objective must be to find a precise definition which will agree with intuitive requirements and be of constructive use in a mathematical theory. [22]

此外,像参考先验这样的系统方法肯定比伪贝叶斯技术做得更好,比如在截断的参数空间上选择一个均匀的先验,或者在一个看起来很有趣的参数空间区域上选择一个模糊的适当先验,比如高斯先验。即使主观信息是可用的,使用参考先验作为构建块通常是整合它的最佳方法。如果我们知道一个参数被限制在一个特定的范围内,但不知道更多,我们可以简单地通过限制和重新规范化它来适应先前的引用[14,p. 256]。




We would argue that noninformative prior Bayesian analysis is the single most powerful method of statistical analysis, in the sense of being the ad hoc method most likely to yield a sensible answer for a given investment of effort. And the answers so obtained have the added feature of being, in some sense, the most “objective” statistical answers obtainable [23, p. 90]


[1]: Problem of the points: Suppose two players A and B each contribute an equal amount of money into a prize pot. A and B then agree to play repeated rounds of a game of chance, with the players having an equal probability of winning any round, until one of the players has won k rounds. The player that first reaches k wins takes the entirety of the prize pot. Now, suppose the game is interrupted with neither player reaching k wins. If A has w_A wins and B has w_B wins, what’s a fair way to split the pot?

[2]: Bernoulli, J. (1713). On the Law of Large Numbers, Part Four of Ars Conjectandi. Translated by Oscar Sheynin.

[3]: De Moivre, A. (1756). The Doctrine of Chances.

[4]: Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r. s. Philosophical Transactions of the Royal Society of London 53, 370–418.

[5]: Stigler, S. (1990). The History of Statistics: The Measurement of Uncer- tainty before 1900. Belknap Press.

[6]: Laplace, P. (1774). Memoir on the probability of the causes of events. Translated by S. M. Stigler.

[7]: De Morgan, A. (1838). An Essay On Probabilities: And On Their Application To Life Contingencies And Insurance Offices.

[8]: Fisher, R. (1930). Inverse probability. Mathematical Proceedings of the Cambridge Philosophical Society 26(4), 528–535.

[9]: Fisher, R. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 222, 309–368.

[10]: Jeffreys, H. (1961). Theory of Probability (3 ed.). Oxford Classic Texts in the Physical Sciences.

[11]: Welch, B. L. and H. W. Peers (1963). On formulae for confidence points based on integrals of weighted likelihoods. Journal of the Royal Statistical Society Series B-methodological 25, 318–329.

[12]: Goodman, S. (1999, June). Toward evidence-based medical statistics. 1: The p value fallacy. Annals of Internal Medicine 130 (12), 995–1004.

[13]: Berger, J. O., J. M. Bernardo, and D. Sun (2009). The formal definition of reference priors. The Annals of Statistics 37 (2), 905–938.

[14]: Berger, J., J. Bernardo, and D. Sun (2024). Objective Bayesian Inference. World Scientific.

[15]: Berger, J. and T. Sellke (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence. Journal of the American Statistical Association 82(397), 112–22.

[16]: Selke, T., M. J. Bayarri, and J. Berger (2001). Calibration of p values for testing precise null hypotheses. The American Statistician 855(1), 62–71.

[17]: Berger, J., J. Bernardo, and D. Sun (2022). Objective bayesian inference and its relationship to frequentism.

[18]: Laplace, P. (1778). Mémoire sur les probabilités. Translated by Richard J. Pulskamp.

[19]: Berger, J. and J. Mortera (1999). Default bayes factors for nonnested hypothesis testing. Journal of the American Statistical Association 94 (446), 542–554.

[20]: Berger, J. (2006). The case for objective Bayesian analysis. Bayesian Analysis 1(3), 385–402.

[21]: Berger, J. O. and D. A. Berry (1988). Statistical analysis and the illusion of objectivity. American Scientist 76(2), 159–165.

[22]: Jaynes, E. T. (1968). Prior probabilities. Ieee Transactions on Systems and Cybernetics (3), 227–241.

[23]: Berger, J. (1985). Statistical Decision Theory and Bayesian Analysis. Springer.

[24]: The portrait of Thomas Bayes is in the public domain; the portrait of Pierre-Simon Laplace is by Johann Ernst Heinsius (1775) and licensed under Creative Commons Attribution-Share Alike 4.0 International; and use of Harold Jeffreys portrait qualifies for fair use.

[25]: Zabell, S. (1989). R. A. Fisher on the History of Inverse Probability. Statistical Science 4(3), 247–256.


作者:Ryan Burn





