scikit-learn linearRegression 1.1.9 贝叶斯回归

本文主要是介绍scikit-learn linearRegression 1.1.9 贝叶斯回归，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

1.1.9. 贝叶斯回归

可以在估计过程中使用贝叶斯回归技术包含正则化参数：正则化参数不是硬编码设置的而是手动调节适合数据的值

可以通过在模型的超参数上引入 uninformative priors

`Ridge Regression`_ 中 $\ell_{2}$ 使用的正则化项等价于在一个参数为 $w$ 且精度为 $\lambda^-1$ 的高斯先验下寻找一个最大的后验的解。而且并不是手动设置 lambda ，而是有可能把它看做一个随机变量来从从数据中估计。

为了获得一个完整的概率模型，输出 $y$ 假设为关于 $X w$ 的高斯分布

$p(y|X,w,\alpha) = \mathcal{N}(y|X w,\alpha)$

Alpha 同样被看做是随机变量，需要从数据中来估计

贝叶斯回归的优势：

根据数据调节参数
在估计过程中包含正则化参数

贝叶斯回归劣势:

模型的推理比较耗时

References

关于贝叶斯方法一个非常好的说明可以参考 C. Bishop: Pattern Recognition and Machine learning (经典的PRML书籍)
而原始的算法在 Bayesian learning for neural networks by Radford M. Neal 中有详细描述。

1.1.9.1. 贝叶斯岭回归

BayesianRidge 对上述的回归问题估计了一个概率模型。先验参数 $w$ 由下面的球形高斯给出：

$p(w|\lambda) =\mathcal{N}(w|0,\lambda^{-1}\bold{I_{p}})$

先验参数 $\alpha$ 和 $\lambda$ 的选择满足 gamma distributions ，即高斯函数精度的共轭先验

生成的模型称为 Bayesian Ridge Regression ,和经典的 Ridge 类似。参数 $w$ , $\alpha$ 以及 $\lambda$ 在模型的拟合中被共同估计。其他的参数是 $\alpha$ 和 $\lambda$ 的gamma 先验的参数。（待校正）这些通常被选择为 non-informative*（参考贝叶斯无信息先验）。参数统计通过最大化 *marginal log likelihood.

By default $\alpha_1 = \alpha_2 = \lambda_1 = \lambda_2 = 1.e^{-6}$ .

Bayesian Ridge Regression is used for regression:

 >>> from sklearn import linear_model
>>> X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
>>> Y = [0., 1., 2., 3.]
>>> clf = linear_model.BayesianRidge()
>>> clf.fit(X, Y)
BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True,
       fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,
       normalize=False, tol=0.001, verbose=False)
 

After being fitted, the model can then be used to predict new values:

 >>> clf.predict ([[1, 0.]])
array([ 0.50000013])
 

The weights $w$ of the model can be access:

 >>> clf.coef_
array([ 0.49999993,  0.49999993])
 

由于贝叶斯框架，权重的发现同普通最小二乘法略有不同。然而Bayesian Ridge Regression 对于病态问题更具有鲁棒性。

Examples:

Bayesian Ridge Regression

References

More details can be found in the article Bayesian Interpolation by MacKay, David J. C.

1.1.9.2. Automatic Relevance Determination - ARD

ARDRegression 和 `Bayesian Ridge Regression`_ 非常相似，但是主要针对稀疏权重 $w$ [1] [2] 。 ARDRegression 提出一个不同于 $w$ 的先验，通过弱化高斯分布为球形的假设。

相反， $w$ 的分布假设为一个平行轴的椭圆高斯分布。(同axis-alignen)

也就是说，每个权重 $w_{i}$ 来自于一个中心在0点，精度为 $\lambda_{i}$ 的高斯分布:

$p(w|\lambda) = \mathcal{N}(w|0,A^{-1})$

with $diag \; (A) = \lambda = \{\lambda_{1},...,\lambda_{p}\}$ .

同 `Bayesian Ridge Regression`_ 形成对比， $w_{i}$ 每一维都有一个标准差 $\lambda_i$ ，所有 $\lambda_i$ 的先验选择和由给定超参数 $\lambda_1$ 和 $\lambda_2$ 的gamma分布一样。

实例：

Bayesian Ridge Regression

Computes a Bayesian Ridge Regression on a synthetic dataset.

See 贝叶斯岭回归 for more information on the regressor.

Compared to the OLS (ordinary least squares) estimator, the coefficient weights are slightly shifted toward zeros, which stabilises them.

As the prior on the weights is a Gaussian prior, the histogram of the estimated weights is Gaussian.

The estimation of the model is done by iteratively maximizing the marginal log-likelihood of the observations.

Python source code: plot_bayesian_ridge.py

 print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from sklearn.linear_model import BayesianRidge, LinearRegression
###############################################################################
# Generating simulated data with Gaussian weigthts
np.random.seed(0)
n_samples, n_features = 100, 100
X = np.random.randn(n_samples, n_features)  # Create Gaussian data
# Create weigts with a precision lambda_ of 4.
lambda_ = 4.
w = np.zeros(n_features)
# Only keep 10 weights of interest
relevant_features = np.random.randint(0, n_features, 10)
for i in relevant_features:
    w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(lambda_))
# Create noise with a precision alpha of 50.
alpha_ = 50.
noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples)
# Create the target
y = np.dot(X, w) + noise
###############################################################################
# Fit the Bayesian Ridge Regression and an OLS for comparison
clf = BayesianRidge(compute_score=True)
clf.fit(X, y)
ols = LinearRegression()
ols.fit(X, y)
###############################################################################
# Plot true weights, estimated weights and histogram of the weights
plt.figure(figsize=(6, 5))
plt.title("Weights of the model")
plt.plot(clf.coef_, 'b-', label="Bayesian Ridge estimate")
plt.plot(w, 'g-', label="Ground truth")
plt.plot(ols.coef_, 'r--', label="OLS estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.figure(figsize=(6, 5))
plt.title("Histogram of the weights")
plt.hist(clf.coef_, bins=n_features, log=True)
plt.plot(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)),
         'ro', label="Relevant features")
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="lower left")
plt.figure(figsize=(6, 5))
plt.title("Marginal log-likelihood")
plt.plot(clf.scores_)
plt.ylabel("Score")
plt.xlabel("Iterations")
plt.show()