本文主要是介绍推导正规方程的解,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
1. 准备工作
1.1 矩阵转置公式 与 求导公式
1.1.1 转置公式:
- ( m A ) T = m A T (mA)^T=mA^T (mA)T=mAT,M是常数
- ( A + B ) T = A T + B T (A+B)^T = A^T + B^T (A+B)T=AT+BT
- ( A B ) T = B T A T (AB)^T = B^TA^T (AB)T=BTAT
- ( A T ) T = A (A^T)^T = A (AT)T=A
1.1.2 求导公式:
∂ X T ∂ X = I \frac{\partial{X^T}}{\partial{X}} = I ∂X∂XT=I,求解出来是单位矩阵
∂ X T A ∂ X = A \frac{\partial{X^T}A}{\partial{X}} = A ∂X∂XTA=A
∂ A X T ∂ X = A \frac{\partial{A}X^T}{\partial{X}} = A ∂X∂AXT=A
∂ A X ∂ X = A T \frac{\partial{A}X}{\partial{X}} = A^T ∂X∂AX=AT
∂ X A ∂ X = A T \frac{\partial{X}A}{\partial{X}} = A^T ∂X∂XA=AT
∂ X T A X ∂ X = ( A + A T ) X \frac{\partial{X^T}AX}{\partial{X}} = (A + A^T) X ∂X∂XTAX=(A+AT)X,则:A不是对称矩阵
∂ X T A X ∂ X = 2 A X \frac{\partial{X^T}AX}{\partial{X}} = 2A X ∂X∂XTAX=2AX,则:A是对称矩阵
2. 推导过程
2.1 推导正规方程的 θ \theta θ解
- 矩阵乘法公式展开
J ( θ ) = 1 2 ( X θ − y ) T ( X θ − y ) J(\theta) = \frac{1}{2}(X \theta - y)^T(X \theta - y) J(θ)=21(Xθ−y)T(Xθ−y)
J ( θ ) = 1 2 ( X T θ T − y T ) ( X θ − y ) J(\theta) = \frac{1}{2}(X^T \theta^T - y^T)(X \theta - y) J(θ)=21(XTθT−yT)(Xθ−y)
J ( θ ) = 1 2 ( X T θ T X θ − X T θ T y − y T X θ + y T y ) J(\theta) = \frac{1}{2}(X^T \theta^T X\theta - X^T \theta^T y - y^T X \theta + y^T y) J(θ)=21(XTθTXθ−XTθTy−yTXθ+yTy) - 进行求导(X,y是已知量, θ \theta θ是变量)
J ′ ( θ ) = 1 2 ( X T θ T X θ − X T θ T y − y T X θ + y T y ) ′ J'(\theta) = \frac{1}{2}(X^T \theta^T X\theta - X^T \theta^T y - y^T X \theta + y^T y)' J′(θ)=21(XTθTXθ−XTθTy−yTXθ+yTy)′ - 根据上面求导公式进行运算,( θ \theta θ是变量)
// 把上面2式子里的导数拿到括号里
J ′ ( θ ) = 1 2 ( ∂ X T θ T X θ ∂ θ − ∂ X T θ T y ∂ θ − ∂ y T X θ ∂ θ + ∂ y T y ∂ θ ) J'(\theta) = \frac{1}{2}(\frac{\partial{X^T \theta^T X\theta}}{\partial{\theta}} - \frac{\partial{{X^T \theta^T y}}}{\partial{\theta}} - \frac{\partial{y^T X \theta}}{\partial{\theta}} + \frac{\partial{y^T y}}{\partial{\theta}}) J′(θ)=21(∂θ∂XTθTXθ−∂θ∂XTθTy−∂θ∂yTXθ+∂θ∂yTy)
// 下面根据求导公式转换
J ′ ( θ ) = 1 2 ( X T X θ − X T y − ( y T X ) T + ( θ T X T X ) T ) J'(\theta) =\frac{1}{2}(X^TX\theta - X^Ty - (y^TX)^T + (\theta^TX^TX)^T) J′(θ)=21(XTXθ−XTy−(yTX)T+(θTXTX)T)
J ′ ( θ ) = 1 2 ( X T X θ − X T y − X T y + X T X θ ) J'(\theta) =\frac{1}{2}(X^TX\theta - X^Ty - X^Ty + X^TX\theta) J′(θ)=21(XTXθ−XTy−XTy+XTXθ)
J ′ ( θ ) = 1 2 ( 2 X T X θ − 2 X T y ) J'(\theta) =\frac{1}{2}(2X^TX\theta - 2X^Ty) J′(θ)=21(2XTXθ−2XTy)
J ′ ( θ ) = X T X θ − X T y J'(\theta) =X^TX\theta - X^Ty J′(θ)=XTXθ−XTy
J ′ ( θ ) = X T ( X θ − y ) J'(\theta) =X^T(X\theta - y) J′(θ)=XT(Xθ−y) // 矩阵运算分配率 - 令导数 J ′ ( θ ) = 0 J'(\theta) = 0 J′(θ)=0(为什么要令它等于0呢?因为最小二乘法公式上有个平方,所以必然是凸函数,所以它的导数=0时,函数值必然是最小值。)
0 = X T X θ − X T y 0 =X^TX\theta - X^Ty 0=XTXθ−XTy
X T X θ = X T y X^TX\theta = X^Ty XTXθ=XTy
// 到这里似乎可以得到 θ = y X \theta = \frac{y}{X} θ=Xy,不过不对,矩阵运算没有除法,得用逆矩阵参与运算 - 使用逆矩阵转换
( X T X ) − 1 X T X θ = ( X T X ) − 1 X T y (X^TX)^{-1}X^TX\theta = (X^TX)^{-1}X^Ty (XTX)−1XTXθ=(XTX)−1XTy
I θ = ( X T X ) − 1 X T y I\theta = (X^TX)^{-1}X^Ty Iθ=(XTX)−1XTy
θ = ( X T X ) − 1 X T y \theta = (X^TX)^{-1}X^Ty θ=(XTX)−1XTy
2.2 到此为止,正规方程推到完毕,完结撒花!!!
公式:
θ = ( X T X ) − 1 X T y \theta = (X^TX)^{-1}X^Ty θ=(XTX)−1XTy
这篇关于推导正规方程的解的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!