本文主要是介绍11 - 向量微分、矩阵微分以及基于雅克比矩阵求导数,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
文章目录
- 1. 手推机器学习-矩阵求导
- 1.1 绪论
- 1.2 ML中为什么需要矩阵求导
- 1.3 向量函数与矩阵求导初印象
- 1.4 矩阵求导-YX拉伸术
- 1.5 常见矩阵求导公式举例
- 1.6 求导细节补充
- 2. 雅克比矩阵
- 2.1 雅克比矩阵数学
- 2.2 雅克比矩阵pytorch
- 2.3 向量对向量求导&矩阵对矩阵求导
- 2.4 小结
1. 手推机器学习-矩阵求导
B站链接
1.1 绪论
(1)理论
- ML中为什么需要矩阵求导
- 向量函数与矩阵求导初印象
- 矩阵求导-YX拉伸术
(2)实战
- 常见矩阵求导公式举例
- 矩阵求导补充
- 最小二乘法
1.2 ML中为什么需要矩阵求导
向量化的数据会让计算变得简单
对于一个方程组来说
y 1 = W 1 x 11 + W 2 x 12 y_1=W_{1}x_{11}+W_2x_{12} y1=W1x11+W2x12
y 2 = W 1 x 21 + W 2 x 22 y_2=W_{1}x_{21}+W_2x_{22} y2=W1x21+W2x22
向量化后可以简写为
[ y 1 y 2 ] = [ x 11 x 12 x 21 x 22 ] [ W 1 W 2 ] (1) \begin{bmatrix} y_1\\\\y_2\end{bmatrix}=\begin{bmatrix}x_{11}&x_{12}\\\\x_{21}&x_{22}\end{bmatrix}\begin{bmatrix}W_1\\\\W_2\end{bmatrix}\tag{1} ⎣⎡y1y2⎦⎤=⎣⎡x11x21x12x22⎦⎤⎣⎡W1W2⎦⎤(1)
Y = X W (2) Y=XW\tag{2} Y=XW(2)
由上可以看出,不管我们的怎么增加x,y,w我们都可以用公式2进行表示,那么我们就可以看出来
for 循环和numpy矩阵运算
向量化计算运算快
我们来对同样一组数据进行比较处理,看看for循环与numpy的矩阵
# -*- coding: utf-8 -*-
# @Project: zc
# @Author: zc
# @File name: numpy_new_test
# @Create time: 2022/3/16 18:43
import numpy as np
import timea = np.random.rand(10000000)
b = np.random.rand(10000000)
time_cur = time.time()
c = a.dot(b)
time_later = time.time()
print(f"c={c}")
vec_time = 1000 * (time_later - time_cur)
print("vectorized is " + str(vec_time) + "ms")
print()
c = 0
time_cur = time.time()
for i in range(a.size):c += a[i] * b[i]
time_later = time.time()
print(f"c={c}")
loop_time = 1000 * (time_later - time_cur)
print("Loop is " + str(loop_time) + "ms")
print()
print("times is " + str(loop_time / vec_time))
# 矢量化的时间-用 numpy 计算
c=2499945.9800939467
vectorized is 7.472991943359375ms# for循环的时间-用for 计算
c=2499945.9800934764
Loop is 3543.708086013794ms# numpy 居然比 for 循环块474倍
times is 474.2020482388974
1.3 向量函数与矩阵求导初印象
- 标量函数:输出为标量的函数
f ( x ) = x 2 ; x ∈ R ; f ( x ) = x 2 ∈ R f(x)=x^2;x\in R;f(x)=x^2\in R f(x)=x2;x∈R;f(x)=x2∈R
f ( x ) = x 1 2 + x 2 2 ; x = [ x 1 , x 2 ] ∈ R 2 , f ( x ) = x 1 2 + x 2 2 ∈ R (3) f(x)=x_1^2+x_2^2;x=[x_1,x_2]\in R^2,f(x)=x_1^2+x_2^2 \in R\tag{3} f(x)=x12+x22;x=[x1,x2]∈R2,f(x)=x12+x22∈R(3) - 输入标量;输出矩阵函数
f ( x ) = [ f 1 ( x ) = x f 2 ( x ) = x 2 ] ; x ∈ R ; [ f 1 ( x ) f 2 ( x ) ] ∈ R 2 (4) f(x)=\begin{bmatrix}f_1(x)=x\\\\f_2(x)=x^2\end{bmatrix};x\in R;\begin{bmatrix}f_1(x)\\\\f_2(x)\end{bmatrix}\in R^2\tag{4} f(x)=⎣⎡f1(x)=xf2(x)=x2⎦⎤;x∈R;⎣⎡f1(x)f2(x)⎦⎤∈R2(4)
f ( x ) = [ f 11 ( x ) = x f 12 ( x ) = x 2 f 21 ( x ) = x 3 f 22 ( x ) = x 4 ] ; x ∈ R ; [ f 11 ( x ) f 12 ( x ) f 12 ( x ) f 22 ( x ) ] ∈ R 4 (5) f(x)=\begin{bmatrix}f_{11}(x)=x&f_{12}(x)=x^2\\\\f_{21}(x)=x^3&f_{22}(x)=x^4\end{bmatrix};x\in R;\begin{bmatrix}f_{11}(x)&f_{12}(x)\\\\f_{12}(x)&f_{22}(x)\end{bmatrix}\in R^4\tag{5} f(x)=⎣⎡f11(x)=xf21(x)=x3f12(x)=x2f22(x)=x4⎦⎤;x∈R;⎣⎡f11(x)f12(x)f12(x)f22(x)⎦⎤∈R4(5) - 输入矩阵,输出矩阵函数
f ( x 1 , x 2 ) = [ f 11 ( x ) = x 1 + x 2 f 12 ( x ) = x 1 2 + x 2 2 f 21 ( x ) = x 1 3 + x 2 3 f 22 ( x ) = x 1 4 + x 2 4 ] ; x ∈ R 2 ; [ f 11 ( x ) f 12 ( x ) f 12 ( x ) f 22 ( x ) ] ∈ R 4 (6) f(x_1,x_2)=\begin{bmatrix}f_{11}(x)=x_1+x_2&f_{12}(x)=x_1^2+x_2^2\\\\f_{21}(x)=x_1^3+x_2^3&f_{22}(x)=x_1^4+x_2^4\end{bmatrix};x\in R^2;\begin{bmatrix}f_{11}(x)&f_{12}(x)\\\\f_{12}(x)&f_{22}(x)\end{bmatrix}\in R^4\tag{6} f(x1,x2)=⎣⎡f11(x)=x1+x2f21(x)=x13+x23f12(x)=x12+x22f22(x)=x14+x24⎦⎤;x∈R2;⎣⎡f11(x)f12(x)f12(x)f22(x)⎦⎤∈R4(6) - 求导的本质
∂ A ∂ B = ? : 指 的 是 每 一 个 来 自 A 的 元 素 对 每 一 个 自 B 的 元 素 求 导 \frac{\partial A}{\partial B}=?:指的是每一个来自A的元素对每一个自B的元素求导 ∂B∂A=?:指的是每一个来自A的元素对每一个自B的元素求导
1.4 矩阵求导-YX拉伸术
标量不变,向量拉伸
前面横向拉,后面纵向拉(YX:Y在前-横向拉,X在后-纵向拉)
(1)假设 f ( x ) f(x) f(x)为标量,x为向量;我们可以得到如下:
f ( x 1 , x 2 , . . . , x n ) = x 1 + x 2 , . . . , + x n (7) f(x_1,x_2,...,x_n)=x_1+x_2,...,+x_n\tag{7} f(x1,x2,...,xn)=x1+x2,...,+xn(7)
x = [ x 1 , x 2 , . . . , x n ] T (8) x=[x_1,x_2,...,x_n]^T\tag{8} x=[x1,x2,...,xn]T(8)
保证标量f(x)不变,向量x拉伸, ∂ f ( x ) ∂ x \frac{\partial f(x)}{\partial x} ∂x∂f(x)-> YX;X在后面,所以纵向拉,f(x)标量不变;可得如下
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] (9) \frac{\partial f(x)}{\partial x}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\frac{\partial f(x)}{\partial x_2}\\\vdots\\\frac{\partial f(x)}{\partial x_n}\end{bmatrix}\tag{9} ∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎤(9)
(2)假设 f ( x ) f(x) f(x)是向量,x是标量;由于x是标量,所以不变;由于YX中Y在前,所以Y得横向拉;我们可以得到如下:
f ( x ) = [ f 1 ( x ) f 2 ( x ) ⋮ f n ( x ) ] (10) f(x)=\begin{bmatrix}f_1(x)\\\\f_2(x)\\\vdots\\f_n(x)\end{bmatrix}\tag{10} f(x)=⎣⎢⎢⎢⎢⎢⎡f1(x)f2(x)⋮fn(x)⎦⎥⎥⎥⎥⎥⎤(10)- 标量X不变,Y=f(x)在前横向拉:
∂ f ( x ) ∂ x = [ ∂ f 1 ( x ) ∂ x , ∂ f 2 ( x ) ∂ x , . . . , ∂ f n ( x ) ∂ x ] (11) \frac{\partial f(x)}{\partial x}=[\frac{\partial f_1(x)}{\partial x},\frac{\partial f_2(x)}{\partial x},...,\frac{\partial f_n(x)}{\partial x}]\tag{11} ∂x∂f(x)=[∂x∂f1(x),∂x∂f2(x),...,∂x∂fn(x)](11)
(3)假设 f ( x ) f(x) f(x)是向量函数,x是向量
f ( x ) = [ f 1 ( x ) f 2 ( x ) ⋮ f n ( x ) ] ; x = [ x 1 x 2 ⋮ x n ] ; (12) f(x)=\begin{bmatrix}f_1(x)\\\\f_2(x)\\\vdots\\f_n(x)\end{bmatrix};x=\begin{bmatrix}x_1\\\\x_2\\\vdots\\x_n\end{bmatrix};\tag{12} f(x)=⎣⎢⎢⎢⎢⎢⎡f1(x)f2(x)⋮fn(x)⎦⎥⎥⎥⎥⎥⎤;x=⎣⎢⎢⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎥⎥⎤;(12) - 先拉伸X,因为YX中X在后面,所以X在后-纵向拉, f ( x ) f(x) f(x)先保持不变
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] (13) \frac{\partial f(x)}{\partial x}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\\\frac{\partial f(x)}{\partial x_2}\\\vdots\\\\\frac{\partial f(x)}{\partial x_n}\end{bmatrix}\tag{13} ∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(13) - 在拉伸 Y = f ( x ) Y=f(x) Y=f(x);Y在前-横向拉
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ ∂ f 1 ( x ) ∂ x 1 ∂ f 2 ( x ) ∂ x 1 … ∂ f n ( x ) ∂ x 1 ∂ f 1 ( x ) ∂ x 2 ∂ f 2 ( x ) ∂ x 2 … ∂ f n ( x ) ∂ x 2 ⋮ ⋮ ⋮ ⋮ ∂ f 1 ( x ) ∂ x n ∂ f 2 ( x ) ∂ x n … ∂ f n ( x ) ∂ x n ] (14) \frac{\partial f(x)}{\partial x}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\\\frac{\partial f(x)}{\partial x_2}\\\vdots\\\\\frac{\partial f(x)}{\partial x_n}\end{bmatrix}=\begin{bmatrix}\frac{\partial f_1(x)}{\partial x_1}&\frac{\partial f_2(x)}{\partial x_1}&\dots&\frac{\partial f_n(x)}{\partial x_1} \\\\\\\frac{\partial f_1(x)}{\partial x_2}&\frac{\partial f_2(x)}{\partial x_2}&\dots&\frac{\partial f_n(x)}{\partial x_2}\\\vdots&\vdots&\vdots&\vdots\\\\\frac{\partial f_1(x)}{\partial x_n}&\frac{\partial f_2(x)}{\partial x_n}&\dots&\frac{\partial f_n(x)}{\partial x_n}\end{bmatrix}\tag{14} ∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂f1(x)∂x2∂f1(x)⋮∂xn∂f1(x)∂x1∂f2(x)∂x2∂f2(x)⋮∂xn∂f2(x)……⋮…∂x1∂fn(x)∂x2∂fn(x)⋮∂xn∂fn(x)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(14)
1.5 常见矩阵求导公式举例
(1) f ( x ) f(x) f(x)是标量,x是向量
f ( x ) = A T X (15) f(x)=A^TX\tag{15} f(x)=ATX(15)
A = [ a 1 , a 2 , . . . , a n ] T ; X = [ x 1 , x 2 , . . . , x n ] T (16) A=[a_1,a_2,...,a_n]^T;X=[x_1,x_2,...,x_n]^T\tag{16} A=[a1,a2,...,an]T;X=[x1,x2,...,xn]T(16)
- 因为f(x)为标量,所以标量不变,YX的X在后面,所以X纵向拉伸,故可得如下
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] (17) \frac{\partial f(x)}{\partial x}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\frac{\partial f(x)}{\partial x_2}\\\ \vdots\\\frac{\partial f(x)}{\partial x_n}\end{bmatrix}\tag{17} ∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x) ⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎤(17) - 由于 f ( x ) = ∑ i = 1 n ∑ j = 1 n a i x j f(x)=\sum_{i=1}^n\sum_{j=1}^na_ix_j f(x)=∑i=1n∑j=1naixj;所以可得偏导如下:
∂ f ( x ) ∂ x i = a i (18) \frac{\partial f(x)}{\partial x_i}=a_i\tag{18} ∂xi∂f(x)=ai(18) - 故导数可得如下:
∂ f ( x ) ∂ x = [ a 1 a 2 ⋮ a n ] = A (19) \frac{\partial f(x)}{\partial x}=\begin{bmatrix}a_1\\\\a_2\\\ \vdots\\a_n\end{bmatrix}=A\tag{19} ∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎡a1a2 ⋮an⎦⎥⎥⎥⎥⎥⎤=A(19)
(2)f(x)是二次型,x是列向量
f ( x ) = X T A X = ∑ i = 1 n ∑ j = 1 n a i j x i x j (20) f(x)=X^TAX=\sum_{i=1}^n\sum_{j=1}^na_{ij}x_ix_j\tag{20} f(x)=XTAX=i=1∑nj=1∑naijxixj(20)
X = [ x 1 , x 2 , . . . , x n ] T ; A = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n ⋮ ⋮ … ⋮ a n 1 a n 2 … a n n ] (21) X=[x_1,x_2,...,x_n]^T;A=\begin{bmatrix}a_{11}&a_{12}&\dots&a_{1n}\\a_{21}&a_{22}&\dots&a_{2n}\\\vdots&\vdots&\dots&\vdots\\a_{n1}&a_{n2}&\dots&a_{nn} \end{bmatrix}\tag{21} X=[x1,x2,...,xn]T;A=⎣⎢⎢⎢⎡a11a21⋮an1a12a22⋮an2…………a1na2n⋮ann⎦⎥⎥⎥⎤(21) - f(x)是标量,YX中X纵向拉伸
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ ∑ j = 1 n a 1 j x j + ∑ i = 1 n a i 1 x i ∑ j = 1 n a 2 j x j + ∑ i = 1 n a i 2 x i ⋮ ∑ j = 1 n a n j x j + ∑ i = 1 n a i n x i ] (22) \frac{\partial f(x)}{\partial x}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\frac{\partial f(x)}{\partial x_2}\\\\\vdots\\\\\frac{\partial f(x)}{\partial x_n}\\\end{bmatrix}=\begin{bmatrix}\sum_{j=1}^na_{1j}x_j+\sum_{i=1}^na_{i1}x_i\\\\\sum_{j=1}^na_{2j}x_j+\sum_{i=1}^na_{i2}x_i\\\\\vdots\\\\\sum_{j=1}^na_{nj}x_j+\sum_{i=1}^na_{in}x_i\\\end{bmatrix}\tag{22} ∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑j=1na1jxj+∑i=1nai1xi∑j=1na2jxj+∑i=1nai2xi⋮∑j=1nanjxj+∑i=1nainxi⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(22)
∂ f ( x ) ∂ x = [ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j ⋮ ∑ j = 1 n a n j x j ] + [ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i n x i ] (23) \frac{\partial f(x)}{\partial x}=\begin{bmatrix}\sum_{j=1}^na_{1j}x_j\\\\\sum_{j=1}^na_{2j}x_j\\\\\vdots\\\\\sum_{j=1}^na_{nj}x_j\\\end{bmatrix}+\begin{bmatrix}\sum_{i=1}^na_{i1}x_i\\\\\sum_{i=1}^na_{i2}x_i\\\\\vdots\\\\\sum_{i=1}^na_{in}x_i\\\end{bmatrix}\tag{23} ∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑j=1na1jxj∑j=1na2jxj⋮∑j=1nanjxj⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤+⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑i=1nai1xi∑i=1nai2xi⋮∑i=1nainxi⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(23)
∂ f ( x ) ∂ x = [ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j ⋮ ∑ j = 1 n a n j x j ] + [ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i n x i ] (24) \frac{\partial f(x)}{\partial x}=\begin{bmatrix}\sum_{j=1}^na_{1j}x_j\\\\\sum_{j=1}^na_{2j}x_j\\\\\vdots\\\\\sum_{j=1}^na_{nj}x_j\\\end{bmatrix}+\begin{bmatrix}\sum_{i=1}^na_{i1}x_i\\\\\sum_{i=1}^na_{i2}x_i\\\\\vdots\\\\\sum_{i=1}^na_{in}x_i\\\end{bmatrix}\tag{24} ∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑j=1na1jxj∑j=1na2jxj⋮∑j=1nanjxj⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤+⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑i=1nai1xi∑i=1nai2xi⋮∑i=1nainxi⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(24)
[ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j ⋮ ∑ j = 1 n a n j x j ] = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n ⋮ ⋮ … ⋮ a n 1 a n 2 … a n n ] [ x 1 x 2 ⋮ x n ] = A X (25) \begin{bmatrix}\sum_{j=1}^na_{1j}x_j\\\\\sum_{j=1}^na_{2j}x_j\\\\\vdots\\\\\sum_{j=1}^na_{nj}x_j\\\end{bmatrix}=\begin{bmatrix}a_{11}&a_{12}&\dots&a_{1n}\\a_{21}&a_{22}&\dots&a_{2n}\\\vdots&\vdots&\dots&\vdots\\a_{n1}&a_{n2}&\dots&a_{nn} \end{bmatrix}\begin{bmatrix}x_1\\\\x_2\\\\\vdots\\\\x_n\end{bmatrix}=AX\tag{25} ⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑j=1na1jxj∑j=1na2jxj⋮∑j=1nanjxj⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎡a11a21⋮an1a12a22⋮an2…………a1na2n⋮ann⎦⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=AX(25)
[ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i n x i ] = [ a 11 a 21 … a n 1 a 12 a 22 … a n 2 ⋮ ⋮ … ⋮ a 1 n a 2 n … a n n ] [ x 1 x 2 ⋮ x n ] = A T X (26) \begin{bmatrix}\sum_{i=1}^na_{i1}x_i\\\\\sum_{i=1}^na_{i2}x_i\\\\\vdots\\\\\sum_{i=1}^na_{in}x_i\\\end{bmatrix}=\begin{bmatrix}a_{11}&a_{21}&\dots&a_{n1}\\a_{12}&a_{22}&\dots&a_{n2}\\\vdots&\vdots&\dots&\vdots\\a_{1n}&a_{2n}&\dots&a_{nn} \end{bmatrix}\begin{bmatrix}x_1\\\\x_2\\\\\vdots\\\\x_n\end{bmatrix}=A^TX\tag{26} ⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑i=1nai1xi∑i=1nai2xi⋮∑i=1nainxi⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎡a11a12⋮a1na21a22⋮a2n…………an1an2⋮ann⎦⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=ATX(26)
∂ f ( x ) ∂ x = ∂ ( X T A X ) ∂ x = A X + A T X = ( A + A T ) X (27) \frac{\partial f(x)}{\partial x}=\frac{\partial (X^TAX)}{\partial x}=AX+A^TX=(A+A^T)X\tag{27} ∂x∂f(x)=∂x∂(XTAX)=AX+ATX=(A+AT)X(27) - 当A为对称矩阵时,满足 A T = A A^T=A AT=A那么上式可得:
∂ f ( x ) ∂ x = ∂ ( X T A X ) ∂ x = A X + A T X = 2 A X (28) \frac{\partial f(x)}{\partial x}=\frac{\partial (X^TAX)}{\partial x}=AX+A^TX=2AX\tag{28} ∂x∂f(x)=∂x∂(XTAX)=AX+ATX=2AX(28)
1.6 求导细节补充
分子布局和分母布局的区别:
详见知乎大佬链接:分子分母布局说明
- 分母布局- YX拉伸术;分子布局-XY拉伸术;X在前面就像分数的X/Y就是分子布局,X在后面就像分数的Y/X就是分母布局
- 区别:向量求导拉伸方向的区别;拉伸方向的口诀是不变的:
- 口诀:
前面横向拉,后面纵向拉
2. 雅克比矩阵
2.1 雅克比矩阵数学
雅克比矩阵就是在向量y关于向量x的偏导数组成的矩阵;本质是y中的每个元素关于x中每个元素求偏导
y = [ y 1 , y 2 , . . , y m ] ; x = [ x 1 , x 2 , . . . , x n ] y=[y_1,y_2,..,y_m];x=[x_1,x_2,...,x_n] y=[y1,y2,..,ym];x=[x1,x2,...,xn]
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 … ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 … ∂ y 2 ∂ x n ⋮ ⋮ ⋮ ⋮ ∂ y n ∂ x 1 ∂ y n ∂ x 2 … ∂ y n ∂ x n ] \frac{\partial y}{\partial x}=\begin{bmatrix}\frac{\partial y_1}{\partial x_1}&\frac{\partial y_1}{\partial x_2}&\dots &\frac{\partial y_1}{\partial x_n}\\\\\frac{\partial y_2}{\partial x_1}&\frac{\partial y_2}{\partial x_2}&\dots &\frac{\partial y_2}{\partial x_n}\\\\\vdots&\vdots&\vdots&\vdots\\\\\frac{\partial y_n}{\partial x_1}&\frac{\partial y_n}{\partial x_2}&\dots &\frac{\partial y_n}{\partial x_n}\end{bmatrix} ∂x∂y=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂y1∂x1∂y2⋮∂x1∂yn∂x2∂y1∂x2∂y2⋮∂x2∂yn……⋮…∂xn∂y1∂xn∂y2⋮∂xn∂yn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
2.2 雅克比矩阵pytorch
jacobian
pytorch中主要是向量对向量之间的jacobian求导
我们定义x,y,如下:
x = [ 0 1 2 3 ] ; y = x 2 ; y = [ 0 1 4 9 ] x=\begin{bmatrix}0&1&2&3\end{bmatrix};y=x^2;y=\begin{bmatrix}0&1&4&9\end{bmatrix} x=[0123];y=x2;y=[0149]
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 1 ∂ x 4 ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ∂ y 2 ∂ x 3 ∂ y 2 ∂ x 4 ∂ y 3 ∂ x 1 ∂ y 3 ∂ x 2 ∂ y 3 ∂ x 3 ∂ y 3 ∂ x 4 ∂ y 4 ∂ x 1 ∂ y 4 ∂ x 2 ∂ y 4 ∂ x 3 ∂ y 4 ∂ x 4 ] = [ 2 x 1 0 0 0 0 2 x 2 0 0 0 0 2 x 3 0 0 0 0 2 x 4 ] = [ 0 0 0 0 0 2 0 0 0 0 4 0 0 0 0 6 ] \frac{\partial y}{\partial x}=\begin{bmatrix}\frac{\partial y_1}{\partial x_1}&\frac{\partial y_1}{\partial x_2}&\frac{\partial y_1}{\partial x_3}&\frac{\partial y_1}{\partial x_4}\\\\\frac{\partial y_2}{\partial x_1}&\frac{\partial y_2}{\partial x_2}&\frac{\partial y_2}{\partial x_3}&\frac{\partial y_2}{\partial x_4}\\\\\frac{\partial y_3}{\partial x_1}&\frac{\partial y_3}{\partial x_2}&\frac{\partial y_3}{\partial x_3}&\frac{\partial y_3}{\partial x_4}\\\\\frac{\partial y_4}{\partial x_1}&\frac{\partial y_4}{\partial x_2}&\frac{\partial y_4}{\partial x_3}&\frac{\partial y_4}{\partial x_4}\end{bmatrix}=\begin{bmatrix}2x_1&0&0&0\\\\0&2x_2&0&0\\\\0&0&2x_3&0\\\\0&0&0&2x_4\end{bmatrix}=\begin{bmatrix}0&0&0&0\\\\0&2&0&0\\\\0&0&4&0\\\\0&0&0&6\end{bmatrix} ∂x∂y=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂y1∂x1∂y2∂x1∂y3∂x1∂y4∂x2∂y1∂x2∂y2∂x2∂y3∂x2∂y4∂x3∂y1∂x3∂y2∂x3∂y3∂x3∂y4∂x4∂y1∂x4∂y2∂x4∂y3∂x4∂y4⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡2x100002x200002x300002x4⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡0000020000400006⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤
- 代码:
import torch
from torch import nndef f(x):return x.pow(2)x = torch.arange(4,dtype=torch.float)
y = f(x)
print(f"x={x}")
print(f"x.shape={x.shape}")
print(f"y={y}")
print(f"y.shape={y.shape}")
jabobian_x = torch.autograd.functional.jacobian(f, x)
print(f"jabobian_x.shape={jabobian_x.shape}")
print(f"jabobian_x={jabobian_x}")
x=tensor([0., 1., 2., 3.])
x.shape=torch.Size([4])
y=tensor([0., 1., 4., 9.])
y.shape=torch.Size([4])
jabobian_x.shape=torch.Size([4, 4])
jabobian_x=tensor([[0., 0., 0., 0.],[0., 2., 0., 0.],[0., 0., 4., 0.],[0., 0., 0., 6.]])
2.3 向量对向量求导&矩阵对矩阵求导
pytorch中对于反向传播来说,有两种计算方式,第一种是用backward进行计算,另外一种是用 v T @ j a c o b i a n v^T@jacobian vT@jacobian
import torch
from torch import nn
from torch.autograd.functional import jacobian# 1.向量a对向量b的求导backward
a = torch.randn(3, requires_grad=True)
b = torch.randn(3, requires_grad=True)def func(a):return a + by = func(a)
y.backward(torch.ones_like(y))
a_grad = a.grad
print(f"a_grad={a_grad}")
# 2.向量a对向量b的求导jacobianjacobian_a = torch.ones_like(func(a)) @ jacobian(func, a)
print(f"jacobian_a={jacobian_a}")
# 3.矩阵m对矩阵n的求导backward
m = torch.randn((2, 3), requires_grad=True)
n = torch.randn((3, 2), requires_grad=True)
z = m @ n
z.backward(torch.ones_like(z))
m_grad = m.grad
n_grad = n.grad
print(f"m_grad={m_grad}")
print(f"n_grad={n_grad}")# 4.矩阵m对矩阵n的求导jacobian
def func_m(m):return m @ njacobian_m = torch.ones_like(func_m(m[0])) @ jacobian(func_m, m[0])
print(f"jacobian_m={jacobian_m}")
a_grad=tensor([1., 1., 1.])
jacobian_a=tensor([1., 1., 1.])
m_grad=tensor([[ 1.3068, 1.3378, -1.5509],[ 1.3068, 1.3378, -1.5509]])
n_grad=tensor([[-1.4083, -1.4083],[-2.3474, -2.3474],[ 0.6330, 0.6330]])
jacobian_m=tensor([ 1.3068, 1.3378, -1.5509])
2.4 小结
(1)通过上述代码我们发现一个问题,我们求解一个参数的导数,不仅仅可以用y.backward
得到x.grad
;还可以通过jacobian
和v
的积来得到;
(2)矩阵A对矩阵B的求导可以将矩阵A按行拆分成向量,再通过jacobian来得到导数;
这篇关于11 - 向量微分、矩阵微分以及基于雅克比矩阵求导数的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!