本文主要是介绍EM算法摘记(三):另一类三硬币问题,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
EM算法摘记(三):另一类三硬币问题
\qquad 本文主要是对经典论文《What is the expectation maximization algorithm》 Figure.1的详细分析,包括EM公式的推导。
观测数据 Y \mathbf Y Y 的产生方式
\qquad 在前两部分所描述的问题中,观测数据 Y = { y 1 , y 2 , ⋯ , y N } \mathbf Y=\{ y_{1},y_{2},\cdots,y_{N}\} Y={y1,y2,⋯,yN} 或 X = { x 1 , x 2 , ⋯ , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} X={x1,x2,⋯,xN} 的产生是按照以下规则:
\qquad\newline 1 ) \qquad1) 1) 首先,以 P ( z k = 1 ∣ θ ) = π k , k ∈ { 1 , ⋯ , K } P(z_{k}=1|\theta)=\pi_{k},\ k\in\{1,\cdots,K\} P(zk=1∣θ)=πk, k∈{1,⋯,K} 的概率,确定 z k = 1 , z j = 0 ( ∀ j ≠ k ) z_{k}=1, \ z_{j}=0\ (\forall j \not =k) zk=1, zj=0 (∀j=k), \newline
也就是选择第 k k k 个子分布 \newline
2 ) \qquad2) 2) 然后,从第 k k k 个子分布 P ( y ∣ z k = 1 , θ ) P(y|z_{k}=1,\theta) P(y∣zk=1,θ) (或 P ( x ∣ z k = 1 , θ ) P(\boldsymbol x|z_{k}=1,\theta) P(x∣zk=1,θ) )中抽出样本 y y y (或 x \boldsymbol x x) \newline
如果每个观测样本 y n y_{n} yn (或 x n \boldsymbol x_{n} xn)的产生过程都是独立的
,那么观测样本 y n y_{n} yn (或 x n \boldsymbol x_{n} xn) 的产生只与 \newline
隐藏向量 z n \mathbf z_{n} zn 有关。 \qquad\newline
\qquad
\qquad 如果“观测样本的抽取”采用下图中的过程:
1 ) \qquad1) 1) 首先,以 P ( z k = 1 ∣ θ ) = π k , k ∈ { 1 , ⋯ , K } P(z_{k}=1|\theta)=\pi_{k},\ k\in\{1,\cdots,K\} P(zk=1∣θ)=πk, k∈{1,⋯,K} 的概率,确定 z k = 1 , z j = 0 ( ∀ j ≠ k ) z_{k}=1, \ z_{j}=0\ (\forall j \not =k) zk=1, zj=0 (∀j=k),也就是选择第 k k k 个子分布(图中为 K = 2 K=2 K=2 的情形) \newline
2 ) \qquad2) 2) 然后,基于第 k k k 个子分布 P ( y ∣ z k = 1 , θ ) P(y|z_{k}=1,\theta) P(y∣zk=1,θ) 的概率,重复进行 N = 10 N=10 N=10 次试验,抽出其中一组样本 Y i = { y i , 1 , y i , 2 , ⋯ , y i , N } , i ∈ { 1 , ⋯ , M } \mathbf Y_{i}=\{ y_{i,1},y_{i,2},\cdots,y_{i,N}\}\ ,\ i\in\{1,\cdots,M\} Yi={yi,1,yi,2,⋯,yi,N} , i∈{1,⋯,M} \newline
\qquad 下图为 M = 5 , N = 10 M=5,N=10 M=5,N=10 的情形:选中硬币B或硬币C之后,然后投掷10次来产生10个观测结果 y y y 组成一组数据,整个数据集 Y = { Y 1 , Y 2 , Y 3 , Y 4 , Y 5 } \mathbf Y=\{\mathbf Y_{1},\mathbf Y_{2},\mathbf Y_{3},\mathbf Y_{4},\mathbf Y_{5}\} Y={Y1,Y2,Y3,Y4,Y5} 包含“5组”观测数据,总共有“50个”观测数据。然而,隐藏变量 Z = { Z 1 , Z 2 , Z 3 , Z 4 , Z 5 } \mathbf Z=\{\mathbf Z_{1},\mathbf Z_{2},\mathbf Z_{3},\mathbf Z_{4},\mathbf Z_{5}\} Z={Z1,Z2,Z3,Z4,Z5} 只有5个(而不是50个,至少从步骤上是这么说)。
\qquad
\qquad 图(a)中,隐藏变量集 Z \mathbf Z Z 的信息已知,最大似然估计的过程和第一部分所描述的完全一致。
\qquad 图(b)中,隐藏变量集 Z \mathbf Z Z 的信息未知,采用 E M EM EM 算法来完成参数的估计, E M EM EM 公式的推到过程稍有不同。
Revised From《What is the expectation maximization algorithm》 Figure.1
\qquad
图(b)解释及 E M EM EM公式推导
\qquad 仍然假设三硬币问题概率模型的参数为 θ = ( π , p , q ) \theta=(\pi,p,q) θ=(π,p,q)
( 1 ) (1) (1) 三硬币问题的 1-of-K 表示
\qquad 隐藏变量 z 1 = 1 z_{1}=1 z1=1 ,即隐藏向量
z = [ 1 , 0 ] T \bold z=[1,0]^{T} z=[1,0]T,表示 事件“硬币 A 正面”
\qquad 该事件的概率为 P ( z 1 = 1 ∣ θ ) = π P(z_{1}=1|\theta)=\pi P(z1=1∣θ)=π
\qquad 隐藏变量 z 2 = 1 z_{2}=1 z2=1 ,即隐藏向量
z = [ 0 , 1 ] T \bold z=[0,1]^{T} z=[0,1]T,表示 事件“硬币 A 反面”
\qquad 该事件的概率为 P ( z 2 = 1 ∣ θ ) = 1 − π P(z_{2}=1|\theta)=1-\pi P(z2=1∣θ)=1−π
\qquad 记 P ( z 1 = 1 ∣ θ ) = π 1 = π \ P(z_{1}=1|\theta)=\pi_{1}=\pi P(z1=1∣θ)=π1=π
\qquad P ( z 2 = 1 ∣ θ ) = π 2 = 1 − π \ P(z_{2}=1|\theta)=\pi_{2}=1-\pi P(z2=1∣θ)=π2=1−π
\qquad 则投掷硬币 A 的概率可以统一描述为: P ( z k = 1 ∣ θ ) = π k , k ∈ { 1 , 2 } P(z_{k}=1|\theta)=\pi_{k}\ ,\ \ \ k\in\{1,2\} P(zk=1∣θ)=πk , k∈{1,2}
\qquad 用隐藏向量 z \mathbf z z 表示,也就是:
\qquad\qquad P ( z ∣ θ ) = ∏ k = 1 2 π k z k = π 1 z 1 ⋅ π 2 z 2 , z = [ z 1 z 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{2} \pi_{k}^{z_{k}} =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\ \ ,\qquad \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned} P(z∣θ)=k=1∏2πkzk=π1z1⋅π2z2 ,z=[z1z2]∈{[10],[01]}
( 2 ) (2) (2) 引入隐藏向量 z \mathbf z z 表示观测值 y y y 的概率
\qquad 单独考虑硬币 B 和硬币 C 关于观测值 y ∈ { 0 , 1 } y \in \{0,1\} y∈{0,1} 的概率:
\qquad 硬币 B 的概率: P ( y ∣ z 1 = 1 , θ ) = p y ( 1 − p ) ( 1 − y ) P(y|z_{1}=1,\theta)=p^{y}(1-p)^{(1-y)} P(y∣z1=1,θ)=py(1−p)(1−y)
\qquad 硬币 C 的概率: P ( y ∣ z 2 = 1 , θ ) = q y ( 1 − q ) ( 1 − y ) P(y|z_{2}=1,\theta)=q^{y}(1-q)^{(1-y)} P(y∣z2=1,θ)=qy(1−q)(1−y)
\qquad 为了数学上的描述方便,记 α 1 = p \alpha_{1}=p α1=p 以及 α 2 = q \alpha_{2}=q α2=q,则:
\qquad 硬币 B 的概率: P ( y ∣ z 1 = 1 , θ ) = p y ( 1 − p ) ( 1 − y ) = α 1 y ( 1 − α 1 ) ( 1 − y ) P(y|z_{1}=1,\theta)=p^{y}(1-p)^{(1-y)}=\alpha_{1}^{y}(1-\alpha_{1})^{(1-y)} P(y∣z1=1,θ)=py(1−p)(1−y)=α1y(1−α1)(1−y)
\qquad 硬币 C 的概率: P ( y ∣ z 2 = 1 , θ ) = q y ( 1 − q ) ( 1 − y ) = α 2 y ( 1 − α 2 ) ( 1 − y ) P(y|z_{2}=1,\theta)=q^{y}(1-q)^{(1-y)}=\alpha_{2}^{y}(1-\alpha_{2})^{(1-y)} P(y∣z2=1,θ)=qy(1−q)(1−y)=α2y(1−α2)(1−y)
\qquad 于是,硬币 B 和C 的概率可以统一描述为:
\qquad\qquad P ( y ∣ z k = 1 , θ ) = α k y ( 1 − α k ) ( 1 − y ) , k ∈ { 1 , 2 } , y ∈ { 0 , 1 } P(y|z_{k}=1,\theta)=\alpha_{k}^{y}(1-\alpha_{k})^{(1-y)},\ \ k \in \{1,2\},\ y \in \{0,1\} P(y∣zk=1,θ)=αky(1−αk)(1−y), k∈{1,2}, y∈{0,1}
\qquad
\qquad 用隐藏向量 z \mathbf z z 表示,也就是:
\qquad\qquad P ( y ∣ z , θ ) = ∏ k = 1 2 [ P ( y ∣ z k = 1 , θ ) ] z k = [ P ( y ∣ z 1 = 1 , θ ) ] z 1 ⋅ [ P ( y ∣ z 2 = 1 , θ ) ] z 2 = [ α 1 y ( 1 − α 1 ) ( 1 − y ) ] z 1 ⋅ [ α 2 y ( 1 − α 2 ) ( 1 − y ) ] z 2 = ∏ k = 1 2 [ α k y ( 1 − α k ) ( 1 − y ) ] z k , z = [ z 1 z 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(y|\mathbf z,\theta) &= \prod_{k=1}^{2} \left[\ P(y|z_{k}=1,\theta)\ \right]^{z_{k}} \\ &=\left[P(y|z_{1}=1,\theta)\right]^{z_{1}}\cdot\left[P(y|z_{2}=1,\theta)\right]^{z_{2}} \\ &=\left[\ \alpha_{1}^{y}(1-\alpha_{1})^{(1-y)}\ \right]^{z_{1}}\cdot\left[\ \alpha_{2}^{y}(1-\alpha_{2})^{(1-y)}\ \right]^{z_{2}} \\ &= \prod_{k=1}^{2} \left[\ \alpha_{k}^{y}(1-\alpha_{k})^{(1-y)}\ \right]^{z_{k}},\ \ \ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned} P(y∣z,θ)=k=1∏2[ P(y∣zk=1,θ) ]zk=[P(y∣z1=1,θ)]z1⋅[P(y∣z2=1,θ)]z2=[ α1y(1−α1)(1−y) ]z1⋅[ α2y(1−α2)(1−y) ]z2=k=1∏2[ αky(1−αk)(1−y) ]zk, z=[z1z2]∈{[10],[01]}
\qquad
( 3 ) (3) (3) imcomlete data
样本集的似然函数
\qquad 对于所有样本集 Y = { Y 1 , Y 2 , ⋯ , Y M } \mathbf Y=\{\mathbf Y_{1},\mathbf Y_{2},\cdots,\mathbf Y_{M}\} Y={Y1,Y2,⋯,YM},对应的隐藏向量集为 Z = { z 1 , z 2 , ⋯ , z M } \mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{M}\} Z={z1,z2,⋯,zM}。
\qquad 其中,第 i i i 组样本 Y i = { y i , 1 , ⋯ , y i , n , ⋯ , y i , N } \mathbf Y_{i}=\{ y_{i,1},\cdots,y_{i,n},\cdots,y_{i,N}\} Yi={yi,1,⋯,yi,n,⋯,yi,N} 又包含了 N N N 个样本, i ∈ { 1 , ⋯ , M } i\in\{1,\cdots,M\} i∈{1,⋯,M}, n ∈ { 1 , ⋯ , N } n\in\{1,\cdots,N\} n∈{1,⋯,N}, y i , n ∈ { 0 , 1 } y_{i,n}\in\{0,1\} yi,n∈{0,1}, z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} zi=[zi1zi2]∈{[10],[01]}。
在图(a)中:
M = 5 M=5 M=5,总共进行了5组实验,每组实验使用的是同一个硬币(硬币B或硬币C)
N = 10 N=10 N=10,在每组实验中使用同一个硬币重复独立地进行了10次投掷过程,产生每一个观测结果 y i , n ∈ { 0 , 1 } , i ∈ { 1 , ⋯ , 5 } , n ∈ { 1 , ⋯ , 10 } y_{i,n}\in\{0,1\},i\in\{1,\cdots,5\},n\in\{1,\cdots,10\} yi,n∈{0,1},i∈{1,⋯,5},n∈{1,⋯,10}。
图(a)观测结果为:
Y = { Y 1 , Y 2 , Y 3 , Y 4 , Y 5 } \mathbf Y=\{\mathbf Y_{1},\mathbf Y_{2},\mathbf Y_{3},\mathbf Y_{4},\mathbf Y_{5}\} Y={Y1,Y2,Y3,Y4,Y5}
其中, Y 1 = { 1 , 0 , 0 , 0 , 1 , 1 , 0 , 1 , 0 , 1 } , z 1 = [ 1 , 0 ] T ( z 11 = 1 ) \mathbf Y_{1}=\{1,0,0,0,1,1,0,1,0,1\},\qquad\mathbf z_{1}=[1,0]^{T}\ (z_{11}=1) Y1={1,0,0,0,1,1,0,1,0,1},z1=[1,0]T (z11=1)
Y 2 = { 1 , 1 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 1 } , z 2 = [ 0 , 1 ] T ( z 22 = 1 ) \mathbf Y_{2}=\{1,1,1,1,0,1,1,1,1,1\},\qquad\mathbf z_{2}=[0,1]^{T}\ (z_{22}=1) Y2={1,1,1,1,0,1,1,1,1,1},z2=[0,1]T (z22=1)
Y 3 = { 1 , 0 , 1 , 1 , 1 , 1 , 1 , 0 , 1 , 1 } , z 3 = [ 0 , 1 ] T ( z 32 = 1 ) \mathbf Y_{3}=\{1,0,1,1,1,1,1,0,1,1\},\qquad\mathbf z_{3}=[0,1]^{T}\ (z_{32}=1) Y3={1,0,1,1,1,1,1,0,1,1},z3=[0,1]T (z32=1)
Y 4 = { 1 , 0 , 1 , 0 , 0 , 0 , 1 , 1 , 0 , 0 } , z 4 = [ 1 , 0 ] T ( z 41 = 1 ) \mathbf Y_{4}=\{1,0,1,0,0,0,1,1,0,0\},\qquad\mathbf z_{4}=[1,0]^{T}\ (z_{41}=1) Y4={1,0,1,0,0,0,1,1,0,0},z4=[1,0]T (z41=1)
Y 5 = { 0 , 1 , 1 , 1 , 0 , 1 , 1 , 1 , 0 , 1 } , z 5 = [ 0 , 1 ] T ( z 52 = 1 ) \mathbf Y_{5}=\{0,1,1,1,0,1,1,1,0,1\},\qquad\mathbf z_{5}=[0,1]^{T}\ (z_{52}=1) Y5={0,1,1,1,0,1,1,1,0,1},z5=[0,1]T (z52=1)
\qquad 就有:
\qquad\qquad P ( Y ∣ Z , θ ) = P ( Y 1 , Y 2 , Y 3 , Y 4 , Y 5 ∣ z 1 , ⋯ , z N , θ ) = ∏ i = 1 M P ( Y i ∣ z i , θ ) = ∏ i = 1 M P ( y i , 1 , y i , 2 , ⋯ , y i , N ∣ z i , θ ) = ∏ i = 1 M { ∏ n = 1 N P ( y i , n ∣ z i , θ ) } 由 P ( y ∣ z i , θ ) = ∏ k = 1 2 P ( y ∣ z i k = 1 , θ ) z i k = ∏ i = 1 M ∏ n = 1 N ∏ k = 1 2 P ( y i , n ∣ z i k = 1 , θ ) z i k = ∏ i = 1 M ∏ n = 1 N ∏ k = 1 2 [ α k y i , n ( 1 − α k ) 1 − y i , n ] z i k \begin{aligned} P(\mathbf Y|\mathbf Z,\theta) &= P(\mathbf Y_{1},\mathbf Y_{2},\mathbf Y_{3},\mathbf Y_{4},\mathbf Y_{5} |\mathbf z_{1},\cdots,\mathbf z_{N},\theta) \\ &= \prod_{i=1}^{M} P(\mathbf Y_{i}|\mathbf z_{i},\theta) \\ &= \prod_{i=1}^{M}P(y_{i,1},y_{i,2},\cdots,y_{i,N}|\mathbf z_{i},\theta) \\ &= \prod_{i=1}^{M}\left\{ \prod_{n=1}^{N} P(y_{i,n}|\mathbf z_{i},\theta) \right\} \qquad由 P(y|\mathbf z_{i},\theta) = \prod_{k=1}^{2}P(y|z_{ik}=1,\theta)^{z_{ik}}\\ &= \prod_{i=1}^{M}\prod_{n=1}^{N}\prod_{k=1}^{2} P(y_{i,n}|z_{ik}=1,\theta)^{z_{ik}} \\ &= \prod_{i=1}^{M}\prod_{n=1}^{N}\prod_{k=1}^{2} \left[\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{1-y_{i,n}}\right]^{z_{ik}} \\ \end{aligned} P(Y∣Z,θ)=P(Y1,Y2,Y3,Y4,Y5∣z1,⋯,zN,θ)=i=1∏MP(Yi∣zi,θ)=i=1∏MP(yi,1,yi,2,⋯,yi,N∣zi,θ)=i=1∏M{n=1∏NP(yi,n∣zi,θ)}由P(y∣zi,θ)=k=1∏2P(y∣zik=1,θ)zik=i=1∏Mn=1∏Nk=1∏2P(yi,n∣zik=1,θ)zik=i=1∏Mn=1∏Nk=1∏2[αkyi,n(1−αk)1−yi,n]zik
\qquad
\qquad 又由
\qquad\qquad P ( z i ∣ θ ) = ∏ k = 1 2 π k z i k = π 1 z i 1 ⋅ π 2 z i 2 , z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf z_{i}|\theta) &= \prod_{k=1}^{2} \pi_{k}^{z_{ik}} =\pi_{1}^{z_{i1}}\cdot\pi_{2}^{z_{i2}}\ \ ,\ \ \ \ \ \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned} P(zi∣θ)=k=1∏2πkzik=π1zi1⋅π2zi2 , zi=[zi1zi2]∈{[10],[01]}
\qquad 从而有
\qquad\qquad P ( Z ∣ θ ) = P ( z 1 , ⋯ , z M ∣ θ ) = ∏ i = 1 M P ( z i ∣ θ ) = ∏ i = 1 M ∏ k = 1 2 π k z i k , z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf Z|\theta) &= P(\mathbf z_{1},\cdots,\mathbf z_{M}|\theta) \\ &=\prod_{i=1}^{M}P(\mathbf z_{i}|\theta) \\ &=\prod_{i=1}^{M}\prod_{k=1}^{2} \pi_{k}^{z_{ik}} ,\ \ \ \ \ \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned} P(Z∣θ)=P(z1,⋯,zM∣θ)=i=1∏MP(zi∣θ)=i=1∏Mk=1∏2πkzik, zi=[zi1zi2]∈{[10],[01]}
\qquad 由于隐藏向量集 Z \mathbf Z Z 无法观测, E M EM EM算法首先对所有的隐藏向量 Z = { z 1 , ⋯ , z i , ⋯ , z M } \mathbf Z=\{\mathbf z_{1},\cdots,\mathbf z_{i},\cdots,\mathbf z_{M}\} Z={z1,⋯,zi,⋯,zM} 进行猜测,使得观测数据从 imcomplete 形式 的 Y \mathbf Y Y 变成 complete 形式 的 ( Y , Z ) (\mathbf Y,\mathbf Z) (Y,Z) 。
\qquad
\qquad 此时,complete data数据 ( Y , Z ) (\mathbf Y,\mathbf Z) (Y,Z) 的似然函数
(如果 Z \mathbf Z Z 已知)为:
\qquad
\qquad\qquad P ( Y , Z ∣ θ ) = P ( Y ∣ Z , θ ) P ( Z ∣ θ ) = ∏ i = 1 M ∏ n = 1 N ∏ k = 1 2 P ( y i , n ∣ z i k = 1 , θ ) z i k ⋅ ∏ i = 1 M ∏ k = 1 2 π k z i k = ∏ i = 1 M ∏ n = 1 N ∏ k = 1 2 [ α k y i , n ( 1 − α k ) 1 − y i , n ] z i k ⋅ ∏ i = 1 M ∏ k = 1 2 π k z i k = ∏ i = 1 M { ∏ n = 1 N ∏ k = 1 2 [ α k y i , n ( 1 − α k ) 1 − y i , n ] z i k ⋅ ∏ k = 1 2 π k z i k } \begin{aligned} P(\mathbf Y,\mathbf Z|\theta) &=P(\mathbf Y|\mathbf Z,\theta)P(\mathbf Z|\theta) \\ &=\prod_{i=1}^{M}\prod_{n=1}^{N}\prod_{k=1}^{2} P(y_{i,n}|z_{ik}=1,\theta)^{z_{ik}} \cdot \prod_{i=1}^{M}\prod_{k=1}^{2} \pi_{k}^{z_{ik}} \\ &=\prod_{i=1}^{M}\prod_{n=1}^{N}\prod_{k=1}^{2} \left[\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{1-y_{i,n}}\right]^{z_{ik}} \cdot \prod_{i=1}^{M}\prod_{k=1}^{2} \pi_{k}^{z_{ik}} \\ &=\prod_{i=1}^{M}\left\{\prod_{n=1}^{N}\prod_{k=1}^{2} \left[\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{1-y_{i,n}}\right]^{z_{ik}} \cdot \prod_{k=1}^{2} \pi_{k}^{z_{ik}}\right\} \\ \end{aligned} P(Y,Z∣θ)=P(Y∣Z,θ)P(Z∣θ)=i=1∏Mn=1∏Nk=1∏2P(yi,n∣zik=1,θ)zik⋅i=1∏Mk=1∏2πkzik=i=1∏Mn=1∏Nk=1∏2[αkyi,n(1−αk)1−yi,n]zik⋅i=1∏Mk=1∏2πkzik=i=1∏M{n=1∏Nk=1∏2[αkyi,n(1−αk)1−yi,n]zik⋅k=1∏2πkzik}
\qquad 可得到对数似然函数
(其值取决于 Z \mathbf Z Z ):
\qquad\qquad ln P ( Y , Z ∣ θ ) = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 z i k ln P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 z i k ln π k \begin{aligned} \ln P(\mathbf Y,\mathbf Z|\theta) &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{ik} \ln P(y_{i,n}|z_{ik}=1,\theta) + \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} z_{ik} \ln\pi_{k} \\ \end{aligned} lnP(Y,Z∣θ)=i=1∑Mn=1∑Nk=1∑2ziklnP(yi,n∣zik=1,θ)+i=1∑Mk=1∑2ziklnπk
\qquad
( 4 ) (4) (4) 计算对数似然函数 ln P ( Y , Z ∣ θ ) \ln P(\mathbf Y,\mathbf Z|\theta) lnP(Y,Z∣θ) 的期望
\qquad 定义函数 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)):
\qquad\qquad Q ( θ , θ ( i ) ) = E P ( Z ∣ Y , θ ) [ ln P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = E P ( Z ∣ Y , θ ) { ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 z i k ln P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 z i k ln π k } = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 E P ( Z ∣ Y , θ ) [ z i k ] ln P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 E P ( Z ∣ Y , θ ) [ z i k ] ln π k \begin{aligned}Q(\theta,\theta^{(i)})&=E_{P(\bold Z|\bold Y,\theta)}[\ln P(\bold Y, \bold Z|\theta)\ |\ \bold Y,\theta^{(i)}] \\ &= E_{P(\bold Z|\bold Y,\theta)}\left\{\displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{ik} \ln P(y_{i,n}|z_{ik}=1,\theta) + \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} z_{ik} \ln\pi_{k}\right\} \\ &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} E_{P(\bold Z|\bold Y,\theta)}[z_{ik}] \ln P(y_{i,n}|z_{ik}=1,\theta) + \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} E_{P(\bold Z|\bold Y,\theta)}[z_{ik}] \ln\pi_{k} \\ \end{aligned} Q(θ,θ(i))=EP(Z∣Y,θ)[lnP(Y,Z∣θ) ∣ Y,θ(i)]=EP(Z∣Y,θ){i=1∑Mn=1∑Nk=1∑2ziklnP(yi,n∣zik=1,θ)+i=1∑Mk=1∑2ziklnπk}=i=1∑Mn=1∑Nk=1∑2EP(Z∣Y,θ)[zik]lnP(yi,n∣zik=1,θ)+i=1∑Mk=1∑2EP(Z∣Y,θ)[zik]lnπk
\qquad
\qquad 为了计算 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 的值,必须先计算: E P ( Z ∣ Y , θ ) [ z i k ] = ∑ z i k z i k P ( z i ∣ Y i , θ ) E_{P(\bold Z|\bold Y,\theta)}[z_{ik}]=\displaystyle\sum_{z_{ik}}z_{ik}P(\bold z_{i}|\bold Y_{i},\theta) EP(Z∣Y,θ)[zik]=zik∑zikP(zi∣Yi,θ)
\qquad 由
\qquad\qquad P ( z i ∣ Y i , θ ) = P ( z i , Y i ∣ θ ) P ( Y i ∣ θ ) = P ( Y i ∣ z i , θ ) P ( z i ∣ θ ) ∑ z i P ( Y i ∣ z i , θ ) P ( z i ∣ θ ) , z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } = P ( Y i ∣ z i k = 1 , θ ) P ( z i k = 1 ∣ θ ) P ( Y i ∣ z i 1 = 1 , θ ) P ( z i 1 = 1 ∣ θ ) + P ( Y i ∣ z i 2 = 1 , θ ) P ( z i 2 = 1 ∣ θ ) \begin{aligned}P(\bold z_{i}|\bold Y_{i},\theta)&=\dfrac{P(\bold z_{i},\bold Y_{i}|\theta)}{P(\bold Y_{i}|\theta)} \\ &=\dfrac{P(\bold Y_{i}|\bold z_{i},\theta)P(\bold z_{i}|\theta)}{\sum\limits_{\bold z_{i}}P(\bold Y_{i}|\bold z_{i},\theta)P(\bold z_{i}|\theta)},\ \ \ \ \ \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \\ &=\dfrac{P(\bold Y_{i}|z_{ik}=1,\theta)P(z_{ik}=1|\theta)}{P(\bold Y_{i}|z_{i1}=1,\theta)P(z_{i1}=1|\theta)+P(\bold Y_{i}|z_{i2}=1,\theta)P(z_{i2}=1|\theta)} \\ \end{aligned} P(zi∣Yi,θ)=P(Yi∣θ)P(zi,Yi∣θ)=zi∑P(Yi∣zi,θ)P(zi∣θ)P(Yi∣zi,θ)P(zi∣θ), zi=[zi1zi2]∈{[10],[01]}=P(Yi∣zi1=1,θ)P(zi1=1∣θ)+P(Yi∣zi2=1,θ)P(zi2=1∣θ)P(Yi∣zik=1,θ)P(zik=1∣θ)
\qquad 因此有
\qquad\qquad E P ( Z ∣ Y , θ ) [ z i k ] = ∑ z i k z i k P ( z i ∣ Y i , θ ) = ∑ z i k z i k P ( Y i ∣ z i k = 1 , θ ) P ( z i k = 1 ∣ θ ) P ( Y i ∣ z i 1 = 1 , θ ) P ( z i 1 = 1 ∣ θ ) + P ( Y i ∣ z i 2 = 1 , θ ) P ( z i 2 = 1 ∣ θ ) = 1 ⋅ P ( Y i ∣ z i k = 1 , θ ) P ( z i k = 1 ∣ θ ) + 0 ⋅ P ( Y i ∣ z i k = 0 , θ ) P ( z i k = 0 ∣ θ ) P ( Y i ∣ z i 1 = 1 , θ ) P ( z i 1 = 1 ∣ θ ) + P ( Y i ∣ z i 2 = 1 , θ ) P ( z i 2 = 1 ∣ θ ) = P ( Y i ∣ z i k = 1 , θ ) P ( z i k = 1 ∣ θ ) P ( Y i ∣ z i 1 = 1 , θ ) P ( z i 1 = 1 ∣ θ ) + P ( Y i ∣ z i 2 = 1 , θ ) P ( z i 2 = 1 ∣ θ ) = π k P ( Y i ∣ z i k = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{ik}]&=\displaystyle\sum_{z_{ik}}z_{ik}P(\bold z_{i}|\bold Y_{i},\theta)\\ &=\displaystyle\sum_{z_{ik}}\dfrac{z_{ik}P(\bold Y_{i}|z_{ik}=1,\theta)P(z_{ik}=1|\theta)}{P(\bold Y_{i}|z_{i1}=1,\theta)P(z_{i1}=1|\theta)+P(\bold Y_{i}|z_{i2}=1,\theta)P(z_{i2}=1|\theta)} \\ &=\dfrac{1\cdot P(\bold Y_{i}|z_{ik}=1,\theta)P(z_{ik}=1|\theta)+0\cdot P(\bold Y_{i}|z_{ik}=0,\theta)P(z_{ik}=0|\theta)}{P(\bold Y_{i}|z_{i1}=1,\theta)P(z_{i1}=1|\theta)+P(\bold Y_{i}|z_{i2}=1,\theta)P(z_{i2}=1|\theta)} \\ &=\dfrac{P(\bold Y_{i}|z_{ik}=1,\theta)P(z_{ik}=1|\theta)}{P(\bold Y_{i}|z_{i1}=1,\theta)P(z_{i1}=1|\theta)+P(\bold Y_{i}|z_{i2}=1,\theta)P(z_{i2}=1|\theta)} \\ &=\dfrac{\pi_{k}P(\bold Y_{i}|z_{ik}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} \\ \end{aligned} EP(Z∣Y,θ)[zik]=zik∑zikP(zi∣Yi,θ)=zik∑P(Yi∣zi1=1,θ)P(zi1=1∣θ)+P(Yi∣zi2=1,θ)P(zi2=1∣θ)zikP(Yi∣zik=1,θ)P(zik=1∣θ)=P(Yi∣zi1=1,θ)P(zi1=1∣θ)+P(Yi∣zi2=1,θ)P(zi2=1∣θ)1⋅P(Yi∣zik=1,θ)P(zik=1∣θ)+0⋅P(Yi∣zik=0,θ)P(zik=0∣θ)=P(Yi∣zi1=1,θ)P(zi1=1∣θ)+P(Yi∣zi2=1,θ)P(zi2=1∣θ)P(Yi∣zik=1,θ)P(zik=1∣θ)=π1P(Yi∣zi1=1,θ)+π2P(Yi∣zi2=1,θ)πkP(Yi∣zik=1,θ)
\qquad
\qquad 而
\qquad\qquad P ( Y i ∣ z i k = 1 , θ ) = P ( y i , 1 , ⋯ , y i , n , ⋯ , y i , N ∣ z i k = 1 , θ ) = ∏ n = 1 N P ( y i , n ∣ z i k = 1 , θ ) = ∏ n = 1 N α k y i , n ( 1 − α k ) ( 1 − y i , n ) , k ∈ { 1 , 2 } , y i , n ∈ { 0 , 1 } \begin{aligned}P(\bold Y_{i}|z_{ik}=1,\theta)&=P(y_{i,1},\cdots,y_{i,n},\cdots,y_{i,N}|z_{ik}=1,\theta)\\ &=\prod_{n=1}^{N}P(y_{i,n}|z_{ik}=1,\theta)\\ &=\prod_{n=1}^{N}\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{(1-y_{i,n})},\ \ k \in \{1,2\},\ y_{i,n} \in \{0,1\}\\ \end{aligned} P(Yi∣zik=1,θ)=P(yi,1,⋯,yi,n,⋯,yi,N∣zik=1,θ)=n=1∏NP(yi,n∣zik=1,θ)=n=1∏Nαkyi,n(1−αk)(1−yi,n), k∈{1,2}, yi,n∈{0,1}
\qquad 可得到: E P ( Z ∣ Y , θ ) [ z i k ] = π k ∏ n = 1 N α k y i , n ( 1 − α k ) ( 1 − y i , n ) π 1 ∏ n = 1 N α 1 y i , n ( 1 − α 1 ) ( 1 − y i , n ) + π 2 ∏ n = 1 N α 2 y i , n ( 1 − α 2 ) ( 1 − y i , n ) \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{ik}]&=\dfrac{\pi_{k}\prod_{n=1}^{N}\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{(1-y_{i,n})}}{\pi_{1}\prod_{n=1}^{N}\alpha_{1}^{y_{i,n}}(1-\alpha_{1})^{(1-y_{i,n})}+\pi_{2}\prod_{n=1}^{N}\alpha_{2}^{y_{i,n}}(1-\alpha_{2})^{(1-y_{i,n})}} \\ \end{aligned} EP(Z∣Y,θ)[zik]=π1∏n=1Nα1yi,n(1−α1)(1−yi,n)+π2∏n=1Nα2yi,n(1−α2)(1−yi,n)πk∏n=1Nαkyi,n(1−αk)(1−yi,n)
关于图(b)的解释(一)
首先假设初始值 p = α 1 = θ ^ B ( 0 ) = 0.5 p=\alpha_{1}=\hat \theta_{B}^{(0)}=0.5 p=α1=θ^B(0)=0.5, q = α 2 = θ ^ C ( 0 ) = 0.6 q=\alpha_{2}=\hat \theta_{C}^{(0)}=0.6 q=α2=θ^C(0)=0.6,分析试验结果:
第1组试验(10次投掷,5次为正面,5次为反面):
如果是硬币C投掷的结果,则似然函数值为
L i k e l i h o o d c o i n C = P ( Y 1 ∣ z 12 = 1 , θ ) = α 2 5 ( 1 − α 2 ) 5 = 0. 6 5 ⋅ 0. 4 5 = 0.0007962624 Likelihood_{coin C}=P(\bold Y_{1}|z_{12}=1,\theta)=\alpha_{2}^{5}(1-\alpha_{2})^{5}=0.6^{5}\cdot0.4^{5}=0.0007962624 LikelihoodcoinC=P(Y1∣z12=1,θ)=α25(1−α2)5=0.65⋅0.45=0.0007962624
如果是硬币B投掷的结果,则似然函数值为
L i k e l i h o o d c o i n B = P ( Y 1 ∣ z 11 = 1 , θ ) = α 1 5 ( 1 − α 1 ) 5 = 0. 5 5 ⋅ 0. 5 5 = 0.0009765625 Likelihood_{coin B}=P(\bold Y_{1}|z_{11}=1,\theta)=\alpha_{1}^{5}(1-\alpha_{1})^{5}=0.5^{5}\cdot0.5^{5}=0.0009765625 LikelihoodcoinB=P(Y1∣z11=1,θ)=α15(1−α1)5=0.55⋅0.55=0.0009765625
\qquad
( 5 ) E (5)\ E (5) E 步公式
\qquad 由于在图(b)抽取的数据中,我们并不知道隐藏变量 z i \mathbf z_{i} zi 的值,因此我们在给定模型参数为 ( μ , Σ , π ) (\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) (μ,Σ,π) 的情况下,求取了 z i \mathbf z_{i} zi 的期望:
\qquad\qquad E P ( Z ∣ Y , θ ) [ z i k ] = π k P ( Y i ∣ z i k = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) = π k ∏ n = 1 N α k y i , n ( 1 − α k ) ( 1 − y i , n ) π 1 ∏ n = 1 N α 1 y i , n ( 1 − α 1 ) ( 1 − y i , n ) + π 2 ∏ n = 1 N α 2 y i , n ( 1 − α 2 ) ( 1 − y i , n ) \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{ik}]&=\dfrac{\pi_{k}P(\bold Y_{i}|z_{ik}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} \\ &=\dfrac{\pi_{k}\prod_{n=1}^{N}\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{(1-y_{i,n})}}{\pi_{1}\prod_{n=1}^{N}\alpha_{1}^{y_{i,n}}(1-\alpha_{1})^{(1-y_{i,n})}+\pi_{2}\prod_{n=1}^{N}\alpha_{2}^{y_{i,n}}(1-\alpha_{2})^{(1-y_{i,n})}} \end{aligned} EP(Z∣Y,θ)[zik]=π1P(Yi∣zi1=1,θ)+π2P(Yi∣zi2=1,θ)πkP(Yi∣zik=1,θ)=π1∏n=1Nα1yi,n(1−α1)(1−yi,n)+π2∏n=1Nα2yi,n(1−α2)(1−yi,n)πk∏n=1Nαkyi,n(1−αk)(1−yi,n)
\qquad 并以此来代替 z i k z_{ik} zik,用于进行对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 的计算,若记
\qquad\qquad γ ( z i k ) = π k P ( Y i ∣ z i k = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) , k ∈ { 1 , 2 } \gamma(z_{ik})=\dfrac{\pi_{k}P(\bold Y_{i}|z_{ik}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} \ ,\ \ \ k\in\{1,2\} γ(zik)=π1P(Yi∣zi1=1,θ)+π2P(Yi∣zi2=1,θ)πkP(Yi∣zik=1,θ) , k∈{1,2}
\qquad 则对数似然函数为:
\qquad\qquad Q ( θ , θ ( i ) ) = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) ln P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 γ ( z i k ) ln π k \begin{aligned}Q(\theta,\theta^{(i)}) &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln P(y_{i,n}|z_{ik}=1,\theta) + \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln\pi_{k} \\ \end{aligned} Q(θ,θ(i))=i=1∑Mn=1∑Nk=1∑2γ(zik)lnP(yi,n∣zik=1,θ)+i=1∑Mk=1∑2γ(zik)lnπk
三硬币问题中,由于 K = 2 K=2 K=2,因此分析相对简单,可以看出:
γ ( z i 1 ) = π 1 P ( Y i ∣ z i 1 = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) \gamma(z_{i1})=\dfrac{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} γ(zi1)=π1P(Yi∣zi1=1,θ)+π2P(Yi∣zi2=1,θ)π1P(Yi∣zi1=1,θ),其中 P ( Y i ∣ z i 1 = 1 , θ ) P(\bold Y_{i}|z_{i1}=1,\theta) P(Yi∣zi1=1,θ) 是第 i i i 组实验假设采用硬币B进行投掷时 ( z i 1 = 1 ) (z_{i1}=1) (zi1=1) 对应观测数据 Y i = { y i , 1 , ⋯ , y i , n , ⋯ , y i , N } \mathbf Y_{i}=\{ y_{i,1},\cdots,y_{i,n},\cdots,y_{i,N}\} Yi={yi,1,⋯,yi,n,⋯,yi,N} 的似然值 (likelihood) \text{(likelihood)} (likelihood)
γ ( z i 2 ) = π 2 P ( Y i ∣ z i 2 = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) \gamma(z_{i2})=\dfrac{\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} γ(zi2)=π1P(Yi∣zi1=1,θ)+π2P(Yi∣zi2=1,θ)π2P(Yi∣zi2=1,θ),其中 P ( Y i ∣ z i 2 = 1 , θ ) P(\bold Y_{i}|z_{i2}=1,\theta) P(Yi∣zi2=1,θ) 是第 i i i 组实验假设采用硬币C进行投掷时 ( z i 2 = 1 ) (z_{i2}=1) (zi2=1) 对应观测数据 Y i = { y i , 1 , ⋯ , y i , n , ⋯ , y i , N } \mathbf Y_{i}=\{ y_{i,1},\cdots,y_{i,n},\cdots,y_{i,N}\} Yi={yi,1,⋯,yi,n,⋯,yi,N} 的似然值 (likelihood) \text{(likelihood)} (likelihood)
显然, γ ( z i 1 ) = 1 − γ ( z i 2 ) \gamma(z_{i1})=1-\gamma(z_{i2}) γ(zi1)=1−γ(zi2)
可以看出:
(1) 在图(a)中,隐藏向量 z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} zi=[zi1zi2]∈{[10],[01]}
(2) 然而,在图(b)的 E M EM EM 算法中,对隐藏向量 z i \mathbf z_{i} zi 的猜测为 z i = [ γ ( z i 1 ) , γ ( z i 2 ) ] T \mathbf z_{i}=[\ \gamma(z_{i1}),\gamma(z_{i2})\ ]^{T} zi=[ γ(zi1),γ(zi2) ]T,也就是在估算对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 的时候,同时考虑了硬币B和硬币C投掷时对应观测数据 Y i \mathbf Y_{i} Yi 的似然值, γ ( z i k ) \gamma(z_{ik}) γ(zik) 起到了一个加权的作用。
关于图(b)的解释(二)
第1组试验观测数据为 Y i \mathbf Y_{i} Yi(10次投掷,5次为正面,5次为反面):
如果是硬币C投掷的结果,则似然函数值为
P ( Y 1 ∣ z 12 = 1 , θ ) = α 2 5 ( 1 − α 2 ) 5 = 0. 6 5 ⋅ 0. 4 5 = 0.0007962624 P(\bold Y_{1}|z_{12}=1,\theta)=\alpha_{2}^{5}(1-\alpha_{2})^{5}=0.6^{5}\cdot0.4^{5}=0.0007962624 P(Y1∣z12=1,θ)=α25(1−α2)5=0.65⋅0.45=0.0007962624
如果是硬币B投掷的结果,则似然函数值为
P ( Y 1 ∣ z 11 = 1 , θ ) = α 1 5 ( 1 − α 1 ) 5 = 0. 5 5 ⋅ 0. 5 5 = 0.0009765625 P(\bold Y_{1}|z_{11}=1,\theta)=\alpha_{1}^{5}(1-\alpha_{1})^{5}=0.5^{5}\cdot0.5^{5}=0.0009765625 P(Y1∣z11=1,θ)=α15(1−α1)5=0.55⋅0.55=0.0009765625
由于在该论文中,假设 π 1 = π 2 = 0.5 \pi_{1}=\pi_{2}=0.5 π1=π2=0.5,也就是等概率选择硬币B或者硬币C来投掷出 y i , n y_{i,n} yi,n 的结果,因此:
γ ( z 11 ) = π 1 P ( Y 1 ∣ z 11 = 1 , θ ) π 1 P ( Y 1 ∣ z 11 = 1 , θ ) + π 2 P ( Y 1 ∣ z 12 = 1 , θ ) ≈ 0.55 \gamma(z_{11})=\dfrac{\pi_{1}P(\bold Y_{1}|z_{11}=1,\theta)}{\pi_{1}P(\bold Y_{1}|z_{11}=1,\theta)+\pi_{2}P(\bold Y_{1}|z_{12}=1,\theta)}\approx 0.55 γ(z11)=π1P(Y1∣z11=1,θ)+π2P(Y1∣z12=1,θ)π1P(Y1∣z11=1,θ)≈0.55
γ ( z 12 ) = π 2 P ( Y 1 ∣ z 12 = 1 , θ ) π 1 P ( Y 1 ∣ z 11 = 1 , θ ) + π 2 P ( Y 1 ∣ z 12 = 1 , θ ) ≈ 0.45 = 1 − γ ( z 11 ) \gamma(z_{12})=\dfrac{\pi_{2}P(\bold Y_{1}|z_{12}=1,\theta)}{\pi_{1}P(\bold Y_{1}|z_{11}=1,\theta)+\pi_{2}P(\bold Y_{1}|z_{12}=1,\theta)}\approx 0.45=1-\gamma(z_{11}) γ(z12)=π1P(Y1∣z11=1,θ)+π2P(Y1∣z12=1,θ)π2P(Y1∣z12=1,θ)≈0.45=1−γ(z11)
这个过程实际上就是猜测隐藏变量 z 1 = [ 0.55 , 0.45 ] T \bold z_{1}=[0.55,0.45]^{T} z1=[0.55,0.45]T
\qquad
( 6 ) M (6)\ M (6) M 步公式
\qquad 对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 分别对 α 1 , α 2 \alpha_{1},\alpha_{2} α1,α2 求最大似然解:
\qquad\qquad Q ( θ , θ ( i ) ) = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) ln P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 γ ( z i k ) ln π k = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) ln [ α k y i , n ( 1 − α k ) ( 1 − y i , n ) ] + ∑ i = 1 M ∑ k = 1 2 γ ( z i k ) ln π k = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) [ y i , n ln α k + ( 1 − y i , n ) ln ( 1 − α k ) ] + ∑ i = 1 M ∑ k = 1 2 γ ( z i k ) ln π k \begin{aligned}Q(\theta,\theta^{(i)}) &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln P(y_{i,n}|z_{ik}=1,\theta)+ \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln\pi_{k} \\ &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik})\ln[ \alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{(1-y_{i,n})} ]+ \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln\pi_{k} \\ &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik})[ y_{i,n}\ln\alpha_{k}+(1-y_{i,n})\ln(1-\alpha_{k}) ]+ \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln\pi_{k} \\ \end{aligned} Q(θ,θ(i))=i=1∑Mn=1∑Nk=1∑2γ(zik)lnP(yi,n∣zik=1,θ)+i=1∑Mk=1∑2γ(zik)lnπk=i=1∑Mn=1∑Nk=1∑2γ(zik)ln[αkyi,n(1−αk)(1−yi,n)]+i=1∑Mk=1∑2γ(zik)lnπk=i=1∑Mn=1∑Nk=1∑2γ(zik)[yi,nlnαk+(1−yi,n)ln(1−αk)]+i=1∑Mk=1∑2γ(zik)lnπk
\qquad\qquad ∂ Q ( θ , θ ( i ) ) ∂ α k = ∂ { ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) [ y i , n ln α k + ( 1 − y i , n ) ln ( 1 − α k ) ] } ∂ α k = ∑ i = 1 M ∑ n = 1 N γ ( z i k ) [ y i , n α k − 1 − y i , n 1 − α k ] = 0 \begin{aligned}\frac{\partial Q(\theta,\theta^{(i)})}{\partial \alpha_{k}} &=\frac{\partial\left\{ \sum\limits_{i=1}^{M} \sum\limits_{n=1}^{N}\sum\limits_{k=1}^{2} \gamma(z_{ik}) [ y_{i,n}\ln\alpha_{k}+(1-y_{i,n})\ln(1-\alpha_{k}) ] \right\} }{\partial \alpha_{k}} \\ &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N} \gamma(z_{ik}) \left[ \frac{y_{i,n}}{\alpha_{k}}-\frac{1-y_{i,n}}{1-\alpha_{k}} \right] =0 \\ \end{aligned} ∂αk∂Q(θ,θ(i))=∂αk∂{i=1∑Mn=1∑Nk=1∑2γ(zik)[yi,nlnαk+(1−yi,n)ln(1−αk)]}=i=1∑Mn=1∑Nγ(zik)[αkyi,n−1−αk1−yi,n]=0
\qquad 可得到:
\qquad\qquad α ^ k = ∑ i = 1 M ∑ n = 1 N γ ( z i k ) y i , n ∑ i = 1 M ∑ n = 1 N γ ( z i k ) \hat {\boldsymbol{\alpha}}_{k}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{ik})y_{i,n}}{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{ik})} α^k=i=1∑Mn=1∑Nγ(zik)i=1∑Mn=1∑Nγ(zik)yi,n
关于图(b)的解释(三)
三硬币问题中 K = 2 K=2 K=2
θ ^ B = α ^ 1 = ∑ i = 1 M ∑ n = 1 N γ ( z i 1 ) y i , n ∑ i = 1 M ∑ n = 1 N γ ( z i 1 ) = ∑ i = 1 M ∑ y i , n = 1 γ ( z i 1 ) ∑ i = 1 M [ ∑ y i , n = 1 γ ( z i 1 ) + ∑ y i , n = 0 γ ( z i 1 ) ] \hat \theta_{B}=\hat {\boldsymbol{\alpha}}_{1}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{i1})y_{i,n}}{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{i1})}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{y_{i,n}=1}\gamma(z_{i1})}{\sum\limits_{i=1}^{M}\left[ \sum\limits_{y_{i,n}=1}\gamma(z_{i1})+\sum\limits_{y_{i,n}=0}\gamma(z_{i1})\right] } θ^B=α^1=i=1∑Mn=1∑Nγ(zi1)i=1∑Mn=1∑Nγ(zi1)yi,n=i=1∑M[yi,n=1∑γ(zi1)+yi,n=0∑γ(zi1)]i=1∑Myi,n=1∑γ(zi1)
其中, ∑ y i , n = 1 γ ( z i 1 ) \sum\limits_{y_{i,n}=1}\gamma(z_{i1}) yi,n=1∑γ(zi1) 就是 “ “ “ γ ( z i 1 ) \gamma(z_{i1}) γ(zi1) × \times × 第 i i i 组实验中硬币B
的正面次数 ” ” ”
∑ y i , n = 0 γ ( z i 1 ) \sum\limits_{y_{i,n}=0}\gamma(z_{i1}) yi,n=0∑γ(zi1) 就是 “ “ “ γ ( z i 1 ) \gamma(z_{i1}) γ(zi1) × \times × 第 i i i 组实验中硬币B
的反面次数 ” ” ”
在图(b)中: θ ^ B = α ^ 1 = 2.8 + 1.8 + 2.1 + 2.6 + 2.5 ( 2.8 + 2.8 ) + ( 1.8 + 0.2 ) + ( 2.1 + 0.5 ) + ( 2.6 + 3.9 ) + ( 2.5 + 1.1 ) \hat \theta_{B}=\hat{\boldsymbol{\alpha}}_{1}=\dfrac{2.8+1.8+2.1+2.6+2.5}{(2.8+2.8)+(1.8+0.2)+(2.1+0.5)+(2.6+3.9)+(2.5+1.1) } θ^B=α^1=(2.8+2.8)+(1.8+0.2)+(2.1+0.5)+(2.6+3.9)+(2.5+1.1)2.8+1.8+2.1+2.6+2.5
= 11.7 11.7 + 8.4 ≈ 0.58 \qquad\ \ \ \ \ \ \ =\dfrac{11.7}{11.7+8.4}\approx0.58 =11.7+8.411.7≈0.58
θ ^ C = α ^ 2 = ∑ i = 1 M ∑ n = 1 N γ ( z i 2 ) y i , n ∑ i = 1 M ∑ n = 1 N γ ( z i 2 ) = ∑ i = 1 M ∑ y i , n = 1 γ ( z i 2 ) ∑ i = 1 M [ ∑ y i , n = 1 γ ( z i 2 ) + ∑ y i , n = 0 γ ( z i 2 ) ] \hat{\theta}_{C}=\hat{ \boldsymbol{\alpha}}_{2}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{i2})y_{i,n}}{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{i2})}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{y_{i,n}=1}\gamma(z_{i2})}{\sum\limits_{i=1}^{M}\left[ \sum\limits_{y_{i,n}=1}\gamma(z_{i2})+\sum\limits_{y_{i,n}=0}\gamma(z_{i2})\right] } θ^C=α^2=i=1∑Mn=1∑Nγ(zi2)i=1∑Mn=1∑Nγ(zi2)yi,n=i=1∑M[yi,n=1∑γ(zi2)+yi,n=0∑γ(zi2)]i=1∑Myi,n=1∑γ(zi2)
其中, ∑ y i , n = 1 γ ( z i 2 ) \sum\limits_{y_{i,n}=1}\gamma(z_{i2}) yi,n=1∑γ(zi2) 就是 “ “ “ γ ( z i 2 ) \gamma(z_{i2}) γ(zi2) × \times × 第 i i i 组实验中硬币C
的正面次数 ” ” ”
∑ y i , n = 0 γ ( z i 2 ) \sum\limits_{y_{i,n}=0}\gamma(z_{i2}) yi,n=0∑γ(zi2) 就是 “ “ “ γ ( z i 2 ) \gamma(z_{i2}) γ(zi2) × \times × 第 i i i 组实验中硬币C
的反面次数 ” ” ”
在图(b)中: θ ^ C = α ^ 2 = 2.2 + 7.2 + 5.9 + 1.4 + 4.5 ( 2.2 + 2.2 ) + ( 7.2 + 0.8 ) + ( 5.9 + 1.5 ) + ( 1.4 + 2.1 ) + ( 4.5 + 1.9 ) \hat{\theta}_{C}=\hat{\boldsymbol{\alpha}}_{2}=\dfrac{2.2+7.2+5.9+1.4+4.5}{(2.2+2.2)+(7.2+0.8)+(5.9+1.5)+(1.4+2.1)+(4.5+1.9) } θ^C=α^2=(2.2+2.2)+(7.2+0.8)+(5.9+1.5)+(1.4+2.1)+(4.5+1.9)2.2+7.2+5.9+1.4+4.5
= 21.3 21.3 + 8.6 ≈ 0.71 \qquad\ \ \ \ \ \ \ =\dfrac{21.3}{21.3+8.6}\approx0.71 =21.3+8.621.3≈0.71
代码实现
《What is the expectation maximization algorithm》 Figure.1的代码实现
import numpy as npdef cal_Likelihood(alpha1, alpha2, Yi):headCount = np.sum(Yi==1)tailCount = np.sum(Yi==0)likelihoodB = np.power(alpha1,headCount)*np.power(1-alpha1,tailCount)likelihoodC = np.power(alpha2,headCount)*np.power(1-alpha2,tailCount)gamma1 = likelihoodB/(likelihoodB + likelihoodC)gamma2 = likelihoodC/(likelihoodB + likelihoodC)return gamma1,gamma2def E_Step(alpha1, alpha2, Y):print('\n====== E-Step ======')gamma = np.zeros((5,2)) print(('coin C, coin B ------ gamma(Zik)=E[Zik]'))for i in range(5):gamma1,gamma2 = cal_Likelihood(alpha1, alpha2, Y[i,:])print((' %2.2f,\t%2.2f') % (gamma2,gamma1))gamma[i,:] = np.array([gamma1,gamma2])return gammadef M_Step(gamma,Y):print('\n====== M-Step ======')t1h = 0.000001t1t = 0.000001t2h = 0.000001t2t = 0.000001for n in range(Y.shape[0]):t1h = t1h + np.sum(gamma[n,0]*Y[n,:])t1t = t1t + np.sum(gamma[n,0]*(1-Y[n,:]))t2h = t2h + np.sum(gamma[n,1]*Y[n,:])t2t = t2t + np.sum(gamma[n,1]*(1-Y[n,:]))print('\nC-Head C-Tail B-Head B-Tail')print((' %2.1f \t %2.1f \t %2.1f \t %2.1f')%(t2h,t2t,t1h,t1t))newThetaB = t1h/(t1h+t1t)newThetaC = t2h/(t2h+t2t)return newThetaB, newThetaCif __name__ == '__main__':observationY = np.array([[1,0,0,0,1,1,0,1,0,1],\[1,1,1,1,0,1,1,1,1,1],\[1,0,1,1,1,1,1,0,1,1],\[1,0,1,0,0,0,1,1,0,0],\[0,1,1,1,0,1,1,1,0,1]])for i in range(5):print('Y'+str(i+1)+': '+str(observationY[i,:]))thetaB = 0.5thetaC = 0.6flag = 1iter = 0while flag:print('\n#iteration\t'+str(iter+1))alpha1 = thetaB;alpha2 = thetaC;gamma = E_Step(alpha1, alpha2, observationY) thetaB,thetaC = M_Step(gamma,observationY)print(('ThetaB=%2.2f,ThetaC=%2.2f')%(thetaB,thetaC))iter = iter + 1if iter==10:flag = 0
程序运行结果:
Y1: [1 0 0 0 1 1 0 1 0 1]
Y2: [1 1 1 1 0 1 1 1 1 1]
Y3: [1 0 1 1 1 1 1 0 1 1]
Y4: [1 0 1 0 0 0 1 1 0 0]
Y5: [0 1 1 1 0 1 1 1 0 1]
#iteration 1 图(b)中的数据为第一次迭代
coin C, coin B ------ gamma(Zik)=E[Zik]
0.45, 0.55
0.80, 0.20
0.73, 0.27
0.35, 0.65
0.65, 0.35
C-Head C-Tail B-Head B-Tail
21.3 8.6 11.7 8.4
ThetaB=0.58,ThetaC=0.71
#iteration 2
coin C, coin B ------ gamma(Zik)=E[Zik]
0.30, 0.70
0.81, 0.19
0.71, 0.29
0.19, 0.81
0.57, 0.43
C-Head C-Tail B-Head B-Tail
19.2 6.6 13.8 10.4
ThetaB=0.57,ThetaC=0.75
#iteration 3
coin C, coin B ------ gamma(Zik)=E[Zik]
0.22, 0.78
0.87, 0.13
0.75, 0.25
0.11, 0.89
0.58, 0.42
C-Head C-Tail B-Head B-Tail
19.4 5.9 13.6 11.1
ThetaB=0.55,ThetaC=0.77
#iteration 4
coin C, coin B ------ gamma(Zik)=E[Zik]
0.16, 0.84
0.91, 0.09
0.79, 0.21
0.07, 0.93
0.59, 0.41
C-Head C-Tail B-Head B-Tail
19.8 5.5 13.2 11.5
ThetaB=0.53,ThetaC=0.78
#iteration 5
coin C, coin B ------ gamma(Zik)=E[Zik]
0.13, 0.87
0.94, 0.06
0.82, 0.18
0.04, 0.96
0.59, 0.41
C-Head C-Tail B-Head B-Tail
20.0 5.3 13.0 11.7
ThetaB=0.53,ThetaC=0.79
#iteration 6
coin C, coin B ------ gamma(Zik)=E[Zik]
0.11, 0.89
0.95, 0.05
0.84, 0.16
0.04, 0.96
0.60, 0.40
C-Head C-Tail B-Head B-Tail
20.1 5.2 12.9 11.8
ThetaB=0.52,ThetaC=0.79
#iteration 7
coin C, coin B ------ gamma(Zik)=E[Zik]
0.11, 0.89
0.95, 0.05
0.84, 0.16
0.03, 0.97
0.60, 0.40
C-Head C-Tail B-Head B-Tail
20.1 5.2 12.9 11.8
ThetaB=0.52,ThetaC=0.80
#iteration 8
coin C, coin B ------ gamma(Zik)=E[Zik]
0.10, 0.90
0.95, 0.05
0.84, 0.16
0.03, 0.97
0.60, 0.40
C-Head C-Tail B-Head B-Tail
20.2 5.2 12.8 11.8
ThetaB=0.52,ThetaC=0.80
#iteration 9
coin C, coin B ------ gamma(Zik)=E[Zik]
0.10, 0.90
0.95, 0.05
0.84, 0.16
0.03, 0.97
0.60, 0.40
C-Head C-Tail B-Head B-Tail
20.2 5.1 12.8 11.9
ThetaB=0.52,ThetaC=0.80
#iteration 10
coin C, coin B ------ gamma(Zik)=E[Zik]
0.10, 0.90
0.95, 0.05
0.85, 0.15
0.03, 0.97
0.60, 0.40
C-Head C-Tail B-Head B-Tail
20.2 5.1 12.8 11.9
ThetaB=0.52,ThetaC=0.80
这篇关于EM算法摘记(三):另一类三硬币问题的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!