EM算法摘记(三):另一类三硬币问题

2023-10-16 07:30

本文主要是介绍EM算法摘记(三):另一类三硬币问题,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

EM算法摘记(三):另一类三硬币问题

\qquad 本文主要是对经典论文《What is the expectation maximization algorithm》 Figure.1的详细分析,包括EM公式的推导。

观测数据 Y \mathbf Y Y 的产生方式

\qquad 在前两部分所描述的问题中,观测数据 Y = { y 1 , y 2 , ⋯ , y N } \mathbf Y=\{ y_{1},y_{2},\cdots,y_{N}\} Y={y1,y2,,yN} X = { x 1 , x 2 , ⋯ , x N } \mathbf X=\{ \boldsymbol x_{1},\boldsymbol x_{2},\cdots,\boldsymbol x_{N}\} X={x1,x2,,xN} 的产生是按照以下规则:

\qquad\newline 1 ) \qquad1) 1) 首先,以 P ( z k = 1 ∣ θ ) = π k , k ∈ { 1 , ⋯ , K } P(z_{k}=1|\theta)=\pi_{k},\ k\in\{1,\cdots,K\} P(zk=1∣θ)=πk, k{1,,K} 的概率,确定 z k = 1 , z j = 0 ( ∀ j ≠ k ) z_{k}=1, \ z_{j}=0\ (\forall j \not =k) zk=1, zj=0 (j=k) \newline
    也就是选择第 k k k 个子分布 \newline
2 ) \qquad2) 2) 然后,从第 k k k 个子分布 P ( y ∣ z k = 1 , θ ) P(y|z_{k}=1,\theta) P(yzk=1,θ) (或 P ( x ∣ z k = 1 , θ ) P(\boldsymbol x|z_{k}=1,\theta) P(xzk=1,θ) )中抽出样本 y y y (或 x \boldsymbol x x \newline
    如果每个观测样本 y n y_{n} yn (或 x n \boldsymbol x_{n} xn)的产生过程都是独立的,那么观测样本 y n y_{n} yn (或 x n \boldsymbol x_{n} xn的产生只与 \newline
隐藏向量 z n \mathbf z_{n} zn 有关 \qquad\newline

\qquad
\qquad 如果“观测样本的抽取”采用下图中的过程:

1 ) \qquad1) 1) 首先,以 P ( z k = 1 ∣ θ ) = π k , k ∈ { 1 , ⋯ , K } P(z_{k}=1|\theta)=\pi_{k},\ k\in\{1,\cdots,K\} P(zk=1∣θ)=πk, k{1,,K} 的概率,确定 z k = 1 , z j = 0 ( ∀ j ≠ k ) z_{k}=1, \ z_{j}=0\ (\forall j \not =k) zk=1, zj=0 (j=k),也就是选择第 k k k 个子分布(图中为 K = 2 K=2 K=2 的情形) \newline
2 ) \qquad2) 2) 然后,基于第 k k k 个子分布 P ( y ∣ z k = 1 , θ ) P(y|z_{k}=1,\theta) P(yzk=1,θ) 的概率,重复进行 N = 10 N=10 N=10 次试验,抽出其中一组样本 Y i = { y i , 1 , y i , 2 , ⋯ , y i , N } , i ∈ { 1 , ⋯ , M } \mathbf Y_{i}=\{ y_{i,1},y_{i,2},\cdots,y_{i,N}\}\ ,\ i\in\{1,\cdots,M\} Yi={yi,1,yi,2,,yi,N} , i{1,,M} \newline
\qquad  下图为 M = 5 , N = 10 M=5,N=10 M=5,N=10 的情形:选中硬币B或硬币C之后,然后投掷10次来产生10个观测结果 y y y 组成一组数据,整个数据集 Y = { Y 1 , Y 2 , Y 3 , Y 4 , Y 5 } \mathbf Y=\{\mathbf Y_{1},\mathbf Y_{2},\mathbf Y_{3},\mathbf Y_{4},\mathbf Y_{5}\} Y={Y1,Y2,Y3,Y4,Y5} 包含“5组”观测数据,总共有“50个”观测数据。然而,隐藏变量 Z = { Z 1 , Z 2 , Z 3 , Z 4 , Z 5 } \mathbf Z=\{\mathbf Z_{1},\mathbf Z_{2},\mathbf Z_{3},\mathbf Z_{4},\mathbf Z_{5}\} Z={Z1,Z2,Z3,Z4,Z5} 只有5个(而不是50个,至少从步骤上是这么说)。
\qquad
在这里插入图片描述
\qquad 图(a)中,隐藏变量集 Z \mathbf Z Z 的信息已知,最大似然估计的过程和第一部分所描述的完全一致。

\qquad 图(b)中,隐藏变量集 Z \mathbf Z Z 的信息未知,采用 E M EM EM 算法来完成参数的估计, E M EM EM 公式的推到过程稍有不同。

在这里插入图片描述

Revised From《What is the expectation maximization algorithm》 Figure.1

\qquad

图(b)解释及 E M EM EM公式推导

\qquad 仍然假设三硬币问题概率模型的参数为 θ = ( π , p , q ) \theta=(\pi,p,q) θ=(π,p,q)

( 1 ) (1) (1) 三硬币问题的 1-of-K 表示
 
\qquad 隐藏变量 z 1 = 1 z_{1}=1 z1=1 ,即隐藏向量 z = [ 1 , 0 ] T \bold z=[1,0]^{T} z=[1,0]T,表示 事件“硬币 A 正面
\qquad 该事件的概率为 P ( z 1 = 1 ∣ θ ) = π P(z_{1}=1|\theta)=\pi P(z1=1∣θ)=π

\qquad 隐藏变量 z 2 = 1 z_{2}=1 z2=1 ,即隐藏向量 z = [ 0 , 1 ] T \bold z=[0,1]^{T} z=[0,1]T,表示 事件“硬币 A 反面
\qquad 该事件的概率为 P ( z 2 = 1 ∣ θ ) = 1 − π P(z_{2}=1|\theta)=1-\pi P(z2=1∣θ)=1π

\qquad P ( z 1 = 1 ∣ θ ) = π 1 = π \ P(z_{1}=1|\theta)=\pi_{1}=\pi  P(z1=1∣θ)=π1=π
\qquad   P ( z 2 = 1 ∣ θ ) = π 2 = 1 − π \ P(z_{2}=1|\theta)=\pi_{2}=1-\pi  P(z2=1∣θ)=π2=1π
\qquad 则投掷硬币 A 的概率可以统一描述为: P ( z k = 1 ∣ θ ) = π k , k ∈ { 1 , 2 } P(z_{k}=1|\theta)=\pi_{k}\ ,\ \ \ k\in\{1,2\} P(zk=1∣θ)=πk ,   k{1,2}
 
\qquad 用隐藏向量 z \mathbf z z 表示,也就是:
\qquad\qquad    P ( z ∣ θ ) = ∏ k = 1 2 π k z k = π 1 z 1 ⋅ π 2 z 2 , z = [ z 1 z 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf z|\theta) &= \prod_{k=1}^{2} \pi_{k}^{z_{k}} =\pi_{1}^{z_{1}}\cdot\pi_{2}^{z_{2}}\ \ ,\qquad \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned} P(zθ)=k=12πkzk=π1z1π2z2  ,z=[z1z2]{[10],[01]}

( 2 ) (2) (2) 引入隐藏向量 z \mathbf z z 表示观测值 y y y 的概率

\qquad 单独考虑硬币 B 和硬币 C 关于观测值 y ∈ { 0 , 1 } y \in \{0,1\} y{0,1} 的概率:
\qquad   硬币 B 的概率: P ( y ∣ z 1 = 1 , θ ) = p y ( 1 − p ) ( 1 − y ) P(y|z_{1}=1,\theta)=p^{y}(1-p)^{(1-y)} P(yz1=1,θ)=py(1p)(1y)
\qquad   硬币 C 的概率: P ( y ∣ z 2 = 1 , θ ) = q y ( 1 − q ) ( 1 − y ) P(y|z_{2}=1,\theta)=q^{y}(1-q)^{(1-y)} P(yz2=1,θ)=qy(1q)(1y)

\qquad 为了数学上的描述方便,记 α 1 = p \alpha_{1}=p α1=p 以及 α 2 = q \alpha_{2}=q α2=q,则:
\qquad   硬币 B 的概率: P ( y ∣ z 1 = 1 , θ ) = p y ( 1 − p ) ( 1 − y ) = α 1 y ( 1 − α 1 ) ( 1 − y ) P(y|z_{1}=1,\theta)=p^{y}(1-p)^{(1-y)}=\alpha_{1}^{y}(1-\alpha_{1})^{(1-y)} P(yz1=1,θ)=py(1p)(1y)=α1y(1α1)(1y)
\qquad   硬币 C 的概率: P ( y ∣ z 2 = 1 , θ ) = q y ( 1 − q ) ( 1 − y ) = α 2 y ( 1 − α 2 ) ( 1 − y ) P(y|z_{2}=1,\theta)=q^{y}(1-q)^{(1-y)}=\alpha_{2}^{y}(1-\alpha_{2})^{(1-y)} P(yz2=1,θ)=qy(1q)(1y)=α2y(1α2)(1y)
 
\qquad 于是,硬币 B 和C 的概率可以统一描述为:
\qquad\qquad   P ( y ∣ z k = 1 , θ ) = α k y ( 1 − α k ) ( 1 − y ) , k ∈ { 1 , 2 } , y ∈ { 0 , 1 } P(y|z_{k}=1,\theta)=\alpha_{k}^{y}(1-\alpha_{k})^{(1-y)},\ \ k \in \{1,2\},\ y \in \{0,1\} P(yzk=1,θ)=αky(1αk)(1y),  k{1,2}, y{0,1}
\qquad
\qquad 用隐藏向量 z \mathbf z z 表示,也就是:
\qquad\qquad   P ( y ∣ z , θ ) = ∏ k = 1 2 [ P ( y ∣ z k = 1 , θ ) ] z k = [ P ( y ∣ z 1 = 1 , θ ) ] z 1 ⋅ [ P ( y ∣ z 2 = 1 , θ ) ] z 2 = [ α 1 y ( 1 − α 1 ) ( 1 − y ) ] z 1 ⋅ [ α 2 y ( 1 − α 2 ) ( 1 − y ) ] z 2 = ∏ k = 1 2 [ α k y ( 1 − α k ) ( 1 − y ) ] z k , z = [ z 1 z 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(y|\mathbf z,\theta) &= \prod_{k=1}^{2} \left[\ P(y|z_{k}=1,\theta)\ \right]^{z_{k}} \\ &=\left[P(y|z_{1}=1,\theta)\right]^{z_{1}}\cdot\left[P(y|z_{2}=1,\theta)\right]^{z_{2}} \\ &=\left[\ \alpha_{1}^{y}(1-\alpha_{1})^{(1-y)}\ \right]^{z_{1}}\cdot\left[\ \alpha_{2}^{y}(1-\alpha_{2})^{(1-y)}\ \right]^{z_{2}} \\ &= \prod_{k=1}^{2} \left[\ \alpha_{k}^{y}(1-\alpha_{k})^{(1-y)}\ \right]^{z_{k}},\ \ \ \mathbf z= \left[ \begin{matrix} z_{1}\\z_{2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned} P(yz,θ)=k=12[ P(yzk=1,θ) ]zk=[P(yz1=1,θ)]z1[P(yz2=1,θ)]z2=[ α1y(1α1)(1y) ]z1[ α2y(1α2)(1y) ]z2=k=12[ αky(1αk)(1y) ]zk,   z=[z1z2]{[10],[01]}
\qquad
( 3 ) (3) (3) imcomlete data样本集的似然函数

\qquad 对于所有样本集 Y = { Y 1 , Y 2 , ⋯ , Y M } \mathbf Y=\{\mathbf Y_{1},\mathbf Y_{2},\cdots,\mathbf Y_{M}\} Y={Y1,Y2,,YM},对应的隐藏向量集 Z = { z 1 , z 2 , ⋯ , z M } \mathbf Z=\{ \mathbf z_{1},\mathbf z_{2},\cdots,\mathbf z_{M}\} Z={z1,z2,,zM}

\qquad 其中,第 i i i 组样本 Y i = { y i , 1 , ⋯ , y i , n , ⋯ , y i , N } \mathbf Y_{i}=\{ y_{i,1},\cdots,y_{i,n},\cdots,y_{i,N}\} Yi={yi,1,,yi,n,,yi,N} 又包含了 N N N 个样本, i ∈ { 1 , ⋯ , M } i\in\{1,\cdots,M\} i{1,,M} n ∈ { 1 , ⋯ , N } n\in\{1,\cdots,N\} n{1,,N} y i , n ∈ { 0 , 1 } y_{i,n}\in\{0,1\} yi,n{0,1} z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} zi=[zi1zi2]{[10],[01]}

在图(a)中:
  M = 5 M=5 M=5,总共进行了5组实验,每组实验使用的是同一个硬币(硬币B或硬币C)
  N = 10 N=10 N=10,在每组实验中使用同一个硬币重复独立地进行了10次投掷过程,产生每一个观测结果 y i , n ∈ { 0 , 1 } , i ∈ { 1 , ⋯ , 5 } , n ∈ { 1 , ⋯ , 10 } y_{i,n}\in\{0,1\},i\in\{1,\cdots,5\},n\in\{1,\cdots,10\} yi,n{0,1},i{1,,5},n{1,,10}
 
图(a)观测结果为:
    Y = { Y 1 , Y 2 , Y 3 , Y 4 , Y 5 } \mathbf Y=\{\mathbf Y_{1},\mathbf Y_{2},\mathbf Y_{3},\mathbf Y_{4},\mathbf Y_{5}\} Y={Y1,Y2,Y3,Y4,Y5}
其中, Y 1 = { 1 , 0 , 0 , 0 , 1 , 1 , 0 , 1 , 0 , 1 } , z 1 = [ 1 , 0 ] T ( z 11 = 1 ) \mathbf Y_{1}=\{1,0,0,0,1,1,0,1,0,1\},\qquad\mathbf z_{1}=[1,0]^{T}\ (z_{11}=1) Y1={1,0,0,0,1,1,0,1,0,1},z1=[1,0]T (z11=1)
    Y 2 = { 1 , 1 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 1 } , z 2 = [ 0 , 1 ] T ( z 22 = 1 ) \mathbf Y_{2}=\{1,1,1,1,0,1,1,1,1,1\},\qquad\mathbf z_{2}=[0,1]^{T}\ (z_{22}=1) Y2={1,1,1,1,0,1,1,1,1,1},z2=[0,1]T (z22=1)
    Y 3 = { 1 , 0 , 1 , 1 , 1 , 1 , 1 , 0 , 1 , 1 } , z 3 = [ 0 , 1 ] T ( z 32 = 1 ) \mathbf Y_{3}=\{1,0,1,1,1,1,1,0,1,1\},\qquad\mathbf z_{3}=[0,1]^{T}\ (z_{32}=1) Y3={1,0,1,1,1,1,1,0,1,1},z3=[0,1]T (z32=1)
    Y 4 = { 1 , 0 , 1 , 0 , 0 , 0 , 1 , 1 , 0 , 0 } , z 4 = [ 1 , 0 ] T ( z 41 = 1 ) \mathbf Y_{4}=\{1,0,1,0,0,0,1,1,0,0\},\qquad\mathbf z_{4}=[1,0]^{T}\ (z_{41}=1) Y4={1,0,1,0,0,0,1,1,0,0},z4=[1,0]T (z41=1)
    Y 5 = { 0 , 1 , 1 , 1 , 0 , 1 , 1 , 1 , 0 , 1 } , z 5 = [ 0 , 1 ] T ( z 52 = 1 ) \mathbf Y_{5}=\{0,1,1,1,0,1,1,1,0,1\},\qquad\mathbf z_{5}=[0,1]^{T}\ (z_{52}=1) Y5={0,1,1,1,0,1,1,1,0,1},z5=[0,1]T (z52=1)

\qquad 就有:

\qquad\qquad P ( Y ∣ Z , θ ) = P ( Y 1 , Y 2 , Y 3 , Y 4 , Y 5 ∣ z 1 , ⋯ , z N , θ ) = ∏ i = 1 M P ( Y i ∣ z i , θ ) = ∏ i = 1 M P ( y i , 1 , y i , 2 , ⋯ , y i , N ∣ z i , θ ) = ∏ i = 1 M { ∏ n = 1 N P ( y i , n ∣ z i , θ ) } 由 P ( y ∣ z i , θ ) = ∏ k = 1 2 P ( y ∣ z i k = 1 , θ ) z i k = ∏ i = 1 M ∏ n = 1 N ∏ k = 1 2 P ( y i , n ∣ z i k = 1 , θ ) z i k = ∏ i = 1 M ∏ n = 1 N ∏ k = 1 2 [ α k y i , n ( 1 − α k ) 1 − y i , n ] z i k \begin{aligned} P(\mathbf Y|\mathbf Z,\theta) &= P(\mathbf Y_{1},\mathbf Y_{2},\mathbf Y_{3},\mathbf Y_{4},\mathbf Y_{5} |\mathbf z_{1},\cdots,\mathbf z_{N},\theta) \\ &= \prod_{i=1}^{M} P(\mathbf Y_{i}|\mathbf z_{i},\theta) \\ &= \prod_{i=1}^{M}P(y_{i,1},y_{i,2},\cdots,y_{i,N}|\mathbf z_{i},\theta) \\ &= \prod_{i=1}^{M}\left\{ \prod_{n=1}^{N} P(y_{i,n}|\mathbf z_{i},\theta) \right\} \qquad由 P(y|\mathbf z_{i},\theta) = \prod_{k=1}^{2}P(y|z_{ik}=1,\theta)^{z_{ik}}\\ &= \prod_{i=1}^{M}\prod_{n=1}^{N}\prod_{k=1}^{2} P(y_{i,n}|z_{ik}=1,\theta)^{z_{ik}} \\ &= \prod_{i=1}^{M}\prod_{n=1}^{N}\prod_{k=1}^{2} \left[\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{1-y_{i,n}}\right]^{z_{ik}} \\ \end{aligned} P(YZ,θ)=P(Y1,Y2,Y3,Y4,Y5z1,,zN,θ)=i=1MP(Yizi,θ)=i=1MP(yi,1,yi,2,,yi,Nzi,θ)=i=1M{n=1NP(yi,nzi,θ)}P(yzi,θ)=k=12P(yzik=1,θ)zik=i=1Mn=1Nk=12P(yi,nzik=1,θ)zik=i=1Mn=1Nk=12[αkyi,n(1αk)1yi,n]zik
\qquad
\qquad 又由
\qquad\qquad P ( z i ∣ θ ) = ∏ k = 1 2 π k z i k = π 1 z i 1 ⋅ π 2 z i 2 , z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf z_{i}|\theta) &= \prod_{k=1}^{2} \pi_{k}^{z_{ik}} =\pi_{1}^{z_{i1}}\cdot\pi_{2}^{z_{i2}}\ \ ,\ \ \ \ \ \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned} P(ziθ)=k=12πkzik=π1zi1π2zi2  ,     zi=[zi1zi2]{[10],[01]}

\qquad 从而有

\qquad\qquad P ( Z ∣ θ ) = P ( z 1 , ⋯ , z M ∣ θ ) = ∏ i = 1 M P ( z i ∣ θ ) = ∏ i = 1 M ∏ k = 1 2 π k z i k , z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \begin{aligned} P(\mathbf Z|\theta) &= P(\mathbf z_{1},\cdots,\mathbf z_{M}|\theta) \\ &=\prod_{i=1}^{M}P(\mathbf z_{i}|\theta) \\ &=\prod_{i=1}^{M}\prod_{k=1}^{2} \pi_{k}^{z_{ik}} ,\ \ \ \ \ \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \end{aligned} P(Zθ)=P(z1,,zMθ)=i=1MP(ziθ)=i=1Mk=12πkzik,     zi=[zi1zi2]{[10],[01]}

\qquad 由于隐藏向量集 Z \mathbf Z Z 无法观测, E M EM EM算法首先对所有的隐藏向量 Z = { z 1 , ⋯ , z i , ⋯ , z M } \mathbf Z=\{\mathbf z_{1},\cdots,\mathbf z_{i},\cdots,\mathbf z_{M}\} Z={z1,,zi,,zM} 进行猜测,使得观测数据从 imcomplete 形式 Y \mathbf Y Y 变成 complete 形式 ( Y , Z ) (\mathbf Y,\mathbf Z) (Y,Z)
\qquad
\qquad 此时,complete data数据 ( Y , Z ) (\mathbf Y,\mathbf Z) (Y,Z)似然函数(如果 Z \mathbf Z Z 已知)为:
\qquad
\qquad\qquad P ( Y , Z ∣ θ ) = P ( Y ∣ Z , θ ) P ( Z ∣ θ ) = ∏ i = 1 M ∏ n = 1 N ∏ k = 1 2 P ( y i , n ∣ z i k = 1 , θ ) z i k ⋅ ∏ i = 1 M ∏ k = 1 2 π k z i k = ∏ i = 1 M ∏ n = 1 N ∏ k = 1 2 [ α k y i , n ( 1 − α k ) 1 − y i , n ] z i k ⋅ ∏ i = 1 M ∏ k = 1 2 π k z i k = ∏ i = 1 M { ∏ n = 1 N ∏ k = 1 2 [ α k y i , n ( 1 − α k ) 1 − y i , n ] z i k ⋅ ∏ k = 1 2 π k z i k } \begin{aligned} P(\mathbf Y,\mathbf Z|\theta) &=P(\mathbf Y|\mathbf Z,\theta)P(\mathbf Z|\theta) \\ &=\prod_{i=1}^{M}\prod_{n=1}^{N}\prod_{k=1}^{2} P(y_{i,n}|z_{ik}=1,\theta)^{z_{ik}} \cdot \prod_{i=1}^{M}\prod_{k=1}^{2} \pi_{k}^{z_{ik}} \\ &=\prod_{i=1}^{M}\prod_{n=1}^{N}\prod_{k=1}^{2} \left[\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{1-y_{i,n}}\right]^{z_{ik}} \cdot \prod_{i=1}^{M}\prod_{k=1}^{2} \pi_{k}^{z_{ik}} \\ &=\prod_{i=1}^{M}\left\{\prod_{n=1}^{N}\prod_{k=1}^{2} \left[\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{1-y_{i,n}}\right]^{z_{ik}} \cdot \prod_{k=1}^{2} \pi_{k}^{z_{ik}}\right\} \\ \end{aligned} P(Y,Zθ)=P(YZ,θ)P(Zθ)=i=1Mn=1Nk=12P(yi,nzik=1,θ)ziki=1Mk=12πkzik=i=1Mn=1Nk=12[αkyi,n(1αk)1yi,n]ziki=1Mk=12πkzik=i=1M{n=1Nk=12[αkyi,n(1αk)1yi,n]zikk=12πkzik}

\qquad 可得到对数似然函数(其值取决于 Z \mathbf Z Z ):

\qquad\qquad ln ⁡ P ( Y , Z ∣ θ ) = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 z i k ln ⁡ P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 z i k ln ⁡ π k \begin{aligned} \ln P(\mathbf Y,\mathbf Z|\theta) &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{ik} \ln P(y_{i,n}|z_{ik}=1,\theta) + \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} z_{ik} \ln\pi_{k} \\ \end{aligned} lnP(Y,Zθ)=i=1Mn=1Nk=12ziklnP(yi,nzik=1,θ)+i=1Mk=12ziklnπk

\qquad
( 4 ) (4) (4) 计算对数似然函数 ln ⁡ P ( Y , Z ∣ θ ) \ln P(\mathbf Y,\mathbf Z|\theta) lnP(Y,Zθ) 的期望

\qquad 定义函数 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i))

\qquad\qquad Q ( θ , θ ( i ) ) = E P ( Z ∣ Y , θ ) [ ln ⁡ P ( Y , Z ∣ θ ) ∣ Y , θ ( i ) ] = E P ( Z ∣ Y , θ ) { ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 z i k ln ⁡ P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 z i k ln ⁡ π k } = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 E P ( Z ∣ Y , θ ) [ z i k ] ln ⁡ P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 E P ( Z ∣ Y , θ ) [ z i k ] ln ⁡ π k \begin{aligned}Q(\theta,\theta^{(i)})&=E_{P(\bold Z|\bold Y,\theta)}[\ln P(\bold Y, \bold Z|\theta)\ |\ \bold Y,\theta^{(i)}] \\ &= E_{P(\bold Z|\bold Y,\theta)}\left\{\displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} z_{ik} \ln P(y_{i,n}|z_{ik}=1,\theta) + \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} z_{ik} \ln\pi_{k}\right\} \\ &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} E_{P(\bold Z|\bold Y,\theta)}[z_{ik}] \ln P(y_{i,n}|z_{ik}=1,\theta) + \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} E_{P(\bold Z|\bold Y,\theta)}[z_{ik}] \ln\pi_{k} \\ \end{aligned} Q(θ,θ(i))=EP(ZY,θ)[lnP(Y,Zθ)  Y,θ(i)]=EP(ZY,θ){i=1Mn=1Nk=12ziklnP(yi,nzik=1,θ)+i=1Mk=12ziklnπk}=i=1Mn=1Nk=12EP(ZY,θ)[zik]lnP(yi,nzik=1,θ)+i=1Mk=12EP(ZY,θ)[zik]lnπk

\qquad
\qquad 为了计算 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 的值,必须先计算: E P ( Z ∣ Y , θ ) [ z i k ] = ∑ z i k z i k P ( z i ∣ Y i , θ ) E_{P(\bold Z|\bold Y,\theta)}[z_{ik}]=\displaystyle\sum_{z_{ik}}z_{ik}P(\bold z_{i}|\bold Y_{i},\theta) EP(ZY,θ)[zik]=zikzikP(ziYi,θ)

\qquad
\qquad\qquad P ( z i ∣ Y i , θ ) = P ( z i , Y i ∣ θ ) P ( Y i ∣ θ ) = P ( Y i ∣ z i , θ ) P ( z i ∣ θ ) ∑ z i P ( Y i ∣ z i , θ ) P ( z i ∣ θ ) , z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } = P ( Y i ∣ z i k = 1 , θ ) P ( z i k = 1 ∣ θ ) P ( Y i ∣ z i 1 = 1 , θ ) P ( z i 1 = 1 ∣ θ ) + P ( Y i ∣ z i 2 = 1 , θ ) P ( z i 2 = 1 ∣ θ ) \begin{aligned}P(\bold z_{i}|\bold Y_{i},\theta)&=\dfrac{P(\bold z_{i},\bold Y_{i}|\theta)}{P(\bold Y_{i}|\theta)} \\ &=\dfrac{P(\bold Y_{i}|\bold z_{i},\theta)P(\bold z_{i}|\theta)}{\sum\limits_{\bold z_{i}}P(\bold Y_{i}|\bold z_{i},\theta)P(\bold z_{i}|\theta)},\ \ \ \ \ \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} \\ &=\dfrac{P(\bold Y_{i}|z_{ik}=1,\theta)P(z_{ik}=1|\theta)}{P(\bold Y_{i}|z_{i1}=1,\theta)P(z_{i1}=1|\theta)+P(\bold Y_{i}|z_{i2}=1,\theta)P(z_{i2}=1|\theta)} \\ \end{aligned} P(ziYi,θ)=P(Yiθ)P(zi,Yiθ)=ziP(Yizi,θ)P(ziθ)P(Yizi,θ)P(ziθ),     zi=[zi1zi2]{[10],[01]}=P(Yizi1=1,θ)P(zi1=1∣θ)+P(Yizi2=1,θ)P(zi2=1∣θ)P(Yizik=1,θ)P(zik=1∣θ)

\qquad 因此有

\qquad\qquad E P ( Z ∣ Y , θ ) [ z i k ] = ∑ z i k z i k P ( z i ∣ Y i , θ ) = ∑ z i k z i k P ( Y i ∣ z i k = 1 , θ ) P ( z i k = 1 ∣ θ ) P ( Y i ∣ z i 1 = 1 , θ ) P ( z i 1 = 1 ∣ θ ) + P ( Y i ∣ z i 2 = 1 , θ ) P ( z i 2 = 1 ∣ θ ) = 1 ⋅ P ( Y i ∣ z i k = 1 , θ ) P ( z i k = 1 ∣ θ ) + 0 ⋅ P ( Y i ∣ z i k = 0 , θ ) P ( z i k = 0 ∣ θ ) P ( Y i ∣ z i 1 = 1 , θ ) P ( z i 1 = 1 ∣ θ ) + P ( Y i ∣ z i 2 = 1 , θ ) P ( z i 2 = 1 ∣ θ ) = P ( Y i ∣ z i k = 1 , θ ) P ( z i k = 1 ∣ θ ) P ( Y i ∣ z i 1 = 1 , θ ) P ( z i 1 = 1 ∣ θ ) + P ( Y i ∣ z i 2 = 1 , θ ) P ( z i 2 = 1 ∣ θ ) = π k P ( Y i ∣ z i k = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{ik}]&=\displaystyle\sum_{z_{ik}}z_{ik}P(\bold z_{i}|\bold Y_{i},\theta)\\ &=\displaystyle\sum_{z_{ik}}\dfrac{z_{ik}P(\bold Y_{i}|z_{ik}=1,\theta)P(z_{ik}=1|\theta)}{P(\bold Y_{i}|z_{i1}=1,\theta)P(z_{i1}=1|\theta)+P(\bold Y_{i}|z_{i2}=1,\theta)P(z_{i2}=1|\theta)} \\ &=\dfrac{1\cdot P(\bold Y_{i}|z_{ik}=1,\theta)P(z_{ik}=1|\theta)+0\cdot P(\bold Y_{i}|z_{ik}=0,\theta)P(z_{ik}=0|\theta)}{P(\bold Y_{i}|z_{i1}=1,\theta)P(z_{i1}=1|\theta)+P(\bold Y_{i}|z_{i2}=1,\theta)P(z_{i2}=1|\theta)} \\ &=\dfrac{P(\bold Y_{i}|z_{ik}=1,\theta)P(z_{ik}=1|\theta)}{P(\bold Y_{i}|z_{i1}=1,\theta)P(z_{i1}=1|\theta)+P(\bold Y_{i}|z_{i2}=1,\theta)P(z_{i2}=1|\theta)} \\ &=\dfrac{\pi_{k}P(\bold Y_{i}|z_{ik}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} \\ \end{aligned} EP(ZY,θ)[zik]=zikzikP(ziYi,θ)=zikP(Yizi1=1,θ)P(zi1=1∣θ)+P(Yizi2=1,θ)P(zi2=1∣θ)zikP(Yizik=1,θ)P(zik=1∣θ)=P(Yizi1=1,θ)P(zi1=1∣θ)+P(Yizi2=1,θ)P(zi2=1∣θ)1P(Yizik=1,θ)P(zik=1∣θ)+0P(Yizik=0,θ)P(zik=0∣θ)=P(Yizi1=1,θ)P(zi1=1∣θ)+P(Yizi2=1,θ)P(zi2=1∣θ)P(Yizik=1,θ)P(zik=1∣θ)=π1P(Yizi1=1,θ)+π2P(Yizi2=1,θ)πkP(Yizik=1,θ)

\qquad
\qquad
\qquad\qquad P ( Y i ∣ z i k = 1 , θ ) = P ( y i , 1 , ⋯ , y i , n , ⋯ , y i , N ∣ z i k = 1 , θ ) = ∏ n = 1 N P ( y i , n ∣ z i k = 1 , θ ) = ∏ n = 1 N α k y i , n ( 1 − α k ) ( 1 − y i , n ) , k ∈ { 1 , 2 } , y i , n ∈ { 0 , 1 } \begin{aligned}P(\bold Y_{i}|z_{ik}=1,\theta)&=P(y_{i,1},\cdots,y_{i,n},\cdots,y_{i,N}|z_{ik}=1,\theta)\\ &=\prod_{n=1}^{N}P(y_{i,n}|z_{ik}=1,\theta)\\ &=\prod_{n=1}^{N}\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{(1-y_{i,n})},\ \ k \in \{1,2\},\ y_{i,n} \in \{0,1\}\\ \end{aligned} P(Yizik=1,θ)=P(yi,1,,yi,n,,yi,Nzik=1,θ)=n=1NP(yi,nzik=1,θ)=n=1Nαkyi,n(1αk)(1yi,n),  k{1,2}, yi,n{0,1}

\qquad 可得到: E P ( Z ∣ Y , θ ) [ z i k ] = π k ∏ n = 1 N α k y i , n ( 1 − α k ) ( 1 − y i , n ) π 1 ∏ n = 1 N α 1 y i , n ( 1 − α 1 ) ( 1 − y i , n ) + π 2 ∏ n = 1 N α 2 y i , n ( 1 − α 2 ) ( 1 − y i , n ) \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{ik}]&=\dfrac{\pi_{k}\prod_{n=1}^{N}\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{(1-y_{i,n})}}{\pi_{1}\prod_{n=1}^{N}\alpha_{1}^{y_{i,n}}(1-\alpha_{1})^{(1-y_{i,n})}+\pi_{2}\prod_{n=1}^{N}\alpha_{2}^{y_{i,n}}(1-\alpha_{2})^{(1-y_{i,n})}} \\ \end{aligned} EP(ZY,θ)[zik]=π1n=1Nα1yi,n(1α1)(1yi,n)+π2n=1Nα2yi,n(1α2)(1yi,n)πkn=1Nαkyi,n(1αk)(1yi,n)

关于图(b)的解释(一)
 
首先假设初始值 p = α 1 = θ ^ B ( 0 ) = 0.5 p=\alpha_{1}=\hat \theta_{B}^{(0)}=0.5 p=α1=θ^B(0)=0.5 q = α 2 = θ ^ C ( 0 ) = 0.6 q=\alpha_{2}=\hat \theta_{C}^{(0)}=0.6 q=α2=θ^C(0)=0.6,分析试验结果:
第1组试验(10次投掷,5次为正面,5次为反面):
  如果是硬币C投掷的结果,则似然函数值
    L i k e l i h o o d c o i n C = P ( Y 1 ∣ z 12 = 1 , θ ) = α 2 5 ( 1 − α 2 ) 5 = 0. 6 5 ⋅ 0. 4 5 = 0.0007962624 Likelihood_{coin C}=P(\bold Y_{1}|z_{12}=1,\theta)=\alpha_{2}^{5}(1-\alpha_{2})^{5}=0.6^{5}\cdot0.4^{5}=0.0007962624 LikelihoodcoinC=P(Y1z12=1,θ)=α25(1α2)5=0.650.45=0.0007962624
  如果是硬币B投掷的结果,则似然函数值
    L i k e l i h o o d c o i n B = P ( Y 1 ∣ z 11 = 1 , θ ) = α 1 5 ( 1 − α 1 ) 5 = 0. 5 5 ⋅ 0. 5 5 = 0.0009765625 Likelihood_{coin B}=P(\bold Y_{1}|z_{11}=1,\theta)=\alpha_{1}^{5}(1-\alpha_{1})^{5}=0.5^{5}\cdot0.5^{5}=0.0009765625 LikelihoodcoinB=P(Y1z11=1,θ)=α15(1α1)5=0.550.55=0.0009765625

\qquad
( 5 ) E (5)\ E (5) E 步公式

\qquad 由于在图(b)抽取的数据中,我们并不知道隐藏变量 z i \mathbf z_{i} zi 的值,因此我们在给定模型参数为 ( μ , Σ , π ) (\boldsymbol{\mu},\bold{\Sigma},\boldsymbol{\pi}) (μ,Σ,π) 的情况下,求取了 z i \mathbf z_{i} zi 的期望:

\qquad\qquad E P ( Z ∣ Y , θ ) [ z i k ] = π k P ( Y i ∣ z i k = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) = π k ∏ n = 1 N α k y i , n ( 1 − α k ) ( 1 − y i , n ) π 1 ∏ n = 1 N α 1 y i , n ( 1 − α 1 ) ( 1 − y i , n ) + π 2 ∏ n = 1 N α 2 y i , n ( 1 − α 2 ) ( 1 − y i , n ) \begin{aligned}E_{P(\bold Z|\bold Y,\theta)}[z_{ik}]&=\dfrac{\pi_{k}P(\bold Y_{i}|z_{ik}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} \\ &=\dfrac{\pi_{k}\prod_{n=1}^{N}\alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{(1-y_{i,n})}}{\pi_{1}\prod_{n=1}^{N}\alpha_{1}^{y_{i,n}}(1-\alpha_{1})^{(1-y_{i,n})}+\pi_{2}\prod_{n=1}^{N}\alpha_{2}^{y_{i,n}}(1-\alpha_{2})^{(1-y_{i,n})}} \end{aligned} EP(ZY,θ)[zik]=π1P(Yizi1=1,θ)+π2P(Yizi2=1,θ)πkP(Yizik=1,θ)=π1n=1Nα1yi,n(1α1)(1yi,n)+π2n=1Nα2yi,n(1α2)(1yi,n)πkn=1Nαkyi,n(1αk)(1yi,n)

\qquad 并以此来代替 z i k z_{ik} zik,用于进行对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 的计算,若记

\qquad\qquad γ ( z i k ) = π k P ( Y i ∣ z i k = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) , k ∈ { 1 , 2 } \gamma(z_{ik})=\dfrac{\pi_{k}P(\bold Y_{i}|z_{ik}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} \ ,\ \ \ k\in\{1,2\} γ(zik)=π1P(Yizi1=1,θ)+π2P(Yizi2=1,θ)πkP(Yizik=1,θ) ,   k{1,2}

\qquad 则对数似然函数为:

\qquad\qquad Q ( θ , θ ( i ) ) = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) ln ⁡ P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 γ ( z i k ) ln ⁡ π k \begin{aligned}Q(\theta,\theta^{(i)}) &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln P(y_{i,n}|z_{ik}=1,\theta) + \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln\pi_{k} \\ \end{aligned} Q(θ,θ(i))=i=1Mn=1Nk=12γ(zik)lnP(yi,nzik=1,θ)+i=1Mk=12γ(zik)lnπk

三硬币问题中,由于 K = 2 K=2 K=2,因此分析相对简单,可以看出:
 
   γ ( z i 1 ) = π 1 P ( Y i ∣ z i 1 = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) \gamma(z_{i1})=\dfrac{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} γ(zi1)=π1P(Yizi1=1,θ)+π2P(Yizi2=1,θ)π1P(Yizi1=1,θ),其中 P ( Y i ∣ z i 1 = 1 , θ ) P(\bold Y_{i}|z_{i1}=1,\theta) P(Yizi1=1,θ) 是第 i i i 组实验假设采用硬币B进行投掷时 ( z i 1 = 1 ) (z_{i1}=1) (zi1=1) 对应观测数据 Y i = { y i , 1 , ⋯ , y i , n , ⋯ , y i , N } \mathbf Y_{i}=\{ y_{i,1},\cdots,y_{i,n},\cdots,y_{i,N}\} Yi={yi,1,,yi,n,,yi,N} 的似然值 (likelihood) \text{(likelihood)} (likelihood)
 
   γ ( z i 2 ) = π 2 P ( Y i ∣ z i 2 = 1 , θ ) π 1 P ( Y i ∣ z i 1 = 1 , θ ) + π 2 P ( Y i ∣ z i 2 = 1 , θ ) \gamma(z_{i2})=\dfrac{\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)}{\pi_{1}P(\bold Y_{i}|z_{i1}=1,\theta)+\pi_{2}P(\bold Y_{i}|z_{i2}=1,\theta)} γ(zi2)=π1P(Yizi1=1,θ)+π2P(Yizi2=1,θ)π2P(Yizi2=1,θ),其中 P ( Y i ∣ z i 2 = 1 , θ ) P(\bold Y_{i}|z_{i2}=1,\theta) P(Yizi2=1,θ) 是第 i i i 组实验假设采用硬币C进行投掷时 ( z i 2 = 1 ) (z_{i2}=1) (zi2=1) 对应观测数据 Y i = { y i , 1 , ⋯ , y i , n , ⋯ , y i , N } \mathbf Y_{i}=\{ y_{i,1},\cdots,y_{i,n},\cdots,y_{i,N}\} Yi={yi,1,,yi,n,,yi,N} 的似然值 (likelihood) \text{(likelihood)} (likelihood)
 
显然, γ ( z i 1 ) = 1 − γ ( z i 2 ) \gamma(z_{i1})=1-\gamma(z_{i2}) γ(zi1)=1γ(zi2)
 
可以看出:
  (1) 在图(a)中,隐藏向量 z i = [ z i 1 z i 2 ] ∈ { [ 1 0 ] , [ 0 1 ] } \mathbf z_{i}= \left[ \begin{matrix} z_{i1}\\z_{i2} \end{matrix} \right] \in \left\{ \left[ \begin{matrix} 1\\0 \end{matrix} \right],\left[ \begin{matrix} 0\\1 \end{matrix} \right]\right\} zi=[zi1zi2]{[10],[01]}
 
  (2) 然而,在图(b)的 E M EM EM 算法中,对隐藏向量 z i \mathbf z_{i} zi 的猜测为 z i = [ γ ( z i 1 ) , γ ( z i 2 ) ] T \mathbf z_{i}=[\ \gamma(z_{i1}),\gamma(z_{i2})\ ]^{T} zi=[ γ(zi1),γ(zi2) ]T,也就是在估算对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 的时候,同时考虑了硬币B和硬币C投掷时对应观测数据 Y i \mathbf Y_{i} Yi 的似然值, γ ( z i k ) \gamma(z_{ik}) γ(zik) 起到了一个加权的作用。

关于图(b)的解释(二)
 
第1组试验观测数据为 Y i \mathbf Y_{i} Yi(10次投掷,5次为正面,5次为反面):
 
  如果是硬币C投掷的结果,则似然函数值
    P ( Y 1 ∣ z 12 = 1 , θ ) = α 2 5 ( 1 − α 2 ) 5 = 0. 6 5 ⋅ 0. 4 5 = 0.0007962624 P(\bold Y_{1}|z_{12}=1,\theta)=\alpha_{2}^{5}(1-\alpha_{2})^{5}=0.6^{5}\cdot0.4^{5}=0.0007962624 P(Y1z12=1,θ)=α25(1α2)5=0.650.45=0.0007962624
  如果是硬币B投掷的结果,则似然函数值
    P ( Y 1 ∣ z 11 = 1 , θ ) = α 1 5 ( 1 − α 1 ) 5 = 0. 5 5 ⋅ 0. 5 5 = 0.0009765625 P(\bold Y_{1}|z_{11}=1,\theta)=\alpha_{1}^{5}(1-\alpha_{1})^{5}=0.5^{5}\cdot0.5^{5}=0.0009765625 P(Y1z11=1,θ)=α15(1α1)5=0.550.55=0.0009765625
 
由于在该论文中,假设 π 1 = π 2 = 0.5 \pi_{1}=\pi_{2}=0.5 π1=π2=0.5,也就是等概率选择硬币B或者硬币C来投掷出 y i , n y_{i,n} yi,n 的结果,因此:
 
    γ ( z 11 ) = π 1 P ( Y 1 ∣ z 11 = 1 , θ ) π 1 P ( Y 1 ∣ z 11 = 1 , θ ) + π 2 P ( Y 1 ∣ z 12 = 1 , θ ) ≈ 0.55 \gamma(z_{11})=\dfrac{\pi_{1}P(\bold Y_{1}|z_{11}=1,\theta)}{\pi_{1}P(\bold Y_{1}|z_{11}=1,\theta)+\pi_{2}P(\bold Y_{1}|z_{12}=1,\theta)}\approx 0.55 γ(z11)=π1P(Y1z11=1,θ)+π2P(Y1z12=1,θ)π1P(Y1z11=1,θ)0.55
    γ ( z 12 ) = π 2 P ( Y 1 ∣ z 12 = 1 , θ ) π 1 P ( Y 1 ∣ z 11 = 1 , θ ) + π 2 P ( Y 1 ∣ z 12 = 1 , θ ) ≈ 0.45 = 1 − γ ( z 11 ) \gamma(z_{12})=\dfrac{\pi_{2}P(\bold Y_{1}|z_{12}=1,\theta)}{\pi_{1}P(\bold Y_{1}|z_{11}=1,\theta)+\pi_{2}P(\bold Y_{1}|z_{12}=1,\theta)}\approx 0.45=1-\gamma(z_{11}) γ(z12)=π1P(Y1z11=1,θ)+π2P(Y1z12=1,θ)π2P(Y1z12=1,θ)0.45=1γ(z11) 
 
这个过程实际上就是猜测隐藏变量 z 1 = [ 0.55 , 0.45 ] T \bold z_{1}=[0.55,0.45]^{T} z1=[0.55,0.45]T

\qquad
( 6 ) M (6)\ M (6) M 步公式

\qquad 对数似然函数值 Q ( θ , θ ( i ) ) Q(\theta,\theta^{(i)}) Q(θ,θ(i)) 分别对 α 1 , α 2 \alpha_{1},\alpha_{2} α1,α2 求最大似然解:

\qquad\qquad Q ( θ , θ ( i ) ) = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) ln ⁡ P ( y i , n ∣ z i k = 1 , θ ) + ∑ i = 1 M ∑ k = 1 2 γ ( z i k ) ln ⁡ π k = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) ln ⁡ [ α k y i , n ( 1 − α k ) ( 1 − y i , n ) ] + ∑ i = 1 M ∑ k = 1 2 γ ( z i k ) ln ⁡ π k = ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) [ y i , n ln ⁡ α k + ( 1 − y i , n ) ln ⁡ ( 1 − α k ) ] + ∑ i = 1 M ∑ k = 1 2 γ ( z i k ) ln ⁡ π k \begin{aligned}Q(\theta,\theta^{(i)}) &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln P(y_{i,n}|z_{ik}=1,\theta)+ \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln\pi_{k} \\ &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik})\ln[ \alpha_{k}^{y_{i,n}}(1-\alpha_{k})^{(1-y_{i,n})} ]+ \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln\pi_{k} \\ &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik})[ y_{i,n}\ln\alpha_{k}+(1-y_{i,n})\ln(1-\alpha_{k}) ]+ \displaystyle\sum_{i=1}^{M}\displaystyle\sum_{k=1}^{2} \gamma(z_{ik}) \ln\pi_{k} \\ \end{aligned} Q(θ,θ(i))=i=1Mn=1Nk=12γ(zik)lnP(yi,nzik=1,θ)+i=1Mk=12γ(zik)lnπk=i=1Mn=1Nk=12γ(zik)ln[αkyi,n(1αk)(1yi,n)]+i=1Mk=12γ(zik)lnπk=i=1Mn=1Nk=12γ(zik)[yi,nlnαk+(1yi,n)ln(1αk)]+i=1Mk=12γ(zik)lnπk

\qquad\qquad ∂ Q ( θ , θ ( i ) ) ∂ α k = ∂ { ∑ i = 1 M ∑ n = 1 N ∑ k = 1 2 γ ( z i k ) [ y i , n ln ⁡ α k + ( 1 − y i , n ) ln ⁡ ( 1 − α k ) ] } ∂ α k = ∑ i = 1 M ∑ n = 1 N γ ( z i k ) [ y i , n α k − 1 − y i , n 1 − α k ] = 0 \begin{aligned}\frac{\partial Q(\theta,\theta^{(i)})}{\partial \alpha_{k}} &=\frac{\partial\left\{ \sum\limits_{i=1}^{M} \sum\limits_{n=1}^{N}\sum\limits_{k=1}^{2} \gamma(z_{ik}) [ y_{i,n}\ln\alpha_{k}+(1-y_{i,n})\ln(1-\alpha_{k}) ] \right\} }{\partial \alpha_{k}} \\ &= \displaystyle\sum_{i=1}^{M} \displaystyle\sum_{n=1}^{N} \gamma(z_{ik}) \left[ \frac{y_{i,n}}{\alpha_{k}}-\frac{1-y_{i,n}}{1-\alpha_{k}} \right] =0 \\ \end{aligned} αkQ(θ,θ(i))=αk{i=1Mn=1Nk=12γ(zik)[yi,nlnαk+(1yi,n)ln(1αk)]}=i=1Mn=1Nγ(zik)[αkyi,n1αk1yi,n]=0

\qquad 可得到:

\qquad\qquad α ^ k = ∑ i = 1 M ∑ n = 1 N γ ( z i k ) y i , n ∑ i = 1 M ∑ n = 1 N γ ( z i k ) \hat {\boldsymbol{\alpha}}_{k}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{ik})y_{i,n}}{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{ik})} α^k=i=1Mn=1Nγ(zik)i=1Mn=1Nγ(zik)yi,n

关于图(b)的解释(三)
 
三硬币问题中  K = 2 K=2 K=2
 
  θ ^ B = α ^ 1 = ∑ i = 1 M ∑ n = 1 N γ ( z i 1 ) y i , n ∑ i = 1 M ∑ n = 1 N γ ( z i 1 ) = ∑ i = 1 M ∑ y i , n = 1 γ ( z i 1 ) ∑ i = 1 M [ ∑ y i , n = 1 γ ( z i 1 ) + ∑ y i , n = 0 γ ( z i 1 ) ] \hat \theta_{B}=\hat {\boldsymbol{\alpha}}_{1}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{i1})y_{i,n}}{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{i1})}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{y_{i,n}=1}\gamma(z_{i1})}{\sum\limits_{i=1}^{M}\left[ \sum\limits_{y_{i,n}=1}\gamma(z_{i1})+\sum\limits_{y_{i,n}=0}\gamma(z_{i1})\right] } θ^B=α^1=i=1Mn=1Nγ(zi1)i=1Mn=1Nγ(zi1)yi,n=i=1M[yi,n=1γ(zi1)+yi,n=0γ(zi1)]i=1Myi,n=1γ(zi1)
 
 其中, ∑ y i , n = 1 γ ( z i 1 ) \sum\limits_{y_{i,n}=1}\gamma(z_{i1}) yi,n=1γ(zi1) 就是 “ “ γ ( z i 1 ) \gamma(z_{i1}) γ(zi1) × \times × i i i 组实验中硬币B正面次数 ” ”
     ∑ y i , n = 0 γ ( z i 1 ) \sum\limits_{y_{i,n}=0}\gamma(z_{i1}) yi,n=0γ(zi1) 就是 “ “ γ ( z i 1 ) \gamma(z_{i1}) γ(zi1) × \times × i i i 组实验中硬币B反面次数 ” ”
 
 在图(b)中:  θ ^ B = α ^ 1 = 2.8 + 1.8 + 2.1 + 2.6 + 2.5 ( 2.8 + 2.8 ) + ( 1.8 + 0.2 ) + ( 2.1 + 0.5 ) + ( 2.6 + 3.9 ) + ( 2.5 + 1.1 ) \hat \theta_{B}=\hat{\boldsymbol{\alpha}}_{1}=\dfrac{2.8+1.8+2.1+2.6+2.5}{(2.8+2.8)+(1.8+0.2)+(2.1+0.5)+(2.6+3.9)+(2.5+1.1) } θ^B=α^1=(2.8+2.8)+(1.8+0.2)+(2.1+0.5)+(2.6+3.9)+(2.5+1.1)2.8+1.8+2.1+2.6+2.5
         = 11.7 11.7 + 8.4 ≈ 0.58 \qquad\ \ \ \ \ \ \ =\dfrac{11.7}{11.7+8.4}\approx0.58        =11.7+8.411.70.58
 
  θ ^ C = α ^ 2 = ∑ i = 1 M ∑ n = 1 N γ ( z i 2 ) y i , n ∑ i = 1 M ∑ n = 1 N γ ( z i 2 ) = ∑ i = 1 M ∑ y i , n = 1 γ ( z i 2 ) ∑ i = 1 M [ ∑ y i , n = 1 γ ( z i 2 ) + ∑ y i , n = 0 γ ( z i 2 ) ] \hat{\theta}_{C}=\hat{ \boldsymbol{\alpha}}_{2}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{i2})y_{i,n}}{\sum\limits_{i=1}^{M}\sum\limits_{n=1}^{N}\gamma(z_{i2})}=\dfrac{\sum\limits_{i=1}^{M}\sum\limits_{y_{i,n}=1}\gamma(z_{i2})}{\sum\limits_{i=1}^{M}\left[ \sum\limits_{y_{i,n}=1}\gamma(z_{i2})+\sum\limits_{y_{i,n}=0}\gamma(z_{i2})\right] } θ^C=α^2=i=1Mn=1Nγ(zi2)i=1Mn=1Nγ(zi2)yi,n=i=1M[yi,n=1γ(zi2)+yi,n=0γ(zi2)]i=1Myi,n=1γ(zi2)
 
 其中, ∑ y i , n = 1 γ ( z i 2 ) \sum\limits_{y_{i,n}=1}\gamma(z_{i2}) yi,n=1γ(zi2) 就是 “ “ γ ( z i 2 ) \gamma(z_{i2}) γ(zi2) × \times × i i i 组实验中硬币C正面次数 ” ”
     ∑ y i , n = 0 γ ( z i 2 ) \sum\limits_{y_{i,n}=0}\gamma(z_{i2}) yi,n=0γ(zi2) 就是 “ “ γ ( z i 2 ) \gamma(z_{i2}) γ(zi2) × \times × i i i 组实验中硬币C反面次数 ” ”
 
 在图(b)中:  θ ^ C = α ^ 2 = 2.2 + 7.2 + 5.9 + 1.4 + 4.5 ( 2.2 + 2.2 ) + ( 7.2 + 0.8 ) + ( 5.9 + 1.5 ) + ( 1.4 + 2.1 ) + ( 4.5 + 1.9 ) \hat{\theta}_{C}=\hat{\boldsymbol{\alpha}}_{2}=\dfrac{2.2+7.2+5.9+1.4+4.5}{(2.2+2.2)+(7.2+0.8)+(5.9+1.5)+(1.4+2.1)+(4.5+1.9) } θ^C=α^2=(2.2+2.2)+(7.2+0.8)+(5.9+1.5)+(1.4+2.1)+(4.5+1.9)2.2+7.2+5.9+1.4+4.5
         = 21.3 21.3 + 8.6 ≈ 0.71 \qquad\ \ \ \ \ \ \ =\dfrac{21.3}{21.3+8.6}\approx0.71        =21.3+8.621.30.71

代码实现

《What is the expectation maximization algorithm》 Figure.1的代码实现

import numpy as npdef cal_Likelihood(alpha1, alpha2, Yi):headCount = np.sum(Yi==1)tailCount = np.sum(Yi==0)likelihoodB = np.power(alpha1,headCount)*np.power(1-alpha1,tailCount)likelihoodC = np.power(alpha2,headCount)*np.power(1-alpha2,tailCount)gamma1 = likelihoodB/(likelihoodB + likelihoodC)gamma2 = likelihoodC/(likelihoodB + likelihoodC)return gamma1,gamma2def E_Step(alpha1, alpha2, Y):print('\n====== E-Step ======')gamma = np.zeros((5,2))    print(('coin C, coin B  ------ gamma(Zik)=E[Zik]'))for i in range(5):gamma1,gamma2 = cal_Likelihood(alpha1, alpha2, Y[i,:])print((' %2.2f,\t%2.2f') % (gamma2,gamma1))gamma[i,:] = np.array([gamma1,gamma2])return gammadef M_Step(gamma,Y):print('\n====== M-Step ======')t1h = 0.000001t1t = 0.000001t2h = 0.000001t2t = 0.000001for n in range(Y.shape[0]):t1h = t1h + np.sum(gamma[n,0]*Y[n,:])t1t = t1t + np.sum(gamma[n,0]*(1-Y[n,:]))t2h = t2h + np.sum(gamma[n,1]*Y[n,:])t2t = t2t + np.sum(gamma[n,1]*(1-Y[n,:]))print('\nC-Head  C-Tail  B-Head  B-Tail')print((' %2.1f \t %2.1f \t %2.1f \t %2.1f')%(t2h,t2t,t1h,t1t))newThetaB = t1h/(t1h+t1t)newThetaC = t2h/(t2h+t2t)return newThetaB, newThetaCif __name__ == '__main__':observationY = np.array([[1,0,0,0,1,1,0,1,0,1],\[1,1,1,1,0,1,1,1,1,1],\[1,0,1,1,1,1,1,0,1,1],\[1,0,1,0,0,0,1,1,0,0],\[0,1,1,1,0,1,1,1,0,1]])for i in range(5):print('Y'+str(i+1)+':  '+str(observationY[i,:]))thetaB = 0.5thetaC = 0.6flag = 1iter = 0while flag:print('\n#iteration\t'+str(iter+1))alpha1 = thetaB;alpha2 = thetaC;gamma = E_Step(alpha1, alpha2, observationY)    thetaB,thetaC = M_Step(gamma,observationY)print(('ThetaB=%2.2f,ThetaC=%2.2f')%(thetaB,thetaC))iter = iter + 1if iter==10:flag = 0

程序运行结果
Y1: [1 0 0 0 1 1 0 1 0 1]
Y2: [1 1 1 1 0 1 1 1 1 1]
Y3: [1 0 1 1 1 1 1 0 1 1]
Y4: [1 0 1 0 0 0 1 1 0 0]
Y5: [0 1 1 1 0 1 1 1 0 1]

#iteration 1              图(b)中的数据为第一次迭代
coin C, coin B ------ gamma(Zik)=E[Zik]
0.45, 0.55
0.80, 0.20
0.73, 0.27
0.35, 0.65
0.65, 0.35

C-Head C-Tail B-Head B-Tail
21.3 8.6 11.7 8.4
ThetaB=0.58,ThetaC=0.71

#iteration 2
coin C, coin B ------ gamma(Zik)=E[Zik]
0.30, 0.70
0.81, 0.19
0.71, 0.29
0.19, 0.81
0.57, 0.43

C-Head C-Tail B-Head B-Tail
19.2 6.6 13.8 10.4
ThetaB=0.57,ThetaC=0.75

#iteration 3
coin C, coin B ------ gamma(Zik)=E[Zik]
0.22, 0.78
0.87, 0.13
0.75, 0.25
0.11, 0.89
0.58, 0.42

C-Head C-Tail B-Head B-Tail
19.4 5.9 13.6 11.1
ThetaB=0.55,ThetaC=0.77

#iteration 4
coin C, coin B ------ gamma(Zik)=E[Zik]
0.16, 0.84
0.91, 0.09
0.79, 0.21
0.07, 0.93
0.59, 0.41

C-Head C-Tail B-Head B-Tail
19.8 5.5 13.2 11.5
ThetaB=0.53,ThetaC=0.78

#iteration 5
coin C, coin B ------ gamma(Zik)=E[Zik]
0.13, 0.87
0.94, 0.06
0.82, 0.18
0.04, 0.96
0.59, 0.41

C-Head C-Tail B-Head B-Tail
20.0 5.3 13.0 11.7
ThetaB=0.53,ThetaC=0.79

#iteration 6
coin C, coin B ------ gamma(Zik)=E[Zik]
0.11, 0.89
0.95, 0.05
0.84, 0.16
0.04, 0.96
0.60, 0.40

C-Head C-Tail B-Head B-Tail
20.1 5.2 12.9 11.8
ThetaB=0.52,ThetaC=0.79

#iteration 7
coin C, coin B ------ gamma(Zik)=E[Zik]
0.11, 0.89
0.95, 0.05
0.84, 0.16
0.03, 0.97
0.60, 0.40

C-Head C-Tail B-Head B-Tail
20.1 5.2 12.9 11.8
ThetaB=0.52,ThetaC=0.80

#iteration 8
coin C, coin B ------ gamma(Zik)=E[Zik]
0.10, 0.90
0.95, 0.05
0.84, 0.16
0.03, 0.97
0.60, 0.40

C-Head C-Tail B-Head B-Tail
20.2 5.2 12.8 11.8
ThetaB=0.52,ThetaC=0.80

#iteration 9
coin C, coin B ------ gamma(Zik)=E[Zik]
0.10, 0.90
0.95, 0.05
0.84, 0.16
0.03, 0.97
0.60, 0.40

C-Head C-Tail B-Head B-Tail
20.2 5.1 12.8 11.9
ThetaB=0.52,ThetaC=0.80

#iteration 10
coin C, coin B ------ gamma(Zik)=E[Zik]
0.10, 0.90
0.95, 0.05
0.85, 0.15
0.03, 0.97
0.60, 0.40

C-Head C-Tail B-Head B-Tail
20.2 5.1 12.8 11.9
ThetaB=0.52,ThetaC=0.80

这篇关于EM算法摘记(三):另一类三硬币问题的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/219969

相关文章

linux生产者,消费者问题

pthread_cond_wait() :用于阻塞当前线程,等待别的线程使用pthread_cond_signal()或pthread_cond_broadcast来唤醒它。 pthread_cond_wait() 必须与pthread_mutex 配套使用。pthread_cond_wait()函数一进入wait状态就会自动release mutex。当其他线程通过pthread

问题:第一次世界大战的起止时间是 #其他#学习方法#微信

问题:第一次世界大战的起止时间是 A.1913 ~1918 年 B.1913 ~1918 年 C.1914 ~1918 年 D.1914 ~1919 年 参考答案如图所示

2024.6.24 IDEA中文乱码问题(服务器 控制台 TOMcat)实测已解决

1.问题产生原因: 1.文件编码不一致:如果文件的编码方式与IDEA设置的编码方式不一致,就会产生乱码。确保文件和IDEA使用相同的编码,通常是UTF-8。2.IDEA设置问题:检查IDEA的全局编码设置和项目编码设置是否正确。3.终端或控制台编码问题:如果你在终端或控制台看到乱码,可能是终端的编码设置问题。确保终端使用的是支持你的文件的编码方式。 2.解决方案: 1.File -> S

vcpkg安装opencv中的特殊问题记录(无法找到opencv_corexd.dll)

我是按照网上的vcpkg安装opencv方法进行的(比如这篇:从0开始在visual studio上安装opencv(超详细,针对小白)),但是中间出现了一些别人没有遇到的问题,虽然原因没有找到,但是本人给出一些暂时的解决办法: 问题1: 我在安装库命令行使用的是 .\vcpkg.exe install opencv 我的电脑是x64,vcpkg在这条命令后默认下载的也是opencv2:x6

问题-windows-VPN不正确关闭导致网页打不开

为什么会发生这类事情呢? 主要原因是关机之前vpn没有关掉导致的。 至于为什么没关掉vpn会导致网页打不开,我猜测是因为vpn建立的链接没被更改。 正确关掉vpn的时候,会把ip链接断掉,如果你不正确关掉,ip链接没有断掉,此时你vpn又是没启动的,没有域名解析,所以就打不开网站。 你可以在打不开网页的时候,把vpn打开,你会发现网络又可以登录了。 方法一 注意:方法一虽然方便,但是可能会有

代码随想录算法训练营:12/60

非科班学习算法day12 | LeetCode150:逆波兰表达式 ,Leetcode239: 滑动窗口最大值  目录 介绍 一、基础概念补充: 1.c++字符串转为数字 1. std::stoi, std::stol, std::stoll, std::stoul, std::stoull(最常用) 2. std::stringstream 3. std::atoi, std

人工智能机器学习算法总结神经网络算法(前向及反向传播)

1.定义,意义和优缺点 定义: 神经网络算法是一种模仿人类大脑神经元之间连接方式的机器学习算法。通过多层神经元的组合和激活函数的非线性转换,神经网络能够学习数据的特征和模式,实现对复杂数据的建模和预测。(我们可以借助人类的神经元模型来更好的帮助我们理解该算法的本质,不过这里需要说明的是,虽然名字是神经网络,并且结构等等也是借鉴了神经网络,但其原型以及算法本质上还和生物层面的神经网络运行原理存在

vue同页面多路由懒加载-及可能存在问题的解决方式

先上图,再解释 图一是多路由页面,图二是路由文件。从图一可以看出每个router-view对应的name都不一样。从图二可以看出层路由对应的组件加载方式要跟图一中的name相对应,并且图二的路由层在跟图一对应的页面中要加上components层,多一个s结尾,里面的的方法名就是图一路由的name值,里面还可以照样用懒加载的方式。 页面上其他的路由在路由文件中也跟图二是一样的写法。 附送可能存在

vue+elementui--$message提示框被dialog遮罩层挡住问题解决

最近碰到一个先执行this.$message提示内容,然后接着弹出dialog带遮罩层弹框。那么问题来了,message提示框会默认被dialog遮罩层挡住,现在就是要解决这个问题。 由于都是弹框,问题肯定是出在z-index比重问题。由于用$message方式是写在js中而不是写在html中所以不是很好直接去改样式。 不过好在message组件中提供了customClass 属性,我们可以利用

Visual Studio中,MSBUild版本问题

假如项目规定了MSBUild版本,那么在安装完Visual Studio后,假如带的MSBUild版本与项目要求的版本不符合要求,那么可以把需要的MSBUild添加到系统中,然后即可使用。步骤如下:            假如项目需要使用V12的MSBUild,而安装的Visual Studio带的MSBUild版本为V14。 ①到MSDN下载V12 MSBUild包,把V12包解压到目录(