HMM原理简述和使用说明

2024-05-05 07:18
文章标签 使用 原理 说明 简述 hmm

本文主要是介绍HMM原理简述和使用说明,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

1、原理简述   

      为了对GMM-HMM在语音识别上的应用有个宏观认识,花了些时间读了下HTK(用htk完成简单的孤立词识别)的部分源码,对该算法总算有了点大概认识,达到了预期我想要的。不得不说,网络上关于语音识别的通俗易懂教程太少,都是各种公式满天飞,很少有说具体细节的,当然了,那需要有实战经验才行。下面总结以下几点,对其有个宏观印象即可(以孤立词识别为例)。

  一、每个单词的读音都对应一个HMM模型,大家都知道HMM模型中有个状态集S,那么每个状态用什么来表示呢,数字?向量?矩阵?其实这个状态集中的状态没有具体的数学要求,只是一个名称而已,你可以用’1’, ’2’, ‘3’…表示,也可以用’a’, ‘b’, ’c ’表示。另外每个HMM模型中到底该用多少个状态,是通过先验知识人为设定的。

  二、HMM的每一个状态都对应有一个观察值,这个观察值可以是一个实数,也可以是个向量,且每个状态对应的观察值的维度应该相同。假设现在有一个单词的音频文件,首先需要将其进行采样得到数字信息(A/D转换),然后分帧进行MFCC特征提取,假设每一帧音频对应的MFCC特征长度为39,则每个音频文件就转换成了N个MFCC向量(不同音频文件对应的N可能不同),这就成了一个序列,而在训练HMM模型的参数时(比如用Baum-Welch算法),每次输入到HMM中的数据要求就是一个观测值序列。这时,每个状态对应的观测值为39维的向量,因为向量中元素的取值是连续的,需要用多维密度函数来模拟,通常情况下用的是多维高斯函数。在GMM-HMM体系中,这个拟合函数是用K个多维高斯混合得到的。假设知道了每个状态对应的K个多维高斯的所有参数,则该GMM生成该状态上某一个观察向量(一帧音频的MFCC系数)的概率就可以求出来了。

  三、对每个单词建立一个HMM模型,需要用到该单词的训练样本,这些训练样本是提前标注好的,即每个样本对应一段音频,该音频只包含这个单词的读音。当有了该单词的多个训练样本后,就用这些样本结合Baum-Welch算法和EM算法来训练出GMM-HMM的所有参数,这些参数包括初始状态的概率向量,状态之间的转移矩阵,每个状态对应的观察矩阵(这里对应的是GMM,即每个状态对应的K个高斯的权值,每个高斯的均值向量和方差矩阵)。

  四、在识别阶段,输入一段音频,如果该音频含有多个单词,则可以手动先将其分割开(考虑的是最简单的方法),然后提取每个单词的音频MFCC特征序列,将该序列输入到每个HMM模型(已提前训练好的)中,采用前向算法求出每个HMM模型生成该序列的概率,最后取最大概率对应的那个模型,而那个模型所表示的单词就是我们识别的结果。

  五、在建立声学模型时,可以用Deep Learning的方法来代替GMM-HMM中的GMM,因为GMM模拟任意函数的功能取决于混合高斯函数的个数,所以具有一定的局限性,属于浅层模型。而Deep Network可以模拟任意的函数,因而表达能力更强。注意,这里用来代替GMM的Deep Nets模型要求是产生式模型,比如DBN,DBM等,因为在训练HMM-DL网络时,需要用到HMM的某个状态产生一个样本的概率。

  六、GMM-HMM在具体实现起来还是相当复杂的。

  七、一般涉及到时间序列时才会使用HMM,比如这里音频中的语音识别,视频中的行为识别等。如果我们用GMM-HMM对静态的图片分类,因为这里没涉及到时间信息,所以HMM的状态数可设为1,那么此时的GMM-HMM算法就退化成GMM算法了。


2、使用说明

一、离散输出的隐马尔科夫模型(DHMM,HMM with discrete outputs)

最大似然参数估计EM(Baum Welch算法

The script dhmm_em_demo.m gives an example of how to learn an HMM with discrete outputs. Let there be Q=2 states and O=3 output symbols. We create random stochastic matrices as follows.

O = 3;

Q = 2;

prior0 = normalise(rand(Q,1));

transmat0 = mk_stochastic(rand(Q,Q));

obsmat0 = mk_stochastic(rand(Q,O)); 

Now we sample nex=20 sequences of length T=10 each from this model, to use as training data.

T=10;   %序列长度

nex=20;  %样本序列数目

data = dhmm_sample(prior0, transmat0, obsmat0, nex, T);      

Here data is 20x10. Now we make a random guess as to what the parameters are,

prior1 = normalise(rand(Q,1)); %初始状态概率

transmat1 = mk_stochastic(rand(Q,Q)); %初始状态转移矩阵

obsmat1 = mk_stochastic(rand(Q,O)); %初始观测状态到隐藏状态间的概率转移矩阵

and improve our guess using 5 iterations of EM...

[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1, obsmat1, 'max_iter', 5);

%prior2, transmat2, obsmat2 为训练好后的初始概率,状态转移矩阵及混合状态概率转移矩阵

LL(t) is the log-likelihood after iteration t, so we can plot the learning curve.

序列分类

To evaluate the log-likelihood of a trained model given test data, proceed as follows:

loglik = dhmm_logprob(data, prior2, transmat2, obsmat2)  %HMM测试

Note: the discrete alphabet is assumed to be {1, 2, ..., O}, where O = size(obsmat, 2). Hence data cannot contain any 0s.

To classify a sequence into one of k classes, train up k HMMs, one per class, and then compute the log-likelihood that each model gives to the test sequence; if the i'th model is the most likely, then declare the class of the sequence to be class i.

Computing the most probable sequence (Viterbi)

First you need to evaluate B(i,t) = P(y_t | Q_t=i) for all t,i:

B = multinomial_prob(data, obsmat);

Then you can use

[path] = viterbi_path(prior, transmat, B) 

 

二、具有高斯混合输出的隐马尔科夫模型(GHMM,HMM with mixture of Gaussians outputs)

Maximum likelihood parameter estimation using EM (Baum Welch)

Let us generate nex=50 vector-valued sequences of length T=50; each vector has size O=2.

O = 2;

T = 50;

nex = 50;

data = randn(O,T,nex);

Now let use fit a mixture of M=2 Gaussians for each of the Q=2 states using K-means.

M = 2;

Q = 2;

left_right = 0;

 

prior0 = normalise(rand(Q,1));

transmat0 = mk_stochastic(rand(Q,Q));

 

[mu0, Sigma0] = mixgauss_init(Q*M, reshape(data, [O T*nex]), cov_type);

mu0 = reshape(mu0, [O Q M]);

Sigma0 = reshape(Sigma0, [O O Q M]);

mixmat0 = mk_stochastic(rand(Q,M));

 

Finally, let us improve these parameter estimates using EM.

[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...

    mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2);

Since EM only finds a local optimum, good initialisation is crucial. The initialisation procedure illustrated above is very crude, and is probably not adequate for real applications... Click here for a real-world example of EM with mixtures of Gaussians using BNT.

What to do if the log-likelihood becomes positive?

It is possible for p(x) > 1 if p(x) is a probability density function, such as a Gaussian. (The requirements for a density are p(x)>0 for all x and int_x p(x) = 1.) In practice this usually means your covariance is shrinking to a point/delta function, so you should increase the width of the prior (see below), or constrain the matrix to be spherical or diagonal, or clamp it to a large fixed constant (not learn it at all). It is also very helpful to ensure the components of the data vectors have small and comparable magnitudes (use e.g., KPMstats/standardize).

This is a well-known pathology of maximum likelihood estimation for Gaussian mixtures: the global optimum may place one mixture component on a single data point, and give it 0 covariance, and hence infinite likelihood. One usually relies on the fact that EM cannot find the global optimum to avoid such pathologies.

What to do if the log-likelihood decreases during EM?

Since I implicitly add a prior to every covariance matrix (see below), what increases is loglik + log(prior), but what I print is just loglik, which may occasionally decrease. This suggests that one of your mixture components is not getting enough data. Try a better initialization or fewer clusters (states).

What to do if the covariance matrix becomes singular?

Estimates of the covariance matrix often become singular if you have too little data, or if too few points are assigned to a cluster center due to a bad initialization of the means. In this case, you should constrain the covariance to be spherical or diagonal, or adjust the prior (see below), or try a better initialization.

How do I add a prior to the covariance matrix?

Buried inside of KPMstats/mixgauss_Mstep you will see that cov_prior is initialized to 0.01*I. This is added to the maximum likelihood estimate after every M step. To change this, you will need to modify the mhmm_em function so it calls mixgauss_Mstep with a different value.

Sequence classification

To classify a sequence (e.g., of speech) into one of k classes (e.g., the digits 0-9), proceed as in the DHMM case above, but use the following procedure to compute likelihood:

loglik = mhmm_logprob(data, prior, transmat, mu, Sigma, mixmat);

Computing the most probable sequence (Viterbi)

First you need to evaluate B(t,i) = P(y_t | Q_t=i) for all t,i:

B = mixgauss_prob(data(:,:,ex), mu, Sigma, mixmat);

where data(:,:,ex) is OxT where O is the size of the observation vector. Finally, use

[path] = viterbi_path(prior, transmat, B);

三、具有高斯输出的HMM

This is just like the mixture of Gaussians case, except we have M=1, and hence there is no mixing matrix.

Online EM for discrete HMMs/ POMDPs

For some applications (e.g., reinforcement learning/ adaptive control), it is necessary to learn a model online. The script dhmm_em_online_demo gives an example of how to do this.



  MFCC:

  MFCC的matlab实现教程可参考:张智星老师的网页教程mfcc. 最基本的12维特征。

function mfcc=frame2mfcc(frame, fs, filterNum, mfccNum, plotOpt)
% frame2mfcc: Frame to MFCC conversion.
%    Usage: mfcc=frame2mfcc(frame, fs, filterNum, mfccNum, plotOpt)
%
%    For example:
%        waveFile='what_movies_have_you_seen_recently.wav';
%        [y, fs, nbits]=wavReadInt(waveFile);
%        startIndex=12000;
%        frameSize=512;
%        frame=y(startIndex:startIndex+frameSize-1);
%        frame2mfcc(frame, fs, 20, 12, 1);%    Roger Jang 20060417if nargin<1, selfdemo; return; end
if nargin<2, fs=16000; end
if nargin<3, filterNum=20; end
if nargin<4, mfccNum=12; end
if nargin<5, plotOpt=0; endframeSize=length(frame);
% ====== Preemphasis should be done at wave level
%a=0.95;
%frame2 = filter([1, -a], 1, frame);
frame2=frame;
% ====== Hamming windowing
frame3=frame2.*hamming(frameSize);
% ====== FFT
[fftMag, fftPhase, fftFreq, fftPowerDb]=fftOneSide(frame3, fs);
% ====== Triangular band-pass filter bank
triFilterBankPrm=getTriFilterBankPrm(fs, filterNum);    % Get parameters for triangular band-pass filter bank
% Triangular bandpass filter.
for i=1:filterNumtbfCoef(i)=dot(fftPowerDb, trimf(fftFreq, triFilterBankPrm(:,i)));%得到filterNum个滤波系数
end
% ====== DCT
mfcc=zeros(mfccNum, 1); %DCT变换的前后个数也没有变
for i=1:mfccNumcoef = cos((pi/filterNum)*i*((1:filterNum)-0.5))'; %mfcc中的前mfccNum个系数mfcc(i) = sum(coef.*tbfCoef');%直接按照DCT公式
end
% ====== Log energy
%logEnergy=10*log10(sum(frame.*frame));
%mfcc=[logEnergy; mfcc];if plotOptsubplot(2,1,1);plot(frame, '.-');set(gca, 'xlim', [-inf inf]);title('Input frame');subplot(2,1,2);plot(mfcc, '.-');set(gca, 'xlim', [-inf inf]);title('MFCC vector');
end% ====== trimf.m (from fuzzy toolbox)
function y = trimf(x, prm) %由频率的横坐标算出三角形内的纵坐标,0~1
a = prm(1); b = prm(2); c = prm(3);
y = zeros(size(x));
% Left and right shoulders (y = 0)
index = find(x <= a | c <= x);
y(index) = zeros(size(index)); %只考虑三角波内的量
% Left slope
if (a ~= b)index = find(a < x & x < b);y(index) = (x(index)-a)/(b-a);
end
% right slope
if (b ~= c)index = find(b < x & x < c);y(index) = (c-x(index))/(c-b);
end
% Center (y = 1)
index = find(x == b);
y(index) = ones(size(index));% ====== Self demo
function selfdemo
waveFile='what_movies_have_you_seen_recently.wav';
[y, fs, nbits]=wavReadInt(waveFile);
startIndex=12000;
frameSize=512;
frame=y(startIndex:startIndex+frameSize-1);
feval(mfilename, frame, fs, 20, 12, 1);

ZCR:

  过0检测,用于判断每一帧中过零点的数量情况,最简单的版本可参考:zeros cross rate. 

waveFile='csNthu.wav';
frameSize=256;
overlap=0;
[y, fs, nbits]=wavread(waveFile);
frameMat=enframe(y, frameSize, overlap);
frameNum=size(frameMat, 2);
for i=1:frameNumframeMat(:,i)=frameMat(:,i)-mean(frameMat(:,i));    % mean justification
end
zcr=sum(frameMat(1:end-1, :).*frameMat(2:end, :)<0);
sampleTime=(1:length(y))/fs;
frameTime=((0:frameNum-1)*(frameSize-overlap)+0.5*frameSize)/fs;
subplot(2,1,1); plot(sampleTime, y); ylabel('Amplitude'); title(waveFile);
subplot(2,1,2); plot(frameTime, zcr, '.-');
xlabel('Time (sec)'); ylabel('Count'); title('ZCR');

EPD:

  端点检测,检测声音的起始点和终止点,可参考:EPD in Time Domain,在时域中的最简单检测方法。

waveFile='sunday.wav';
[wave, fs, nbits] = wavread(waveFile);
frameSize = 256;
overlap = 128;wave=wave-mean(wave);                % zero-mean substraction
frameMat=buffer2(wave, frameSize, overlap);    % frame blocking,每一列代表一帧
frameNum=size(frameMat, 2);            % no. of frames
volume=frame2volume(frameMat);        % volume,求每一帧的能量,绝对值或者平方和,volume为行向量
volumeTh1=max(volume)*0.1;            % volume threshold 1
volumeTh2=median(volume)*0.1;            % volume threshold 2
volumeTh3=min(volume)*10;            % volume threshold 3
volumeTh4=volume(1)*5;                % volume threshold 4
index1 = find(volume>volumeTh1); %找出volume大于阈值的那些帧序号
index2 = find(volume>volumeTh2);
index3 = find(volume>volumeTh3);
index4 = find(volume>volumeTh4);
%frame2sampleIndex()为从帧序号找到样本点的序号(即每一个采样点的序号)
%endPointX长度为2,包含了起点和终点的样本点序号
endPoint1=frame2sampleIndex([index1(1), index1(end)], frameSize, overlap);
endPoint2=frame2sampleIndex([index2(1), index2(end)], frameSize, overlap);
endPoint3=frame2sampleIndex([index3(1), index3(end)], frameSize, overlap);
endPoint4=frame2sampleIndex([index4(1), index4(end)], frameSize, overlap);subplot(2,1,1);
time=(1:length(wave))/fs;
plot(time, wave);
ylabel('Amplitude'); title('Waveform');
axis([-inf inf -1 1]);
line(time(endPoint1(  1))*[1 1], [-1, 1], 'color', 'm');%标起点终点线
line(time(endPoint2(  1))*[1 1], [-1, 1], 'color', 'g');
line(time(endPoint3(  1))*[1 1], [-1, 1], 'color', 'k');
line(time(endPoint4(  1))*[1 1], [-1, 1], 'color', 'r');
line(time(endPoint1(end))*[1 1], [-1, 1], 'color', 'm');
line(time(endPoint2(end))*[1 1], [-1, 1], 'color', 'g');
line(time(endPoint3(end))*[1 1], [-1, 1], 'color', 'k');
line(time(endPoint4(end))*[1 1], [-1, 1], 'color', 'r');
legend('Waveform', 'Boundaries by threshold 1', 'Boundaries by threshold 2', 'Boundaries by threshold 3', 'Boundaries by threshold 4');subplot(2,1,2);
frameTime=frame2sampleIndex(1:frameNum, frameSize, overlap);
plot(frameTime, volume, '.-');
ylabel('Sum of Abs.'); title('Volume');
axis tight;
line([min(frameTime), max(frameTime)], volumeTh1*[1 1], 'color', 'm');
line([min(frameTime), max(frameTime)], volumeTh2*[1 1], 'color', 'g');
line([min(frameTime), max(frameTime)], volumeTh3*[1 1], 'color', 'k');
line([min(frameTime), max(frameTime)], volumeTh4*[1 1], 'color', 'r');
legend('Volume', 'Threshold 1', 'Threshold 2', 'Threshold 3', 'Threshold 4');

 GMM: 

   GMM用在拟合数据分布上,本质上是先假设样本的概率分布为GMM,然后用多个样本去学习这些GMM的参数。GMM建模在语音中可用于某个单词的发音,某个人的音色等。其训练过程可参考:speaker recognition.

function [M, V, W, logProb] = gmmTrain(data, gaussianNum, dispOpt)
% gmmTrain: Parameter training for gaussian mixture model (GMM)
%    Usage: function [M, V, W, logProb] = gmm(data, gaussianNum, dispOpt)
%        data: dim x dataNum matrix where each column is a data point
%        gaussianNum: No. of Gaussians or initial centers
%        dispOpt: Option for displaying info during training
%        M: dim x meanNum matrix where each column is a mean vector
%        V: 1 x gaussianNum vector where each element is a variance for a Gaussian
%        W: 1 x gaussianNum vector where each element is a weighting factor for a Gaussian% Roger Jang 20000610if nargin==0, selfdemo; return; end
if nargin<3, dispOpt=0; endmaxLoopCount = 50;    % Max. iteration
minImprove = 1e-6;    % Min. improvement
minVariance = 1e-6;    % Min. variance
logProb = zeros(maxLoopCount, 1);   % Array for objective function
[dim, dataNum] = size(data);% Set initial parameters
% Set initial M
%M = data(1+floor(rand(gaussianNum,1)*dataNum),:);    % Randomly select several data points as the centers
if length(gaussianNum)==1,% Using vqKmeans to find initial centersfprintf('Start KMEANS to find the initial mu...\n');
%    M = vqKmeansMex(data, gaussianNum, 0);M = vqKmeans(data, gaussianNum, 0); %利用聚类的方法求均值,聚成gaussianNum类
%    M = vqLBG(data, gaussianNum, 0);fprintf('Start GMM training...\n');if any(any(~isfinite(M))); keyboard; end
else% gaussianNum is in fact the initial centersM = gaussianNum;gaussianNum = size(M, 2);
end
% Set initial V as the distance to the nearest center
if gaussianNum==1V=1;
elsedistance=pairwiseSqrDist(M);%pairwiseSqrDist是dll%distance=pairwiseSqrDist2(M);distance(1:(gaussianNum+1):gaussianNum^2)=inf;    % Diagonal elements are inf[V, index]=min(distance);    % Initial variance for each Gaussian
end
% Set initial W
W = ones(1, gaussianNum)/gaussianNum;    % Weight for each Gaussian,初始化时是均分权值if dispOpt & dim==2, displayGmm(M, V, data); end
for i = 1:maxLoopCount  %开始迭代训练参数,EM算法% Expectation step:% P(i,j) is the probability of data(:,j) to the i-th Gaussian% Prob为每个样本在GMM下的概率[prob, P]=gmmEval(data, M, V, W);logProb(i)=sum(log(prob)); %所有样本的联合概率if dispOptfprintf('i = %d, log prob. = %f\n',i-1, logProb(i));endPW = diag(W)*P;BETA=PW./(ones(gaussianNum,1)*sum(PW));    % BETA(i,j) is beta_i(x_j)sumBETA=sum(BETA,2);% Maximization step:  eqns (2.96) to (2.98) from Bishop p.67:M = (data*BETA')./(ones(dim,1)*sumBETA');DISTSQ = pairwiseSqrDist(M, data);                    % Distance of M to data%DISTSQ = pairwiseSqrDist2(M, data);                    % Distance of M to dataV = max((sum(BETA.*DISTSQ, 2)./sumBETA)/dim, minVariance);    % (2.97)W = (1/dataNum)*sumBETA;                    % (2.98)if dispOpt & dim==2, displayGmm(M, V, data); endif i>1, if logProb(i)-logProb(i-1)<minImprove, break; end; end
end
[prob, P]=gmmEval(data, M, V, W);
logProb(i)=sum(log(prob));
fprintf('Iteration count = %d, log prob. = %f\n',i, logProb(i));
logProb(i+1:maxLoopCount) = [];% ====== Self Demo ======
function selfdemo
%[data, gaussianNum] = dcdata(2);
data = rand(1000,2);
gaussianNum = 8;
data=data';
plotOpt=1;
[M, V, W, lp] = feval(mfilename, data, gaussianNum, plotOpt);pointNum = 40;
x = linspace(min(data(1,:)), max(data(1,:)), pointNum);
y = linspace(min(data(2,:)), max(data(2,:)), pointNum);
[xx, yy] = meshgrid(x, y);
data = [xx(:) yy(:)]';
z = gmmEval(data, M, V, W);
zz = reshape(z, pointNum, pointNum);
figure; mesh(xx, yy, zz); axis tight; box on; rotate3d on
figure; contour(xx, yy, zz, 30); axis image% ====== Other subfunctions ======
function displayGmm(M, V, data)
% Display function for EM algorithm
figureH=findobj(0, 'tag', mfilename);
if isempty(figureH)figureH=figure;set(figureH, 'tag', mfilename);colordef blackplot(data(1,:), data(2,:),'.r'); axis imagetheta=linspace(-pi, pi, 21);x=cos(theta); y=sin(theta);sigma=sqrt(V);for i=1:length(sigma)circleH(i)=line(x*sigma(i)+M(1,i), y*sigma(i)+M(2,i), 'color', 'y');endset(circleH, 'tag', 'circleH', 'erasemode', 'xor');
elsecircleH=findobj(figureH, 'tag', 'circleH');theta=linspace(-pi, pi, 21);x=cos(theta); y=sin(theta);sigma=sqrt(V);for i=1:length(sigma)set(circleH(i), 'xdata', x*sigma(i)+M(1,i), 'ydata', y*sigma(i)+M(2,i));enddrawnow
end

 Speaker identification:

   给N个人的语音资料,用GMM可以训练这N个人的声音模型,然后给定一段语音,判断该语音与这N个人中哪个最相似。方法是求出该语音在N个GMM模型下的概率,选出概率最大的那个。可参考:speaker recognition.

function [recogRate, confusionMatrix, speakerData]=speakerIdentify(speakerData, speakerGmm, useIntGmm)
% speakerIdentify: speaker identification using GMM parameters
%    Usage: [recogRate, confusionMatrix, speakerData]=speakerIdentify(speakerData, speakerGmm, useIntGmm)
%        speakerData: structure array generated by speakerDataRead.m
%        speakerGmm: speakerGmm(i).gmmPrm is the GMM parameters for speaker i.
%        useIntGmm: use fixed-point GMM%    Roger Jang, 20070517, 20080726if nargin<3, useIntGmm=0; end% ====== Speaker identification using GMM parameters
speakerNum=length(speakerData);
for i=1:speakerNum
%    fprintf('%d/%d: Recognizing wave files by %s\n', i, speakerNum, speakerData(i).name);for j=1:length(speakerData(i).sentence)
%        fprintf('\tSentece %d...\n', j);frameNum=size(speakerData(i).sentence(j).fea, 2);logProb=zeros(speakerNum, frameNum); %logProb(i,m)表示第i个人第j个句子中第m帧在GMM模型下的log概率%找出一个句子,看它属于哪个speakerfor k=1:speakerNum,
%            fprintf('\t\tSpeaker %d...\n', k);%    logProb(k, :)=gmmEval(speakerData(i).sentence(j).fea, speakerGmm(k).gmmPrm);if ~useIntGmm%    logProb(k, :)=gmmEvalMex(speakerData(i).sentence(j).fea, gmm(k).mean, gmm(k).covariance, gmm(k).weight);logProb(k, :)=gmmEval(speakerData(i).sentence(j).fea, speakerGmm(k).gmmPrm);else%    logProb(k, :)=gmmEvalIntMex(speakerData(i).sentence(j).fea, gmm(k).mean, gmm(k).covariance, gmm(k).weight);logProb(k, :)=gmmEvalIntMex(speakerData(i).sentence(j).fea, speakerGmm(i).gmmPrm);endendcumLogProb=sum(logProb, 2);[maxProb, index]=max(cumLogProb);speakerData(i).sentence(j).predictedSpeaker=index; %找出身份speakerData(i).sentence(j).logProb=logProb;end
end% ====== Compute confusion matrix and recognition rate
confusionMatrix=zeros(speakerNum);
for i=1:speakerNum,predictedSpeaker=[speakerData(i).sentence.predictedSpeaker];[index, count]=elementCount(predictedSpeaker);confusionMatrix(i, index)=count;
end
recogRate=sum(diag(confusionMatrix))/sum(sum(confusionMatrix));

GMM-HMM:

  训练阶段:给出HMM的k个状态,每个状态下的观察样本的生成可以用一个概率分布来拟合,这里是采用GMM拟合的。其实,可以把GMM-HMM整体看成是一个生成模型。给定该模型的5个初始参数(结合随机和训练样本获得),启动EM算法的E步:获得训练样本分布,即计算训练样本在各个状态下的概率。M步:用这些训练样本重新评估那5个参数。

  测试阶段:(以孤立词识别为例)给定每个词发音的frame矩阵,取出某一个GMM-HMM模型,算出该发音每一帧数据在取出的GMM-HMM模型各个state下的概率,结合模型的转移概率和初始概率,获得对应的clique tree,可用图模型的方法inference出生成该语音的概率。比较多个GMM-HMM模型,取最大概率的模型对应的词。

   参考资料:

     机器学习&数据挖掘笔记_13(用htk完成简单的孤立词识别)

     http://htk.eng.cam.ac.uk/extensions/

     张智星老师的网页教程mfcc.


这篇关于HMM原理简述和使用说明的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/961103

相关文章

Zookeeper安装和配置说明

一、Zookeeper的搭建方式 Zookeeper安装方式有三种,单机模式和集群模式以及伪集群模式。 ■ 单机模式:Zookeeper只运行在一台服务器上,适合测试环境; ■ 伪集群模式:就是在一台物理机上运行多个Zookeeper 实例; ■ 集群模式:Zookeeper运行于一个集群上,适合生产环境,这个计算机集群被称为一个“集合体”(ensemble) Zookeeper通过复制来实现

中文分词jieba库的使用与实景应用(一)

知识星球:https://articles.zsxq.com/id_fxvgc803qmr2.html 目录 一.定义: 精确模式(默认模式): 全模式: 搜索引擎模式: paddle 模式(基于深度学习的分词模式): 二 自定义词典 三.文本解析   调整词出现的频率 四. 关键词提取 A. 基于TF-IDF算法的关键词提取 B. 基于TextRank算法的关键词提取

使用SecondaryNameNode恢复NameNode的数据

1)需求: NameNode进程挂了并且存储的数据也丢失了,如何恢复NameNode 此种方式恢复的数据可能存在小部分数据的丢失。 2)故障模拟 (1)kill -9 NameNode进程 [lytfly@hadoop102 current]$ kill -9 19886 (2)删除NameNode存储的数据(/opt/module/hadoop-3.1.4/data/tmp/dfs/na

Hadoop数据压缩使用介绍

一、压缩原则 (1)运算密集型的Job,少用压缩 (2)IO密集型的Job,多用压缩 二、压缩算法比较 三、压缩位置选择 四、压缩参数配置 1)为了支持多种压缩/解压缩算法,Hadoop引入了编码/解码器 2)要在Hadoop中启用压缩,可以配置如下参数

Makefile简明使用教程

文章目录 规则makefile文件的基本语法:加在命令前的特殊符号:.PHONY伪目标: Makefilev1 直观写法v2 加上中间过程v3 伪目标v4 变量 make 选项-f-n-C Make 是一种流行的构建工具,常用于将源代码转换成可执行文件或者其他形式的输出文件(如库文件、文档等)。Make 可以自动化地执行编译、链接等一系列操作。 规则 makefile文件

深入探索协同过滤:从原理到推荐模块案例

文章目录 前言一、协同过滤1. 基于用户的协同过滤(UserCF)2. 基于物品的协同过滤(ItemCF)3. 相似度计算方法 二、相似度计算方法1. 欧氏距离2. 皮尔逊相关系数3. 杰卡德相似系数4. 余弦相似度 三、推荐模块案例1.基于文章的协同过滤推荐功能2.基于用户的协同过滤推荐功能 前言     在信息过载的时代,推荐系统成为连接用户与内容的桥梁。本文聚焦于

使用opencv优化图片(画面变清晰)

文章目录 需求影响照片清晰度的因素 实现降噪测试代码 锐化空间锐化Unsharp Masking频率域锐化对比测试 对比度增强常用算法对比测试 需求 对图像进行优化,使其看起来更清晰,同时保持尺寸不变,通常涉及到图像处理技术如锐化、降噪、对比度增强等 影响照片清晰度的因素 影响照片清晰度的因素有很多,主要可以从以下几个方面来分析 1. 拍摄设备 相机传感器:相机传

hdu4407(容斥原理)

题意:给一串数字1,2,......n,两个操作:1、修改第k个数字,2、查询区间[l,r]中与n互质的数之和。 解题思路:咱一看,像线段树,但是如果用线段树做,那么每个区间一定要记录所有的素因子,这样会超内存。然后我就做不来了。后来看了题解,原来是用容斥原理来做的。还记得这道题目吗?求区间[1,r]中与p互质的数的个数,如果不会的话就先去做那题吧。现在这题是求区间[l,r]中与n互质的数的和

pdfmake生成pdf的使用

实际项目中有时会有根据填写的表单数据或者其他格式的数据,将数据自动填充到pdf文件中根据固定模板生成pdf文件的需求 文章目录 利用pdfmake生成pdf文件1.下载安装pdfmake第三方包2.封装生成pdf文件的共用配置3.生成pdf文件的文件模板内容4.调用方法生成pdf 利用pdfmake生成pdf文件 1.下载安装pdfmake第三方包 npm i pdfma

零基础学习Redis(10) -- zset类型命令使用

zset是有序集合,内部除了存储元素外,还会存储一个score,存储在zset中的元素会按照score的大小升序排列,不同元素的score可以重复,score相同的元素会按照元素的字典序排列。 1. zset常用命令 1.1 zadd  zadd key [NX | XX] [GT | LT]   [CH] [INCR] score member [score member ...]