HEVC中SAO--自适应样点补偿分析解读

本文主要是介绍HEVC中SAO--自适应样点补偿分析解读，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

 HEVC SAO 

 目录(?)[+]

原文地址：http://blog.csdn.net/feixiang_john/article/details/8258452

HEVC中SAO--自适应样点补偿:本文分三个部分, 1.Sample Adaptive Offset原理, 2.SAO处理流程分析, 3.SAO意义何在!

SAO的文章见《Sample Adaptive Offset in the HEVC Standard》谷歌学术

a) SAO原理

SAO是在DB之后进行, 输入是重建帧和原始帧数据, 输出是SAO数据和SAO后的重建帧. 自适应样点补偿是一个自适应选择过程，在去块滤波后进行。下面是整个HEVC的编码框图, 可以看到SAO是在整个帧编码完成后得到重建帧后进行的,属于Slice级别(帧级).

首先把Frame划分为若干LCU, 然后对每个LCU中每个像素进行SAO操作.将根据其LCU像素特征选择一种像素补偿方式，以减少源图像与重构图像之间的失真。自适应样点补偿方式分为带状补偿（Band Offset，BO）和边缘补偿（Edge Offset，EO）两大类。

带状补偿将像素值强度等级划分为若干个条带，每个条带内的像素拥有相同的补偿值。进行补偿时根据重构像素点所处的条带，选择相应的带状补偿值进行补偿。

边缘补偿主要用于对图像的轮廓进行补偿。它将当前像素点值与相邻的2个像素值进行对比，用于比较的2个相邻像素可以在下图中所示的4种模板中选择，从而得到该像素点的类型。解码端根据码流中标示的像素点的类型信息进行相应的补偿校正。

对每个模板还要确定属于那种类型,类型确定有下面表格来决定.

类别
1	c < a&&c< b
2	(c< a && c==b) \|\| (c==a&&c < b)
3	(c> a && c==b) \|\| (c==a&&c > b)
4	c > a&&c> b
0	None of the above

b) SAO处理流程分析

SAO主函数代码结构如下:主要有3个函数完成所有操作.

[cpp]  view plain copy print ?  
 Void TEncSampleAdaptiveOffset::SAOProcess()  
 { …  
 rdoSaoUnitAll(pcSaoParam, dLambdaLuma, dLambdaChroma, depth);  
 …  
 if (pcSaoParam->bSaoFlag[0])  
  processSaoUnitAll(saoLcuParam[0], oneUnitFlag[0], 0);  
 if (pcSaoParam->bSaoFlag[1])  
 {  
 processSaoUnitAll(saoLcuParam[1], oneUnitFlag[1], 1);  
 processSaoUnitAll(saoLcuParam[2], oneUnitFlag[2], 2);  
 }  
 }  

其中TEncSampleAdaptiveOffset::rdoSaoUnitAll函数完成对整个Frame的所有LCU的YUV进行reset stats和calcSaoStatsCu,以及saoComponentParamDist得到最佳SAO_TYPE选择.最后encodeSaoOffset.

[cpp]  view plain copy print ?  
 Void TEncSampleAdaptiveOffset::rdoSaoUnitAll()  
 { …  
  for (idxY = 0; idxY< frameHeightInCU; idxY++)  
  {  
    for (idxX = 0; idxX< frameWidthInCU; idxX++)  
 { …  
 // reset stats Y, Cb, Cr  
 …  
      calcSaoStatsCu(addr, compIdx,  compIdx);  
      saoComponentParamDist();  
      sao2ChromaParamDist();  
 …  
      for ( compIdx=0;compIdx<3;compIdx++)  
       encodeSaoOffset(&saoLcuParam[compIdx][addr], compIdx);  
        // Cost of Merge  
    }  
  }  
 }  

下面是子函数的说明:

也就是TEncSampleAdaptiveOffset::calcSaoStatsCuOrg

主要是得到LCU中所有像素各种SAOType的所有信息和状态统计,

分析记录各种SAOType(SAO_EO_0,SAO_EO_1,SAO_EO_2,SAO_EO_3,SAO_BO)以及各种SAOTypeLen和YUV分量对应的m_iOffsetOrg和m_iCount.

[cpp]  view plain copy print ?  
 iStats = m_iOffsetOrg[0.1.2][0.1.2.3.4]; //YUV and SA0_TYPE  
 iCount = m_iCount[0.1.2][ 0.1.2.3.4];  

如果是BO, 通过iClassIdx = m_lumaTableBo [ pRec [ x ]];来确定属于那个条带,并记录 iStats [ iClassIdx ] += ( pOrg [ x ] - pRec [ x ]); iCount [ iClassIdx ]++;这里 iClassIdx 大小为0~32.

如果是EO, 通过iClassIdx =m_auiEoTable[uiEdgeType]来确定模板类型,并记录iStats[iClassIdx] += (pOrg[x] -pRec[x]);iCount[iClassIdx]++;

这里iClassIdx大小为0~4.

其中映射关系如下:就是查表映射.

[cpp]  view plain copy print ?  
 const UInt TComSampleAdaptiveOffset::m_auiEoTable[9] =  
 { 1, //0   
 2, //1  
 0, //2  
 3, //3  
 4, //4  
  0, //5   
 0, //6  
 0, //7  
 0};  

随后函数TEncSampleAdaptiveOffset::saoComponentParamDist完成最佳SAOTYPE的选择. 得到最佳dCostPartBest和typeIdx等信息.

下面两个语句得到每个BO带或者EO种类的offset.

[cpp]  view plain copy print ?  
 m_iOffset[compIdx][typeIdx][classIdx] = m_iOffsetOrg[compIdx][typeIdx][classIdx]/(m_iCount[compIdx][typeIdx][classIdx];  
 m_iOffset[compIdx][typeIdx][classIdx] = Clip3(-m_iOffsetTh+1, m_iOffsetTh-1, (Int)m_iOffset[compIdx][typeIdx][classIdx]);  
 m_iOffset[compIdx][typeIdx][classIdx] = estIterOffset();  

最后通过比较得到最佳dCostPartBest 和typeIdx.

[cpp]  view plain copy print ?  
    if(m_dCost[yCbCr][typeIdx] < dCostPartBest)  
    {  
      dCostPartBest = m_dCost[yCbCr][typeIdx];  
 // saoLcuParam保存目前最好的.  
      copySaoUnit(saoLcuParam, &saoLcuParamRdo );       
 bestDist = estDist;        
 }  

同样TEncSampleAdaptiveOffset::sao2ChromaParamDist完成色度的选择.

TEncSampleAdaptiveOffset::SAOProcess()中的函数processSaoUnitAll主要完成SAO解码的操作,也就是对重建帧进行SAOoffset叠加.

另外一个同名函数SAOProcess属于COM库,主要是解码使用,也就调用processSaoUnitAll为主,负责恢复SAO偏移量.

[cpp]  view plain copy print ?  
 Void TComSampleAdaptiveOffset::SAOProcess(TComPic* pcPic, SAOParam* pcSaoParam)  
 { …  
 if (pcSaoParam->bSaoFlag[0] || pcSaoParam->bSaoFlag[1])  
  { if (pcSaoParam->bSaoFlag[0])  
    {  
      processSaoUnitAll(saoLcuParam[iY], oneUnitFlag[iY], iY);  
    }  
    if(pcSaoParam->bSaoFlag[1])  
    {  
       processSaoUnitAll(saoLcuParam[1], oneUnitFlag[1], 1);//Cb  
       processSaoUnitAll(saoLcuParam[2], oneUnitFlag[2], 2);//Cr  
    }  
  }  
 }  

c) SAO--自适应样点补偿意义

SAO意义:大量模拟测试和资料显示, SAO平均可以节约2%到6%的码率, 而编解码的复杂度只增加2%左右!SAO主要目的和操作原理减少源图像与重构图像之间的失真。如果只看这点,实际上每帧编码后的码率反而会增加,因为多了SAO的相关语法和语义以及补偿值的编码!其实不然,虽然当前帧的码率增加了几个字节或者几个bit, 但是这点增加码字使得源图像与重构图像的失真减少,使接下来的预测残差更小了,从而大大的降低码率了!

我在想是否可以增加一个无损编码的HEVC版本呢? 增加另外一些语法和语义,把源图像与重构图像之间的失真单独拿出来再次采用其它无损编码技术编码,如果在需要超清晰的图像时,就可以和这个无损编码的码流组合解码,重建无损图像.个人随想!大家也可以发表下观点啊!

代码注释：

（一）http://blog.csdn.net/hevc_cjl/article/details/8288479

关于SAO的原理和流程的解析，已经在我转载的一篇博客HEVC中SAO--自适应样点补偿详细分析解读有了比较清楚的介绍了，本文就不再重复这个过程，而把主要精力放在具体函数实现的解析上。在我自己的一篇博客HEVC学习（八） —— 以SAO为例浅析跟踪代码方法里其实也作了相关的铺垫了，当中重点放在跟踪代码的方法上，本文在此基础上对重要的函数进行解析，它们的调用位置这里就不提了，有关这部分的内容请参考前述的两篇博客。

本文首先介绍

[cpp]  view plain copy print ?  
 m_cEncSAO.create( getSourceWidth(), getSourceHeight(), g_uiMaxCUWidth, g_uiMaxCUHeight, _uiMaxCUDepth );  

该函数为SAO的相关参数分配内存和初始化。

[cpp]  view plain copy print ?  
 /** create SampleAdaptiveOffset memory. 
  * \param  
  */  
 Void TComSampleAdaptiveOffset::create( UInt uiSourceWidth, UInt uiSourceHeight, UInt uiMaxCUWidth, UInt uiMaxCUHeight, UInt uiMaxCUDepth)  
 {  
   m_iPicWidth  = uiSourceWidth;  
   m_iPicHeight = uiSourceHeight;  
   
   m_uiMaxCUWidth  = uiMaxCUWidth;  
   m_uiMaxCUHeight = uiMaxCUHeight;  
   
   m_iNumCuInWidth  = m_iPicWidth / m_uiMaxCUWidth;  
   m_iNumCuInWidth += ( m_iPicWidth % m_uiMaxCUWidth ) ? 1 : 0;  
   
   m_iNumCuInHeight  = m_iPicHeight / m_uiMaxCUHeight;  
   m_iNumCuInHeight += ( m_iPicHeight % m_uiMaxCUHeight ) ? 1 : 0;  
   //! 以下四句根据CU的实际尺寸计算可划分的最大深度  
   Int iMaxSplitLevelHeight = (Int)(logf((Float)m_iNumCuInHeight)/logf(2.0));  
   Int iMaxSplitLevelWidth  = (Int)(logf((Float)m_iNumCuInWidth )/logf(2.0));  
   
   m_uiMaxSplitLevel = (iMaxSplitLevelHeight < iMaxSplitLevelWidth)?(iMaxSplitLevelHeight):(iMaxSplitLevelWidth);  
   m_uiMaxSplitLevel = (m_uiMaxSplitLevel< m_uiMaxDepth)?(m_uiMaxSplitLevel):(m_uiMaxDepth);  
   /* various structures are overloaded to store per component data. 
    * m_iNumTotalParts must allow for sufficient storage in any allocated arrays */  
   /* 
   const Int TComSampleAdaptiveOffset::m_aiNumCulPartsLevel[5] = 
   { 
   1,   //level 0 
   5,   //level 1 
   21,  //level 2 
   85,  //level 3 
   341, //level 4 
   }; 
   */  
   m_iNumTotalParts  = max(3,m_aiNumCulPartsLevel[m_uiMaxSplitLevel]);  
   
   UInt uiPixelRangeY = 1 << g_bitDepthY; //!< Y分量的像素范围（256 for 8it-depth）  
   UInt uiBoRangeShiftY = g_bitDepthY - SAO_BO_BITS; //!< Band Offset (BO)中每个band的宽度（3 for 8bit-depth，即1<<3 = 8）（Y分量）  
   
   m_lumaTableBo = new Pel [uiPixelRangeY]; //!< Y分量在BO模式下的索引表，根据像素值可以索引到对应的band序号  
   for (Int k2=0; k2<uiPixelRangeY; k2++)  
   {  
     m_lumaTableBo[k2] = 1 + (k2>>uiBoRangeShiftY); //!< 总共分了32个band(1~32)  
   }  
   
   UInt uiPixelRangeC = 1 << g_bitDepthC; //!< CbCr分量的像素范围（256 for 8it-depth）  
   UInt uiBoRangeShiftC = g_bitDepthC - SAO_BO_BITS; //!< Band Offset (BO)中每个band的宽度（3 for 8bit-depth，即1<<3 = 8）（CbCr分量）  
   
   m_chromaTableBo = new Pel [uiPixelRangeC]; //!< CbCr分量在BO模式下的索引表，根据像素值可以索引到对应的band序号  
   for (Int k2=0; k2<uiPixelRangeC; k2++)  
   {  
     m_chromaTableBo[k2] = 1 + (k2>>uiBoRangeShiftC); //!< 总共分了32个band(1~32)  
   }  
   
   m_iUpBuff1 = new Int[m_iPicWidth+2];  
   m_iUpBuff2 = new Int[m_iPicWidth+2];  
   m_iUpBufft = new Int[m_iPicWidth+2];  
   
   m_iUpBuff1++;  
   m_iUpBuff2++;  
   m_iUpBufft++;  
   Pel i;  
   
   UInt uiMaxY  = (1 << g_bitDepthY) - 1; //!< Y分量像素最大值（255 for 8bit-depth）  
   UInt uiMinY  = 0;  
   
   Int iCRangeExt = uiMaxY>>1; //!< 对范围进行扩展  
   
   m_pClipTableBase = new Pel[uiMaxY+2*iCRangeExt];  
   m_iOffsetBo      = new Int[uiMaxY+2*iCRangeExt];  
   
   for(i=0;i<(uiMinY+iCRangeExt);i++)  
   {  
     m_pClipTableBase[i] = uiMinY;  
   }  
   
   for(i=uiMinY+iCRangeExt;i<(uiMaxY+  iCRangeExt);i++)  
   {  
     m_pClipTableBase[i] = i-iCRangeExt;  
   }  
   
   for(i=uiMaxY+iCRangeExt;i<(uiMaxY+2*iCRangeExt);i++)  
   {  
     m_pClipTableBase[i] = uiMaxY;  
   }  
   //! 0 0 0 ... 0 1 2 3 4 ... 255 255 ... 255//!  
   m_pClipTable = &(m_pClipTableBase[iCRangeExt]); //!< 查找表  
   
   UInt uiMaxC  = (1 << g_bitDepthC) - 1; //!< CbCr分量像素最大值（255 for 8bit-depth）  
   UInt uiMinC  = 0;  
   
   Int iCRangeExtC = uiMaxC>>1; //!< 对范围进行扩展  
   
   m_pChromaClipTableBase = new Pel[uiMaxC+2*iCRangeExtC];  
   m_iChromaOffsetBo      = new Int[uiMaxC+2*iCRangeExtC];  
   
   for(i=0;i<(uiMinC+iCRangeExtC);i++)  
   {  
     m_pChromaClipTableBase[i] = uiMinC;  
   }  
   
   for(i=uiMinC+iCRangeExtC;i<(uiMaxC+  iCRangeExtC);i++)  
   {  
     m_pChromaClipTableBase[i] = i-iCRangeExtC;  
   }  
   
   for(i=uiMaxC+iCRangeExtC;i<(uiMaxC+2*iCRangeExtC);i++)  
   {  
     m_pChromaClipTableBase[i] = uiMaxC;  
   }  
   //! 0 0 0 ... 0 ! 1 2 3 4 ... 255 255 ... 255 //!  
   m_pChromaClipTable = &(m_pChromaClipTableBase[iCRangeExtC]); //!< 查找表，实际上与Y分量是相同的  
   
   m_iLcuPartIdx = new Int [m_iNumCuInHeight*m_iNumCuInWidth];  
   m_pTmpL1 = new Pel [m_uiMaxCUHeight+1];  
   m_pTmpL2 = new Pel [m_uiMaxCUHeight+1];  
   m_pTmpU1 = new Pel [m_iPicWidth];  
   m_pTmpU2 = new Pel [m_iPicWidth];  
 }</span>  

（二）http://blog.csdn.net/hevc_cjl/article/details/8288572

[cpp]  view plain copy print ?  
 /** rate distortion optimization of all SAO units 
  * \param saoParam SAO parameters 
  * \param lambda  
  * \param lambdaChroma 
  */  
 #if SAO_ENCODING_CHOICE  
 Void TEncSampleAdaptiveOffset::rdoSaoUnitAll(SAOParam *saoParam, Double lambda, Double lambdaChroma, Int depth)  
 #else  
 Void TEncSampleAdaptiveOffset::rdoSaoUnitAll(SAOParam *saoParam, Double lambda, Double lambdaChroma)  
 #endif  
 {  
   
   Int idxY;  
   Int idxX;  
   Int frameHeightInCU = saoParam->numCuInHeight;  
   Int frameWidthInCU  = saoParam->numCuInWidth;  
   Int j, k;  
   Int addr = 0;  
   Int addrUp = -1;  
   Int addrLeft = -1;  
   Int compIdx = 0;  
   SaoLcuParam mergeSaoParam[3][2];  
   Double compDistortion[3];  
   
   saoParam->bSaoFlag[0] = true;  
   saoParam->bSaoFlag[1] = true;  
   saoParam->oneUnitFlag[0] = false;  
   saoParam->oneUnitFlag[1] = false;  
   saoParam->oneUnitFlag[2] = false;  
   
 #if SAO_ENCODING_CHOICE  
 #if SAO_ENCODING_CHOICE_CHROMA  
   Int numNoSao[2];  
   numNoSao[0] = 0;// Luma   
   numNoSao[1] = 0;// Chroma   
   if( depth > 0 && m_depthSaoRate[0][depth-1] > SAO_ENCODING_RATE )  
   {  
     saoParam->bSaoFlag[0] = false;  
   }  
   if( depth > 0 && m_depthSaoRate[1][depth-1] > SAO_ENCODING_RATE_CHROMA )  
   {  
     saoParam->bSaoFlag[1] = false;  
   }  
 #else  
   Int numNoSao = 0;  
   
   if( depth > 0 && m_depth0SaoRate > SAO_ENCODING_RATE )  
   {  
     saoParam->bSaoFlag[0] = false;  
     saoParam->bSaoFlag[1] = false;  
   }  
 #endif  
 #endif  
   //!< 以LCU为单位对图像中的每个LCU进行遍历  
   for (idxY = 0; idxY< frameHeightInCU; idxY++)  
   {  
     for (idxX = 0; idxX< frameWidthInCU; idxX++)  
     {  
       addr     = idxX  + frameWidthInCU*idxY; //!< 当前LCU地址  
       addrUp   = addr < frameWidthInCU ? -1:idxX   + frameWidthInCU*(idxY-1); //!< 当前LCU上邻块地址  
       addrLeft = idxX == 0               ? -1:idxX-1 + frameWidthInCU*idxY; //!< 当前LCU左邻块地址  
       Int allowMergeLeft = 1;  
       Int allowMergeUp   = 1;  
       UInt rate;  
       Double bestCost, mergeCost;  
       if (idxX!=0) //!< 非第1列  
       {   
         // check tile id and slice id //! 检查当前LCU与其左邻块是否属于同一个tile以及是否属于同一个slice，不同的话该邻块不可用  
         if ( (m_pcPic->getPicSym()->getTileIdxMap(addr-1) != m_pcPic->getPicSym()->getTileIdxMap(addr)) || (m_pcPic->getCU(addr-1)->getSlice()->getSliceIdx() != m_pcPic->getCU(addr)->getSlice()->getSliceIdx()))  
         {  
           allowMergeLeft = 0; //!< 左邻块不可用  
         }  
       }  
       else  
       {  
         allowMergeLeft = 0; //!< 第1列的左邻块均不可用  
       }  
       if (idxY!=0) //!< 非第1行  
       {//! 检查当前LCU与其上邻块是否属于同一个tile以及是否属于同一个slice，不同的话该邻块不可用  
         if ( (m_pcPic->getPicSym()->getTileIdxMap(addr-m_iNumCuInWidth) != m_pcPic->getPicSym()->getTileIdxMap(addr)) || (m_pcPic->getCU(addr-m_iNumCuInWidth)->getSlice()->getSliceIdx() != m_pcPic->getCU(addr)->getSlice()->getSliceIdx()))  
         {  
           allowMergeUp = 0; //!< 上邻块不可用  
         }  
       }  
       else  
       {  
         allowMergeUp = 0; //!< 第1行的上邻块均不可用  
       }  
   
       compDistortion[0] = 0; //!< Y distortion  
       compDistortion[1] = 0; //!< Cb distortion  
       compDistortion[2] = 0; //!< Cr distortion  
       m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[0][CI_CURR_BEST]);  
       if (allowMergeLeft)  
       {  
         m_pcEntropyCoder->m_pcEntropyCoderIf->codeSaoMerge(0); //!< 编码句法元素sao_merge_left_flag  
       }  
       if (allowMergeUp)  
       {  
         m_pcEntropyCoder->m_pcEntropyCoderIf->codeSaoMerge(0); //!< 编码句法元素sao_merge_up_flag  
       }  
       m_pcRDGoOnSbacCoder->store( m_pppcRDSbacCoder[0][CI_TEMP_BEST] );  
       // reset stats Y, Cb, Cr  
       for ( compIdx=0;compIdx<3;compIdx++)  
       {  
         for ( j=0;j<MAX_NUM_SAO_TYPE;j++)  
         {  
           for ( k=0;k< MAX_NUM_SAO_CLASS;k++)  
           {  
             m_iOffset   [compIdx][j][k] = 0;  
             if( m_saoLcuBasedOptimization && m_saoLcuBoundary ){ //!< true && false  
               m_iCount    [compIdx][j][k] = m_count_PreDblk    [addr][compIdx][j][k];  
               m_iOffsetOrg[compIdx][j][k] = m_offsetOrg_PreDblk[addr][compIdx][j][k];  
             }  
             else  
             {  
               m_iCount    [compIdx][j][k] = 0;  
               m_iOffsetOrg[compIdx][j][k] = 0;  
             }  
           }    
         }  
         saoParam->saoLcuParam[compIdx][addr].typeIdx       =  -1;  
         saoParam->saoLcuParam[compIdx][addr].mergeUpFlag   = 0;  
         saoParam->saoLcuParam[compIdx][addr].mergeLeftFlag = 0;  
         saoParam->saoLcuParam[compIdx][addr].subTypeIdx    = 0;  
 #if SAO_ENCODING_CHOICE  
   if( (compIdx ==0 && saoParam->bSaoFlag[0])|| (compIdx >0 && saoParam->bSaoFlag[1]) )  
 #endif  
         {//! 统计BO和EO各个模式下，对应classIdx下滤波前的重建像素值与原始像素值的差值的总和，以及对classIdx的计数  
           calcSaoStatsCu(addr, compIdx,  compIdx);  
         }  
       }  
       //!< Y分量最佳滤波模式的选择  
       saoComponentParamDist(allowMergeLeft, allowMergeUp, saoParam, addr, addrUp, addrLeft, 0,  lambda, &mergeSaoParam[0][0], &compDistortion[0]);  
       //!< CbCr分量最佳滤波模式的选择  
       sao2ChromaParamDist(allowMergeLeft, allowMergeUp, saoParam, addr, addrUp, addrLeft, lambdaChroma, &mergeSaoParam[1][0], &mergeSaoParam[2][0], &compDistortion[0]);  
      if( saoParam->bSaoFlag[0] || saoParam->bSaoFlag[1] )  
       {  
         // Cost of new SAO_params  
         m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[0][CI_CURR_BEST]);  
         m_pcRDGoOnSbacCoder->resetBits();  
         if (allowMergeLeft)  
         {  
           m_pcEntropyCoder->m_pcEntropyCoderIf->codeSaoMerge(0);   
         }  
         if (allowMergeUp)  
         {  
           m_pcEntropyCoder->m_pcEntropyCoderIf->codeSaoMerge(0);  
         }  
         for ( compIdx=0;compIdx<3;compIdx++)  
         {  
         if( (compIdx ==0 && saoParam->bSaoFlag[0]) || (compIdx >0 && saoParam->bSaoFlag[1]))  
           {  
            m_pcEntropyCoder->encodeSaoOffset(&saoParam->saoLcuParam[compIdx][addr], compIdx);  
           }  
         }  
   
         rate = m_pcEntropyCoder->getNumberOfWrittenBits();  
         bestCost = compDistortion[0] + (Double)rate;  
         m_pcRDGoOnSbacCoder->store(m_pppcRDSbacCoder[0][CI_TEMP_BEST]);  
   
         // Cost of Merge  
         for(Int mergeUp=0; mergeUp<2; ++mergeUp)  
         {  
           if ( (allowMergeLeft && (mergeUp==0)) || (allowMergeUp && (mergeUp==1)) )  
           {  
             m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[0][CI_CURR_BEST]);  
             m_pcRDGoOnSbacCoder->resetBits();  
             if (allowMergeLeft)  
             {  
               m_pcEntropyCoder->m_pcEntropyCoderIf->codeSaoMerge(1-mergeUp);   
             }  
             if ( allowMergeUp && (mergeUp==1) )  
             {  
               m_pcEntropyCoder->m_pcEntropyCoderIf->codeSaoMerge(1);   
             }  
   
             rate = m_pcEntropyCoder->getNumberOfWrittenBits();  
             mergeCost = compDistortion[mergeUp+1] + (Double)rate;  
             if (mergeCost < bestCost)  
             {  
               bestCost = mergeCost;  
               m_pcRDGoOnSbacCoder->store(m_pppcRDSbacCoder[0][CI_TEMP_BEST]);                
               for ( compIdx=0;compIdx<3;compIdx++)  
               {  
                 mergeSaoParam[compIdx][mergeUp].mergeLeftFlag = 1-mergeUp;  
                 mergeSaoParam[compIdx][mergeUp].mergeUpFlag = mergeUp;  
                 if( (compIdx==0 && saoParam->bSaoFlag[0]) || (compIdx>0 && saoParam->bSaoFlag[1]))  
                 {  
                   copySaoUnit(&saoParam->saoLcuParam[compIdx][addr], &mergeSaoParam[compIdx][mergeUp] );               
                 }  
               }  
             }  
           }  
         }  
 #if SAO_ENCODING_CHOICE  
 #if SAO_ENCODING_CHOICE_CHROMA  
 if( saoParam->saoLcuParam[0][addr].typeIdx == -1) //!< Y分量不存在SAO参数  
 {  
   numNoSao[0]++;  
 }  
 if( saoParam->saoLcuParam[1][addr].typeIdx == -1) //!< CbCr分量不存在SAO参数  
 {  
   numNoSao[1]+=2;  
 }  
 #else  
         for ( compIdx=0;compIdx<3;compIdx++)  
         {  
           if( depth == 0 && saoParam->saoLcuParam[compIdx][addr].typeIdx == -1)  
           {  
             numNoSao++;  
           }  
         }  
 #endif  
 #endif  
         m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[0][CI_TEMP_BEST]);  
         m_pcRDGoOnSbacCoder->store(m_pppcRDSbacCoder[0][CI_CURR_BEST]);  
       } //!< if( saoParam->bSaoFlag[0] || saoParam->bSaoFlag[1] )         
     } //!< for (idxX = 0; idxX< frameWidthInCU; idxX++)       
   } //!< for (idxY = 0; idxY< frameHeightInCU; idxY++)     
 #if SAO_ENCODING_CHOICE  
 #if SAO_ENCODING_CHOICE_CHROMA  
 #if SAO_ENCODING_CHOICE_CHROMA_BF  
   if( !saoParam->bSaoFlag[0])   
   {  
     m_depthSaoRate[0][depth] = 1.0;  
   }  
   else  
   {  
     m_depthSaoRate[0][depth] = numNoSao[0]/((Double) frameHeightInCU*frameWidthInCU);  
   }  
   if( !saoParam->bSaoFlag[1])   
   {  
     m_depthSaoRate[1][depth] = 1.0;  
   }  
   else   
   {  
     m_depthSaoRate[1][depth] = numNoSao[1]/((Double) frameHeightInCU*frameWidthInCU*2);  
   }  
 #else  
 m_depthSaoRate[0][depth] = numNoSao[0]/((Double) frameHeightInCU*frameWidthInCU);  
 m_depthSaoRate[1][depth] = numNoSao[1]/((Double) frameHeightInCU*frameWidthInCU*2);  
 #endif  
 #else  
   if( depth == 0)  
   {  
     // update SAO Rate  
     m_depth0SaoRate = numNoSao/((Double) frameHeightInCU*frameWidthInCU*3);  
   }  
 #endif  
 #endif  
   
 }  

（三）http://blog.csdn.net/hevc_cjl/article/details/8288584

完成SAO论文中V-D小节所讲述内容中公式（7）中所述的N和E的数值的计算，

其中E对应变量iStats，而N对应变量iCount所指向的数组元素

[cpp]  view plain copy print ?  
 /** Calculate SAO statistics for current LCU 
  * \param  iAddr,  iPartIdx,  iYCbCr 
  */  
 Void TEncSampleAdaptiveOffset::calcSaoStatsCu(Int iAddr, Int iPartIdx, Int iYCbCr)  
 {  
   if(!m_bUseNIF) //!< true for performing non-cross slice boundary ALF  
   {  
     calcSaoStatsCuOrg( iAddr, iPartIdx, iYCbCr);  // 没有多 slice 划分的情况，else中处理多slice的情况  
   }  
   else   
   {  
     Int64** ppStats = m_iOffsetOrg[iPartIdx]; //!< [MAX_NUM_SAO_PART][MAX_NUM_SAO_TYPE][MAX_NUM_SAO_CLASS] ??  
     Int64** ppCount = m_iCount    [iPartIdx]; //!< [MAX_NUM_SAO_PART][MAX_NUM_SAO_TYPE][MAX_NUM_SAO_CLASS]  
   
     //parameters  
     Int  isChroma = (iYCbCr != 0)? 1:0;  
     Int  stride   = (iYCbCr != 0)?(m_pcPic->getCStride()):(m_pcPic->getStride());  
     Pel* pPicOrg = getPicYuvAddr (m_pcPic->getPicYuvOrg(), iYCbCr);  
     Pel* pPicRec  = getPicYuvAddr(m_pcYuvTmp, iYCbCr);  
   
     std::vector<NDBFBlockInfo>& vFilterBlocks = *(m_pcPic->getCU(iAddr)->getNDBFilterBlocks());  
   
     //variables  
     UInt  xPos, yPos, width, height;  
     Bool* pbBorderAvail;  
     UInt  posOffset;  
   
     for(Int i=0; i< vFilterBlocks.size(); i++)  
     {  
       xPos        = vFilterBlocks[i].posX   >> isChroma;  
       yPos        = vFilterBlocks[i].posY   >> isChroma;  
       width       = vFilterBlocks[i].width  >> isChroma;  
       height      = vFilterBlocks[i].height >> isChroma;  
       pbBorderAvail = vFilterBlocks[i].isBorderAvailable;  
   
       posOffset = (yPos* stride) + xPos;  
       //! 对ppStats，ppCount赋值，分别计算出对应滤波模式下原始像素与重建像素之间的差值，重建值对应的classIdx的统计值  
       calcSaoStatsBlock(pPicRec+ posOffset, pPicOrg+ posOffset, stride, ppStats, ppCount,width, height, pbBorderAvail, iYCbCr);  
     }  
   }  
   
 }  

[cpp]  view plain copy print ?  
 /** Calculate SAO statistics for non-cross-slice or non-cross-tile processing 
  * \param  pRecStart to-be-filtered block buffer pointer 
  * \param  pOrgStart original block buffer pointer 
  * \param  stride picture buffer stride 
  * \param  ppStat statistics buffer 
  * \param  ppCount counter buffer 
  * \param  width block width 
  * \param  height block height 
  * \param  pbBorderAvail availabilities of block border pixels 
  */  
 Void TEncSampleAdaptiveOffset::calcSaoStatsBlock( Pel* pRecStart, Pel* pOrgStart, Int stride, Int64** ppStats, Int64** ppCount, UInt width, UInt height, Bool* pbBorderAvail, Int iYCbCr)  
 {  
   Int64 *stats, *count;  
   Int classIdx, posShift, startX, endX, startY, endY, signLeft,signRight,signDown,signDown1;  
   Pel *pOrg, *pRec;  
   UInt edgeType;  
   Int x, y;  
   Pel *pTableBo = (iYCbCr==0)?m_lumaTableBo:m_chromaTableBo; //!< band offset 的索引表，共32个bands  
   
   //--------- Band offset （对应SAO论文中第III节讲述的BO）-----------//  
   stats = ppStats[SAO_BO];  
   count = ppCount[SAO_BO];  
   pOrg   = pOrgStart;  
   pRec   = pRecStart;  
   for (y=0; y< height; y++)  
   {  
     for (x=0; x< width; x++)  
     {  
       classIdx = pTableBo[pRec[x]]; //!< classIdx即查表得到的band对应的序号值（1~32）  
       if (classIdx)  
       {  
         stats[classIdx] += (pOrg[x] - pRec[x]); //!< 对原始像素与重建像素的差值求和  
         count[classIdx] ++; //!< 对应classIdx的统计值加1  
       }  
     }  
     pOrg += stride;  
     pRec += stride;  
   }  
   //---------- Edge offset 0 （对应SAO论文中第III节中EO的class划分中第一类，一下 SAO_EO_1 等类似）--------------//  
   stats = ppStats[SAO_EO_0];  
   count = ppCount[SAO_EO_0];  
   pOrg   = pOrgStart;  
   pRec   = pRecStart;  
   
   //!< 设置起始点和终点  
   startX = (pbBorderAvail[SGU_L]) ? 0 : 1;  
   endX   = (pbBorderAvail[SGU_R]) ? width : (width -1);  
   for (y=0; y< height; y++)  
   {  
     signLeft = xSign(pRec[startX] - pRec[startX-1]); //!< 取 p - n0 的符号  
     for (x=startX; x< endX; x++)  
     {  
       signRight =  xSign(pRec[x] - pRec[x+1]); //!< 取 p - n1 的符号  
       edgeType =  signRight + signLeft + 2; //!< 计算符号类型，用于下面通过查表的方式确定EdgeIdx  
       signLeft  = -signRight;  // 这里往右遍历时右边的点可以利用左边邻近点的边界信息，减少了计算量  
   
       /* 
       const UInt TComSampleAdaptiveOffset::m_auiEoTable[9] = { 
       1, //0     
       2, //1    
       0, //2 
       3, //3 
       4, //4 
       0, //5   
       0, //6   
       0, //7  
       0}; 
       */  
       stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]); //!< 通过查表可以确定出真正的EdgeIdx，从而把统计值保存到对应的类型中去  
       count[m_auiEoTable[edgeType]] ++;  
     }  
     pRec  += stride;  
     pOrg += stride;  
   }  
   
   //---------- Edge offset 1--------------//  
   stats = ppStats[SAO_EO_1];  
   count = ppCount[SAO_EO_1];  
   pOrg   = pOrgStart;  
   pRec   = pRecStart;  
   
   startY = (pbBorderAvail[SGU_T]) ? 0 : 1;  
   endY   = (pbBorderAvail[SGU_B]) ? height : height-1;  
   if (!pbBorderAvail[SGU_T]) //!< 如果上邻行不可用，则下移一行  
   {  
     pRec  += stride;  
     pOrg  += stride;  
   }  
   
   for (x=0; x< width; x++) //!< 先计算第一行与其上一行的差值  
   {  
     m_iUpBuff1[x] = xSign(pRec[x] - pRec[x-stride]); //!< 保存整一行 p - n0 的符号  
   }  
   for (y=startY; y<endY; y++)  
   {  
     for (x=0; x< width; x++)  
     {  
       signDown     =  xSign(pRec[x] - pRec[x+stride]); //!< 取 p - n1 的符号  
       edgeType    =  signDown + m_iUpBuff1[x] + 2;  
       m_iUpBuff1[x] = -signDown; //!< -signDown相当于是下一行的 p - n0 的符号，保存下来  
   
       stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
       count[m_auiEoTable[edgeType]] ++;  
     }  
     pOrg += stride;  
     pRec += stride;  
   }  
   //---------- Edge offset 2--------------//  
   stats = ppStats[SAO_EO_2];  
   count = ppCount[SAO_EO_2];  
   pOrg   = pOrgStart;  
   pRec   = pRecStart;  
   
   posShift= stride + 1;  
   
   startX = (pbBorderAvail[SGU_L]) ? 0 : 1 ;  
   endX   = (pbBorderAvail[SGU_R]) ? width : (width-1);  
   
   //prepare 2nd line upper sign  
   pRec += stride;  
   for (x=startX; x< endX+1; x++) //!< 先计算第二行的 p - n0 的符号，并保存下来  
   {  
     m_iUpBuff1[x] = xSign(pRec[x] - pRec[x- posShift]);  
   }  
   
   //1st line  
   pRec -= stride; //!< 回到第一行  
   if(pbBorderAvail[SGU_TL]) //!< Top Left available  
   {  
     x= 0;  
     edgeType =  xSign(pRec[x] - pRec[x- posShift]) - m_iUpBuff1[x+1] + 2; //!< -m_iUpBuff1[x+1]是因为第二行的p - n0相当于第一行的n1 - p，且第二行的x+1的位置才是第一行的x位置  
     stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
     count[m_auiEoTable[edgeType]] ++;  
   }  
   if(pbBorderAvail[SGU_T]) //!< Top available  
   {  
     for(x= 1; x< endX; x++)  
     {  
       edgeType      =  xSign(pRec[x] - pRec[x- posShift]) - m_iUpBuff1[x+1] + 2;  
       stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
       count[m_auiEoTable[edgeType]] ++;  
     }  
   }  
   pRec   += stride;  
   pOrg   += stride;  
   
   //middle lines  
   for (y= 1; y< height-1; y++) //!< 除了第一行和最后一行的行  
   {  
     for (x=startX; x<endX; x++)  
     {  
       signDown1      =  xSign(pRec[x] - pRec[x+ posShift]) ; //!< 取 p - n1 的符号  
       edgeType      =  signDown1 + m_iUpBuff1[x] + 2; //!< 此时m_iUpBuff1[x]正好就是p - n0 的符号  
       stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
       count[m_auiEoTable[edgeType]] ++;  
   
       m_iUpBufft[x+1] = -signDown1; //!< 当前行的p - n1相当于下一行的p - n0 的相反数，且当前行的x，对应于下一行的x+1  
     }  
     m_iUpBufft[startX] = xSign(pRec[stride+startX] - pRec[startX-1]); //!< 取startX位置的p - n0的符号  
     //!< m_iUpBuff1与m_iUpBufft交换，交换完成后，m_iUpBuff1将保存的是下一行的p - n0的符号  
     ipSwap     = m_iUpBuff1;  
     m_iUpBuff1 = m_iUpBufft;  
     m_iUpBufft = ipSwap;  
   
     pRec  += stride;  
     pOrg  += stride;  
   }  
   
   //last line  
   if(pbBorderAvail[SGU_B]) //!< 最后一行，Bottom available  
   {  
     for(x= startX; x< width-1; x++)  
     {  
       edgeType =  xSign(pRec[x] - pRec[x+ posShift]) + m_iUpBuff1[x] + 2; //!< 此时m_iUpBuff1保存的就是最后一行的p - n0的符号  
       stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
       count[m_auiEoTable[edgeType]] ++;  
     }  
   }  
   if(pbBorderAvail[SGU_BR]) //!< Bottom Right available  
   {  
     x= width -1;  
     edgeType =  xSign(pRec[x] - pRec[x+ posShift]) + m_iUpBuff1[x] + 2;  
     stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
     count[m_auiEoTable[edgeType]] ++;  
   }  
   
   //---------- Edge offset 3--------------//  
   
   stats = ppStats[SAO_EO_3];  
   count = ppCount[SAO_EO_3];  
   pOrg   = pOrgStart;  
   pRec   = pRecStart;  
   
   posShift     = stride - 1;  
   startX = (pbBorderAvail[SGU_L]) ? 0 : 1;  
   endX   = (pbBorderAvail[SGU_R]) ? width : (width -1);  
   
   //prepare 2nd line upper sign  
   pRec += stride;  
   for (x=startX-1; x< endX; x++) //!< 先算第二行的p - n0的符号并保存到m_iUpBuff1中  
   {  
     m_iUpBuff1[x] = xSign(pRec[x] - pRec[x- posShift]);  
   }  
   
   
   //first line  
   pRec -= stride; //!< 回到第一行  
   if(pbBorderAvail[SGU_T]) //!< Top available  
   {  
     for(x= startX; x< width -1; x++)  
     {  
       edgeType = xSign(pRec[x] - pRec[x- posShift]) -m_iUpBuff1[x-1] + 2;//!< -m_iUpBuff1[x-1]是因为第二行的p - n0相当于第一行的n1 - p，且第二行的x-1的位置才是第一行的x位置  
       stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
       count[m_auiEoTable[edgeType]] ++;  
     }  
   }  
   if(pbBorderAvail[SGU_TR]) //!< Top Right available  
   {  
     x= width-1;  
     edgeType = xSign(pRec[x] - pRec[x- posShift]) -m_iUpBuff1[x-1] + 2;  
     stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
     count[m_auiEoTable[edgeType]] ++;  
   }  
   pRec  += stride;  
   pOrg  += stride;  
   
   //middle lines  
   for (y= 1; y< height-1; y++) //!< 除第一行和最后一行的行  
   {  
     for(x= startX; x< endX; x++)  
     {  
       signDown1      =  xSign(pRec[x] - pRec[x+ posShift]) ;  
       edgeType      =  signDown1 + m_iUpBuff1[x] + 2; //!< 此时m_iUpBuff1就是当前行的p - n0的符号  
   
       stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
       count[m_auiEoTable[edgeType]] ++;  
       m_iUpBuff1[x-1] = -signDown1; //!< 当前行的p - n1相当于下一行的p - n0 的相反数，且当前行的x，对应于下一行的x-1  
   
     }  
     m_iUpBuff1[endX-1] = xSign(pRec[endX-1 + stride] - pRec[endX]); //!< 保存下一行endX-1处的p - n0的符号  
   
     pRec  += stride;  
     pOrg  += stride;  
   }  
   
   //last line  
   if(pbBorderAvail[SGU_BL]) //!< Bottom Left available  
   {  
     x= 0;  
     edgeType = xSign(pRec[x] - pRec[x+ posShift]) + m_iUpBuff1[x] + 2; //!< 此时m_iUpBuff1正好保存的是最后一行的x位置的p - n0的符号  
     stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
     count[m_auiEoTable[edgeType]] ++;  
   
   }  
   if(pbBorderAvail[SGU_B]) //!< Bottom available  
   {  
     for(x= 1; x< endX; x++)  
     {  
       edgeType = xSign(pRec[x] - pRec[x+ posShift]) + m_iUpBuff1[x] + 2;  
       stats[m_auiEoTable[edgeType]] += (pOrg[x] - pRec[x]);  
       count[m_auiEoTable[edgeType]] ++;  
     }  
   }  
 }  

（四） http://blog.csdn.net/hevc_cjl/article/details/8288601

亮度分量上最优SAO参数的选取，色度分量的流程基本上是相似的，因而原作者没有给出注解。

[cpp]  view plain copy print ?  
 Void TEncSampleAdaptiveOffset::saoComponentParamDist(Int allowMergeLeft, Int allowMergeUp, SAOParam *saoParam, Int addr, Int addrUp, Int addrLeft, Int yCbCr, Double lambda, SaoLcuParam *compSaoParam, Double *compDistortion)  
 {  
   Int typeIdx;  
   
   Int64 estDist;  
   Int classIdx;  
   Int shift = 2 * DISTORTION_PRECISION_ADJUSTMENT(((yCbCr==0)?g_bitDepthY:g_bitDepthC)-8); //!< 0 for 8bit-depth  
   Int64 bestDist;  
   
   SaoLcuParam*  saoLcuParam = &(saoParam->saoLcuParam[yCbCr][addr]);  
   SaoLcuParam*  saoLcuParamNeighbor = NULL;   
   
   resetSaoUnit(saoLcuParam);  
   resetSaoUnit(&compSaoParam[0]); //!< 左邻块的SAO参数  
   resetSaoUnit(&compSaoParam[1]); //!< 上邻块的SAO参数  
   
   
   Double dCostPartBest = MAX_DOUBLE;  
   
   Double  bestRDCostTableBo = MAX_DOUBLE;  
   Int     bestClassTableBo    = 0;  
   Int     currentDistortionTableBo[MAX_NUM_SAO_CLASS];  
   Double  currentRdCostTableBo[MAX_NUM_SAO_CLASS];  
   
   
   SaoLcuParam   saoLcuParamRdo;     
   Double   estRate = 0;  
   
   resetSaoUnit(&saoLcuParamRdo);  
   
   m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[0][CI_TEMP_BEST]);  
   m_pcRDGoOnSbacCoder->resetBits();  
  m_pcEntropyCoder->encodeSaoOffset(&saoLcuParamRdo, yCbCr);  
     
   dCostPartBest = m_pcEntropyCoder->getNumberOfWrittenBits()*lambda ;   
   copySaoUnit(saoLcuParam, &saoLcuParamRdo );  
   bestDist = 0;  
     
   
   
   for (typeIdx=0; typeIdx<MAX_NUM_SAO_TYPE; typeIdx++) //!< 遍历所有的滤波类型  
   {  
     estDist = estSaoTypeDist(yCbCr, typeIdx, shift, lambda, currentDistortionTableBo, currentRdCostTableBo);//!< 得到当前滤波类型的失真值  
   
     if( typeIdx == SAO_BO )  
     {  
       // Estimate Best Position  
       Double currentRDCost = 0.0;  
   
       for(Int i=0; i< SAO_MAX_BO_CLASSES -SAO_BO_LEN +1; i++)  
       {  
         currentRDCost = 0.0;  
         for(UInt uj = i; uj < i+SAO_BO_LEN; uj++) //!< 依次以4个band为单位进行RDcost计算  
         {  
           currentRDCost += currentRdCostTableBo[uj];  
         }  
   
         if( currentRDCost < bestRDCostTableBo) //!< 更新最佳值  
         {  
           bestRDCostTableBo = currentRDCost;  
           bestClassTableBo  = i;  
         }  
       }  
   
       // Re code all Offsets  
       // Code Center  
       estDist = 0;  
       for(classIdx = bestClassTableBo; classIdx < bestClassTableBo+SAO_BO_LEN; classIdx++)   
       {  
         estDist += currentDistortionTableBo[classIdx];  
       }  
     } //!< if( typeIdx == SAO_BO )  
     resetSaoUnit(&saoLcuParamRdo);  
     saoLcuParamRdo.length = m_iNumClass[typeIdx];  
     saoLcuParamRdo.typeIdx = typeIdx;  
     saoLcuParamRdo.mergeLeftFlag = 0;  
     saoLcuParamRdo.mergeUpFlag   = 0;  
     saoLcuParamRdo.subTypeIdx = (typeIdx == SAO_BO) ? bestClassTableBo : 0;  
     for (classIdx = 0; classIdx < saoLcuParamRdo.length; classIdx++)  
     {  
       saoLcuParamRdo.offset[classIdx] = (Int)m_iOffset[yCbCr][typeIdx][classIdx+saoLcuParamRdo.subTypeIdx+1];  
     }  
     m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[0][CI_TEMP_BEST]);  
     m_pcRDGoOnSbacCoder->resetBits();  
     m_pcEntropyCoder->encodeSaoOffset(&saoLcuParamRdo, yCbCr);  
   
     estRate = m_pcEntropyCoder->getNumberOfWrittenBits();  
     m_dCost[yCbCr][typeIdx] = (Double)((Double)estDist + lambda * (Double) estRate);  
   
     if(m_dCost[yCbCr][typeIdx] < dCostPartBest) //!< 更新最佳值  
     {  
       dCostPartBest = m_dCost[yCbCr][typeIdx];  
       copySaoUnit(saoLcuParam, &saoLcuParamRdo );  
       bestDist = estDist;         
     }  
   } //!< for (typeIdx=0; typeIdx<MAX_NUM_SAO_TYPE; typeIdx++)  
   compDistortion[0] += ((Double)bestDist/lambda);  
   m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[0][CI_TEMP_BEST]);  
  m_pcEntropyCoder->encodeSaoOffset(saoLcuParam, yCbCr);  
   m_pcRDGoOnSbacCoder->store( m_pppcRDSbacCoder[0][CI_TEMP_BEST] );  
   
   
   // merge left or merge up  
   
   for (Int idxNeighbor=0;idxNeighbor<2;idxNeighbor++)   
   {  
     saoLcuParamNeighbor = NULL;  
     if (allowMergeLeft && addrLeft>=0 && idxNeighbor ==0) //!< 左邻块可用  
     {  
       saoLcuParamNeighbor = &(saoParam->saoLcuParam[yCbCr][addrLeft]); //!< 取左邻块的SAO参数  
     }  
     else if (allowMergeUp && addrUp>=0 && idxNeighbor ==1) //!< 上邻块可用  
     {  
       saoLcuParamNeighbor = &(saoParam->saoLcuParam[yCbCr][addrUp]); //!< 取上邻块的SAO参数  
     }  
     if (saoLcuParamNeighbor!=NULL)  
     {  
       estDist = 0;  
       typeIdx = saoLcuParamNeighbor->typeIdx;  
       if (typeIdx>=0) //!< typeIdx为有效的   
       {  
         Int mergeBandPosition = (typeIdx == SAO_BO)?saoLcuParamNeighbor->subTypeIdx:0;  
         Int   merge_iOffset;  
         for(classIdx = 0; classIdx < m_iNumClass[typeIdx]; classIdx++)  
         {  
           merge_iOffset = saoLcuParamNeighbor->offset[classIdx];  
           estDist   += estSaoDist(m_iCount [yCbCr][typeIdx][classIdx+mergeBandPosition+1], merge_iOffset, m_iOffsetOrg[yCbCr][typeIdx][classIdx+mergeBandPosition+1],  shift);  
         }  
       }  
       else  
       {  
         estDist = 0;  
       }  
   
       copySaoUnit(&compSaoParam[idxNeighbor], saoLcuParamNeighbor );  
       compSaoParam[idxNeighbor].mergeUpFlag   = idxNeighbor;  
       compSaoParam[idxNeighbor].mergeLeftFlag = !idxNeighbor;  
   
       compDistortion[idxNeighbor+1] += ((Double)estDist/lambda);  
     }   
   }   
 }  

（五）附上一个xSign函数的解析http://blog.csdn.net/hevc_cjl/article/details/8290370

[cpp]  view plain copy print ?  
 inline Int xSign(Int x) //!< 取出x的符号，x大于0返回+1，x小于0返回-1  
 {//! 当x等于0时，返回0；当x小于0时，由于x是int型，x>>31（算术右移）后结果为0xffffffff，即-1，而-x为0x00000001，右移31位后结果为0，  
   //! 因此此时返回值为-1；当x大于0时，x>>31后结果为0x00000000，即0，而-x即为补码形式表示的负数（最高位为1），被转换为unsigned int后，  
   //! 再右移31位时，符号位不会保留，最终移位结果为0x00000001，即1  
   return ((x >> 31) | ((Int)( (((UInt) -x)) >> 31)));  
 }