本文主要是介绍【x264】码率控制模块的简单分析—宏块级码控工具Mbtree和AQ,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
【x264】码率控制模块的简单分析—宏块级码控工具Mbtree和AQ
- 1. 宏块树(mbtree)
- 1.1 计算当前帧的帧内和帧间cost(slicetype_frame_cost)
- 1.2 计算宏块的传播cost(macroblock_tree_propagate)
- 1.3 mbtree的测试
- 2.自适应量化参数(Adaptive Quantization,AQ)
- 2.1 自适应量化一帧(x264_adaptive_quant_frame)
- 2.2 量化参数的调整(x264_ratecontrol_mb_qp)
- 2.3 宏块分析(x264_macroblock_analyse)
- 2.4 AQ的测试
- 3.小结
参考:
帧类型决策-slicetype_frame/slice/mb_cost()
参数分析:
【x264】x264编码器参数配置
流程分析:
【x264】x264编码主流程简单分析
【x264】编码核心函数(x264_encoder_encode)的简单分析
【x264】分析模块(analyse)的简单分析—帧内预测
【x264】分析模块(analyse)的简单分析—帧间预测
1. 宏块树(mbtree)
mbtree是一种宏块级别的码率控制工具,根据帧间预测相关信息,来有效的降低编码的码率。mbtree的工作原理可以简单描述为,在lookahead当中通过前向预测,来获得当前mb的重要程度,重要程度由被参考的情况来衡量。如果被参考的权值很高,说明这个mb很重要,此时这个mb会按照较低的qp编码,否则按照较高的qp编码。这个权值的衡量由当前mb的intra_cost、inter_cost和前面mb赋予当前mb的传播cost决定
在mbtree中,主要的计算过程位于macroblock_tree中,函数位于slicetype.c中,函数主要的工作流程如下:
- 如果当前帧为intra帧,则直接计算cost
- 为propagate_cost赋初始值
- 循环处理每一帧,计算cost
- 计算当前帧的帧内和帧间cost(slicetype_frame_cost)
- 计算宏块的传播cost(macroblock_tree_propagate),之后根据帧内、帧间cost和宏块传播cost来计算qp_offset(macroblock_tree_finish),从而调整当前mb的qp
static void macroblock_tree( x264_t *h, x264_mb_analysis_t *a, x264_frame_t **frames, int num_frames, int b_intra )
{int idx = !b_intra;int last_nonb, cur_nonb = 1;int bframes = 0;x264_emms();float total_duration = 0.0;for( int j = 0; j <= num_frames; j++ )total_duration += frames[j]->f_duration;float average_duration = total_duration / (num_frames + 1);int i = num_frames;// ----- 1.当前帧为intra帧,直接计算cost ----- //if( b_intra )slicetype_frame_cost( h, a, frames, 0, 0, 0 );while( i > 0 && IS_X264_TYPE_B( frames[i]->i_type ) )i--;last_nonb = i;/* Lookaheadless MB-tree is not a theoretically distinct case; the same extrapolation could* be applied to the end of a lookahead buffer of any size. However, it's most needed when* lookahead=0, so that's what's currently implemented. */// ----- 2.为propagate_cost赋初始值 ----- //if( !h->param.rc.i_lookahead ){if( b_intra ){memset( frames[0]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );memcpy( frames[0]->f_qp_offset, frames[0]->f_qp_offset_aq, h->mb.i_mb_count * sizeof(float) );return;}XCHG( uint16_t*, frames[last_nonb]->i_propagate_cost, frames[0]->i_propagate_cost );memset( frames[0]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );}else{if( last_nonb < idx )return;memset( frames[last_nonb]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );}// ----- 3.循环处理每一帧,计算cost ----- //while( i-- > idx ){cur_nonb = i;while( IS_X264_TYPE_B( frames[cur_nonb]->i_type ) && cur_nonb > 0 )cur_nonb--;if( cur_nonb < idx )break;// ---- 4.计算当前帧的帧内和帧间cost ----- //slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, last_nonb );memset( frames[cur_nonb]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );bframes = last_nonb - cur_nonb - 1;if( h->param.i_bframe_pyramid && bframes > 1 ) // B帧{int middle = (bframes + 1)/2 + cur_nonb;slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, middle );memset( frames[middle]->i_propagate_cost, 0, h->mb.i_mb_count * sizeof(uint16_t) );while( i > cur_nonb ){int p0 = i > middle ? middle : cur_nonb;int p1 = i < middle ? middle : last_nonb;if( i != middle ){slicetype_frame_cost( h, a, frames, p0, p1, i );// ----- 5.计算宏块的传播cost ----- //macroblock_tree_propagate( h, frames, average_duration, p0, p1, i, 0 );}i--;}macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, middle, 1 );}else{while( i > cur_nonb ){slicetype_frame_cost( h, a, frames, cur_nonb, last_nonb, i );macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, i, 0 );i--;}}macroblock_tree_propagate( h, frames, average_duration, cur_nonb, last_nonb, last_nonb, 1 );last_nonb = cur_nonb;}if( !h->param.rc.i_lookahead ){slicetype_frame_cost( h, a, frames, 0, last_nonb, last_nonb );macroblock_tree_propagate( h, frames, average_duration, 0, last_nonb, last_nonb, 1 );XCHG( uint16_t*, frames[last_nonb]->i_propagate_cost, frames[0]->i_propagate_cost );}// ----- 6.计算量化参数传播系数 ----- //macroblock_tree_finish( h, frames[last_nonb], average_duration, last_nonb );if( h->param.i_bframe_pyramid && bframes > 1 && !h->param.rc.i_vbv_buffer_size )macroblock_tree_finish( h, frames[last_nonb+(bframes+1)/2], average_duration, 0 );
}
1.1 计算当前帧的帧内和帧间cost(slicetype_frame_cost)
计算帧的帧内和帧间cost,其主要的工作流程为:
- 检查是否已经评估了该帧
- 多线程处理
- 单线程处理
- 计算slice的cost(slicetype_slice_cost),其中会计算mb的cost(slicetype_mb_cost),会进行帧内预测和帧间预测,计算其最佳的cost
- 将计算结果累加起来
/*
将帧b以slice为单位计算其开销,统计其总开销
其中p0表示b的前向参考帧,p1表示b的后向参考帧
若p0 = p1 = b,则表示没有参考帧,即I帧
若p1 = b,则表示只有前向参考帧,即P帧作为I帧,所有宏块的cost = intra cost
作为P帧,所有宏块的cost = min( intra cost, inter cost)
作为B帧,所有宏块的cost = inter cost其中每一个帧都带有开销矩阵i_cost_est[b-p0][p1-b]
表示帧b以p0为前向参考,p1为后向参考时的帧cost
*/
static int slicetype_frame_cost( x264_t *h, x264_mb_analysis_t *a,x264_frame_t **frames, int p0, int p1, int b )
{int i_score = 0;int do_search[2];const x264_weight_t *w = x264_weight_none;x264_frame_t *fenc = frames[b];/* Check whether we already evaluated this frame* If we have tried this frame as P, then we have also tried* the preceding frames as B. (is this still true?) */// ----- 1.检查是否已经评估了该帧,如果我们已经把该帧当作P帧计算,那我们仍然需要将前面的帧当作B帧来计算 ----- ///* Also check that we already calculated the row SATDs for the current frame. */// 同样检查我们是否对该帧进行了行SATD计算if( fenc->i_cost_est[b-p0][p1-b] >= 0 && (!h->param.rc.i_vbv_buffer_size || fenc->i_row_satds[b-p0][p1-b][0] != -1) ) // 如果帧b已经计算了前向参考p0后向参考p1的cost,则直接得到开销i_score = fenc->i_cost_est[b-p0][p1-b];else{int dist_scale_factor = 128;/* For each list, check to see whether we have lowres motion-searched this reference frame before. */// 对于每个列表,检查我们之前是否对这个参考帧进行了低分辨率运动搜索do_search[0] = b != p0 && fenc->lowres_mvs[0][b-p0-1][0][0] == 0x7FFF;do_search[1] = b != p1 && fenc->lowres_mvs[1][p1-b-1][0][0] == 0x7FFF;if( do_search[0] ) // 前向{if( h->param.analyse.i_weighted_pred && b == p1 ){x264_emms();x264_weights_analyse( h, fenc, frames[p0], 1 );w = fenc->weight[0];}fenc->lowres_mvs[0][b-p0-1][0][0] = 0;}if( do_search[1] ) fenc->lowres_mvs[1][p1-b-1][0][0] = 0; // 后向if( p1 != p0 )dist_scale_factor = ( ((b-p0) << 8) + ((p1-p0) >> 1) ) / (p1-p0);int output_buf_size = h->mb.i_mb_height + (NUM_INTS + PAD_SIZE) * h->param.i_lookahead_threads;int *output_inter[X264_LOOKAHEAD_THREAD_MAX+1];int *output_intra[X264_LOOKAHEAD_THREAD_MAX+1];output_inter[0] = h->scratch_buffer2;output_intra[0] = output_inter[0] + output_buf_size;#if HAVE_OPENCLif( h->param.b_opencl ){x264_opencl_lowres_init(h, fenc, a->i_lambda );if( do_search[0] ){x264_opencl_lowres_init( h, frames[p0], a->i_lambda );x264_opencl_motionsearch( h, frames, b, p0, 0, a->i_lambda, w );}if( do_search[1] ){x264_opencl_lowres_init( h, frames[p1], a->i_lambda );x264_opencl_motionsearch( h, frames, b, p1, 1, a->i_lambda, NULL );}if( b != p0 )x264_opencl_finalize_cost( h, a->i_lambda, frames, p0, p1, b, dist_scale_factor );x264_opencl_flush( h );i_score = fenc->i_cost_est[b-p0][p1-b];}else
#endif{ // ----- 2.多线程处理 ----- //if( h->param.i_lookahead_threads > 1 ) // 多个lookahead线程{ x264_slicetype_slice_t s[X264_LOOKAHEAD_THREAD_MAX];for( int i = 0; i < h->param.i_lookahead_threads; i++ ) {x264_t *t = h->lookahead_thread[i];/* FIXME move this somewhere else */// 将搜索方法赋值t->mb.i_me_method = h->mb.i_me_method;t->mb.i_subpel_refine = h->mb.i_subpel_refine;t->mb.b_chroma_me = h->mb.b_chroma_me;s[i] = (x264_slicetype_slice_t){ t, a, frames, p0, p1, b, dist_scale_factor, do_search, w,output_inter[i], output_intra[i] };// 计算起始行t->i_threadslice_start = ((h->mb.i_mb_height * i + h->param.i_lookahead_threads/2) / h->param.i_lookahead_threads);// 计算结束行t->i_threadslice_end = ((h->mb.i_mb_height * (i+1) + h->param.i_lookahead_threads/2) / h->param.i_lookahead_threads);int thread_height = t->i_threadslice_end - t->i_threadslice_start;int thread_output_size = thread_height + NUM_INTS;memset( output_inter[i], 0, thread_output_size * sizeof(int) );memset( output_intra[i], 0, thread_output_size * sizeof(int) );output_inter[i][NUM_ROWS] = output_intra[i][NUM_ROWS] = thread_height;output_inter[i+1] = output_inter[i] + thread_output_size + PAD_SIZE;output_intra[i+1] = output_intra[i] + thread_output_size + PAD_SIZE;// 运行线程x264_threadpool_run( h->lookaheadpool, (void*)slicetype_slice_cost, &s[i] );}// 等待线程结束for( int i = 0; i < h->param.i_lookahead_threads; i++ )x264_threadpool_wait( h->lookaheadpool, &s[i] );}else{ // ----- 3.单个lookahead线程 ----- //h->i_threadslice_start = 0;h->i_threadslice_end = h->mb.i_mb_height;memset( output_inter[0], 0, (output_buf_size - PAD_SIZE) * sizeof(int) );memset( output_intra[0], 0, (output_buf_size - PAD_SIZE) * sizeof(int) );output_inter[0][NUM_ROWS] = output_intra[0][NUM_ROWS] = h->mb.i_mb_height;x264_slicetype_slice_t s = (x264_slicetype_slice_t){ h, a, frames, p0, p1, b, dist_scale_factor, do_search, w,output_inter[0], output_intra[0] };// ---- 4.计算slice的cost ---- //slicetype_slice_cost( &s );}// ----- 5.将计算结果累加起来 ----- ///* Sum up accumulators */if( b == p1 )fenc->i_intra_mbs[b-p0] = 0;if( !fenc->b_intra_calculated ){fenc->i_cost_est[0][0] = 0;fenc->i_cost_est_aq[0][0] = 0;}fenc->i_cost_est[b-p0][p1-b] = 0;fenc->i_cost_est_aq[b-p0][p1-b] = 0;int *row_satd_inter = fenc->i_row_satds[b-p0][p1-b];int *row_satd_intra = fenc->i_row_satds[0][0];for( int i = 0; i < h->param.i_lookahead_threads; i++ ){if( b == p1 )fenc->i_intra_mbs[b-p0] += output_inter[i][INTRA_MBS];if( !fenc->b_intra_calculated ){fenc->i_cost_est[0][0] += output_intra[i][COST_EST];fenc->i_cost_est_aq[0][0] += output_intra[i][COST_EST_AQ];}// 累计intra的cost和aq_cost到开销矩阵i_cost_est[0][0]中fenc->i_cost_est[b-p0][p1-b] += output_inter[i][COST_EST];fenc->i_cost_est_aq[b-p0][p1-b] += output_inter[i][COST_EST_AQ];if( h->param.rc.i_vbv_buffer_size ){int row_count = output_inter[i][NUM_ROWS];memcpy( row_satd_inter, output_inter[i] + NUM_INTS, row_count * sizeof(int) );if( !fenc->b_intra_calculated )memcpy( row_satd_intra, output_intra[i] + NUM_INTS, row_count * sizeof(int) );row_satd_inter += row_count;row_satd_intra += row_count;}}i_score = fenc->i_cost_est[b-p0][p1-b];if( b != p1 )i_score = (uint64_t)i_score * 100 / (120 + h->param.i_bframe_bias);elsefenc->b_intra_calculated = 1;fenc->i_cost_est[b-p0][p1-b] = i_score;x264_emms();}}return i_score;
}
1.2 计算宏块的传播cost(macroblock_tree_propagate)
该函数的主要功能是计算宏块的传播cost(或者叫做遗传cost),主要的工作流程为:
- 遍历所有的mb
- 计算传播cost(mbtree_propagate_cost)
- 将计算地传播cost分给参考mb(mbtree_propagate_list)
- 根据前述的计算结果计算qp_offset(macroblock_tree_finish)
static void macroblock_tree_propagate( x264_t *h, x264_frame_t **frames, float average_duration, int p0, int p1, int b, int referenced )
{// ...float fps_factor = CLIP_DURATION(frames[b]->f_duration) / (CLIP_DURATION(average_duration) * 256.0f) * MBTREE_PRECISION;/* For non-reffed frames the source costs are always zero, so just memset one row and re-use it. */// 对于没有被参考的帧,cost永远是0,因此直接赋值为0即可if( !referenced )memset( frames[b]->i_propagate_cost, 0, h->mb.i_mb_width * sizeof(uint16_t) );// ----- 1.遍历所有的mb ----- //for( h->mb.i_mb_y = 0; h->mb.i_mb_y < h->mb.i_mb_height; h->mb.i_mb_y++ ){int mb_index = h->mb.i_mb_y*h->mb.i_mb_stride;// ----- 2.计算传播cost ----- //h->mc.mbtree_propagate_cost( buf, propagate_cost,frames[b]->i_intra_cost+mb_index, lowres_costs+mb_index,frames[b]->i_inv_qscale_factor+mb_index, &fps_factor, h->mb.i_mb_width );if( referenced )propagate_cost += h->mb.i_mb_width;// ----- 3.将计算地传播cost分给参考mb ----- //h->mc.mbtree_propagate_list( h, ref_costs[0], &mvs[0][mb_index], buf, &lowres_costs[mb_index],bipred_weights[0], h->mb.i_mb_y, h->mb.i_mb_width, 0 );if( b != p1 ){h->mc.mbtree_propagate_list( h, ref_costs[1], &mvs[1][mb_index], buf, &lowres_costs[mb_index],bipred_weights[1], h->mb.i_mb_y, h->mb.i_mb_width, 1 );}}// ----- 4.根据前述的计算结果调整qp ----- //if( h->param.rc.i_vbv_buffer_size && h->param.rc.i_lookahead && referenced )macroblock_tree_finish( h, frames[b], average_duration, b == p1 ? b - p0 : 0 );
}
其中,计算传播cost的核心函数mbtree_propagate_cost的实现方式为
/* Estimate the total amount of influence on future quality that could be had if we* were to improve the reference samples used to inter predict any given macroblock. */
// 本函数评估的内容是,如果提升任何用于帧间预测的样本,会对未来编码的质量带来多大的性能增益
static void mbtree_propagate_cost( int16_t *dst, uint16_t *propagate_in, uint16_t *intra_costs,uint16_t *inter_costs, uint16_t *inv_qscales, float *fps_factor, int len )
{// fps因子,在观看时,如果一个视频帧持续的时间较长,对观看者的主观影响就越大,因此将fps因子作为一个参数加进来进行调整// float fps_factor = CLIP_DURATION(frames[b]->f_duration) / (CLIP_DURATION(average_duration) * 256.0f) * MBTREE_PRECISION;float fps = *fps_factor;for( int i = 0; i < len; i++ ){int intra_cost = intra_costs[i];// 如果inter_cost > intra_cost,将其修正为intra_costint inter_cost = X264_MIN(intra_costs[i], inter_costs[i] & LOWRES_COST_MASK);// qscale是拉格朗日乘子,或者也可叫做lambda,其计算方式为// qscale = 0.85f * powf( 2.0f, ( qp - (12.0f + QP_BD_OFFSET) ) / 6.0f );// /* 我对inv_qscale初步的理解如下:(1)inv_qscale的计算方式为:frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8( frame->f_qp_offset[mb_xy] );或 frame->i_inv_qscale_factor[mb_xy] = 256;或 frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8(qp_adj); 使用aq_mode时会执行的操作由于默认会使用aq_mode,所以默认是将qp_adj转换成为inv_qscale,即aq转换过来的qscale,所以可以理解为是一个aq的系数(2)aq模式下,会对每一个mb进行qp的调整,所以将aq的系数也作为一个衡量重要程度的一个因子*/float propagate_intra = intra_cost * inv_qscales[i];// propagate_in是前面的帧中计算的传播cost赋值到当前mb的// 加上当前帧本身的cost,得到总体的costfloat propagate_amount = propagate_in[i] + propagate_intra*fps;// inter_cost相对于intra_cost而言,如果越小,说明当前mb的传播cost越多,即当前mb蕴含的信息越多,越重要float propagate_num = intra_cost - inter_cost;float propagate_denom = intra_cost;// 计算出来的propagate cost后续会被分到参考的mb中,用以表示该参考mb的重要性dst[i] = X264_MIN((int)(propagate_amount * propagate_num / propagate_denom + 0.5f), 32767);}
}
将计算出来的传播cost分发给参考mb
static void mbtree_propagate_list( x264_t *h, uint16_t *ref_costs, int16_t (*mvs)[2],int16_t *propagate_amount, uint16_t *lowres_costs,int bipred_weight, int mb_y, int len, int list )
{// ...int x = mvs[i][0];int y = mvs[i][1];unsigned mbx = (unsigned)((x>>5)+i);unsigned mby = (unsigned)((y>>5)+mb_y);unsigned idx0 = mbx + mby * stride;unsigned idx2 = idx0 + stride;x &= 31;y &= 31;/* + ---- + ---- +| 0 | 1 |+ ---- + ---- +| 2 | 3 |+ ---- + ---- +运动搜索的结果不一定让mv刚好处于某一个Mb上,可能处于多个Mb宏块上,则将当前的遗传代价根据所占块的面积来分配给各个宏块分成了4个小块,每个mb都是16*16*/int idx0weight = (32-y)*(32-x); // 0号块的面积int idx1weight = (32-y)*x; // 1号块的面积int idx2weight = y*(32-x); // 2号块的面积int idx3weight = y*x; // 3号块的面积idx0weight = (idx0weight * listamount + 512) >> 10; // 0号块的权重idx1weight = (idx1weight * listamount + 512) >> 10; // 1号块的权重idx2weight = (idx2weight * listamount + 512) >> 10; // 2号块的权重idx3weight = (idx3weight * listamount + 512) >> 10; // 3号块的权重// 更新当前mb传播出去的代价if( mbx < width-1 && mby < height-1 ){MC_CLIP_ADD( ref_costs[idx0+0], idx0weight );MC_CLIP_ADD( ref_costs[idx0+1], idx1weight );MC_CLIP_ADD( ref_costs[idx2+0], idx2weight );MC_CLIP_ADD( ref_costs[idx2+1], idx3weight );}// ...}
}
根据传播cost调整qp,实现的函数macroblock_tree_finish如下
static void macroblock_tree_finish( x264_t *h, x264_frame_t *frame, float average_duration, int ref0_distance )
{int fps_factor = round( CLIP_DURATION(average_duration) / CLIP_DURATION(frame->f_duration) * 256 / MBTREE_PRECISION );float weightdelta = 0.0;if( ref0_distance && frame->f_weighted_cost_delta[ref0_distance-1] > 0 )weightdelta = (1.0 - frame->f_weighted_cost_delta[ref0_distance-1]);/* Allow the strength to be adjusted via qcompress, since the two* concepts are very similar. */// 允许通过qcompress来调整strengthfloat strength = 5.0f * (1.0f - h->param.rc.f_qcompress);for( int mb_index = 0; mb_index < h->mb.i_mb_count; mb_index++ ){int intra_cost = (frame->i_intra_cost[mb_index] * frame->i_inv_qscale_factor[mb_index] + 128) >> 8;if( intra_cost ){int propagate_cost = (frame->i_propagate_cost[mb_index] * fps_factor + 128) >> 8;// intra_cost : 当前mb的代价,propagate_cost : 前面的mb计算出来的传播cost赋予到当前的mb,两者相加表示了当前mb所蕴含的信息量float log2_ratio = x264_log2(intra_cost + propagate_cost) - x264_log2(intra_cost) + weightdelta;// 我对这里的理解是,如果log2_ratio越大,右边整体的值就越小,此时qp_offset就越小// log2_ratio表示的是传入进来的cost和自身cost的比值// 如果传入进来propagate_cost比自身的intra_cost要大,则log2_ratio较大// 此时,下面等式右边的值就会偏小,即qp_offset偏小,当前mb的qp会比较小(当前mb很重要)// 反之,qp_offset会偏大,当前mb的qp会较大(当前mb不那么重要)frame->f_qp_offset[mb_index] = frame->f_qp_offset_aq[mb_index] - strength * log2_ratio;}}
}
1.3 mbtree的测试
从上面的分析来看,对于mbtree这一项优化工具而言,利用了lookahead中的一系列参考帧,计算每个mb在这些参考关系中的重要程度,如果一个mb很重要,则进行较高质量编码(低qp),否则进行较低质量编码(高qp)。
重要程度的衡量内容包括:
(1)当前mb的intra_cost、inter_cost(由自己计算)
(2)当前mb的propagate_cost(由其他mb计算并且赋值到当前mb)
mbtree这一工具所提出的论文为:《Garrett-Glaser J. A novel macroblock-tree algorithm for high-performance optimization of dependent video coding in H.264/AVC[J]. Tech. Rep., 2009.》,新的宏块树方法在很低的运算消耗下比已有的宏块树算法在PSNR性能上提高1.2db,在SSIM性能上提高2.3db。我自己用自己的测试序列进行测试100帧,分辨率为1290x1200,是否启用mbtree可以用x264的命令行参数–no-mbtree指定:
(1)不使用mbtree
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:8 Avg QP:23.88 size:186467
x264 [info]: frame P:39 Avg QP:26.54 size:108471
x264 [info]: frame B:53 Avg QP:27.15 size: 63502
x264 [info]: consecutive B-frames: 19.0% 22.0% 27.0% 32.0%
x264 [info]: mb I I16..4: 5.4% 79.7% 14.9%
x264 [info]: mb P I16..4: 6.3% 32.9% 11.4% P16..4: 23.3% 14.6% 7.3% 0.0% 0.0% skip: 4.2%
x264 [info]: mb B I16..4: 2.0% 7.0% 2.6% B16..8: 40.0% 17.7% 4.6% direct: 7.4% skip:18.7% L0:41.7% L1:45.6% BI:12.8%
x264 [info]: 8x8 transform intra:67.6% inter:65.4%
x264 [info]: coded y,uvDC,uvAC intra: 72.1% 32.0% 5.5% inter: 41.5% 18.1% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16% 6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23% 5% 5% 4% 6% 4% 7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10% 4% 5% 4% 6% 4% 4%
x264 [info]: i8c dc,h,v,p: 64% 19% 16% 1%
x264 [info]: Weighted P-Frames: Y:23.1% UV:12.8%
x264 [info]: ref P L0: 66.4% 19.8% 9.3% 3.7% 0.8%
x264 [info]: ref B L0: 94.4% 4.4% 1.1%
x264 [info]: ref B L1: 99.1% 0.9%
x264 [info]: kb/s:18175.45encoded 100 frames, 8.24 fps, 18175.44 kb/s
(2)使用mbtree
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7 Avg QP:23.59 size:195961
x264 [info]: frame P:39 Avg QP:26.23 size:110076
x264 [info]: frame B:54 Avg QP:28.41 size: 50786
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I I16..4: 4.4% 79.4% 16.2%
x264 [info]: mb P I16..4: 5.3% 36.6% 12.0% P16..4: 22.8% 13.9% 6.6% 0.0% 0.0% skip: 2.8%
x264 [info]: mb B I16..4: 2.1% 6.0% 2.4% B16..8: 41.2% 16.4% 3.9% direct: 6.0% skip:22.1% L0:42.4% L1:47.2% BI:10.5%
x264 [info]: 8x8 transform intra:68.4% inter:66.7%
x264 [info]: coded y,uvDC,uvAC intra: 73.8% 32.0% 5.2% inter: 37.2% 15.2% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16% 6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23% 5% 5% 4% 6% 4% 7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10% 4% 5% 5% 6% 4% 4%
x264 [info]: i8c dc,h,v,p: 65% 19% 15% 1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 65.4% 19.2% 10.6% 3.7% 1.2%
x264 [info]: ref B L0: 94.3% 4.5% 1.2%
x264 [info]: ref B L1: 98.8% 1.2%
x264 [info]: kb/s:16814.29encoded 100 frames, 8.43 fps, 16814.29 kb/s
从测试结果来看,使用mbtree能够显著降低bitrate(7.5%左右),且带来的编码速度降低不十分明显(2.3%左右),另外速度的测试还存在误差,所以这里差异比较小。使用了mbtree之后,每个帧编码的比特数显著降低,这说明使用了mbtree之后编码的残差变小了,佐证了该算法的有效性
2.自适应量化参数(Adaptive Quantization,AQ)
x264当中,还有一个重要mb级别的码控工具,自适应量化参数(Adaptive Quantization,AQ),它的定义如下
#define X264_AQ_NONE 0 // 不使用AQ
#define X264_AQ_VARIANCE 1 // 使用方差AQ(默认配置)
#define X264_AQ_AUTOVARIANCE 2 // 使用自动方差AQ
#define X264_AQ_AUTOVARIANCE_BIASED 3 // 使用带有偏置项的自动方差AQ
AQ通过考虑一帧之内mb的复杂程度来调整编码的量化参数,从而获得更好的编码质量,不同的模式对应着不同的自适应方式:
- X264_AQ_NONE:不使用AQ,不进行qp的调整
- X264_AQ_VARIANCE:使用方差AQ,仅考虑自身mb的复杂度,不考虑当前帧当中其他mb的复杂度
- X264_AQ_AUTOVARIACNE:使用自动方差AQ,不仅考虑自身mb的复杂度,还考虑当前帧中其他mb的复杂度
- X264_AQ_AUTOVARIACNE_BIASED:使用带有偏置项的自动方差AQ,考虑自身和其他mb的复杂度,计算qp_offset还需要加上一个偏置项
对于AQ的理解,假设每个宏块只有一个频率系数。这个系数可以大也可以小如果一个宏块的系数是1.4,它被量化为1。误差为28.5%,如果一个宏块的系数是9.4,它被量化为9。误差为4.3%。那么,当使用线性量化器时,较大系数的编码比较小系数的编码更精确。从视觉上看,这显然是错误的;仅仅因为某个细节在X块中的对比度比Y块高10倍,并不意味着Y块应该被完全删除。在这种情况下,使用AQ是比较合适的,根据宏块的系数对量化参数进行自适应调整,能够对具有不同系数的宏块更好地编码,实现更精准的mb级别的码控
另外,AQ是mbtree的必要前提,即如果要开启mbtree,则必须要开启AQ,如果没有开启则会强制开启。与AQ有关的另一个紧密相关的变量是aq_strength,它控制了AQ的强度,如果编码的场景复杂,应该使用较小的aq_strength,反之应该用较大的aq_strength。aq_strength默认配置为1,1表示认为当前的场景比较简单,调控的力度比较大
/* MB-tree requires AQ to be on, even if the strength is zero. */// MB-tree要求AQ打开,即便aq_strength为0if( !h->param.rc.i_aq_mode && h->param.rc.b_mb_tree ){h->param.rc.i_aq_mode = 1;h->param.rc.f_aq_strength = 0;}
自适应量化参数AQ的主要实现流程为:
- 自适应量化一帧(x264_adaptive_quant_frame)
- 量化参数的调整(x264_ratecontrol_mb_qp)
- 宏块分析(x264_macroblock_analyse)
2.1 自适应量化一帧(x264_adaptive_quant_frame)
该函数是AQ工具实现的具体函数,通过区分不同的模式,为mb计算qp_adj,即qp的偏移量。主要工作流程为:
- 如果不使用自适应量化(AQ模式),aq_mode为X264_AQ_NONE或者aq_strength为0,则考虑为了使用Mbtree进行qp_offset和inv_qscale_factor的初始化
- 如果使用AQ,则需要计算qp_adj, strength等信息
(1)如果使用AUTOVARIANCE或者是AUTOVARIANCE_BIASED,则要考虑调整qp和strength,具体地做法是计算每一个mb应该调整的qp量,随后取均值,然后计算avg_adj和strength。如果使用VARIANCE,则仅考虑strength
(2)根据不同的AQ模式对应的计算公式,来计算qp_adj
void x264_adaptive_quant_frame( x264_t *h, x264_frame_t *frame, float *quant_offsets )
{/* Initialize frame stats */for( int i = 0; i < 3; i++ ){frame->i_pixel_sum[i] = 0;frame->i_pixel_ssd[i] = 0;}/* Degenerate cases */// ----- 1.不使用自适应量化 ----- //if( h->param.rc.i_aq_mode == X264_AQ_NONE || h->param.rc.f_aq_strength == 0 ){/* Need to init it anyways for MB tree */// 无论如何都要初始化,为了Mb tree的使用if( h->param.rc.i_aq_mode && h->param.rc.f_aq_strength == 0 ){if( quant_offsets ){for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )frame->f_qp_offset[mb_xy] = frame->f_qp_offset_aq[mb_xy] = quant_offsets[mb_xy];if( h->frames.b_have_lowres )for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8( frame->f_qp_offset[mb_xy] );}else{memset( frame->f_qp_offset, 0, h->mb.i_mb_count * sizeof(float) );memset( frame->f_qp_offset_aq, 0, h->mb.i_mb_count * sizeof(float) );if( h->frames.b_have_lowres )for( int mb_xy = 0; mb_xy < h->mb.i_mb_count; mb_xy++ )frame->i_inv_qscale_factor[mb_xy] = 256;}}/* Need variance data for weighted prediction */if( h->param.analyse.i_weighted_pred ){for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ )ac_energy_mb( h, mb_x, mb_y, frame );}elsereturn;}/* Actual adaptive quantization */// ----- 2.实际的自适应量化 -----else{/* constants chosen to result in approximately the same overall bitrate as without AQ.* FIXME: while they're written in 5 significant digits, they're only tuned to 2. */float strength;float avg_adj = 0.f;float bias_strength = 0.f;// 如果是AUTOVARIANCE或者AUTOVARIANCE_BIASED,则需要考虑调整qp和strengthif( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE || h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE_BIASED ){float bit_depth_correction = 1.f / (1 << (2*(BIT_DEPTH-8)));float avg_adj_pow2 = 0.f;// 计算每一个mb应该调整的qp量,随后取均值for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ ){// 在AQ模式计算中,将能量定义为方差。能量(方差)越大,qp调整值越大// E[X^2] - E[X]^2 = ssd - (sum * sum >> shift)uint32_t energy = ac_energy_mb( h, mb_x, mb_y, frame );float qp_adj = powf( energy * bit_depth_correction + 1, 0.125f );frame->f_qp_offset[mb_x + mb_y*h->mb.i_mb_stride] = qp_adj;// 计算avg_adjavg_adj += qp_adj;avg_adj_pow2 += qp_adj * qp_adj;}avg_adj /= h->mb.i_mb_count;avg_adj_pow2 /= h->mb.i_mb_count;// 计算strength,根据所有mb的平均qp_adj来调整strength = h->param.rc.f_aq_strength * avg_adj;// 计算avg_adj,根据所有mb的平均avg_adj = avg_adj - 0.5f * (avg_adj_pow2 - 14.f) / avg_adj;// 计算bias_strengthbias_strength = h->param.rc.f_aq_strength;}else// 如果是VARIANCE,则仅调整strengthstrength = h->param.rc.f_aq_strength * 1.0397f;for( int mb_y = 0; mb_y < h->mb.i_mb_height; mb_y++ )for( int mb_x = 0; mb_x < h->mb.i_mb_width; mb_x++ ){float qp_adj;int mb_xy = mb_x + mb_y*h->mb.i_mb_stride;if( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE_BIASED ) // AUTOVARIANCE_BIASED的qp调整{qp_adj = frame->f_qp_offset[mb_xy];qp_adj = strength * (qp_adj - avg_adj) + bias_strength * (1.f - 14.f / (qp_adj * qp_adj)); // 增加的偏置项强度}else if( h->param.rc.i_aq_mode == X264_AQ_AUTOVARIANCE ) // AUTOVARIANCE的qp调整{qp_adj = frame->f_qp_offset[mb_xy];qp_adj = strength * (qp_adj - avg_adj);}else // VARIANCE的qp调整{uint32_t energy = ac_energy_mb( h, mb_x, mb_y, frame );qp_adj = strength * (x264_log2( X264_MAX(energy, 1) ) - (14.427f + 2*(BIT_DEPTH-8)));}if( quant_offsets )qp_adj += quant_offsets[mb_xy];// 最后赋值的地方frame->f_qp_offset[mb_xy] =frame->f_qp_offset_aq[mb_xy] = qp_adj;if( h->frames.b_have_lowres )frame->i_inv_qscale_factor[mb_xy] = x264_exp2fix8(qp_adj);}}// ...
}
2.2 量化参数的调整(x264_ratecontrol_mb_qp)
计算之后的qp,实际调整的地方位于x264_ratecontrol_mb_qp之中,该函数给出了一个待编码mb的初始qp
int x264_ratecontrol_mb_qp( x264_t *h )
{x264_emms();float qp = h->rc->qpm;if( h->param.rc.i_aq_mode ){/* MB-tree currently doesn't adjust quantizers in unreferenced frames. */// 如果当前帧会作为一个参考帧,则使用qp_offset,否则使用qp_offset_aqfloat qp_offset = h->fdec->b_kept_as_ref ? h->fenc->f_qp_offset[h->mb.i_mb_xy] : h->fenc->f_qp_offset_aq[h->mb.i_mb_xy];/* Scale AQ's effect towards zero in emergency mode. */if( qp > QP_MAX_SPEC )qp_offset *= (QP_MAX - qp) / (QP_MAX - QP_MAX_SPEC);qp += qp_offset;}return x264_clip3( qp + 0.5f, h->param.rc.i_qp_min, h->param.rc.i_qp_max );
}
2.3 宏块分析(x264_macroblock_analyse)
在进行宏块分析时,如果启用了aq并且像素精度的等级小于10,则会进行qp的调整。如果当前mb的qp和前面mb的qp的差值为1,则将当前mb的qp调整为和前面的qp一样,否则使用当前qp
void x264_macroblock_analyse( x264_t *h )
{x264_mb_analysis_t analysis;int i_cost = COST_MAX;h->mb.i_qp = x264_ratecontrol_mb_qp( h );/* If the QP of this MB is within 1 of the previous MB, code the same QP as the previous MB,* to lower the bit cost of the qp_delta. Don't do this if QPRD is enabled. */// 如果使用aq模式,并且像素精度的等级小于10// 如果当前mb的qp和前面mb的qp的差值为1,则将当前mb的qp调整为和前面的qp一样,否则使用当前qpif( h->param.rc.i_aq_mode && h->param.analyse.i_subpel_refine < 10 )h->mb.i_qp = abs(h->mb.i_qp - h->mb.i_last_qp) == 1 ? h->mb.i_last_qp : h->mb.i_qp;// ...
2.4 AQ的测试
使用自测的序列做简单的测试,分辨率为1920x1200,在这里还关闭了mbtree,因为如果启用了mbtree则会强制开启aq。对比测试配置的参数为开启–no-mbtree 和关闭AQ --no-mbtree --aq-mode 0
(1)使用默认配置的AQ(X264_AQ_VARIANCE)
yuv [info]: 1920x1200p 0:0 @ 25/1 fps (cfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7 Avg QP:23.59 size:195961
x264 [info]: frame P:39 Avg QP:26.23 size:110076
x264 [info]: frame B:54 Avg QP:28.41 size: 50786
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I I16..4: 4.4% 79.4% 16.2%
x264 [info]: mb P I16..4: 5.3% 36.6% 12.0% P16..4: 22.8% 13.9% 6.6% 0.0% 0.0% skip: 2.8%
x264 [info]: mb B I16..4: 2.1% 6.0% 2.4% B16..8: 41.2% 16.4% 3.9% direct: 6.0% skip:22.1% L0:42.4% L1:47.2% BI:10.5%
x264 [info]: 8x8 transform intra:68.4% inter:66.7%
x264 [info]: coded y,uvDC,uvAC intra: 73.8% 32.0% 5.2% inter: 37.2% 15.2% 0.1%
x264 [info]: i16 v,h,dc,p: 64% 16% 6% 14%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 19% 23% 5% 5% 4% 6% 4% 7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 10% 4% 5% 5% 6% 4% 4%
x264 [info]: i8c dc,h,v,p: 65% 19% 15% 1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 65.4% 19.2% 10.6% 3.7% 1.2%
x264 [info]: ref B L0: 94.3% 4.5% 1.2%
x264 [info]: ref B L1: 98.8% 1.2%
x264 [info]: kb/s:16814.29encoded 100 frames, 8.43 fps, 16814.29 kb/s
(2)关闭AQ
yuv [info]: 1920x1200p 0:0 @ 25/1 fps (cfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile High, level 5.0, 4:2:0, 8-bit
x264 [info]: frame I:7 Avg QP:25.00 size:195532
x264 [info]: frame P:39 Avg QP:27.79 size:114287
x264 [info]: frame B:54 Avg QP:29.52 size: 54980
x264 [info]: consecutive B-frames: 19.0% 18.0% 27.0% 36.0%
x264 [info]: mb I I16..4: 20.0% 66.9% 13.0%
x264 [info]: mb P I16..4: 13.7% 28.8% 10.8% P16..4: 16.6% 12.6% 6.5% 0.0% 0.0% skip:11.0%
x264 [info]: mb B I16..4: 3.2% 4.9% 2.6% B16..8: 35.2% 15.6% 4.3% direct: 4.5% skip:29.7% L0:40.8% L1:45.1% BI:14.2%
x264 [info]: 8x8 transform intra:55.4% inter:61.7%
x264 [info]: coded y,uvDC,uvAC intra: 61.5% 33.5% 7.1% inter: 33.4% 10.1% 0.1%
x264 [info]: i16 v,h,dc,p: 57% 24% 10% 8%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 29% 18% 20% 5% 5% 4% 6% 5% 7%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 45% 17% 11% 4% 5% 5% 6% 4% 4%
x264 [info]: i8c dc,h,v,p: 68% 17% 14% 1%
x264 [info]: Weighted P-Frames: Y:33.3% UV:17.9%
x264 [info]: ref P L0: 63.2% 22.2% 10.6% 3.1% 0.9%
x264 [info]: ref B L0: 94.7% 4.2% 1.1%
x264 [info]: ref B L1: 98.9% 1.1%
x264 [info]: kb/s:17589.66encoded 100 frames, 9.00 fps, 17589.66 kb/s
从上面的对比来看,关闭AQ之后,码率增加了4.5%,并且编码的帧率也增加了6.3%。从数据来看,关闭AQ之后,各个帧的类型的AvgQP都增加了,其中B帧的编码比特增加较为明显。这里猜测是因为B帧的帧间编码比较复杂,利用帧内的信息进行了AQ的调整,使得编码qp更加合理,这样编码的比特数更小
3.小结
本节讨论了x264当中宏块级别的两个工具Mbtree和AQ,其中Mbtree基于帧间,利用前后帧当中mb互相参考的重要程度来调控mb级别的qp,AQ基于帧内,利用当前帧的mb(或当前帧所有mb)的复杂度信息来调控mb级别的qp。两者的差异是应用在帧内和帧间两个模块之中,关联是Mbtree依赖于AQ,即要使用Mbtree时必须开启AQ,如没有开启AQ,则会强制打开
值得思考的问题是:
(1)这里只考虑了mb级别的码控,没有考虑帧级别的,帧级别的码控是如何处理的?
(2)Mbtree使用了lookahead,这个lookahead实现的细节是怎样的?
CSDN : https://blog.csdn.net/weixin_42877471
Github : https://github.com/DoFulangChen
这篇关于【x264】码率控制模块的简单分析—宏块级码控工具Mbtree和AQ的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!