【计算机视觉】Gaussian Splatting源码解读补充（三）

本文主要是介绍【计算机视觉】Gaussian Splatting源码解读补充（三），希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

第一部分
第二部分

本文是对学习笔记之——3D Gaussian Splatting源码解读的补充，并订正了一些错误。

五、反向传播

原文第6节：

During the backward pass, we must therefore recover the full sequence of blended points per-pixel in the forward pass. One solution would be to store arbitrarily long lists of blended points per-pixel in global memory [Kopanas et al. 2021]. To avoid the implied dynamic memory management overhead, we instead choose to traverse the per-tile lists again; we can reuse the sorted array of Gaussians and tile ranges from the forward pass. To facilitate gradient computation, we now traverse them back-to-front.

The traversal starts from the last point that affected any pixel in the tile, and loading of points into shared memory again happens collaboratively. Additionally, each pixel will only start (expensive) overlap testing and processing of points if their depth is lower than or equal to the depth of the last point that contributed to its color during the forward pass. Computation of the gradients described in Sec. 4 requires the accumulated opacity values at each step during the original blending process. Rather than trasversing an explicit list of progressively shrinking opacities in the backward pass, we can recover these intermediate opacities by storing only the total accumulated opacity at the end of the forward pass. Specifically, each point stores the final accumulated opacity 𝛼 in the forward process; we divide this by each point’s 𝛼 in our back-to-front traversal to obtain the required coefficients for gradient computation.

反向传播是根据loss对每个像素颜色的导求出loss对Gaussian的2维中心坐标、3维中心坐标、椭圆二次型矩阵、不透明度、相机看到的Gaussian颜色、球谐系数、三维协方差矩阵、缩放和旋转的导数。

代码中出现的类似于dL_dx的变量的意思是loss对参数x的导数，即 $\frac{\partial L}{\partial x}$ 。

1. `rasterizer_impl.cu`: `CudaRasterizer::Rasterizer::backward`

// Produce necessary gradients for optimization, corresponding
// to forward render pass
void CudaRasterizer::Rasterizer::backward(const int P, // Gaussian个数int D, // 球谐度数int M, // 三通道球谐系数个数int R,// R对应CudaRasterizer::Rasterizer::forward函数中的num_rendered// 即排序数组的个数（等于每个Gaussian覆盖的tile的个数之和）const float* background, // 背景颜色const int width, int height,const float* means3D,const float* shs,const float* colors_precomp,const float* scales,const float scale_modifier,const float* rotations,const float* cov3D_precomp,const float* viewmatrix,const float* projmatrix,const float* campos,const float tan_fovx, float tan_fovy,const int* radii,char* geom_buffer,char* binning_buffer,char* img_buffer,// 上面的参数不解释const float* dL_dpix, // loss对每个像素颜色的导数float* dL_dmean2D, // loss对Gaussian二维中心坐标的导数float* dL_dconic, // loss对椭圆二次型矩阵的导数float* dL_dopacity, // loss对不透明度的导数float* dL_dcolor, // loss对Gaussian颜色的导数（颜色是从相机中心看向Gaussian的颜色）float* dL_dmean3D, // loss对Gaussian三维中心坐标的导数float* dL_dcov3D, // loss对Gaussian三维协方差矩阵的导数float* dL_dsh, // loss对Gaussian的球谐系数的导数float* dL_dscale, // loss对Gaussian的缩放参数的导数float* dL_drot, // loss对Gaussian旋转四元数的导数bool debug)
{GeometryState geomState = GeometryState::fromChunk(geom_buffer, P);BinningState binningState = BinningState::fromChunk(binning_buffer, R);ImageState imgState = ImageState::fromChunk(img_buffer, width * height);// 上面这些缓冲区都是在前向传播的时候存下来的，现在拿出来用if (radii == nullptr){radii = geomState.internal_radii;}const float focal_y = height / (2.0f * tan_fovy);const float focal_x = width / (2.0f * tan_fovx);const dim3 tile_grid((width + BLOCK_X - 1) / BLOCK_X, (height + BLOCK_Y - 1) / BLOCK_Y, 1);const dim3 block(BLOCK_X, BLOCK_Y, 1);// Compute loss gradients w.r.t. 2D mean position, conic matrix,// opacity and RGB of Gaussians from per-pixel loss gradients.// If we were given precomputed colors and not SHs, use them.const float* color_ptr = (colors_precomp != nullptr) ? colors_precomp : geomState.rgb;CHECK_CUDA(BACKWARD::render(tile_grid,block,imgState.ranges,binningState.point_list,width, height,background,geomState.means2D,geomState.conic_opacity,color_ptr,imgState.accum_alpha,imgState.n_contrib,dL_dpix,(float3*)dL_dmean2D,(float4*)dL_dconic,dL_dopacity,dL_dcolor), debug)// Take care of the rest of preprocessing. Was the precomputed covariance// given to us or a scales/rot pair? If precomputed, pass that. If not,// use the one we computed ourselves.const float* cov3D_ptr = (cov3D_precomp != nullptr) ? cov3D_precomp : geomState.cov3D;// 因为是反向传播，所以preprocess放在后面了(❁´◡`❁)CHECK_CUDA(BACKWARD::preprocess(P, D, M,(float3*)means3D,radii,shs,geomState.clamped,(glm::vec3*)scales,(glm::vec4*)rotations,scale_modifier,cov3D_ptr,viewmatrix,projmatrix,focal_x, focal_y,tan_fovx, tan_fovy,(glm::vec3*)campos,(float3*)dL_dmean2D,dL_dconic,(glm::vec3*)dL_dmean3D,dL_dcolor,dL_dcov3D,dL_dsh,(glm::vec3*)dL_dscale,(glm::vec4*)dL_drot), debug)
}

2. `backward.cu`: `renderCUDA`

这个函数中每个线程负责一个像素，计算loss对Gaussian的二维中心坐标、椭圆二次型矩阵、不透明度和Gaussian颜色的导。

这里求导利用的公式是颜色的递推公式，与前向传播有所不同。设 $\boldsymbol{b}_i$ 是第 $i$ 个即其后的Gaussians渲染出来的颜色（三个通道）， $\boldsymbol{g}_i$ 是第 $i$ 个Gaussian的颜色，则有递推公式： $\boldsymbol{b}_i=\alpha_i \boldsymbol{g}_i+(1-\alpha_i)\boldsymbol{b}_{i+1}$ 令 $T_i=(1-\alpha_1)(1-\alpha_2)\cdots(1-\alpha_{i-1})$ 为第 $i$ 个Gaussian对像素点的透光率， $\boldsymbol{C}$ 是像素点的颜色，则有 $\begin{aligned} \cfrac{\partial L}{\partial\boldsymbol{b}_i}&=\cfrac{\partial L}{\partial\boldsymbol{C}}\cdot\cfrac{\partial \boldsymbol{b}_1}{\partial\boldsymbol{b}_2}\cdot\cfrac{\partial \boldsymbol{b}_2}{\partial\boldsymbol{b}_3}\cdots\cfrac{\partial \boldsymbol{b}_{i-1}}{\partial\boldsymbol{b}_i}\\ &=\cfrac{\partial L}{\partial\boldsymbol{C}}\cdot(1-\alpha_1)I\cdot(1-\alpha_2)I\cdots(1-\alpha_{i-1})I\\ &=T_i\cfrac{\partial L}{\partial\boldsymbol{C}}\\ \end{aligned}$ 故 $\cfrac{\partial L}{\partial \alpha_i}=T_i\cfrac{\partial L}{\partial\boldsymbol{C}}(\boldsymbol{g}_i-\boldsymbol{b}_{i+1})\\ \cfrac{\partial L}{\partial \boldsymbol{g}_i}=T_i\alpha_i\cfrac{\partial L}{\partial\boldsymbol{C}}$ 令 $\alpha=\min(0.99,\sigma G)$ ，其中 $\sigma$ 是“opacity”， $G$ 是正态分布给出的“exponential falloff”，则 $\frac{\partial L}{\partial \sigma}=G\frac{\partial L}{\partial \alpha}\\\ \\ \frac{\partial L}{\partial G}=\sigma\frac{\partial L}{\partial \alpha}$ （注：代码中似乎不考虑 $\alpha>0.99$ 的情况；也许是因为想让 $\sigma$ 过大时也有梯度从而变小？）
而 $G=\exp\{-\frac{1}{2}\boldsymbol{d}^T A\boldsymbol{d}\}$ ， $\boldsymbol{d}=\boldsymbol{p}-\boldsymbol\mu$ ，其中 $\boldsymbol p$ 为像素坐标， $\boldsymbol\mu$ 为Gausian中心的2D坐标， $\boldsymbol d$ 为它们之间的位移向量， $A$ 为椭圆二次型的矩阵，因此 $\frac{\partial L}{\partial \boldsymbol d}=-\cfrac{1}{2}G\frac{\partial L}{\partial G}[\boldsymbol d^T(A^T+A)]\\ \cfrac{\partial L}{\partial \boldsymbol \mu}=-\cfrac{\partial L}{\partial \boldsymbol d}\\ \frac{\partial L}{\partial \boldsymbol A_{00}}=-\cfrac{1}{2}G\frac{\partial L}{\partial G}d_x^2\\ \frac{\partial L}{\partial \boldsymbol A_{11}}=-\cfrac{1}{2}G\frac{\partial L}{\partial G}d_y^2\\ \frac{\partial L}{\partial \boldsymbol A_{01}}=-\cfrac{1}{2}G\frac{\partial L}{\partial G}d_x d_y$ 注意计算 $\frac{\partial L}{\partial \boldsymbol \mu}$ 时要乘以像素坐标对像平面坐标的导。

公式中出现的变量和代码中变量的关系如下表：

公式变量	代码变量
$\boldsymbol{b}_{i+1}$	`accum_rec`
$\boldsymbol{g}_i$	`c`
$T_i$	`T`
$\cfrac{\partial L}{\partial\boldsymbol{C}}$	`dL_dpixel`
$\sigma_i$	`con_o.w`
$\boldsymbol{d}$	`d`
$\boldsymbol\mu$	`xy`
$\boldsymbol p$	`pixf`
$A_{00},A_{01},A_{11}$	`con_o.x`,`con_o.y`,`con_o.z`

// Backward version of the rendering procedure.
template <uint32_t C>
__global__ void __launch_bounds__(BLOCK_X * BLOCK_Y)
renderCUDA(const uint2* __restrict__ ranges,const uint32_t* __restrict__ point_list,int W, int H,const float* __restrict__ bg_color,const float2* __restrict__ points_xy_image,const float4* __restrict__ conic_opacity,const float* __restrict__ colors,const float* __restrict__ final_Ts,const uint32_t* __restrict__ n_contrib,const float* __restrict__ dL_dpixels,float3* __restrict__ dL_dmean2D,float4* __restrict__ dL_dconic2D,float* __restrict__ dL_dopacity,float* __restrict__ dL_dcolors)
{// We rasterize again. Compute necessary block info.auto block = cg::this_thread_block();const uint32_t horizontal_blocks = (W + BLOCK_X - 1) / BLOCK_X;const uint2 pix_min = { block.group_index().x * BLOCK_X, block.group_index().y * BLOCK_Y };const uint2 pix_max = { min(pix_min.x + BLOCK_X, W), min(pix_min.y + BLOCK_Y , H) };const uint2 pix = { pix_min.x + block.thread_index().x, pix_min.y + block.thread_index().y };const uint32_t pix_id = W * pix.y + pix.x;const float2 pixf = { (float)pix.x, (float)pix.y };// 上面这些就是当前线程负责的像素点的信息const bool inside = pix.x < W&& pix.y < H;const uint2 range = ranges[block.group_index().y * horizontal_blocks + block.group_index().x];const int rounds = ((range.y - range.x + BLOCK_SIZE - 1) / BLOCK_SIZE);// 轮数，和前向传播一样bool done = !inside;int toDo = range.y - range.x;__shared__ int collected_id[BLOCK_SIZE];__shared__ float2 collected_xy[BLOCK_SIZE];__shared__ float4 collected_conic_opacity[BLOCK_SIZE];__shared__ float collected_colors[C * BLOCK_SIZE];// In the forward, we stored the final value for T, the// product of all (1 - alpha) factors. const float T_final = inside ? final_Ts[pix_id] : 0;float T = T_final; // 最后的透光度// We start from the back. The ID of the last contributing// Gaussian is known from each pixel from the forward.uint32_t contributor = toDo;const int last_contributor = inside ? n_contrib[pix_id] : 0;float accum_rec[C] = { 0 };float dL_dpixel[C]; // loss对该像素RGB通道的导数if (inside)for (int i = 0; i < C; i++)dL_dpixel[i] = dL_dpixels[i * H * W + pix_id];float last_alpha = 0;float last_color[C] = { 0 };// Gradient of pixel coordinate w.r.t. normalized // screen-space viewport corrdinates (-1 to 1)// 像素坐标对像平面坐标的导（参见ndc2Pix函数）const float ddelx_dx = 0.5 * W;const float ddely_dy = 0.5 * H;// Traverse all Gaussiansfor (int i = 0; i < rounds; i++, toDo -= BLOCK_SIZE){// Load auxiliary data into shared memory, start in the BACK// and load them in revers order.block.sync();const int progress = i * BLOCK_SIZE + block.thread_rank();if (range.x + progress < range.y){const int coll_id = point_list[range.y - progress - 1];collected_id[block.thread_rank()] = coll_id;collected_xy[block.thread_rank()] = points_xy_image[coll_id];collected_conic_opacity[block.thread_rank()] = conic_opacity[coll_id];for (int i = 0; i < C; i++)collected_colors[i * BLOCK_SIZE + block.thread_rank()] = colors[coll_id * C + i];}block.sync();// Iterate over Gaussiansfor (int j = 0; !done && j < min(BLOCK_SIZE, toDo); j++){// Keep track of current Gaussian ID. Skip, if this one// is behind the last contributor for this pixel.contributor--;if (contributor >= last_contributor)continue;// 之所以不直接把contributor设置成last_contributor，// 是因为排在我的last_contributor后面的Gaussian可能// 对于其他像素来说是可见的// Compute blending values, as before.const float2 xy = collected_xy[j];const float2 d = { xy.x - pixf.x, xy.y - pixf.y };const float4 con_o = collected_conic_opacity[j];const float power = -0.5f * (con_o.x * d.x * d.x + con_o.z * d.y * d.y) - con_o.y * d.x * d.y;if (power > 0.0f)continue;const float G = exp(power);const float alpha = min(0.99f, con_o.w * G); // 先把alpha算出来if (alpha < 1.0f / 255.0f)continue;T = T / (1.f - alpha); // 一点一点把T除掉（T原本是累乘）const float dchannel_dcolor = alpha * T;// Propagate gradients to per-Gaussian colors and keep// gradients w.r.t. alpha (blending factor for a Gaussian/pixel// pair).float dL_dalpha = 0.0f;const int global_id = collected_id[j];for (int ch = 0; ch < C; ch++){const float c = collected_colors[ch * BLOCK_SIZE + j];// Gaussian的color// Update last color (to be used in the next iteration)accum_rec[ch] = last_alpha * last_color[ch] + (1.f - last_alpha) * accum_rec[ch];// 更新我以及后面所有Gaussian的颜色贡献last_color[ch] = c; // 我后面的Gaussian的颜色const float dL_dchannel = dL_dpixel[ch];dL_dalpha += (c - accum_rec[ch]) * dL_dchannel;// Update the gradients w.r.t. color of the Gaussian. // Atomic, since this pixel is just one of potentially// many that were affected by this Gaussian.atomicAdd(&(dL_dcolors[global_id * C + ch]), dchannel_dcolor * dL_dchannel);}dL_dalpha *= T;// Update last alpha (to be used in the next iteration)last_alpha = alpha;// Account for fact that alpha also influences how much of// the background color is added if nothing left to blendfloat bg_dot_dpixel = 0;for (int i = 0; i < C; i++)bg_dot_dpixel += bg_color[i] * dL_dpixel[i];dL_dalpha += (-T_final / (1.f - alpha)) * bg_dot_dpixel;// Helpful reusable temporary variablesconst float dL_dG = con_o.w * dL_dalpha;const float gdx = G * d.x;const float gdy = G * d.y;const float dG_ddelx = -gdx * con_o.x - gdy * con_o.y;const float dG_ddely = -gdy * con_o.z - gdx * con_o.y;// 下面累积这些参数的梯度// Update gradients w.r.t. 2D mean position of the GaussianatomicAdd(&dL_dmean2D[global_id].x, dL_dG * dG_ddelx * ddelx_dx);atomicAdd(&dL_dmean2D[global_id].y, dL_dG * dG_ddely * ddely_dy);// Update gradients w.r.t. 2D covariance (2x2 matrix, symmetric)atomicAdd(&dL_dconic2D[global_id].x, -0.5f * gdx * d.x * dL_dG);atomicAdd(&dL_dconic2D[global_id].y, -0.5f * gdx * d.y * dL_dG);atomicAdd(&dL_dconic2D[global_id].w, -0.5f * gdy * d.y * dL_dG);// Update gradients w.r.t. opacity of the GaussianatomicAdd(&(dL_dopacity[global_id]), G * dL_dalpha);}}
}