Instruct-NeRF2NeRF：通过用户指令编辑 NeRF 三维场景

本文主要是介绍Instruct-NeRF2NeRF：通过用户指令编辑 NeRF 三维场景，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

Haque A, Tancik M, Efros A A, et al. Instruct-nerf2nerf: Editing 3d scenes with instructions[J]. arXiv preprint arXiv:2303.12789, 2023.

Instruct-NeRF2NeRF 是 ICCV 2023 Oral 论文，首次将图像编辑任务从二维提升到三维。

Instruct-NeRF2NeRF 所做的任务是根据用户指令编辑 NeRF 表示的三维场景。Instruct-NeRF2NeRF 使用预训练的 InstructPix2Pix 对 NeRF 的训练数据（即多视角视图）进行编辑，然后用编辑后的视图继续训练 NeRF，从而达到编辑三维场景的效果。为了确保编辑后的三维场景的连续性，使用 Iterative DU 的方式进行训练。

在这里插入图片描述

一. 研究思路

Instruct-NeRF2NeRF 的目的是按照人为指令对 NeRF 表示的三维场景进行编辑，因此训练模型只需要编辑指令和 NeRF 场景。正如 DreamFusion 中所说，三维场景的本质就是从多个视角观测一个场景 ¹，因此 Instruct-NeRF2NeRF 使用 InstructPix2Pix 对 NeRF 的多视角训练数据进行编辑，编辑后的图像就可以用来优化 NeRF 的三维表示。为了方便编辑，给定场景的 NeRF 表示时还保留了其训练数据（视图、机位等信息）。

二. Instruct-NeRF2NeRF 模型

Instruct-NeRF2NeRF 在 NeRF 表示的三维场景上使用 InstructPix2Pix 进行微调：

输入：NeRF 场景及其训练数据和编辑指令；
输出：编辑后的 NeRF 场景；

三. 训练方法

直接对不同视角的训练数据进行编辑会导致三维场景的不连续 (inconsistent edits across viewpoints)，因为不同视角的图像编辑之间相互独立：
在这里插入图片描述

于是， Instruct-NeRF2NeRF 的训练使用 迭代数据集更新 (Iterative Dataset Update, Iterative DU) 的方式，即交替编辑 NeRF 训练集图像和更新 NeRF 三维场景。

这也就是为什么不对所有训练图像编辑后从头训练 NeRF 的原因：NeRF 的训练数据可以保证三维场景的连续型，而 InstructPix2Pix 编辑后的多视角图像之间构成的三维场景很有可能不连续。

在这里插入图片描述

1. 编辑 NeRF 训练图像

编辑 NeRF 训练集图像时，将视角 $v$ 下的原始图像 $c_I$ 、编辑指令 $c_T$ 、噪声 $z_t$ 输入 InstructPix2Pix 模型。记 $I_{i}^{v}$ 表示第 $i$ 轮视角 $v$ 下的图像， $I_{0}^{v}=c_I$ ，则有随着迭代不断更新图像：
$I_{i+1}^{v} \leftarrow U_{\theta}(I_{i}^{v},t;I_{0}^{v},c_T)$

2. 更新 NeRF 训练集

Instruct-NeRF2NeRF 的核心就是交替编辑 NeRF 训练集图像和更新 NeRF 三维场景，称为 Iterative DU。训练前对 NeRF 训练集的多视角视图指定顺序，在每一轮训练中，先更新 $d$ 张图像，再采样 $n$ 条射线训练 NeRF：

图像更新时，随机选取部分视图进行编辑，然后将其替换成编辑后的视图；
NeRF 训练时，从新旧数据混合的训练集中采样部分视图对 NeRF 进行训练；

在这里插入图片描述

上述训练方法在训练初期可能也会出现不连续的三维场景，但随着不断迭代，会收敛到一个连续的三维场景：
在这里插入图片描述

四. 实验结果

使用 Nerfstudio 框架 ² 表示三维场景，每次编辑都需要在三维场景上重新训练。训练过程可视化如下：

在这里插入图片描述

不同方法效果对比如下：
在这里插入图片描述

五. 总结

Instruct-NeRF2NeRF 通过使用预训练的 InstructPix2Pix 对 NeRF 的训练数据进行编辑，然后以 Iterative DU 的方式使用编辑后的视图继续训练 NeRF，从而实现了三维场景的编辑，保持了场景的连贯性和真实感。³

其实 Instruct-NeRF2NeRF 在处理三维场景一致性时使用了 tricks：既然已经保留了 NeRF 的所有训练数据，为什么不对所有数据编辑后再训练 NeRF？因为 NeRF 的原始训练数据可以保证三维场景的连续型，而 InstructPix2Pix 编辑后的多视角图像之间构成的三维场景很有可能不连续。因此采用迭代更新数据集的方式来训练，使得 NeRF 逐渐收敛到一个连续三维场景。

但 Instruct-NeRF2NeRF 也有一些局限性：

Instruct-NeRF2NeRF 一次只能在一个视图上进行编辑，因此可能出现伪影；
有时 InstructPix2Pix 编辑不理想，因此 Instruct-NeRF2NeRF 的编辑也会因此出问题；
即使 InstructPix2Pix 编辑成功，Instruct-NeRF2NeRF 的编辑也可能不理想；

六. 复现

Instruct-NeRF2NeRF 基于 Nerfstudio：

平台：AutoDL
显卡：RTX 4090 24GB
镜像：PyTorch 2.0.0、Python 3.8(ubuntu20.04)、Cuda 11.8
源码：https://github.com/ayaanzhaque/instruct-nerf2nerf

实验记录：

先按照教程创建 nerfstudio 环境并安装依赖包，执行到 conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit 即可；
再克隆 Instruct-NeRF2NeRF 仓库并更新组件和包；
此时执行 ns-train -h 查看安装情况会出现 TypeEror：

需要先在 instruct-nerf2nerf 文件夹下安装 Nerfstudio ⁴ ，然后就可以成功验证：

(nerfstudio) root@autodl-container-9050458ceb-3f1684be:~/instruct-nerf2nerf/nerfstudio# ns-train -h
usage: ns-train [-h]{depth-nerfacto,dnerf,gaussian-splatting,generfacto,in2n,in2n-small,in2n-tiny,instant-ngp,instant-ngp-bounded,mipnerf,nerfacto,nerfact
o-big,nerfacto-huge,neus,neus-facto,phototourism,semantic-nerfw,tensorf,vanilla-nerf,kplanes,kplanes-dynamic,lerf,lerf-big,lerf-lite,nerfplayer-nerfac
to,nerfplayer-ngp,tetra-nerf,tetra-nerf-original,volinga}Train a radiance field with nerfstudio. For real captures, we recommend using the nerfacto model.Nerfstudio allows for customizing your training and eval configs from the CLI in a powerful way, but there are some things to understand.The most demonstrative and helpful example of the CLI structure is the difference in output between the following commands:ns-train -hns-train nerfacto -h nerfstudio-datans-train nerfacto nerfstudio-data -hIn each of these examples, the -h applies to the previous subcommand (ns-train, nerfacto, and nerfstudio-data).In the first example, we get the help menu for the ns-train script. In the second example, we get the help menu for the nerfacto model. In the third 
example, we get the help menu for the nerfstudio-data dataparser.With our scripts, your arguments will apply to the preceding subcommand in your command, and thus where you put your arguments matters! Any optional 
arguments you discover from runningns-train nerfacto -h nerfstudio-dataneed to come directly after the nerfacto subcommand, since these optional arguments only belong to the nerfacto subcommand:ns-train nerfacto {nerfacto optional args} nerfstudio-data╭─ arguments ─────────────────────────────────────────────────────────────╮ ╭─ subcommands ──────────────────────────────────────────────────────────╮
│ -h, --help        show this help message and exit                       │ │ {depth-nerfacto,dnerf,gaussian-splatting,generfacto,in2n,in2n-small,i… │
╰─────────────────────────────────────────────────────────────────────────╯ │     depth-nerfacto                                                     ││                   Nerfacto with depth supervision.                     ││     dnerf         Dynamic-NeRF model. (slow)                           ││     gaussian-splatting                                                 ││                   Gaussian Splatting model                             ││     generfacto    Generative Text to NeRF model                        ││     in2n          Instruct-NeRF2NeRF primary method: uses LPIPS, IP2P  ││                   at full precision                                    ││     in2n-small    Instruct-NeRF2NeRF small method, uses LPIPs, IP2P at ││                   half precision                                       ││     in2n-tiny     Instruct-NeRF2NeRF tiny method, does not use LPIPs,  ││                   IP2P at half precision                               ││     instant-ngp   Implementation of Instant-NGP. Recommended real-time ││                   model for unbounded scenes.                          ││     instant-ngp-bounded                                                ││                   Implementation of Instant-NGP. Recommended for       ││                   bounded real and synthetic scenes                    ││     mipnerf       High quality model for bounded scenes. (slow)        ││     nerfacto      Recommended real-time model tuned for real captures. ││                   This model will be continually updated.              ││     nerfacto-big                                                       ││     nerfacto-huge                                                      ││     neus          Implementation of NeuS. (slow)                       ││     neus-facto    Implementation of NeuS-Facto. (slow)                 ││     phototourism  Uses the Phototourism data.                          ││     semantic-nerfw                                                     ││                   Predicts semantic segmentations and filters out      ││                   transient objects.                                   ││     tensorf       tensorf                                              ││     vanilla-nerf  Original NeRF model. (slow)                          ││     kplanes       [External] K-Planes model tuned to static blender    ││                   scenes                                               ││     kplanes-dynamic                                                    ││                   [External] K-Planes model tuned to dynamic DNeRF     ││                   scenes                                               ││     lerf          [External] LERF with OpenCLIP ViT-B/16, used in      ││                   paper                                                ││     lerf-big      [External] LERF with OpenCLIP ViT-L/14               ││     lerf-lite     [External] LERF with smaller network and less LERF   ││                   samples                                              ││     nerfplayer-nerfacto                                                ││                   [External] NeRFPlayer with nerfacto backbone         ││     nerfplayer-ngp                                                     ││                   [External] NeRFPlayer with instang-ngp-bounded       ││                   backbone                                             ││     tetra-nerf    [External] Tetra-NeRF. Different sampler - faster    ││                   and better                                           ││     tetra-nerf-original                                                ││                   [External] Tetra-NeRF. Official implementation from  ││                   the paper                                            ││     volinga       [External] Real-time rendering model from Volinga.   ││                   Directly exportable to NVOL format at                ││                   https://volinga.ai/                                  │╰────────────────────────────────────────────────────────────────────────╯

Nerfstudio 安装完成后，就可以训练了。使用 bear 数据集进行训练：ns-train nerfacto --data data/bear：

训练时可以复制网址 https://viewer.nerf.studio/versions/23-05-15-1/?websocket_url=ws://localhost:7007 监控实时效果 ⁵。需要注意的是，在服务器上训练想要监视训练过程需要转发 ⁶ ⁷，监视窗口如下：
NeRF 场景训练完成后，就可以进行编辑：ns-train in2n --data data/bear --load-dir outputs/bear/nerfacto/2023-12-17_230904/nerfstudio_models --pipeline.prompt "Turn the bear into a polar bear" --pipeline.guidance-scale 7.5 --pipeline.image-guidance-scale 1.5。但 GPU 内存有限，加载全部模型会超限 ⁸ ⁹ ¹⁰：

作者也考虑到了这一点，因此提供了占用内存更小但效果更差的模型 in2n-small 和 in2n-tiny：ns-train in2n-small --data data/bear --load-dir outputs/bear/nerfacto/2023-12-17_230904/nerfstudio_models --pipeline.prompt "Turn the bear into a polar bear" --pipeline.guidance-scale 7.5 --pipeline.image-guidance-scale 1.5；

实验结果：

原始 NeRF 场景训练结果如下，3w 轮迭代大概需要 1h：
为了呈现可视化效果，在训练完成后可以使用 ns-viewer --load-config outputs/bear/nerfacto/2023-12-17_230904/config.yml 加载监视页面 ¹¹；在监视页面 LOAD PATH 选择 final-path 即可，点击 RENDER 即可复制指令：ns-render camera-path --load-config outputs/bear/nerfacto/2023-12-17_230904/config.yml --camera-path-filename data/bear/camera_paths/2023-12-17_230904.json --output-path renders/bear/2023-12-17_230904.mp4。原始场景是用完整 NeRF 训练的，参数量太大超过显存容量，无法渲染成视频，截一张图以供参考：
使用 in2n-small 模型编辑三维场景，迭代到 4k 次已经完全收敛，不必再继续训练（完整编辑会执行到 6w 步，没有必要），大概需要 2h：
继续使用 ns-viewer 指令可视化三维场景，并使用 ns-render 指令可以渲染成视频。由于显存容量问题，无法渲染成视频，截一张图以供参考：

MAV3D：从文本描述中生成三维动态场景 ↩︎
Tancik M, Weber E, Ng E, et al. Nerfstudio: A modular framework for neural radiance field development[C]//ACM SIGGRAPH 2023 Conference Proceedings. 2023: 1-12. ↩︎
一行字实现3D换脸！UC伯克利提出「Chat-NeRF」，说句话完成大片级渲染 ↩︎
Fresh install error #72 ↩︎
nerfstudio-project | nerfstudio # 2-training-your-first-model ↩︎
nerfstudio | Using the viewer # training-on-a-remote-machine ↩︎
AutoDL帮助文档 | VSCode远程开发 ↩︎
RuntimeError: CUDA out of memory. Tried to allocate 12.50 MiB (GPU 0; 10.92 GiB total capacity; 8.57 MiB already allocated; 9.28 GiB free; 4.68 MiB cached) #16417 ↩︎
How to avoid “CUDA out of memory” in PyTorch ↩︎
How to avoid “CUDA out of memory” in PyTorch ↩︎
nerfstudio-project | nerfstudio # Visualize existing run ↩︎