HQ-Edit、MaxFusion、Ctrl-Adapter、EdgeRelight360、Video2Game

2024-04-23 01:20

本文主要是介绍HQ-Edit、MaxFusion、Ctrl-Adapter、EdgeRelight360、Video2Game,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

本文首发于公众号:机器感知

HQ-Edit、MaxFusion、Ctrl-Adapter、EdgeRelight360、Video2Game

图片

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

图片

Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpaintin......

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

图片

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For example, an HQ-Edit finetuned InstructPix2Pix can attain state-of-the-art image editing performan......

MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion  Models

图片

Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation as well as spatially conditioned image generation. For most applications, we can train the model end-toend with paired data to obtain photorealistic generation quality. However, to add an additional task, one often needs to retrain the model from scratch using paired data across all modalities to retain good generation performance. In this paper, we tackle this issue and propose a novel strategy to scale a generative model across new tasks with minimal compute. During our experiments, we discovered that the variance maps of intermediate feature maps of diffusion models capture the intensity of conditioning. Utilizing this prior information, we propose MaxFusion, an efficient strategy to scale up text-to-image generation models to accommodate new modality conditions. Specifically, we combine aligned features of multiple models, hence bringing a compositional ef......

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse  Controls to Any Diffusion Model

图片

ControlNets are widely used for adding spatial control in image generation with different conditions, such as depth maps, canny edges, and human poses. However, there are several challenges when leveraging the pretrained image ControlNets for controlled video generation. First, pretrained ControlNet cannot be directly plugged into new backbone models due to the mismatch of feature spaces, and the cost of training ControlNets for new backbones is a big burden. Second, ControlNet features for different frames might not effectively handle the temporal consistency. To address these challenges, we introduce Ctrl-Adapter, an efficient and versatile framework that adds diverse controls to any image/video diffusion models, by adapting pretrained ControlNets (and improving temporal alignment for videos). Ctrl-Adapter provides diverse capabilities including image control, video control, video control with sparse frames, multi-condition control, compatibility with different backbones, a......

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for  Real-Time On-Device Video Portrait Relighting

图片

In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, realistic lighting conditions from textual descriptions, offering flexibility and control in portrait video relighting task. Unlike the previous relighting frameworks, our proposed system performs video relighting directly on-device, enabling real-time inference with real 360-degree HDRI maps. This on-device processing ensures both privacy and guarantees low runtime, providing an immediate response to changes in lighting conditions or user inputs. Our approach paves the way for new possibilities in real-time video applications, including video conferencing, gaming, and augmented reality, ......

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible  Environment from a Single Video

图片

Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF) module that effectively captures the geometry and visual appearance of the scene; (ii) a mesh module that distills the knowledge from NeRF for faster rendering; and (iii) a physics module that models the interactions and physical dynamics among the objects. By following the carefully designed pipeline, one can construct an interactable and actionable digital replica of the real world. We benchmark our system on both indoor and large-scale outdoor scenes. We show that we can not only produce highly-realistic renderings in real-time, but also build interactive games on top. ......

Equipping Diffusion Models with Differentiable Spatial Entropy for  Low-Light Image Enhancement

图片

Image restoration, which aims to recover high-quality images from their corrupted counterparts, often faces the challenge of being an ill-posed problem that allows multiple solutions for a single input. However, most deep learning based works simply employ l1 loss to train their network in a deterministic way, resulting in over-smoothed predictions with inferior perceptual quality. In this work, we propose a novel method that shifts the focus from a deterministic pixel-by-pixel comparison to a statistical perspective, emphasizing the learning of distributions rather than individual pixel values. The core idea is to introduce spatial entropy into the loss function to measure the distribution difference between predictions and targets. To make this spatial entropy differentiable, we employ kernel density estimation (KDE) to approximate the probabilities for specific intensity values of each pixel with their neighbor areas. Specifically, we equip the entropy with diffusion model......

这篇关于HQ-Edit、MaxFusion、Ctrl-Adapter、EdgeRelight360、Video2Game的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/927399

相关文章

MFC中Spin Control控件使用,同时数据在Edit Control中显示

实现mfc spin control 上下滚动,只需捕捉spin control 的 UDN_DELTAPOD 消息,如下:  OnDeltaposSpin1(NMHDR *pNMHDR, LRESULT *pResult) {  LPNMUPDOWN pNMUpDown = reinterpret_cast(pNMHDR);  // TODO: 在此添加控件通知处理程序代码    if

禁止 CTRL+ALT+DEL

在win98 里可以用   Public Declare Function SystemParametersInfo Lib "user32" Alias "SystemParametersInfoA" (ByVal uAction As Long, ByVal uParam As Long, ByRef lpvParam As Any, ByVal fuWinIni As Long)

超越IP-Adapter!阿里提出UniPortrait,可通过文本定制生成高保真的单人或多人图像。

阿里提出UniPortrait,能根据用户提供的文本描述,快速生成既忠实于原图又能灵活调整的个性化人像,用户甚至可以通过简单的句子来描述多个不同的人物,而不需要一一指定每个人的位置。这种设计大大简化了用户的操作,提升了个性化生成的效率和效果。 UniPortrait以统一的方式定制单 ID 和多 ID 图像,提供高保真身份保存、广泛的面部可编辑性、自由格式的文本描述,并且无需预先确定的布局。

【扩散模型(十)】IP-Adapter 源码详解 4 - 训练细节、具体训了哪些层?

系列文章目录 【扩散模型(一)】中介绍了 Stable Diffusion 可以被理解为重建分支(reconstruction branch)和条件分支(condition branch)【扩散模型(二)】IP-Adapter 从条件分支的视角,快速理解相关的可控生成研究【扩散模型(三)】IP-Adapter 源码详解1-训练输入 介绍了训练代码中的 image prompt 的输入部分,即 i

《GOF设计模式》—适配器(ADAPTER)—Delphi源码示例:可插入的Adapter(参数化的适配器)

 示例:可插入的Adapter(参数化的适配器) 实现: c)、参数化的适配器 用一个或多个模块对适配器进行参数化。模块构造支持无子类化的适配。一个模块可以匹配一个请求,并且适配器可以为每个请求存储一个模块。 在本例中意味着,TreeDisplay存储的一个模块用来将一个节点转化成为一个GraphicNode,另外一个模块用来存取一个节点的子节点。   例如,当对一个目录

《GOF设计模式》—适配器(ADAPTER)—Delphi源码示例:可插入的Adapter(使用代理对象)

 示例:可插入的Adapter(使用代理对象) 实现: b)、使用代理对象 在这种方法中,TreeDisplay将访问树结构的请求转发到代理对象。TreeDisplay的客户进行一些选择,并将这些选择提供给代理对象,这样客户就可以对适配加以控制,如下图所示。 例如,有一个DirectoryBrowser,它像前面一样使用TreeDisplay。DirectoryBrows

《GOF设计模式》—适配器(ADAPTER)—Delphi源码示例:可插入的Adapter(使用抽象操作)

 示例:可插入的Adapter(使用抽象操作) 说明: 当其他的类(如A)使用一个类(如C)时,如果所需的假定条件越少,这个类(如C)就更具可复用性。如果将接口匹配构建为一个类(如B),就不需要假定对其他的类可见的是一个相同的接口(如C接口)。也就是说,接口匹配使得我们可以将自己的类(如C)加入到一些现有的系统中去,而这些系统对这个类(如C)的接口可能会有所不同。 A  =〉 C

《GOF设计模式》—适配器(ADAPTER)—Delphi源码示例:绘图编辑器

 示例:绘图编辑器 说明: 有时,为复用而设计的工具箱类不能够被复用的原因仅仅是因为它的接口与专业应用领域所需要的接口不匹配。 例如,有一个绘图编辑器,这个编辑器允许用户绘制和排列基本图元(线、多边型和正 文等)、生成图片和图表。这个绘图编辑器的关键抽象是图形对象。图形对象有一个可编辑的形状,并可以绘制自身。图形对象的接口由一个称为Shape的抽象类定义。绘图编辑器为每一种图形对

《GOF设计模式》—适配器(ADAPTER)—Delphi源码示例:适配器接口

 示例:适配器接口 说明: (1)、定义 将一个类的接口转换成客户希望的另外一个接口。Adapter模式使得原本由于接口不兼容而不能一起工作的那些类可以一起工作。 (2)、结构 对象匹配器依赖于对象组合,如下图所示。 目标Target:定义Client使用的与特定领域相关的接口。 适配器Adapter:对Adaptee的接口与Target接口进行适配。 被适配者A

R-Adapter:零样本模型微调新突破,提升鲁棒性与泛化能力 | ECCV 2024

大规模图像-文本预训练模型实现了零样本分类,并在不同数据分布下提供了一致的准确性。然而,这些模型在下游任务中通常需要微调优化,这会降低对于超出分布范围的数据的泛化能力,并需要大量的计算资源。论文提出新颖的Robust Adapter(R-Adapter),可以在微调零样本模型用于下游任务的同时解决这两个问题。该方法将轻量级模块集成到预训练模型中,并采用新颖的自我集成技术以提高超出分布范围的鲁棒性