pytorch pyro 贝叶斯神经网络 bnn beyesean neure network svi ​定制SVI目标和培训循环,变更推理

本文主要是介绍pytorch pyro 贝叶斯神经网络 bnn beyesean neure network svi ​定制SVI目标和培训循环,变更推理,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!


Pyro支持各种基于优化的贝叶斯推理方法,包括Trace_ELBO作为SVI(随机变分推理)的基本实现。参见文件(documents的简写)有关各种SVI实现和SVI教程的更多信息I, 二,以及罗马数字3了解SVI的背景。


  1. 基本SVI用法

    1. 较低层次的模式

  2. 示例:自定义正则化

  3. 示例:调整损失

  4. 例如:贝塔VAE

  5. 示例:混合优化器

  6. 示例:自定义ELBO

  7. 示例:KL退火



optimizer = pyro.optim.Adam({"lr": 0.001, "betas": (0.90, 0.999)})
svi = pyro.infer.SVI(model, guide, optimizer, loss=pyro.infer.Trace_ELBO())




  • pyro.optim.Adam动态创建一个新的torch.optim.Adam每当遇到新参数时优化器

  • SVI.step()渐变步骤之间的零渐变


svi = pyro.infer.SVI(model, guide, optimizer, loss=pyro.infer.Trace_ELBO())
for i in range(n_iter):loss = svi.step(X_train, y_train)


loss_fn = lambda model, guide: pyro.infer.Trace_ELBO().differentiable_loss(model, guide, X_train, y_train)
with pyro.poutine.trace(param_only=True) as param_capture:loss = loss_fn(model, guide)
params = set(site["value"].unconstrained()for site in param_capture.trace.nodes.values())
optimizer = torch.optim.Adam(params, lr=0.001, betas=(0.90, 0.999))
for i in range(n_iter):# compute lossloss = loss_fn(model, guide)loss.backward()# take a step and zero the parameter gradientsoptimizer.step()optimizer.zero_grad()



def my_custom_L2_regularizer(my_parameters):reg_loss = 0.0for param in my_parameters:reg_loss = reg_loss + param.pow(2.0).sum()return reg_loss


- loss = loss_fn(model, guide)
+ loss = loss_fn(model, guide) + my_custom_L2_regularizer(my_parameters)




- optimizer = pyro.optim.Adam({"lr": 0.001, "betas": (0.90, 0.999)})
+ optimizer = pyro.optim.Adam({"lr": 0.001, "betas": (0.90, 0.999)}, {"clip_norm": 10.0})




- loss = loss_fn(model, guide)
+ loss = loss_fn(model, guide) / N_data


def model(...):pass@poutine.scale(scale=1.0/N_data)
def guide(...):pass



def model(data, beta=0.5):z_loc, z_scale = ...with pyro.poutine.scale(scale=beta)z = pyro.sample("z", dist.Normal(z_loc, z_scale))pyro.sample("obs", dist.Bernoulli(...), obs=data)def guide(data, beta=0.5):with pyro.poutine.scale(scale=beta)z_loc, z_scale = ...z = pyro.sample("z", dist.Normal(z_loc, z_scale))


svi = pyro.infer.SVI(model, guide, optimizer, loss=pyro.infer.Trace_ELBO())




adam = torch.optim.Adam(adam_parameters, {"lr": 0.001, "betas": (0.90, 0.999)})
sgd = torch.optim.SGD(sgd_parameters, {"lr": 0.0001})
loss_fn = pyro.infer.Trace_ELBO().differentiable_loss
# compute loss
loss = loss_fn(model, guide)
# take a step and zero the parameter gradients


def model():pyro.param('a', ...)pyro.param('b', ...)...adam = pyro.optim.Adam({'lr': 0.1})
sgd = pyro.optim.SGD({'lr': 0.01})
optim = MixedMultiOptimizer([(['a'], adam), (['b'], sgd)])
with pyro.poutine.trace(param_only=True) as param_capture:loss = elbo.differentiable_loss(model, guide)
params = {'a': pyro.param('a'), 'b': pyro.param('b')}
optim.step(loss, params)



# note that simple_elbo takes a model, a guide, and their respective arguments as inputs
def simple_elbo(model, guide, *args, **kwargs):# run the guide and trace its executionguide_trace = poutine.trace(guide).get_trace(*args, **kwargs)# run the model and replay it against the samples from the guidemodel_trace = poutine.trace(poutine.replay(model, trace=guide_trace)).get_trace(*args, **kwargs)# construct the elbo loss functionreturn -1*(model_trace.log_prob_sum() - guide_trace.log_prob_sum())svi = SVI(model, guide, optim, loss=simple_elbo)




def simple_elbo_kl_annealing(model, guide, *args, **kwargs):# get the annealing factor and latents to anneal from the keyword# arguments passed to the model and guideannealing_factor = kwargs.pop('annealing_factor', 1.0)latents_to_anneal = kwargs.pop('latents_to_anneal', [])# run the guide and replay the model against the guideguide_trace = poutine.trace(guide).get_trace(*args, **kwargs)model_trace = poutine.trace(poutine.replay(model, trace=guide_trace)).get_trace(*args, **kwargs)elbo = 0.0# loop through all the sample sites in the model and guide trace and# construct the loss; note that we scale all the log probabilities of# samples sites in `latents_to_anneal` by the factor `annealing_factor`for site in model_trace.values():if site["type"] == "sample":factor = annealing_factor if site["name"] in latents_to_anneal else 1.0elbo = elbo + factor * site["fn"].log_prob(site["value"]).sum()for site in guide_trace.values():if site["type"] == "sample":factor = annealing_factor if site["name"] in latents_to_anneal else 1.0elbo = elbo - factor * site["fn"].log_prob(site["value"]).sum()return -elbosvi = SVI(model, guide, optim, loss=simple_elbo_kl_annealing)
svi.step(other_args, annealing_factor=0.2, latents_to_anneal=["my_latent"])


Customizing SVI objectives and training loops¶

Pyro provides support for various optimization-based approaches to Bayesian inference, with Trace_ELBO serving as the basic implementation of SVI (stochastic variational inference). See the docs for more information on the various SVI implementations and SVI tutorials I, II, and III for background on SVI.

In this tutorial we show how advanced users can modify and/or augment the variational objectives (alternatively: loss functions) and the training step implementation provided by Pyro to support special use cases.

  1. Basic SVI Usage

    1. A Lower Level Pattern

  2. Example: Custom Regularizer

  3. Example: Scaling the Loss

  4. Example: Beta VAE

  5. Example: Mixing Optimizers

  6. Example: Custom ELBO

  7. Example: KL Annealing

Basic SVI Usage¶

We first review the basic usage pattern of SVI objects in Pyro. We assume that the user has defined a model and a guide. The user then creates an optimizer and an SVI object:

optimizer = pyro.optim.Adam({"lr": 0.001, "betas": (0.90, 0.999)})
svi = pyro.infer.SVI(model, guide, optimizer, loss=pyro.infer.Trace_ELBO())

Gradient steps can then be taken with a call to svi.step(...). The arguments to step() are then passed to model and guide.

A Lower-Level Pattern¶

The nice thing about the above pattern is that it allows Pyro to take care of various details for us, for example:

  • pyro.optim.Adam dynamically creates a new torch.optim.Adam optimizer whenever a new parameter is encountered

  • SVI.step() zeros gradients between gradient steps

If we want more control, we can directly manipulate the differentiable loss method of the various ELBO classes. For example, this optimization loop:

svi = pyro.infer.SVI(model, guide, optimizer, loss=pyro.infer.Trace_ELBO())
for i in range(n_iter):loss = svi.step(X_train, y_train)

is equivalent to this low-level pattern:

loss_fn = lambda model, guide: pyro.infer.Trace_ELBO().differentiable_loss(model, guide, X_train, y_train)
with pyro.poutine.trace(param_only=True) as param_capture:loss = loss_fn(model, guide)
params = set(site["value"].unconstrained()for site in param_capture.trace.nodes.values())
optimizer = torch.optim.Adam(params, lr=0.001, betas=(0.90, 0.999))
for i in range(n_iter):# compute lossloss = loss_fn(model, guide)loss.backward()# take a step and zero the parameter gradientsoptimizer.step()optimizer.zero_grad()

Example: Custom Regularizer¶

Suppose we want to add a custom regularization term to the SVI loss. Using the above usage pattern, this is easy to do. First we define our regularizer:

def my_custom_L2_regularizer(my_parameters):reg_loss = 0.0for param in my_parameters:reg_loss = reg_loss + param.pow(2.0).sum()return reg_loss

Then the only change we need to make is:

- loss = loss_fn(model, guide)
+ loss = loss_fn(model, guide) + my_custom_L2_regularizer(my_parameters)

Example: Clipping Gradients¶

For some models the loss gradient can explode during training, leading to overflow and NaN values. One way to protect against this is with gradient clipping. The optimizers in pyro.optim take an optional dictionary of clip_args which allows clipping either the gradient norm or the gradient value to fall within the given limit.

To change the basic example above:

- optimizer = pyro.optim.Adam({"lr": 0.001, "betas": (0.90, 0.999)})
+ optimizer = pyro.optim.Adam({"lr": 0.001, "betas": (0.90, 0.999)}, {"clip_norm": 10.0})

Further variants of gradient clipping can also be implemented manually by modifying the low-level pattern described above.

Example: Scaling the Loss¶

Depending on the optimization algorithm, the scale of the loss may or not matter. Suppose we want to scale our loss function by the number of datapoints before we differentiate it. This is easily done:

- loss = loss_fn(model, guide)
+ loss = loss_fn(model, guide) / N_data

Note that in the case of SVI, where each term in the loss function is a log probability from the model or guide, this same effect can be achieved using poutine.scale. For example we can use the poutine.scale decorator to scale both the model and guide:

def model(...):pass@poutine.scale(scale=1.0/N_data)
def guide(...):pass

Example: Beta VAE¶

We can also use poutine.scale to construct non-standard ELBO variational objectives in which, for example, the KL divergence is scaled differently relative to the expected log likelihood. In particular for the Beta VAE the KL divergence is scaled by a factor beta:

def model(data, beta=0.5):z_loc, z_scale = ...with pyro.poutine.scale(scale=beta)z = pyro.sample("z", dist.Normal(z_loc, z_scale))pyro.sample("obs", dist.Bernoulli(...), obs=data)def guide(data, beta=0.5):with pyro.poutine.scale(scale=beta)z_loc, z_scale = ...z = pyro.sample("z", dist.Normal(z_loc, z_scale))

With this choice of model and guide the log densities corresponding to the latent variable z that enter into constructing the variational objective via

svi = pyro.infer.SVI(model, guide, optimizer, loss=pyro.infer.Trace_ELBO())

will be scaled by a factor of beta, resulting in a KL divergence that is likewise scaled by beta.

Example: Mixing Optimizers¶

The various optimizers in pyro.optim allow the user to specify optimization settings (e.g. learning rates) on a per-parameter basis. But what if we want to use different optimization algorithms for different parameters? We can do this using Pyro’s MultiOptimizer (see below), but we can also achieve the same thing if we directly manipulate differentiable_loss:

adam = torch.optim.Adam(adam_parameters, {"lr": 0.001, "betas": (0.90, 0.999)})
sgd = torch.optim.SGD(sgd_parameters, {"lr": 0.0001})
loss_fn = pyro.infer.Trace_ELBO().differentiable_loss
# compute loss
loss = loss_fn(model, guide)
# take a step and zero the parameter gradients

For completeness, we also show how we can do the same thing using MultiOptimizer, which allows us to combine multiple Pyro optimizers. Note that since MultiOptimizer uses torch.autograd.grad under the hood (instead of torch.Tensor.backward()), it has a slightly different interface; in particular the step() method also takes parameters as inputs.

def model():pyro.param('a', ...)pyro.param('b', ...)...adam = pyro.optim.Adam({'lr': 0.1})
sgd = pyro.optim.SGD({'lr': 0.01})
optim = MixedMultiOptimizer([(['a'], adam), (['b'], sgd)])
with pyro.poutine.trace(param_only=True) as param_capture:loss = elbo.differentiable_loss(model, guide)
params = {'a': pyro.param('a'), 'b': pyro.param('b')}
optim.step(loss, params)

Example: Custom ELBO¶

In the previous three examples we bypassed creating a SVI object and directly manipulated the differentiable loss function provided by an ELBO implementation. Another thing we can do is create custom ELBO implementations and pass those into the SVI machinery. For example, a simplified version of a Trace_ELBO loss function might look as follows:

# note that simple_elbo takes a model, a guide, and their respective arguments as inputs
def simple_elbo(model, guide, *args, **kwargs):# run the guide and trace its executionguide_trace = poutine.trace(guide).get_trace(*args, **kwargs)# run the model and replay it against the samples from the guidemodel_trace = poutine.trace(poutine.replay(model, trace=guide_trace)).get_trace(*args, **kwargs)# construct the elbo loss functionreturn -1*(model_trace.log_prob_sum() - guide_trace.log_prob_sum())svi = SVI(model, guide, optim, loss=simple_elbo)

Note that this is basically what the elbo implementation in “mini-pyro” looks like.

Example: KL Annealing¶

In the Deep Markov Model Tutorial the ELBO variational objective is modified during training. In particular the various KL-divergence terms between latent random variables are scaled downward (i.e. annealed) relative to the log probabilities of the observed data. In the tutorial this is accomplished using poutine.scale. We can accomplish the same thing by defining a custom loss function. This latter option is not a very elegant pattern but we include it anyway to show the flexibility we have at our disposal.

def simple_elbo_kl_annealing(model, guide, *args, **kwargs):# get the annealing factor and latents to anneal from the keyword# arguments passed to the model and guideannealing_factor = kwargs.pop('annealing_factor', 1.0)latents_to_anneal = kwargs.pop('latents_to_anneal', [])# run the guide and replay the model against the guideguide_trace = poutine.trace(guide).get_trace(*args, **kwargs)model_trace = poutine.trace(poutine.replay(model, trace=guide_trace)).get_trace(*args, **kwargs)elbo = 0.0# loop through all the sample sites in the model and guide trace and# construct the loss; note that we scale all the log probabilities of# samples sites in `latents_to_anneal` by the factor `annealing_factor`for site in model_trace.values():if site["type"] == "sample":factor = annealing_factor if site["name"] in latents_to_anneal else 1.0elbo = elbo + factor * site["fn"].log_prob(site["value"]).sum()for site in guide_trace.values():if site["type"] == "sample":factor = annealing_factor if site["name"] in latents_to_anneal else 1.0elbo = elbo - factor * site["fn"].log_prob(site["value"]).sum()return -elbosvi = SVI(model, guide, optim, loss=simple_elbo_kl_annealing)
svi.step(other_args, annealing_factor=0.2, latents_to_anneal=["my_latent"])

这篇关于pytorch pyro 贝叶斯神经网络 bnn beyesean neure network svi ​定制SVI目标和培训循环,变更推理的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



《PyTorch使用教程之Tensor包详解》这篇文章介绍了PyTorch中的张量(Tensor)数据结构,包括张量的数据类型、初始化、常用操作、属性等,张量是PyTorch框架中的核心数据结构,支持... 目录1、张量Tensor2、数据类型3、初始化(构造张量)4、常用操作5、常用属性5.1 存储(st


《JAVA中while循环的使用与注意事项》:本文主要介绍while循环在编程中的应用,包括其基本结构、语句示例、适用场景以及注意事项,文中通过代码介绍的非常详细,需要的朋友可以参考下... 目录while循环1. 什么是while循环2. while循环的语句3.while循环的适用场景以及优势4. 注意


《如何用Java结合经纬度位置计算目标点的日出日落时间详解》这篇文章主详细讲解了如何基于目标点的经纬度计算日出日落时间,提供了在线API和Java库两种计算方法,并通过实际案例展示了其应用,需要的朋友... 目录前言一、应用示例1、天安门升旗时间2、湖南省日出日落信息二、Java日出日落计算1、在线API2


《SpringBoot整合Canal+RabbitMQ监听数据变更详解》在现代分布式系统中,实时获取数据库的变更信息是一个常见的需求,本文将介绍SpringBoot如何通过整合Canal和Rabbit... 目录需求步骤环境搭建整合SpringBoot与Canal实现客户端Canal整合RabbitMQSp

Python中的异步:async 和 await以及操作中的事件循环、回调和异常

《Python中的异步:async和await以及操作中的事件循环、回调和异常》在现代编程中,异步操作在处理I/O密集型任务时,可以显著提高程序的性能和响应速度,Python提供了asyn... 目录引言什么是异步操作?python 中的异步编程基础async 和 await 关键字asyncio 模块理论


好喜欢这题,第一次做小数问题,一开始真心没思路,然后参考了网上的一些资料。 知识点***********************************无限不循环小数即无理数,不能写作两整数之比*****************************(一开始没想到,小学没学好) 此题1/n肯定是一个有限循环小数,了解这些后就能做此题了。 按照除法的机制,用一个函数表示出来就可以了,代码如下

poj 2349 Arctic Network uva 10369(prim or kruscal最小生成树)

题目很麻烦,因为不熟悉最小生成树的算法调试了好久。 感觉网上的题目解释都没说得很清楚,不适合新手。自己写一个。 题意:给你点的坐标,然后两点间可以有两种方式来通信:第一种是卫星通信,第二种是无线电通信。 卫星通信:任何两个有卫星频道的点间都可以直接建立连接,与点间的距离无关; 无线电通信:两个点之间的距离不能超过D,无线电收发器的功率越大,D越大,越昂贵。 计算无线电收发器D

烟火目标检测数据集 7800张 烟火检测 带标注 voc yolo

一个包含7800张带标注图像的数据集,专门用于烟火目标检测,是一个非常有价值的资源,尤其对于那些致力于公共安全、事件管理和烟花表演监控等领域的人士而言。下面是对此数据集的一个详细介绍: 数据集名称:烟火目标检测数据集 数据集规模: 图片数量:7800张类别:主要包含烟火类目标,可能还包括其他相关类别,如烟火发射装置、背景等。格式:图像文件通常为JPEG或PNG格式;标注文件可能为X


我们将图神经网络分为基于谱域的模型和基于空域的模型,并按照发展顺序详解每个类别中的重要模型。 1.1基于谱域的图神经网络         谱域上的图卷积在图学习迈向深度学习的发展历程中起到了关键的作用。本节主要介绍三个具有代表性的谱域图神经网络:谱图卷积网络、切比雪夫网络和图卷积网络。 (1)谱图卷积网络 卷积定理:函数卷积的傅里叶变换是函数傅里叶变换的乘积,即F{f*g}


查看原文>>>全流程SWAP农业模型数据制备、敏感性分析及气候变化影响实践技术应用 SWAP模型是由荷兰瓦赫宁根大学开发的先进农作物模型,它综合考虑了土壤-水分-大气以及植被间的相互作用;是一种描述作物生长过程的一种机理性作物生长模型。它不但运用Richard方程,使其能够精确的模拟土壤中水分的运动,而且耦合了WOFOST作物模型使作物的生长描述更为科学。 本文让更多的科研人员和农业工作者