pytorch之LSTM(四)

本文主要是介绍pytorch之LSTM(四)，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

1、序列模型和长期记忆网络

至此，我们已经看到了各种前馈网络。即，网络根本不维持任何状态。这可能不是我们想要的行为。序列模型是NLP的核心：它们是在输入之间存在一定时间依存关系的模型。序列模型的经典示例是用于词性标记的隐马尔可夫模型。另一个示例是条件随机场。

递归神经网络是维持某种状态的网络。例如，它的输出可以用作下一个输入的一部分，以便信息可以随着网络在序列上传递而传播。对于LSTM，对于序列中的每个元素，都有一个对应的隐藏状态 ht，原则上可以包含序列中任意点的信息。我们可以使用隐藏状态来预测语言模型中的单词，例如词性标签，槽位识别。

2、Pytorch中的LSTM

Pytorch的LSTM输入一般为3D张量。第1维为词数，第2维为batch数，第3维为词向量的维度。LSTM的原理可以看博客循环神经网络。如果我们想对句子“ The cow jumped”运行序列模型，我们的输入应如下所示

#LSTM依赖的数学函数及输入输出参数详解
lstm=nn.LSTM(3,4,2) #输入维度3，输出维度3,层数2
'''
nn.LSTM将多层长短期记忆（LSTM）RNN应用于输入顺序。
对于输入序列中的每个元素，每一层计算以下内容
对应的数学函数：
math::\begin{array}{ll} \\i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\c_t = f_t \odot c_{t-1} + i_t \odot g_t \\h_t = o_t \odot \tanh(c_t) \\\end{array}:math:`h_t` 表示`t`时刻的隐藏状态, :math:`c_t`表示`t`时刻的cell状态, :math:`x_t` 表示`t`时刻的输入, :math:`h_{t-1}`表示`t-1`时刻的隐藏状态或者初始状态math:`f_t`,:math:`g_t`,:math:`o_t`分别表示输入，遗忘、cell、输出门:math:`\sigma`表示sigmoid 函数， :math:`\odot` Hadamard乘积.
Args:input_size：输入x的特征维度hidden_size：处于隐藏状态“h”的特征维度num_layers：循环图层数。例如，设置``num_layers = 2``意味着将两个LSTM堆叠在一起以形成“堆叠的LSTM”，第二个LSTM接收第一个LSTM的输出，计算最终结果。默认值：1bias：如果为False，则该图层不使用偏见权重“ b_ih”和“ b_hh”。默认值：``True``batch_first：如果为``True''，则提供输入和输出张量作为（批次，序列，特征）。默认值：``False''dropout：如果非零，则在每个输出的输出端引入一个“ Dropout”层LSTM层（最后一层除外），丢失概率等于attr：`dropout`。默认值：0bidirectional：如果为``True''，则变为双向LSTM。默认值：``False''Inputs: input, (h_0, c_0)- input张量的shape为 `(seq_len, batch, input_size)`: 张量包括了输入序列的特征- h_0张量的shape为`(num_layers * num_directions, batch, hidden_size)`: 张量包含了每一batch中初始隐藏状态如果LSTM是双向的, num_directions为2, 否则为1.- c_0张量的shape为 `(num_layers * num_directions, batch, hidden_size)`: 张量包含了每一batch中初始cell状态如果`(h_0, c_0)` 未提供, h_0，c_0默认为全0张量
Outputs: output, (h_n, c_n)- output张量的shape为`(seq_len, batch, num_directions * hidden_size)`:张量是LSTM最后一层每个t时刻o_t如果class:`torch.nn.utils.rnn.PackedSequence` 已经作为输入给出，输出也将是packed序列。对于unpacked的情况，可以将方向分开使用``output.view（seq_len，batch，num_directions，hidden_size）``，前进和后退分别是方向“ 0”和“ 1”。- h_n张量的shape为`(num_layers * num_directions, batch, hidden_size)`: 张量是LSTM最后一层t(t = seq_len)时刻h_t隐藏状态可通过h_n.view(num_layers, num_directions, batch, hidden_size)获取每层隐藏状态。- c_n张量的shape为`(num_layers * num_directions, batch, hidden_size)`: 张量是LSTM最后一层t(t = seq_len)时刻c_t状态'''  
inputs=[torch.randn(1,3) for _ in range(5)] #构建一个句子长度为5，维度为3的输入句子，一个batch的数据#初始化隐层状态和cell状态 hidden输入维度(num_layers * num_directions, batch, hidden_size)
hidden = (torch.randn(2, 1, 4),torch.randn(2, 1, 4))'''
before: 1 tensor([[[ 0.0645, -0.1050,  0.1427, -0.0289]]], grad_fn=<StackBackward>) (tensor([[[ 0.0465,  0.0638,  0.3326, -0.2273]],[[ 0.0645, -0.1050,  0.1427, -0.0289]]], grad_fn=<StackBackward>), tensor([[[ 0.0992,  0.2878,  0.9721, -0.4657]],[[ 0.1024, -0.3557,  0.6139, -0.0391]]], grad_fn=<StackBackward>))
after: 1 tensor([[[ 0.1458, -0.0954,  0.1286,  0.1728]]], grad_fn=<StackBackward>) (tensor([[[ 0.0157,  0.2103,  0.2820, -0.2021]],[[ 0.1458, -0.0954,  0.1286,  0.1728]]], grad_fn=<StackBackward>), tensor([[[ 0.0302,  0.6760,  0.6505, -0.4757]],[[ 0.2552, -0.2182,  0.3291,  0.3289]]], grad_fn=<StackBackward>))
'''for i,input_x in enumerate(inputs):input_x=input_x.view(1,1,-1)if i==0:input_all=input_xelse:input_all=torch.cat((input_all,input_x),dim=0)out,hidden=lstm(input_x,hidden)print("after:",i,out,hidden) print(input_all.size())
out,hidden=lstm(input_all,hidden)
print('all seq',out,hidden) #out:seq_len*hidden_dim

3、LSTM用于词性标记

目前使用LSTM实现词性标注的实例，目前未使用Viterbi或Forward-Backward之类的算法。在此处会使用词向量，实现参考pytorch之词嵌(三)

流程如下：输入句子为 w1,…,wM。wi∈V，T 是标签集，yi 对应单词的标签wi，对单词标签wi 的预测为ŷi。

预测序列为 ŷ1,…,ŷM； ŷ i∈T。

预测目标函数为：

#词性标注LSTM使用实例
def word2id(data,w2i):for ws,tag in data:for w in ws:if w not in w2i:w2i[w]=len(w2i)#词性标注数据
train = [("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]w2i={}
t2i={"DET": 0, "NN": 1, "V": 2}
word2id(train,w2i)EMBEDDING_DIM = 6
HIDDEN_DIM = 6class LstmTag(nn.Module):def __init__(self,embedding_dim,hidden_dim,vocab_size,tag_size):super(LstmTag,self).__init__()self.embedding=nn.Embedding(vocab_size,embedding_dim)self.lstm=nn.LSTM(embedding_dim,hidden_dim)self.lstm2tag=nn.Linear(hidden_dim,tag_size)def forward(self,x):embeds=self.embedding(x) o,h=self.lstm(embeds.view(len(x),1,-1))tags=self.lstm2tag(o.view(len(x),-1))tags_prob=F.log_softmax(tags)return tags_probmodel=LstmTag(EMBEDDING_DIM,HIDDEN_DIM,len(w2i),len(t2i))
loss_function=nn.NLLLoss()
opt=optim.SGD(model.parameters(),lr=0.1)for input_x,y in train:ids=torch.tensor([w2i[w] for w in input_x],dtype=torch.long)targets=torch.tensor([t2i[w] for w in y],dtype=torch.long)probs=model(ids)print(probs)loss=loss_function(probs,targets)loss.backward()opt.step()print(loss)

这篇关于pytorch之LSTM(四)的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！