Variable Sequence Lengths in TensorFlow

本文主要是介绍Variable Sequence Lengths in TensorFlow，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

翻译这篇文章：https://danijar.com/variable-sequence-lengths-in-tensorflow/
大意是因为在用rnn做nlp任务的时候，不同的句子长度不一样，如果我们使用static_rnn我们需要固定最大句子长度，这其实是不合适的。因为在句子实际长度m小于最大长度n的时候，我们实际上希望得到m时刻的输出，而不是n时刻的输出(因为m时刻句子已经结束)，但是因为static_rnn我们不得不继续计算。这样不仅会增加很多的计算量，并且也会对我们的输出结果造成影响。所以我们使用dynamic_rnn.这样有一个好处就是在超过句子实际长度的时间的输出直接返回0，不在计算。
我们只需要添加sequence_length这个参数。这个参数是一个1-D，大小为batchsize的vector。
对于每个输入的shape是：batch size x max length x features。

def length(sequence):#sign大于0的等于1，小于0的等于-1，等于0的输出0#因为句子最大长度是max_lenth，不足的补0，所以通过reduce_max得到对于每个时间点的最大值#（因为补充值都为0，原有句子词向量的绝对值大于0）这样经过sign句子实际长度step值都为1，补充的都为0used = tf.sign(tf.reduce_max(tf.abs(sequence), 2))#这样对于每一个batch的step那一维求和就能得到句子长度length = tf.reduce_sum(used, 1)length = tf.cast(length, tf.int32)return length

这样我们就可以用下面的代码构建rnn网络：

max_length = 100
frame_size = 64
num_hidden = 200sequence = tf.placeholder(tf.float32, [None, max_length, frame_size])
output, state = tf.nn.dynamic_rnn(tf.contrib.rnn.GRUCell(num_hidden),sequence,dtype=tf.float32,sequence_length=length(sequence),
)

Masking the Cost Function：
对于加了sequence_lenth的output的shape依旧为batch_size x max_length x out_size。只不过大于句子实际长度的step输出为0。再计算损失的时候我们reduce_mean就不合适了，因为它除以的是句子的max_lenth，而不是实际长度。
所以我们可以使用下面的代码计算损失：

def cost(output, target):# Compute cross entropy for each frame.cross_entropy = target * tf.log(output)#求出每一个step的损失cross_entropy = -tf.reduce_sum(cross_entropy, 2)#对于每一个step，赋值为1或者0mask = tf.sign(tf.reduce_max(tf.abs(target), 2))#这一步我的理解如果前面求出的已经是变长的输出，那么补充的step值本身就是0，就不用这一步了#如果假设求出的是定长的，那么补充部分乘以0就变为0了cross_entropy *= mask# Average over actual sequence lengths.cross_entropy = tf.reduce_sum(cross_entropy, 1)cross_entropy /= tf.reduce_sum(mask, 1)return tf.reduce_mean(cross_entropy)

选择句子的最后输出：
因为在句子实际长度之后的step都赋值为0了，不能像以前一样直接去output[:,-1.:],但是tensorflow又不像numpy支持切片索引，直接output[:, length - 1]就可以了。所以要使用下面这段代码：

def last_relevant(output, length):batch_size = tf.shape(output)[0]max_length = tf.shape(output)[1]out_size = int(output.get_shape()[2])index = tf.range(0, batch_size) * max_length + (length - 1)#把output变成2-D的，在后面直接使用gather函数和索引就取到所有的结果了，类似embedding_lookup。flat = tf.reshape(output, [-1, out_size])relevant = tf.gather(flat, index)return relevant

预测：

num_classes = 10last = last_relevant(output)
weight = tf.Variable(tf.truncated_normal([num_hidden, num_classes], stddev=0.1))
bias = tf.Variable(tf.constant(0.1, shape=[num_classes]))
prediction = tf.nn.softmax(tf.matmul(last, weight) + bias)

这篇关于Variable Sequence Lengths in TensorFlow的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！