机器翻译 -- Neural Machine Translation

2024-08-28 21:48

本文主要是介绍机器翻译 -- Neural Machine Translation,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

本文是基于吴恩达老师《深度学习》第五课第三周练习题所做。

0.背景介绍

 为探究机器翻译的奥秘,我们首先从日期翻译着手。本程序所需的第三方库、数据集及辅助程序,可点击此处下载。

from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import numpy as npfrom faker import Faker
import random
from tqdm import tqdm
from babel.dates import format_date
from nmt_utils import *
import matplotlib.pyplot as plt

1. 将人类可读日期转化为机器可读日期

 1.1 数据集

 我们训练模型所用的数据集中有10000个人类可读的日期及等价的机器可读的日期。

m = 10000
dataset, human_vocab, machine_vocab, inv_machine_vocab = load_dataset(m)
[('9 may 1998', '1998-05-09'), ('10.09.70', '1970-09-10'), ('4/28/90', '1990-04-28'), ('thursday january 26 1995', '1995-01-26'), ('monday march 7 1983', '1983-03-07'), ('sunday may 22 1988', '1988-05-22'), ('tuesday july 8 2008', '2008-07-08'), ('08 sep 1999', '1999-09-08'), ('1 jan 1981', '1981-01-01'), ('monday may 22 1995', '1995-05-22')]

其中:dataset为元组列表(human_readable_date, machine_readable_date);

human_vocab为字典映射,将human_readable_date映射为整数值向量;

machine_vocab为字典映射,将machine_readable_date映射为整数值向量;

inv_machine_vocab为machine_vocab的翻转映射。

接下来对数据及初始的文本数据进行预处理,设human_readable_date的最大长度为Tx=30,machine_readable_date的最大长度为Ty=10

Tx = 30
Ty = 10
X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)
print("X.shape:", X.shape)
print("Y.shape:", Y.shape)
print("Xoh.shape:", Xoh.shape)
print("Yoh.shape:", Yoh.shape)X.shape: (10000, 30)
Y.shape: (10000, 10)
Xoh.shape: (10000, 30, 37)
Yoh.shape: (10000, 10, 11)

2. NMT的注意力机制

如果我们想将一篇文章从英语翻译成法语,我们不可能通读文章后一口气翻译出来,而是要考虑到前后文的关系,一点一点翻译。注意力机制就是要告诉NMT算法需要在哪一步特别留意。

2.1 注意力机制

本小节所要实现的注意力机制流程如图所示,one_step_attention函数的直接输出为context变量。

 

图1 

图2 

由图可知,在实现one_step_attention()之前,需要通过一些运算对输入X进行处理,在keras框架下,这些步骤可以抽象成一个一个层。计算的机理可参考吴恩达老师的视频教程。

repeator = RepeatVector(Tx)
concatenator = Concatenate(axis = -1)
densor1 = Dense(10, activation = 'tanh')
densor2 = Dense(1, activation = 'relu')
activator = Activation(softmax, name = 'attention_weights')
dotor = Dot(axes = 1)

有兴趣的同学可以详细阅读:RepeatVector(), Concatenate(), Dense(), Activation(), Dot().

def one_step_attention(a, s_prev):"""Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights"alphas" and the hidden states "a" of the Bi-LSTM.Arguments:a -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_a)s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)Returns:context -- context vector, input of the next (post-attetion) LSTM cell"""### START CODE HERE #### Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)s_prev = repeator(s_prev)# Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)concat = concatenator([a, s_prev])# Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e. (≈1 lines)e = densor1(concat)# Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies. (≈1 lines)energies = densor2(e)# Use "activator" on "energies" to compute the attention weights "alphas" (≈ 1 line)alphas = activator(energies)# Use dotor together with "alphas" and "a" to compute the context vector to be given to the next (post-attention) LSTM-cell (≈ 1 line)context = dotor([alphas, a])### END CODE HERE ###return context

接下来在创建图2所示的模型前初始化一些全局变量和LSTM层

n_a = 32
n_s = 64
post_activation_LSTM_cell = LSTM(n_s, return_state = True)
output_layer = Dense(len(machine_vocab), activation = softmax)

定义模型

def model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):"""Arguments:Tx -- length of the input sequenceTy -- length of the output sequencen_a -- hidden state size of the Bi-LSTMn_s -- hidden state size of the post-attention LSTMhuman_vocab_size -- size of the python dictionary "human_vocab"machine_vocab_size -- size of the python dictionary "machine_vocab"Returns:model -- Keras model instance"""# Define the inputs of your model with a shape (Tx,)# Define s0 and c0, initial hidden state for the decoder LSTM of shape (n_s,)X = Input(shape=(Tx, human_vocab_size))s0 = Input(shape=(n_s,), name='s0')c0 = Input(shape=(n_s,), name='c0')s = s0c = c0# Initialize empty list of outputsoutputs = []### START CODE HERE #### Step 1: Define your pre-attention Bi-LSTM. Remember to use return_sequences=True. (≈ 1 line)a = Bidirectional(LSTM(n_a, return_sequences=True))(X)# Step 2: Iterate for Ty stepsfor t in range(Ty):# Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)context = one_step_attention(a, s)# Step 2.B: Apply the post-attention LSTM cell to the "context" vector.# Don't forget to pass: initial_state = [hidden state, cell state] (≈ 1 line)s, _, c = post_activation_LSTM_cell(context, initial_state = [s, c])# Step 2.C: Apply Dense layer to the hidden state output of the post-attention LSTM (≈ 1 line)out = output_layer(s)# Step 2.D: Append "out" to the "outputs" list (≈ 1 line)outputs.append(out)# Step 3: Create model instance taking three inputs and returning the list of outputs. (≈ 1 line)model = Model(inputs=[X,s0,c0],outputs=outputs)### END CODE HERE ###return model

创建模型,随后可通过调用model.summary()函数,参看模型的结构及参数详情。

model = model(Tx, Ty, n_a, n_s, len(human_vocab), len(machine_vocab))
model.summary()

 对模型进行优化和计算,采用categorical_crossentropy做loss和Adam优化器

opt = Adam(lr=0.0005, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss='categorical_crossentropy', optimizer = opt, metrics=['accuracy'])

最后定义输入与输出的shape对模型进行训练,由于训练所需要的时间较长,我们直接load一个预训练好的模型参数。

s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))
outputs = list(Yoh.swapaxes(0,1))
model.load_weights('datasets/model.h5')

使用一些新样本对模型进行测试

EXAMPLES = ['3 May 1979', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018','March 3 2001', 'March 3rd 2001', '1 March 2001']for example in EXAMPLES:source = string_to_int(example, Tx, human_vocab)source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source))).swapaxes(0,1)source = source.Tsource = source[np.newaxis, :]prediction = model.predict([source, s0, c0])prediction = np.argmax(prediction, axis = -1)output = [inv_machine_vocab[int(i)] for i in prediction]print("source:", example)print("output:", ''.join(output))
source: 3 May 1979
output: 1979-05-03
source: 5 April 09
output: 2009-05-05
source: 21th of August 2016
output: 2016-08-21
source: Tue 10 Jul 2007
output: 2007-07-10
source: Saturday May 9 2018
output: 2018-05-09
source: March 3 2001
output: 2001-03-03
source: March 3rd 2001
output: 2001-03-03
source: 1 March 2001
output: 2001-03-01

3. 可视化注意力函数

 由于我们处理的问题具有固定长度的输出,因此可以使用10个不同的softmax单元来产生具有10个特征的输出 。而注意力模型具有这样的一个好处:输出的每一部分都知道它仅仅取决于输入的一小部分。我们可以通过可视化来观察输出的各部分分别依赖输出的哪部分。下面以“Saturday 9 May 2018” -- “2018-05-09”任务为例对激活值α〈t,t′〉进行分析。

由图可知,“Saturday”部分输出是不关注的,对于年月日这三部,输出给与了最大的关注,并进行了正确的转化。

3.1从网络中获取激活函数

为了说明注意力的取值是如何定位的,我们打印出模型的summary

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 30, 37)        0                                            
____________________________________________________________________________________________________
s0 (InputLayer)                  (None, 64)            0                                            
____________________________________________________________________________________________________
bidirectional_1 (Bidirectional)  (None, 30, 64)        17920       input_1[0][0]                    
____________________________________________________________________________________________________
repeat_vector_1 (RepeatVector)   (None, 30, 64)        0           s0[0][0]                         lstm_1[0][0]                     lstm_1[1][0]                     lstm_1[2][0]                     lstm_1[3][0]                     lstm_1[4][0]                     lstm_1[5][0]                     lstm_1[6][0]                     lstm_1[7][0]                     lstm_1[8][0]                     
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 30, 128)       0           bidirectional_1[0][0]            repeat_vector_1[0][0]            bidirectional_1[0][0]            repeat_vector_1[1][0]            bidirectional_1[0][0]            repeat_vector_1[2][0]            bidirectional_1[0][0]            repeat_vector_1[3][0]            bidirectional_1[0][0]            repeat_vector_1[4][0]            bidirectional_1[0][0]            repeat_vector_1[5][0]            bidirectional_1[0][0]            repeat_vector_1[6][0]            bidirectional_1[0][0]            repeat_vector_1[7][0]            bidirectional_1[0][0]            repeat_vector_1[8][0]            bidirectional_1[0][0]            repeat_vector_1[9][0]            
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 30, 10)        1290        concatenate_1[0][0]              concatenate_1[1][0]              concatenate_1[2][0]              concatenate_1[3][0]              concatenate_1[4][0]              concatenate_1[5][0]              concatenate_1[6][0]              concatenate_1[7][0]              concatenate_1[8][0]              concatenate_1[9][0]              
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 30, 1)         11          dense_1[0][0]                    dense_1[1][0]                    dense_1[2][0]                    dense_1[3][0]                    dense_1[4][0]                    dense_1[5][0]                    dense_1[6][0]                    dense_1[7][0]                    dense_1[8][0]                    dense_1[9][0]                    
____________________________________________________________________________________________________
attention_weights (Activation)   (None, 30, 1)         0           dense_2[0][0]                    dense_2[1][0]                    dense_2[2][0]                    dense_2[3][0]                    dense_2[4][0]                    dense_2[5][0]                    dense_2[6][0]                    dense_2[7][0]                    dense_2[8][0]                    dense_2[9][0]                    
____________________________________________________________________________________________________
dot_1 (Dot)                      (None, 1, 64)         0           attention_weights[0][0]          bidirectional_1[0][0]            attention_weights[1][0]          bidirectional_1[0][0]            attention_weights[2][0]          bidirectional_1[0][0]            attention_weights[3][0]          bidirectional_1[0][0]            attention_weights[4][0]          bidirectional_1[0][0]            attention_weights[5][0]          bidirectional_1[0][0]            attention_weights[6][0]          bidirectional_1[0][0]            attention_weights[7][0]          bidirectional_1[0][0]            attention_weights[8][0]          bidirectional_1[0][0]            attention_weights[9][0]          bidirectional_1[0][0]            
____________________________________________________________________________________________________
c0 (InputLayer)                  (None, 64)            0                                            
____________________________________________________________________________________________________
lstm_1 (LSTM)                    [(None, 64), (None, 6 33024       dot_1[0][0]                      s0[0][0]                         c0[0][0]                         dot_1[1][0]                      lstm_1[0][0]                     lstm_1[0][2]                     dot_1[2][0]                      lstm_1[1][0]                     lstm_1[1][2]                     dot_1[3][0]                      lstm_1[2][0]                     lstm_1[2][2]                     dot_1[4][0]                      lstm_1[3][0]                     lstm_1[3][2]                     dot_1[5][0]                      lstm_1[4][0]                     lstm_1[4][2]                     dot_1[6][0]                      lstm_1[5][0]                     lstm_1[5][2]                     dot_1[7][0]                      lstm_1[6][0]                     lstm_1[6][2]                     dot_1[8][0]                      lstm_1[7][0]                     lstm_1[7][2]                     dot_1[9][0]                      lstm_1[8][0]                     lstm_1[8][2]                     
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 11)            715         lstm_1[0][0]                     lstm_1[1][0]                     lstm_1[2][0]                     lstm_1[3][0]                     lstm_1[4][0]                     lstm_1[5][0]                     lstm_1[6][0]                     lstm_1[7][0]                     lstm_1[8][0]                     lstm_1[9][0]                     
====================================================================================================
Total params: 52,960
Trainable params: 52,960
Non-trainable params: 0

 通过以上的参数列表,我们看到attention_weights层的输出alphas的shape是(m,30,1)。我们使用attention_map()函数将注意力的值画出来。

attention_map = plot_attention_map(model, human_vocab, inv_machine_vocab, "Tuesday 09 Oct 1993", num = 7, n_s = 64)
#plt.savefig('fig.png', bbox_inches='tight')
plt.show()

从图中可以观察到预测值中每字符的注意力权重值, 可见注意力机制使得网络更关注输入数据中与正确输出最相关的部分。注意力机制模型,可以将Tx长度的输入转化为Ty长度的输出值。

这篇关于机器翻译 -- Neural Machine Translation的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1115982

相关文章

MonoHuman: Animatable Human Neural Field from Monocular Video 翻译

MonoHuman:来自单目视频的可动画人类神经场 摘要。利用自由视图控制来动画化虚拟化身对于诸如虚拟现实和数字娱乐之类的各种应用来说是至关重要的。已有的研究试图利用神经辐射场(NeRF)的表征能力从单目视频中重建人体。最近的工作提出将变形网络移植到NeRF中,以进一步模拟人类神经场的动力学,从而动画化逼真的人类运动。然而,这种流水线要么依赖于姿态相关的表示,要么由于帧无关的优化而缺乏运动一致性

ZOJ 3324 Machine(线段树区间合并)

这道题网上很多代码是错误的,由于后台数据水,他们可以AC。 比如这组数据 10 3 p 0 9 r 0 5 r 6 9 输出应该是 0 1 1 所以有的人直接记录该区间是否被覆盖过的方法是错误的 正确方法应该是记录这段区间的最小高度(就是最接近初始位置的高度),和最小高度对应的最长左区间和右区间 开一个sum记录这段区间最小高度的块数,min_v 记录该区间最小高度 cover

A Comprehensive Survey on Graph Neural Networks笔记

一、摘要-Abstract 1、传统的深度学习模型主要处理欧几里得数据(如图像、文本),而图神经网络的出现和发展是为了有效处理和学习非欧几里得域(即图结构数据)的信息。 2、将GNN划分为四类:recurrent GNNs(RecGNN), convolutional GNNs,(GCN), graph autoencoders(GAE), and spatial–temporal GNNs(S

idea中配置Translation插件完成翻译功能

文章目录 idea下载插件配置有道云阿里云百度翻译开放平台 idea下载插件 idea中安装Translation插件 使用方法:右下角选择翻译引擎,鼠标选中想翻译的部分,右键翻译即可 之前一直用的微软的翻译,不需要配置,但是最近微软服务器总是抽风,无法使用,故打算配置一下国内的翻译服务。 配置 有道云 只有初始的一点额度,用完就要收费了,不推荐

OpenSNN推文:神经网络(Neural Network)相关论文最新推荐(九月份)(一)

基于卷积神经网络的活动识别分析系统及应用 论文链接:oalib简介:  活动识别技术在智能家居、运动评估和社交等领域得到广泛应用。本文设计了一种基于卷积神经网络的活动识别分析与应用系统,通过分析基于Android搭建的前端采所集的三向加速度传感器数据,对用户的当前活动进行识别。实验表明活动识别准确率满足了应用需求。本文基于识别的活动进行卡路里消耗计算,根据用户具体的活动、时间以及体重计算出相应活

Convolutional Neural Networks for Sentence Classification论文解读

基本信息 作者Yoon Kimdoi发表时间2014期刊EMNLP网址https://doi.org/10.48550/arXiv.1408.5882 研究背景 1. What’s known 既往研究已证实 CV领域著名的CNN。 2. What’s new 创新点 将CNN应用于NLP,打破了传统NLP任务主要依赖循环神经网络(RNN)及其变体的局面。 用预训练的词向量(如word2v

Show,Attend and Tell: Neural Image Caption Generation with Visual Attention

简单的翻译阅读了一下 Abstract 受机器翻译和对象检测领域最新工作的启发,我们引入了一种基于注意力的模型,该模型可以自动学习描述图像的内容。我们描述了如何使用标准的反向传播技术,以确定性的方式训练模型,并通过最大化变分下界随机地训练模型。我们还通过可视化展示了模型如何能够自动学习将注视固定在显着对象上,同时在输出序列中生成相应的单词。我们通过三个基准数据集(Flickr9k,Flickr

Image Transformation can make Neural Networks more robust against Adversarial Examples

Image Transformation can make Neural Networks more robust against Adversarial Examples 创新点 1.旋转解决误分类 总结 可以说简单粗暴有效

吴恩达深度学习笔记:卷积神经网络(Foundations of Convolutional Neural Networks)1.9-1.10

目录 第四门课 卷积神经网络(Convolutional Neural Networks)第一周 卷积神经网络(Foundations of Convolutional Neural Networks)1.9 池化层(Pooling layers)1.10 卷 积 神 经 网 络 示 例 ( Convolutional neural network example) 第四门课

HDU1150/POJ1325_Machine Schedule(二分图/最小点覆盖=最大匹配)

解题报告 http://blog.csdn.net/juncoder/article/details/38147135 题目传送门(POJ) 题目传送门(HDU) 题意: A机器有n个模式,B机器有m个模式,每个作业可以在任何机器的特定模式下工作,转换模式需要耗时,求最小耗时 思路: 把AB两机器的模式当成二分图顶点,模式之间的连线就是某个作业可以在该两个模式下工作,就转换成求最小