CNN-LSTM模型中应用贝叶斯推断进行时间序列预测

2024-09-07 15:12

本文主要是介绍CNN-LSTM模型中应用贝叶斯推断进行时间序列预测,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

这篇论文的标题是《在混合CNN-LSTM模型中应用贝叶斯推断进行时间序列预测》,作者是Thi-Lich Nghiem, Viet-Duc Le, Thi-Lan Le, Pierre Maréchal, Daniel Delahaye, Andrija Vidosavljevic。论文发表在2022年10月于越南富国岛举行的国际多媒体分析与模式识别会议(MAPR)上。

摘要部分提到,卷积神经网络(CNN)和长短期记忆网络(LSTM)在各种任务中提供了最先进的性能。然而,这些模型在小数据上容易过拟合,并且不能测量不确定性,这对它们的泛化能力有负面影响。此外,由于时间序列数据集中复杂的长期波动,预测任务面临许多挑战。最近,引入了在深度学习中应用贝叶斯推断来估计模型预测中的不确定性。这种方法对过拟合具有很高的鲁棒性,并允许估计不确定性。在本文中,作者提出了一种新的方法,即使用贝叶斯推断在混合CNN-LSTM模型中,称为CNN-Bayes LSTM,用于时间序列预测。实验在两个真实的时间序列数据集上进行,即太阳黑子和天气数据集。实验结果表明,所提出的CNN-Bayes LSTM模型在均方根误差(RMSE)和平均绝对误差(MAE)以及不确定性量化方面比其他预测模型更有效。

关键词包括贝叶斯推断、时间序列数据集、不确定性量化。

论文的主要贡献包括:
- 将贝叶斯推断应用于CNN和LSTM结合的混合预测方法中,以更新超参数的权重。
- 使用CNN的一维卷积层提取空间特征,使用LSTM提取太阳黑子和天气数据集的时间特征。
- 将所提出的模型与统计模型(SARIMA和Prophet)和深度学习模型(LSTM、GRU、Transformer和Informer)进行比较,并说明如何计算时间序列数据集中模型的不确定性。

论文的结构包括相关工作的简要回顾、提出的模型介绍、两个关于太阳黑子和天气预测的实验结果描述,以及结论和未来工作的总结。

在相关工作部分,论文讨论了为了提高时间序列数据集的模型预测性能,许多研究者引入了多种统计和深度学习方法来处理不确定的复杂时间序列。统计方法如SARIMA、Prophet等可以通过映射原始数据和预测数据之间的关系来精确预测时间序列。深度学习方法如RNN、LSTM、GRU、Transformer等也被广泛用于模拟时间序列数据,因为它们能够收集时间信息。

在提出的方法部分,论文详细介绍了LSTM网络和在CNN-LSTM模型中应用贝叶斯推断的过程。LSTM网络是RNN的一个高级版本,能够学习短期和长期依赖关系。贝叶斯推断在CNN-LSTM模型中的应用包括使用CNN层提取空间数据,然后通过LSTM提取长期时间数据。此外,还介绍了如何量化不确定性,包括模型的不确定性和观测中的固有噪声。

实验结果部分展示了在太阳黑子和天气数据集上测试所提出模型的性能,并与其他模型进行了比较。实验结果显示,所提出的CNN-Bayes LSTM模型在RMSE和MAE方面优于现有方法,并且能够量化模型的不确定性。

结论和未来工作部分总结了论文的主要发现,并提出了未来的研究方向,包括在更高维度的数据集上测试模型,以及考虑更多因素。

参考文献部分列出了与时间序列预测相关的研究文献,包括统计方法、深度学习方法、贝叶斯推断在时间序列预测中的应用,以及不确定性量化的相关研究。

请注意,这是对论文内容的简要概述,具体的数学模型、实验细节和数据分析需要查阅原文以获得完整信息。如果您需要更详细的信息或对特定部分有疑问,请告知我,我可以进一步提供帮助。

HAL Id:HAL-04056437
https://hal.science/hal-04056437
2023年4月3日提交
HAL是一个多学科的开放获取
用于存放和传播科学研究文件的档案,无论这些文件是否公开。这些文件可能来自
法国的教学和研究机构或
国外或公共或私人研究中心。
多学科HAL档案馆
文件传播的命运
公共或非公共研究科学,
教育与发展机构管理局
法国实验室研究
公众或私人。
贝叶斯推理在混合CNN-LSTM中的应用
时间序列预测模型
Thi Lich Nghiem、越德、Thi Lan Le、Pierre Maréchal、Daniel
安德里贾·维多萨维耶维奇·德拉哈耶
引用此版本:
Thi Lich Nghiem、越共、Thi Lan Le、Pierre Maréchal、Daniel Delahaye等。申请
用于时间序列预测的混合CNN-LSTM模型中的贝叶斯推理。国际会议
多媒体分析和模式识别(MAPR),2022年10月,越南富国。法哈尔-
04056437及其后
贝叶斯推理在混合模型中的应用
时间序列的CNN-LSTM模型
预测
摘要——卷积神经网络(CNN)
长短期记忆(LSTM)在各种任务中提供了最先进的表现。然而,
这些模型在小数据上面临过拟合问题
并且无法测量不确定性,不确定性为负
影响他们的泛化能力。此外
由于以下原因,预测任务可能面临许多挑战
复杂的长期波动,尤其是在时间上
系列数据集。最近,应用贝叶斯推理
在深度学习中估计不确定性
引入了模型预测。这种方法可以
对过拟合具有高度鲁棒性,并允许进行估计
不确定性。在本文中,我们提出了一种在混合CNN-LSTM模型中使用贝叶斯推理的新方法,称为CNN-Bayes-LSTM,用于时间序列
预测。实验已在
两个实时序列数据集,即太阳黑子和
天气数据集。实验结果表明:
所提出的CNN-Bayes-LSTM模型在根方面比其他预测模型更有效
均方误差(RMSE)和平均绝对误差
(MAE)以及用于不确定性量化。
索引项——贝叶斯推理;时间序列
数据集;不确定性量化
一.导言
时间序列预测是一个研究领域
日益增长的兴趣被广泛应用于各种
经济、生物医学、工程、天文学、天气预报、空中交通等应用
管理。时间序列预测的目的
是预测动态系统的未来状态
根据对先前状态的观察[1]。然而,
在大量的预测问题中,我们
必须面对不确定性、非线性、混沌行为和非平稳性,这会恶化
模型的预测精度。
为了解决这些问题,已经提出了许多方法。它们通常可分为两类:统计
方法和深度学习方法。统计方法,如SARIMA[2]、Prophet[3]
可以通过利用
在深度学习方法(如
LSTM,Transformer可以用丰富的时态模式对数据进行建模,并学习高级表示
特征和相关非线性函数
依靠专家来选择使用哪些手工制作的功能[1],[4]。
除了评估性能预测之外,
不确定度的量化被认为是
决策最重要的方面
过程[5]。为了量化模型的不确定性,许多研究人员使用贝叶斯推理
估计预测模型中的不确定性
根据概率分布。因此,它可以
对过拟合具有高度鲁棒性,易于学习
次要数据集。在贝叶斯框架中
后验分布提供了以下所有信息
未知参数。贝叶斯推理
不同的技术,如马尔可夫链蒙特卡罗
Carlo、拉普拉斯近似、期望传播、变分推理已被用于量化
时间序列数据预测中的不确定性,例如
太阳黑子数据集[6]、[7]、天气数据集[8]、[9]等。
在这项研究中,我们建议在CNN和LSTM之间的混合模型中使用贝叶斯推断。
我们测试了两个真实的数据集,即太阳黑子和
天气数据集。此外,我们还比较了
提出的模型对统计模型和深度
学习模型以及不确定性量化。本文的主要贡献是
总结如下:
•我们应用贝叶斯推理来更新
结合CNN和LSTM的混合预测方法中超参数的权重。
我们使用CNN的1D卷积层来
提取空间特征和LSTM来解释太阳黑子的时间特征
天气数据集。
•我们还比较了预测性能
所提出的模型与统计模型
(SARIMA和Prophet)和深度学习模型(LSTM、GRU、Transformer和Informer)。
1.
•最后,我们说明了计算
时间序列数据集中使用的模型不确定性。
我们论文的其余部分结构如下:
第二节简要回顾了相关工作
用于时间序列预测。第三节介绍
第四节描述了我们提出的模型
两项太阳黑子研究的实验结果
以及天气预报。结论和未来
第五节对工作进行了总结。
二、相关作品
改进模型的性能预测
对于时间序列数据集,许多研究人员介绍了
几种攻击不确定复杂时间序列的统计、深度学习。
统计方法可以预测时间序列
精确地映射两者之间的关系
原始数据和预测数据。这些模型
包括ARIMA方法家族,例如
AR,ARMA,ARIMA,随机游走,SARIMA[2],
先知[3]等。而SARIMA将描述
基于先验观测的时间序列中的当前值
通过添加三个新的超参数来确定AR、移动平均和区分数据
术语以及
季节间隔,先知是一个更当前的时间
序列预测方法。虽然这种方法
与SARIMA有一些相似之处,它模拟了
时间序列的趋势与季节性结合
更可配置的灵活性。在Prophet方法中,
趋势、季节性和假期是三个因素
主要功能,选择假期进行更改
预言。
深度学习已被证明在计算机视觉、计算机游戏、多媒体和大数据相关挑战中非常有效。深度学习
这些方法也被广泛用于时间建模
系列数据。由于其收集能力
时间信息,RNN已被证明在
预测时间序列[10]。许多研究人员使用
深度学习方法,
GRU[11]、[12]、Transformer[13]或CNN模型
预测时间序列数据集中的时间信息。
[14] 建议使用递归Levenberg-Marquardt
贝叶斯神经网络预测电力现货价格
以及计算模型的不确定性。
其他研究人员使用CNN预测风力发电
[15] ,LSTM预测风速[16]、天气[8],
[9] ,太阳黑子[10],[17],[18],或结合CNN和
LSTM[19]、[20]、RNN和LSTM[21]用于预测
时间序列数据集中的输出。最近,在2021年,
周提出了一种名为“告密者”的新方法
使用长输入时处理大量内存
序列[22]。这种方法是一种改进
变压器方法[13]。In-former的主要思想是使用ProbSparse技术进行选择
只使用KullbackᦆLeibler进行最关键的查询。因此,它可以降低时间复杂度和
内存使用。
III、 提出的方法
A.长短期记忆
LSTM网络是RNN的高级版本
Hochreiter于1997年提出[23]。它被应用
由于具有学习长短依赖关系的能力,因此使用起来非常有效。网络的
(因此RNN)默认行为是记住
信息很长一段时间。RNN采用以下形式
NN模块的重复序列。在RNN中,
这些模块的结构非常简单
tanh层。但问题是RNN无法处理
长期依赖性,LSTM旨在防止
这个问题。LSTM也有字符串结构。
LSTM有四层,而不是一个NN层
它们相互作用(见图1)。
LSTM的主要思想是细胞的状态是
由顶部的水平线(红线)描绘,
从Ct-1到Ct。细胞状态就像一个旋转木马
直接穿过整个链条,只需
一些小的线性相互作用。这相对容易
信息保持不变。
LSTM有能力删除或添加信息到细胞状态,这由
称为门的结构。大门是一种可选方式
以便信息传递。它们是由
一层sigmoid神经网络和一个逐点乘法算子。sigmoid层的输出
是[0,1]中的数值,用于描述
每个组件的吞吐量。0和1值的平均值
“什么都不让”和“什么都让”,
分别。LSTM有三个S形门
保护和控制细胞状态,包括
忘记输入门和输出门。
因此,这允许重置长期记忆
克服消失和爆炸梯度
问题。
B.CNN-LSTM模型中的贝叶斯推理
所提出的模型名为CNN-Bayes-LSTM
如图2所示,它有两个主要部分:CNN
(提取空间数据)和贝叶斯LSTM(提取
长期时间数据)。在数据准备之后,
通过使用
CNN层。

图1:LSTM架构[23]。
然后,这些提取的低维特征
通过LSTM链接在一起以提取
从数据中提取时间特征。在本部分中,我们使用
贝叶斯推理优化超参数
以及我们模型的架构和量化
模型权重的抽样不确定性
它们来自可训练参数化的分布
每个前馈操作的变量。贝叶斯LSTM架构中的去尾过程
如图3所示。最后,一个完全连接
添加层以确定输出。
图2:CNN-Bayes-LSTM框架
用于时间序列预测。
在本文中,我们表示X和Y是输入
以及框架的输出。
•输入:X=(Xt,Xt+1,…,Xt+N-1),其中
Xt是时间t的观测样本,其中N
是样本数。
•输出:Y=(Yt+N,Yt+N+1,…,Yt+2N−1)为
预测值。
例如,我们使用前五个月的温度来预测未来,因此N=5。在
第一次,t=1,我们有X=(X1,X1,…,X5)
对应于1月至5月的五个月,
则输出Y=(Y6,Y7,…,Y10)对应于6月至10月的五个月。
贝叶斯神经网络的目的是
而不是使用确定性权重对其进行采样
图3:贝叶斯LSTM流程图
对于概率分布,然后优化
分布参数。通过这种方法,可以测量以下方面的置信度和不确定性
预言。在贝叶斯LSTM中,我们可以计算
权重和偏差采样如下:
•在i处取样的重量
第四次
通过以下公式计算层的位置N:
W
(i)
n
)
)=N(0,1)*log 1+ρ
(i)
w
+ µ
(i)
w
1.
•在i处采样的偏差
第四次在岗位上
该层的N由以下公式表示:
b
(i)
(n) =n(0,1)*log 1+ρ
(i)
b
+ µ
(i)
b
2.
其中ρ和µ是输入特征标准
偏差和输入特征均值。
在该阶段,贝叶斯优化计算
目标函数的后验分布
贝叶斯推理,其中下一个超参数
从该分布中选择组合。这个
使用先前的采样信息来查找
目标函数与超参数的排序
3.
以最大化目标输出。如果过程不是
很好,我们再次考虑超参数
以及LSTM模型中的架构。否则,
我们将使用这些权重来预测未来
评估模型。在我们提出的模型中,为了识别重要的LSTM超参数值,贝叶斯
使用优化。
C.不确定性量化
在评估预测不确定性之前
识别两种不确定性所必需的
(任意性和认知性)和适当的解决方案来减少它们。第一种不确定性
是认知的。它指的是模型的不确定性,因为
模型在特征方面缺乏知识
存在微小数据的输入空间,例如
数据稀疏性、偏差等[24]。可以通过以下方式减少
收集足够的数据。我们可以实现一个模型的
通过估计其认知不确定性来确定置信区间。第二种不确定性是任意的。
它本质上是观测中固有的噪声
例如由于任一传感器噪声而导致的输入依赖性
或沿数据集均匀的运动噪声。
即使有更多数据,也无法减少
收集。我们可以计算预测区间
通过估计任意和认知不确定性
[25], [26]. 置信区间可能更窄
比预测区间大。
IV、 实验结果
A.数据集
评估我们提出的
我们在两个实时序列数据集上进行了测试,
即太阳黑子和天气数据集。
1) 太阳黑子数据集:收集太阳黑子数据
1749年1月至2022年2月
在比利时皇家天文台工作。这个
数据可在世界数据中心SILSO获得
网站[27]。我们研究中使用的数据集包括每月平均太阳黑子总数的3278个样本,包括日期和月平均值
太阳黑子数量信息。它分为
两组,包括2294和984个训练样本
以及测试集。对于预测
数据可以分为固定的时间段
或太阳周期。太阳周期和普通年份
在数据集中没有区分。因此
数据集仅使用太阳黑子的平均数量
在那个月看到的。
2) 天气数据集:我们使用的天气数据集
这项研究包括1月份以来孟加拉国每月1380个平均温度值样本
1901年至2015年12月。此数据可在
Kaggle网站[28]。它分为两组,
包括965和415个训练样本
分别使用测试集。
图4显示了月平均值的趋势
1901年以来太阳黑子数和平均温度
到1905年。图5显示了月平均值
整个两个太阳黑子的总数和温度
数据集。
B.评价
为了评估模型预测的性能,我们在预测中使用了两个评估指标
任务,包括RMSE和MAE。RMSE用于
测量预测中的误差幅度
并计算为差值的二次平均值
在预测值和观测值之间,称为
预测误差。MAE是模型的度量
与测试集相关的性能。它捕获
作为个体绝对值的平均值
测试集中所有时刻的预测错误。
RMSE和MAE的定义如下:
RMSE=
v
u
u t 1
n
nX
i=1
(毕)
2.
; MAE=1
n
nX
i=1
|比易
|
3.
其中b yi和yi是观测值和预测值
时间步长i的值,n是样本的长度
数据。
C.经验结果
表一显示了拟议的
方法。此外,为了展示
提出的模型,我们比较了提出的模型
与其他型号,即SARIMA[2]、Prophet
[3] 变压器[13]、通知器[22]、LSTM[23],
以及GRU[11]型号。结果表明,我们
该模型的表现优于其他模型,为26.10
太阳黑子数据集的RMSE和MAE分别为18.74和18.74。
在天气数据集上,通过提出的方法获得的RMSE值为2.23,MAE值为1.64
分别。可以清楚地看到,有一个
统计模型之间RMSE值存在较大差距
以及深度学习模型。在太阳黑子数据集中,虽然
所有使用的深度学习模型都具有RMSE值
在50以下,MAE值从22以上到40以下,
特别是Informer模型在RMSE上获得了29.90
在MAE为22.25,统计模型有50多个
在RMSE值和MAE值中,SARIMA
4.
a ____ b
图4:1901年至1905年月平均太阳黑子数(a)和平均温度(b)的趋势
两个数据集。
(a) 1749年至2022年2月的月太阳黑子数(b)1901年至2015年2月月的月平均温度
图5:整个两个数据集中的月平均太阳黑子数(a)和平均温度(b)的趋势。
表一:拟议方法和
两个数据集上最先进的方法。2
最好的结果是粗体。
预测模型太阳黑子数据集天气数据集
我,我,我
莎丽玛[2]54.11 45.51--
先知[3]60.15 56.09--
变压器[13]33.99 25.26 2.10 1.43
告密者[22]29.90 22.35 2.32 1.82
LSTM[23]46.14 39.44 2.32 1.75
格鲁[11]37.14 26.77 4.44 3.43
拟议模型26.10 18.74 2.23 1.64
Prophet模型的得分为54.11,而Prophet为33.99
RMSE和MAE分别为45.51和56.09。
有趣的是,在Sunspot数据集上,提出了一种方法
在RMSE方面表现突出
和MAE值,它比众所周知的要好得多
知情者模型(RMSE为25.95,RMSE为29.90)
分别为18.61和23.35)。在天气数据集中,
RMSE和MAE值略低于
变压器(RMSE为2.23比2.10
分别为1.64和1.43)。我们的结果仍然是
高于Informer、LSTM等其他型号
以及GRU型号。此外,所提出的模型
可以计算认知不确定性。有
GRU与拟议的
使用贝叶斯推理时的模型。数字
6给出了模型在两个数据集上的认知方差估计。这些模型之间更有趣的差异。我们比较GRU和
从三个方面提出了模型,包括现实
数据、预测数据和认知不确定性
对应于红线、绿线和
分别为浅蓝色线。95%的信心
太阳黑子数和温度的间隔
从众多预测中获得的两个模型
如图6所示
我们提出的模型捕捉到了
整个两个数据集中的预测归一化值
而GRU有时无法在两个数据集中捕捉到这一点(红色圆圈)。
五、结论和今后的工作
在这篇论文中,我们提出了一种新的方法
在混合CNN-LSTM中使用贝叶斯推理
时间序列的CNN-Bayes-LSTM模型
预测。我们评估了我们提出的性能预测和不确定性量化
模型,并与照明温度下的六个模型进行比较,包括SARIMA、Prophet、Transformer、,
时间序列数据集中的Informer、LSTM和GRU
预测。实验结果表明
所提出的CNN-Bayes-LSTM在RMSE方面比现有方法具有更好的性能
MAE值和不确定性量化
模型。然而,我们只使用了1D CNN


 

HAL Id: hal-04056437
https://hal.science/hal-04056437
Submitted on 3 Apr 2023
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Applying Bayesian inference in a hybrid CNN-LSTM
model for time series prediction
Thi-Lich Nghiem, Viet-Duc Le, Thi-Lan Le, Pierre Maréchal, Daniel
Delahaye, Andrija Vidosavljevic
To cite this version:
Thi-Lich Nghiem, Viet-Duc Le, Thi-Lan Le, Pierre Maréchal, Daniel Delahaye, et al.. Applying
Bayesian inference in a hybrid CNN-LSTM model for time series prediction. International Conference
on Multimedia Analysis and Pattern Recognition (MAPR), Oct 2022, Phu Quoc, Vietnam. ffhal-
04056437ff
Applying Bayesian inference in a hybrid
CNN-LSTM model for time series
prediction
Abstract—Convolutional neural networks (CNN)
and Long short-term memory (LSTM) provide stateof-the-art performance in various tasks. However,
these models are faced with overfitting on small data
and cannot measure uncertainty, which have a negative
effect on their generalization abilities. In addition, the
prediction task can face many challenges because of
the complex long-term fluctuations, especially in time
series datasets. Recently, applying Bayesian inference
in deep learning to estimate the uncertainty in the
model prediction was introduced. This approach can
be highly robust to overfitting and allows to estimate
uncertainty. In this paper, we propose a novel approach using Bayesian inference in a hybrid CNNLSTM model called CNN-Bayes LSTM for time series
prediction. The experiments have been conducted on
two real time series datasets, namely sunspot and
weather datasets. The experimental results show that
the proposed CNN-Bayes LSTM model is more effective than other forecasting models in terms of Root
Mean Square Error (RMSE) and Mean Absolute Error
(MAE) as well as for uncertainty quantification.
Index Terms—Bayesian inference; time series
dataset; uncertainty quantification
I. INTRODUCTION
Time series prediction is a field of research with
increasing interest that is broadly used in various
applications such as economy, bio-medicine, engineering, astronomy, weather forecast, air traffic
management. The purpose of time series prediction
is to predict the future state of a dynamic system
from the observation of previous states [1]. However,
in a significant number of prediction problems, we
have to face uncertainty, non-linearity, chaotic behaviors and non-stationarity, which deteriorates the
prediction accuracy of the model.
In order to deal with these issues, many approaches have been proposed. They can be generally categorized into two types: the statistical
approach and the deep learning approach. Statistical approaches such as SARIMA [2], Prophet [3]
can predict time series precisely by exploiting the
relationship between the original data and the predicted states while deep learning approaches such as
LSTM, Transformer can model data with rich temporal patterns and learn high-level representations of
features and associated nonlinear functions without
relying on experts to select which of the manuallycrafted features to employ [1], [4].
Besides evaluating the performance prediction,
quantification of uncertainty is considered as one of
the most important aspects of the decision-making
process [5]. In order to quantify the model’s uncertainty, many researchers use Bayesian inference
to estimate the uncertainty in the prediction model
from probability distributions. As the result, it can
be highly robust to overfitting and easily learn from
minor datasets. In the Bayesian framework, the
posterior distribution provides all information about
the unknown parameters. Bayesian inference with
different techniques such as Markov Chain Monte
Carlo, Laplace approximation, expectation propagation, variational inference have been used to quantify
the uncertainty in time series data prediction such as
sunspot dataset [6], [7], weather dataset [8], [9], etc.
In this study, we propose to use Bayesian inference in a hybrid model between CNN and LSTM.
We test on two real datasets, namely sunspot and
weather datasets. In addition, we also compare the
proposed model to the statistical models and deep
learning models as well as uncertainty quantification. The main contributions of this paper are
summarised as follows:
• We apply a Bayesian inference to update the
weight of hyper-parameters in a hybrid prediction method that combines CNN and LSTM.
We use 1D convolutional layer of CNN to
extract the spatial features and LSTM to extract the temporal features of the sunspot and
weather datasets.
• We also compare the prediction performance
of proposed model with statistical models
(SARIMA and Prophet) and deep leaning models (LSTM, GRU, Transformer, and Informer).
1
• Finally, we illustrate the way to calculate the
model’s uncertainty used in time series dataset.
The rest of our paper is structured as follows:
Section II provides brief review of relevant works
for time series prediction. Section III introduces
our proposed model while Section IV describes
the experimental results of two studies on sunspots
and weather prediction. The conclusions and future
works are summarized in Section V.
II. RELATED WORKS
To improve the models’ performance prediction
for time series dataset, many researchers introduced
several statistical, deep learning which attack uncertain complex time series.
Statistical approaches could predict time series
precisely by mapping the relationship among both
original data and predicted data. These models
include the ARIMA family of methods such as
AR, ARMA, ARIMA, Random Walk, SARIMA [2],
Prophet [3], etc. While SARIMA is to describe the
current value in a time series based on prior observed
data by adding three new hyper-parameters to determine the AR, moving average and distinguishing
terms as well as an additional parameter for the
seasonal interval, Prophet is a more current time
series predicting method. Although this approach
has some similarities to SARIMA, it models the
trend and seasonality of time series by combining
more configurable flexibility. In Prophet approach,
the trend, seasonality, and holiday are the three
main features, and holiday is selected to change
predictions.
Deep learning has proven to be extremely effective in computer vision, computer gaming, multimedia, and big data-related challenges. Deep learning
approaches are also widely used to model time
series data. Because of their capacity to collect
temporal information, RNNs have proven useful in
forecasting time series [10]. Many researchers used
deep learning approaches such as RNN, LSTM,
GRU [11], [12], Transformer [13] or CNN models to
forecast temporal information in time series dataset.
[14] proposed to use recursive Levenberg-Marquardt
Bayesian in RNN to forecast electricity spot prices
as well as compute the uncertainty of the model.
Other researchers used CNN to predict wind power
[15], LSTM to predict wind speed [16], weather [8],
[9], sunspot [10], [17], [18], or combine CNN and
LSTM [19], [20], RNN and LSTM [21] to forecast
the output in time series datasets. Recently, In 2021,
Zhou proposed a novel approach called Informer
to deal with heavy memory when using long input
sequences [22]. This approach is an improvement
of Transformer approach [13]. The main idea of Informer is to use a ProbSparse technique in selecting
only the most crucial queries by using KullbackLeibler. So it can decrease the time complexity and
memory useage.
III. PROPOSED METHOD
A. Long Short-Term Memory
LSTM network is a advanced version of RNN
proposed by Hochreiter in 1997 [23]. It is applied
very effectively used due to the capability of learning short and long dependencies. The network’s
(and so RNN) default behavior is to remember
information for a long time. RNNs take the form
of a repeating sequence of NN modules. In RNN,
these modules have a very simple structure, just a
tanh layer. But the issue is that RNN cannot process
long-term dependency, LSTM is intended to prevent
this problem. LSTMs also have a string structure.
Instead of a single NN layer, LSTM has four layers
which interact with each other (seen in Figure 1).
The main idea of LSTM is that the cells’ state is
depicted by the horizontal line (red line) at the top,
from Ct−1 to Ct. The cell state is like a carousel
running straight through the whole chain with only
a few small linear interactions. It is relatively easy
for information to remain unaltered.
LSTMs have the ability to remove or add information to the cell state, which is carefully regulated by
structures called gates. The gate is an optional way
for information to pass through. They are composed
of a layer of sigmoid NN and a point-wise multiplication operator. The output of the sigmoid layer
are the number values in [0, 1], which describe the
throughput of each component. 0 and 1 values mean
”let nothing through” and ”let everything through”,
respectively. An LSTM has three sigmoid gates
to protect and control the cell state, including the
forget, the input, and the output gates.
Hence this allows long-term memory to be reset
and overcome the vanishing and exploding gradient
problems.
B. Bayesian inference in a CNN-LSTM model
The proposed model named CNN-Bayes LSTM
that is illustrated in Fig. 2 has two main parts: CNN
(extract the spatial data) and Bayes LSTM (extract
long-term temporal data). After the data preparation,
high level spatial features can be extracted by using
a CNN layer.

2
Fig. 1: The LSTM architecture [23].
Then, these extracted low-dimensional features
are linked together via a LSTM to extract the
temporal features from the data. In this part, we use
Bayesian inference to optimize the hyper-parameters
and architecture of our model as well as quantify
the model’s uncertainty on its weights by sampling
them from a distribution parameterized by trainable
variables on each feed-forward operation. The detailed process in the Bayesian LSTM architecture
is illustrated in Figure 3. Finally, a fully-connected
layer is added to determine the output.
Fig. 2: The CNN-Bayes LSTM framework proposed
for time series prediction.
In this paper, we denote X an Y are the input
and output of the framework.
• Input: X = (Xt, Xt+1, . . . , Xt+N−1) where
Xt is the observed sample at time t with N
is the number of samples.
• Output: Y = (Yt+N , Yt+N+1, . . . , Yt+2N−1) is
the predicted values.
For example, we use the temperature of five previous months to predict the future, so N = 5. At the
first time, t = 1, we have X = (X1, X1, . . . , X5)
corresponding to five months from January to May,
then the output Y = (Y6, Y7, . . . , Y10) corresponding to five months from June to October.
The purpose of Bayesian neural network is rather
than having deterministic weights to sample them
Fig. 3: The Bayesian LSTM flow chart
for a probability distribution and then optimize the
distribution parameters. By this approach, it is possible to measure confidence and uncertainty over
predictions. In Bayesian LSTM, we can calculate
the weights and biases sampling as follows:
• The weight sampled at the i
th time on the
position N of the layer by the formula:
W
(i
(n
)
) = N (0, 1) ∗ log 1 + ρ
(i)
(w)
+ µ
(i)
(w)
(1)
• The bias sampled at the i
th time on the position
N of the layer by the formula:
b
(i)
(n) = N (0, 1) ∗ log 1 + ρ
(i)
(b)
+ µ
(i)
(b)
(2)
Where ρ and µ are the input feature standard
deviation and the input feature mean, respectively.
At the stage, Bayesian optimization calculate the
posterior distribution of objective function by using
Bayesian inference where the next hyper-parameter
combination is selected from this distribution. The
previous sampling information is used to find the
objective function and the hyper-parameters in order
3
to maximize the target output. If the process is not
good, we return to consider the hyper-parameters as
well as the architecture in LSTM model. Otherwise,
we go to use these weights to predict the future and
evaluate the model. In our proposed model, to identify vital LSTM hyper-parameter values, Bayesian
optimization is used.
C. Uncertainty quantification
Before evaluating forecasting uncertainty, it is
necessary to identify the two types of uncertainty
(aleatoric and epistemic) and the appropriate solution to decrease them. The first type of uncertainty
is epistemic. It refers to model’s uncertainty because
of the lacking of model’s knowledge in features of
the input space where there is tiny data such as
data sparsity, bias, etc. [24]. It can be reduced by
gathering enough data. We can achieve a model’s
confidence interval by estimating its epistemic uncertainty. The second type of uncertainty is aleatoric.
It is essentially a noise inherent in the observations
such as input-dependent due to either sensor noise
or motion noise which is uniform along the dataset.
It cannot be decreased even when more data is
collected. We may calculate the prediction interval
by estimating the aleatoric and epistemic uncertainty
[25], [26]. The confidence interval may be narrower
than the prediction interval.
IV. EXPERIMENTAL RESULTS
A. Dataset
To evaluate the performance of our proposed
model, we test on two real time series datasets,
namely sunspot and weather datasets.
1) Sunspot dataset: Sunspot dataset is collected
from January 1749 to February 2022 by the research
working in the Royal Observatory of Belgium. This
data is available at the World Data Center SILSO
website [27]. The dataset used in our research includes 3278 samples of averaged total sunspot number per month with the dates and the monthly mean
number of sunspots information. It is divided into
two sets, including 2294 and 984 samples in training
and testing sets, respectively. For the forecasting, the
data can be classified into either a fixed time period
or a solar cycle. Solar cycles and ordinary years
are not distinguished in the dataset. As a result, the
dataset only uses the averaged number of sunspots
seen in that month.
2) Weather dataset: Weather dataset used in our
research includes 1380 samples of the mean temperature values per month in Bangladesh from January
1901 to December 2015. This data is available at
Kaggle website [28]. It is divided into two sets,
including 965 and 415 samples in training and
testing sets, respectively.
Figure 4 illustrates the trend of monthly mean
sunspots number and mean temperature from 1901
to 1905. Figure 5 illustrates the monthly mean
total sunspot number and temperature in entire two
datasets.
B. Evaluation
To evaluate the performance of the model’s prediction, we use two evaluation metrics in forecasting
task, including RMSE and MAE. RMSE is used to
measure the magnitude of errors in the prediction
and is calculated as quadratic mean of the difference
between predicted value and observed value, called
prediction error. MAE is a measure of a model’s
performance in relation to a test set. It captures
as the average of the absolute values of individual
prediction mistakes across all instants in the test set.
RMSE and MAE are defined as follows:
RMSE =
v
u
u t 1
n
nX
i=1
(byi − yi)
2
; MAE = 1
n
nX
i=1
|byi − yi
|
(3)
where b yi and yi are the observed and predicted
values at time step i, n is the length of the sample
data.
C. Empirical Results
Table I shows the results obtained by the proposed
method. Furthermore, to show the robustness of the
proposed model, we compare the proposed model
with others models, namely SARIMA [2], Prophet
[3], Transformer [13], Informer [22], LSTM [23],
and GRU [11] models. The results show that, our
proposed model has outperformed others with 26.10
of RMSE and 18.74 of MAE for sunspot dataset.
On weather dataset, the value obtained by the proposed method is 2.23 for RMSE and 1.64 for MAE
respectively. It can be clearly seen that, there is a
big gap in RMSE values between statistical models
and deep learning model. In sunspot dataset, while
all used deep learning models have RMSE values
under 50 and MAE values from over 22 to under 40,
especially Informer model obtained 29.90 at RMSE
and 22.25 at MAE, statistical models have over 50
in both RMSE values and MAE values, SARIMA
4
(a) (b)
Fig. 4: The trend of monthly mean sunspots number (a) and mean temperature (b) from 1901 to 1905 in
two datasets.
(a) The monthly sunspot number from 1749 to February 2022 (b) The monthly mean Temperature from 1901 to February 2015
Fig. 5: The trend of monthly mean sunspots number (a) and mean temperature (b) in entire two datasets.
TABLE I: Comparison of the proposed method and
the state-of-the-art methods on two datasets. Two
best results are in bold.
Forecasting models Sunspots dataset Weather dataset
RMSE MAE RMSE MAE
SARIMA [2] 54.11 45.51 - -
Prophet [3] 60.15 56.09 - -
Transformer [13] 33.99 25.26 2.10 1.43
Informer [22] 29.90 22.35 2.32 1.82
LSTM [23] 46.14 39.44 2.32 1.75
GRU [11] 37.14 26.77 4.44 3.43
Proposed model 26.10 18.74 2.23 1.64
and Prophet models obtained 54.11 versus 33.99 of
RMSE and 45.51 versus 56.09 of MAE respectively.
Interestingly, on Sunspot dataset, proposed method
had an outstanding performance in regarding RMSE
and MAE values, it is much better than well-known
Informer model (25.95 versus 29.90 in RMSE and
18.61 versus 23.35 respectively). In weather dataset,
the RMSE and MAE values are slightly lower than
that of Transformer (2.23 versus 2.10 in RMSE and
1.64 versus 1.43 respectively). Our result still is
higher than other models such as Informer, LSTM
and GRU models. In addition, the proposed model
can calculate the epistemic uncertainty. There are
some differences between GRU and the proposed
models when using Bayesian inference. The Figures
6 are presented the models’ epistemic variance estimation on two datasets. It is more interesting differences between these models. We compare GRU and
proposed models in three aspects, including the real
data, the predicted data and the epistemic uncertainty
corresponding to the red line, the green line and
the light blue line, respectively. The 95% confidence
intervals for the sunspot number and the temperature
of two models obtained from numerous predictions
are illustrated in this figure. Figure 6 shows that
our proposed model captures the variation of the
predicted normalized value in the entire two datasets
whereas GRU sometimes fails to capture this measure in both datasets (the red circles).
V. CONCLUSIONS AND FUTURE WORKS
In this paper, we have proposed a novel approach
using Bayesian inference in a hybrid CNN-LSTM
model called CNN-Bayes LSTM for time series
prediction. We evaluated the performance prediction and uncertainty quantification of our proposed
model and compared with six models in the literature, including SARIMA, Prophet, Transformer,
Informer, LSTM, and GRU in time series dataset
forecasting. Experimental results have shown that
proposed CNN-Bayes LSTM achieves better performance than existing methods in term of RMSE and
MAE values as well as the uncertainty quantification
of the model. However, we only used 1D CNN
5
(a) sunspot data
(b) weather data
Fig. 6: Models’ uncertainty quantification in two datasets.
and one factor such as the sunspot number and the
temperature. It is interesting idea if we can test on
many factors in high dimension dataset (3D or 4D).
REFERENCES
[1] Li, F. Zhang, L. Gao, Y. Liu, and X. Ren, “A novel
model for chaotic complex time series with large of data
forecasting,” Knowledge-Based Systems, vol. 222, 2021.
[2] Box and Jenkins, “Time series analysis: Forecasting and
control,” Holden-Day Series in Time Series Analysis, pp.
161–215, 1976.
[3] S. Taylor and B. Letham, “Forecasting at scale,” The American Statistician, vol. 72, no. 1, pp. 37–45, 2018.
[4] Y. Dang, Z. Chen, H. Li, and H. Shu, “A comparative study
of non-deep learning, deep learning, and ensemble learning
methods for sunspot number prediction,” arXiv, 2022.
[5] M.Abdar, F. Pourpanah, S. Hussain, and R. D. .etc, “A
review of uncertainty quantification in deep learning: Techniques, applications and challenges,” CoRR, 2020.
[6] M. Sophie, R. Sachs, C. Ritter, V. Delouille, and L. Laure,
“Uncertainty quantification in sunspot counts,” The Astrophysical Journal, vol. 886, no. 1, pp. 1–14, 2019.
[7] M. Atencia, R. Stoean, and G. Joya, “Uncertainty quantification through dropout in time series prediction by echo
state networks,” Mathematics, vol. 8, no. 8, 2020.
[8] A. Shafin, “Machine learning approach to forecast average
weather temperature of bangladesh,” Global Journal of
Computer Science and Technology: Neural & Artificial
Intelligence, vol. 19, pp. 39–48, 2019.
[9] T. Siddique, S. Mahmud, A. Keesee, C. Ngwira, and
H. Connor, “A survey of uncertainty quantification in machine learning for space weather prediction,” Geosciences,
vol. 12, no. 1, 2022.
[10] R. Chandra, S. Goyal, and R. Gupta, “Evaluation of deep
learning models for multi-step ahead time series prediction,”
arXiv:2103.14250, 2021.
[11] K. Cho, B. Merrienboer, D. Bahdanau, and Y. Bengio,
“On the properties of neural machine translation: Encoderdecoder approaches,” CoRR, 2014.
[12] J. Chung, C¸ . G¨ulc¸ehre, K. Cho, and Y. Bengio, “Empirical
evaluation of gated recurrent neural networks on sequence
modeling,” CoRR, 2014.
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is
all you need,” CoRR, 2017.
[14] D. Mirikitani and N. Nikolaev, “Recursive bayesian recurrent neural networks for time-series modeling,” Transactions on Neural Networks, vol. 21, no. 2, pp. 262–274, 2010.
[15] K. Amarasinghe, D. L. Marino, and M. Manic, “Deep neural
networks for energy load forecasting,” in International
Symposium on Industrial Electronics, 2017, pp. 1483–1488.
[16] J. Wang and Y. Li, “Multi-step ahead wind speed prediction
based on optimal feature extraction, long short term memory neural network and error correction strategy,” Applied
Energy, vol. 230, pp. 429–443, 2018.
[17] Z. Pala and R. Atici, “Forecasting sunspot time series using
deep learning methods,” Solar Physics, pp. 1–14, 2019.
[18] T. Khan, F. Arafat, U. Mojumdar, A. Rajbongshi, T. Siddiquee, and R. Chakraborty, “A machine learning approach
for predicting the sunspot of solar cycle,” in International
Conference on Computing, Communication and Networking
Technologies, 2020, pp. 1–4.
[19] X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and
W. Woo, “Convolutional LSTM network: A machine learning approach for precipitation nowcasting,” CoRR, 2015.
[20] C.-J. Huang and P.-H. Kuo, “A deep CNN-LSTM model
for particulate matter (PM2.5) forecasting in smart cities,”
Sensors, vol. 18, no. 7, 2018.
[21] Y. Sudriani, I. Ridwansyah, and H. A. Rustini, “Long short
term memory (LSTM) recurrent neural network (RNN) for
discharge level prediction and forecast in cimandiri river, indonesia,” IOP Conference Series: Earth and Environmental
Science, vol. 299, 2019.
[22] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and
W. Zhang, “Informer: Beyond efficient transformer for long
sequence time-series forecasting,” CoRR, 2021.
[23] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780,
1997.
[24] Y. Dar, V. Muthukumar, and R. G. Baraniuk, “A farewell
to the bias-variance tradeoff? an overview of the theory of
over-parameterized machine learning,” arxiv, 2021.
[25] B. Kappen and S. Gielen, “Practical confidence and prediction intervals for prediction tasks,” Prog. Neural Process,
vol. 8, pp. 128–135, 1997.
[26] A. Kendall and Y. Gal, “What uncertainties do we need in
bayesian deep learning for computer vision?” CoRR, 2017.
[27] SILSO World Data Center, “The international sunspot number,” International Sunspot Number Monthly Bulletin and
online catalogue, 1749–2022.
[28] www.kaggle.com/yakinrubaiat/bangladeshweather-dataset.
6

这篇关于CNN-LSTM模型中应用贝叶斯推断进行时间序列预测的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1145406

相关文章

大模型研发全揭秘:客服工单数据标注的完整攻略

在人工智能(AI)领域,数据标注是模型训练过程中至关重要的一步。无论你是新手还是有经验的从业者,掌握数据标注的技术细节和常见问题的解决方案都能为你的AI项目增添不少价值。在电信运营商的客服系统中,工单数据是客户问题和解决方案的重要记录。通过对这些工单数据进行有效标注,不仅能够帮助提升客服自动化系统的智能化水平,还能优化客户服务流程,提高客户满意度。本文将详细介绍如何在电信运营商客服工单的背景下进行

服务器集群同步时间手记

1.时间服务器配置(必须root用户) (1)检查ntp是否安装 [root@node1 桌面]# rpm -qa|grep ntpntp-4.2.6p5-10.el6.centos.x86_64fontpackages-filesystem-1.41-1.1.el6.noarchntpdate-4.2.6p5-10.el6.centos.x86_64 (2)修改ntp配置文件 [r

中文分词jieba库的使用与实景应用(一)

知识星球:https://articles.zsxq.com/id_fxvgc803qmr2.html 目录 一.定义: 精确模式(默认模式): 全模式: 搜索引擎模式: paddle 模式(基于深度学习的分词模式): 二 自定义词典 三.文本解析   调整词出现的频率 四. 关键词提取 A. 基于TF-IDF算法的关键词提取 B. 基于TextRank算法的关键词提取

水位雨量在线监测系统概述及应用介绍

在当今社会,随着科技的飞速发展,各种智能监测系统已成为保障公共安全、促进资源管理和环境保护的重要工具。其中,水位雨量在线监测系统作为自然灾害预警、水资源管理及水利工程运行的关键技术,其重要性不言而喻。 一、水位雨量在线监测系统的基本原理 水位雨量在线监测系统主要由数据采集单元、数据传输网络、数据处理中心及用户终端四大部分构成,形成了一个完整的闭环系统。 数据采集单元:这是系统的“眼睛”,

csu 1446 Problem J Modified LCS (扩展欧几里得算法的简单应用)

这是一道扩展欧几里得算法的简单应用题,这题是在湖南多校训练赛中队友ac的一道题,在比赛之后请教了队友,然后自己把它a掉 这也是自己独自做扩展欧几里得算法的题目 题意:把题意转变下就变成了:求d1*x - d2*y = f2 - f1的解,很明显用exgcd来解 下面介绍一下exgcd的一些知识点:求ax + by = c的解 一、首先求ax + by = gcd(a,b)的解 这个

Andrej Karpathy最新采访:认知核心模型10亿参数就够了,AI会打破教育不公的僵局

夕小瑶科技说 原创  作者 | 海野 AI圈子的红人,AI大神Andrej Karpathy,曾是OpenAI联合创始人之一,特斯拉AI总监。上一次的动态是官宣创办一家名为 Eureka Labs 的人工智能+教育公司 ,宣布将长期致力于AI原生教育。 近日,Andrej Karpathy接受了No Priors(投资博客)的采访,与硅谷知名投资人 Sara Guo 和 Elad G

hdu1394(线段树点更新的应用)

题意:求一个序列经过一定的操作得到的序列的最小逆序数 这题会用到逆序数的一个性质,在0到n-1这些数字组成的乱序排列,将第一个数字A移到最后一位,得到的逆序数为res-a+(n-a-1) 知道上面的知识点后,可以用暴力来解 代码如下: #include<iostream>#include<algorithm>#include<cstring>#include<stack>#in

【Prometheus】PromQL向量匹配实现不同标签的向量数据进行运算

✨✨ 欢迎大家来到景天科技苑✨✨ 🎈🎈 养成好习惯,先赞后看哦~🎈🎈 🏆 作者简介:景天科技苑 🏆《头衔》:大厂架构师,华为云开发者社区专家博主,阿里云开发者社区专家博主,CSDN全栈领域优质创作者,掘金优秀博主,51CTO博客专家等。 🏆《博客》:Python全栈,前后端开发,小程序开发,人工智能,js逆向,App逆向,网络系统安全,数据分析,Django,fastapi

zoj3820(树的直径的应用)

题意:在一颗树上找两个点,使得所有点到选择与其更近的一个点的距离的最大值最小。 思路:如果是选择一个点的话,那么点就是直径的中点。现在考虑两个点的情况,先求树的直径,再把直径最中间的边去掉,再求剩下的两个子树中直径的中点。 代码如下: #include <stdio.h>#include <string.h>#include <algorithm>#include <map>#

Retrieval-based-Voice-Conversion-WebUI模型构建指南

一、模型介绍 Retrieval-based-Voice-Conversion-WebUI(简称 RVC)模型是一个基于 VITS(Variational Inference with adversarial learning for end-to-end Text-to-Speech)的简单易用的语音转换框架。 具有以下特点 简单易用:RVC 模型通过简单易用的网页界面,使得用户无需深入了