LLMs:《A Decoder-Only Foundation Model For Time-Series Forecasting》的翻译与解读

2024-06-17 08:04

本文主要是介绍LLMs:《A Decoder-Only Foundation Model For Time-Series Forecasting》的翻译与解读,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

LLMs:《A Decoder-Only Foundation Model For Time-Series Forecasting》的翻译与解读


背景痛点:近年来,深度学习模型在有充足训练数据的情况下已成为时序预测的主流方法,但这些方法通常需要独立在每个数据集上训练。同时,自然语言处理领域的大规模预训练语言模型在下游任务上的表现不断提高。而时序数据 volumes 难以与 NLP 中的文本数据相比,且时序数据没有明确的词汇或语法规则。是否可训练一个基础模型,其零样本学习模式下各个新未见数据集的预测效果能与专门为每个数据集训练的模型相媲美?


>> 构建了大规模时序预训练语料,包含真实世界数据(谷歌查询趋势、维基浏览量等)和人工生成的数据。

>> 提出了TimesFM模型,采用解码器样式的注意力结构加输入切片策略进行预训练。

>> 模型规模为200M,预训练数据规模在100B时间点级别,远小于NLP领域的大模型。

>> 在多重未见预测任务上,TimesFM的零射效果趋近或超越各任务的专门训练基线模型。


>> 输入切片,相当于NLP中词语,提高计算效率。

>> 解码器训练策略,支持任意输入长度。

>> 输出切片长度大于输入切片长度,减少自动回归步骤。

>> 训练采样掩码策略,覆盖所有可能输入窗口长度。

>> 人工数据增加训练语料多样性。

>> 小模型达到较好效果,说明时序预训练也可取得成果。


>> 可一致处理不同应用场景,预测长度、细粒度等。

>> 零样本学习模式实现,无需额外训练即可直接应用。

>> 导入成本低,计算资源消耗小。

>> 提供了可行的时序基础模型范例。

>> 有助提升时序深度学习在实际应用中的采用度。


《A Decoder-Only Foundation Model For Time-Series Forecasting》的翻译与解读


1 Introduction引言

7 Conclusion结论

《A Decoder-Only Foundation Model For Time-Series Forecasting》的翻译与解读






Google Research


Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a decoder style attention model with input patching, using a large time-series corpus comprising both real-world and synthetic datasets. Experiments on a diverse set of previously unseen forecasting datasets suggests that the model can yield accurate zero-shot forecasts across different domains, forecasting horizons and temporal granularities.


1 Introduction引言

Time-series data is ubiquitous in various domains such as retail, finance, manufacturing, healthcare and natural sciences. In many of these domains, one of the most important use-cases of time-series data is forecasting. Time-series forecasting is critical to several scientific and industrial applications, like retail supply chain optimization, energy and traffic prediction, and weather forecasting. In recent times, Deep learning models [SFGJ20, OCCB19] have emerged as a popular approach for forecasting rich, multivariate, time-series data, often outperforming classical statistical approaches such as ARIMA or GARCH [BJ68]. In several forecasting competitions such as the M5 competition [MSA22] and IARAI Traffic4cast contest [KKN+21] deep network based solutions performed very well.

At the same time, we are witnessing a rapid progress in the Natural Language Processing (NLP) domain on large foundation models for downstream NLP tasks. Large language models (LLMs) are growing in popularity because they can be used to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way [RWC+19]. They are trained on massive amounts of data, which allows them to learn the patterns of human language. This makes them very powerful tools that can be used for a variety of downstream tasks, often in a zero-shot learning mode.

时间序列数据在零售、金融、制造、医疗保健和自然科学等各个领域中无处不在。在这些领域中,时间序列数据最重要的用例之一是预测。时间序列预测对多个科学和工业应用至关重要,如零售供应链优化、能源和交通预测以及天气预报。近年来,深度学习模型(如[SFGJ20, OCCB19])在处理丰富的多变量时间序列数据方面已经成为一种流行的方法,往往优于经典的统计方法,如ARIMA或GARCH[BJ68]。在多个预测竞赛中,如M5竞赛[MSA22]和IARAI Traffic4cast比赛[KKN+21],基于深度网络的解决方案表现非常出色。


This motivates the question: “Can large pretrained models trained on massive amounts of time-series data learn temporal patterns that can be useful for time-series forecasting on previously unseen datasets?” In particular, can we design a time-series foundation model that obtains good zero-shot out-of-the-box forecasting performance ? Such a pretrained time-series foundation model, if possible, would bring significant benefits for downstream forecasting users in terms of no additional training burden and significantly reduced compute requirements. It is not immediately obvious that such a foundation model for time-series forecasting is possible. Unlike in NLP, there is no well defined vocabulary or grammar for time-series. Additionally, such a model would need to support forecasting with varying history lengths (context) , prediction lengths (horizon) and time granularities. Furthermore, unlike the huge volume of public text data for pretraining language models, vast amounts of time-series data is not readily available. In spite of these issues, we provide evidence to answer the above question in the affirmative.In particular, we design TimesFM, a single foundation model for time-series forecasting that, when applied to a variety of previously-unseen forecasting datasets across different domains, obtains close to state-of-the-art zero-shot accuracy (compared to the best supervised models trained individually for these datasets). Our model can work well across different forecasting history lengths, prediction lengths and time granularities at inference time. The key elements of our foundation model are twofold: 1) a large-scale time-series corpus built using both real-world (mostly time-series data from web search queries1 and Wikipedia page visits2) and synthetic data, which meets the volume and diversity of data needed for training our foundation model, and 2) a decoder style attention architecture with input patching, that can be efficiently pre-trained on this time-series corpus.


Compared to the latest large language models, our time-series foundation model is much smaller in both parameter size (200M parameters) and pretraining data size (O(100B) timepoints); yet we show that even at such scales, it is possible to pretrain a practical foundation model for forecasting whose zero-shot performance comes close to the accuracy of fully-supervised approaches on a diverse set of time-series data. Our work also suggests that unlike recent work [GFQW23] that recommends Large Language Models such as GPT-3 and LLama-2 as out-of-the-box zero-shot forecasters, foundation models trained from scratch exclusively on time-series data can obtain much better zero-shot performance at a tiny fraction of its costs.


7 Conclusion结论

In this paper, we presented TimesFM, a practical foundation model for forecasting whose zero-shot performance comes close to the accuracy of fully-supervised forecasting models on a diverse set of time-series data. This model is pretrained on real-world and synthetic datasets comprising O(100B) timepoints. We discuss limitations and future work in more detail in Appendix A.1.


这篇关于LLMs:《A Decoder-Only Foundation Model For Time-Series Forecasting》的翻译与解读的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!




《MySQL中的MVCC底层原理解读》本文详细介绍了MySQL中的多版本并发控制(MVCC)机制,包括版本链、ReadView以及在不同事务隔离级别下MVCC的工作原理,通过一个具体的示例演示了在可重... 目录简介ReadView版本链演示过程总结简介MVCC(Multi-Version Concurr


《关于Gateway路由匹配规则解读》本文详细介绍了SpringCloudGateway的路由匹配规则,包括基本概念、常用属性、实际应用以及注意事项,路由匹配规则决定了请求如何被转发到目标服务,是Ga... 目录Gateway路由匹配规则一、基本概念二、常用属性三、实际应用四、注意事项总结Gateway路由


《解读Redis秒杀优化方案(阻塞队列+基于Stream流的消息队列)》该文章介绍了使用Redis的阻塞队列和Stream流的消息队列来优化秒杀系统的方案,通过将秒杀流程拆分为两条流水线,使用Redi... 目录Redis秒杀优化方案(阻塞队列+Stream流的消息队列)什么是消息队列?消费者组的工作方式每


《解读静态资源访问static-locations和static-path-pattern》本文主要介绍了SpringBoot中静态资源的配置和访问方式,包括静态资源的默认前缀、默认地址、目录结构、访... 目录静态资源访问static-locations和static-path-pattern静态资源配置


《MySQL中时区参数time_zone解读》MySQL时区参数time_zone用于控制系统函数和字段的DEFAULTCURRENT_TIMESTAMP属性,修改时区可能会影响timestamp类型... 目录前言1.时区参数影响2.如何设置3.字段类型选择总结前言mysql 时区参数 time_zon


《MySQL中的锁和MVCC机制解读》MySQL事务、锁和MVCC机制是确保数据库操作原子性、一致性和隔离性的关键,事务必须遵循ACID原则,锁的类型包括表级锁、行级锁和意向锁,MVCC通过非锁定读和... 目录mysql的锁和MVCC机制事务的概念与ACID特性锁的类型及其工作机制锁的粒度与性能影响多版本

Python 标准库time时间的访问和转换问题小结

《Python标准库time时间的访问和转换问题小结》time模块为Python提供了处理时间和日期的多种功能,适用于多种与时间相关的场景,包括获取当前时间、格式化时间、暂停程序执行、计算程序运行时... 目录模块介绍使用场景主要类主要函数 - time()- sleep()- localtime()- g


《Redis过期键删除策略解读》Redis通过惰性删除策略和定期删除策略来管理过期键,惰性删除策略在键被访问时检查是否过期并删除,节省CPU开销但可能导致过期键滞留,定期删除策略定期扫描并删除过期键,... 目录1.Redis使用两种不同的策略来删除过期键,分别是惰性删除策略和定期删除策略1.1惰性删除策略


《Redis与缓存解读》文章介绍了Redis作为缓存层的优势和缺点,并分析了六种缓存更新策略,包括超时剔除、先删缓存再更新数据库、旁路缓存、先更新数据库再删缓存、先更新数据库再更新缓存、读写穿透和异步... 目录缓存缓存优缺点缓存更新策略超时剔除先删缓存再更新数据库旁路缓存(先更新数据库,再删缓存)先更新数

如何使用 Bash 脚本中的time命令来统计命令执行时间(中英双语)

《如何使用Bash脚本中的time命令来统计命令执行时间(中英双语)》本文介绍了如何在Bash脚本中使用`time`命令来测量命令执行时间,包括`real`、`user`和`sys`三个时间指标,... 使用 Bash 脚本中的 time 命令来统计命令执行时间在日常的开发和运维过程中,性能监控和优化是不