本文主要是介绍时间序列预测-女性出生数量预测,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
1 数据集构建
原始数据为:
然后通过滑窗来构造多个X,如下图所示,第一列为是将原始值往后移6个时间步,其他列依次类推。
我们去除空值之后,最后数据集为:
这里的X就是前六列特征,最后一列为y是预测值
预测女性未来出生数量
每日女性出生数据集,即三年内的每月出生数。
下载链接:https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv
完整代码
from numpy import asarray
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_absolute_error
from xgboost import XGBRegressor# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):n_vars = 1 if type(data) is list else data.shape[1]df = DataFrame(data)cols = list()# input sequence (t-n, ... t-1)for i in range(n_in, 0, -1):cols.append(df.shift(i))# forecast sequence (t, t+1, ... t+n)for i in range(0, n_out):cols.append(df.shift(-i))# put it all togetheragg = concat(cols, axis=1)# drop rows with NaN valuesif dropnan:agg.dropna(inplace=True)return agg.valuesseries = read_csv('data/daily-total-female-births.csv', header=0, index_col=0)
values = series.values
data = series_to_supervised(values, n_in=6)def train_test_split(data, n_test):return data[:-n_test, :], data[-n_test:, :]def xgboost_forecast(train, testX):# transform list into arraytrain = asarray(train)# split into input and output columnstrainX, trainy = train[:, :-1], train[:, -1]# fit modelmodel = XGBRegressor(objective='reg:squarederror', n_estimators=1000)model.fit(trainX, trainy)# make a one-step predictionyhat = model.predict(asarray([testX]))return yhat[0]# walk-forward validation for univariate data
def walk_forward_validation(data, n_test):predictions = list()# split datasettrain, test = train_test_split(data, n_test)# seed history with training datasethistory = [x for x in train]# step over each time-step in the test setfor i in range(len(test)):# split test row into input and output columnstestX, testy = test[i, :-1], test[i, -1]# fit model on history and make a predictionyhat = xgboost_forecast(history, testX)# store forecast in list of predictionspredictions.append(yhat)# add actual observation to history for the next loophistory.append(test[i])# summarize progressprint('>expected=%.1f, predicted=%.1f' % (testy, yhat))# estimate prediction errorerror = mean_absolute_error(test[:, -1], predictions)return error, test[:, -1], predictions# %%# transform the time series data into supervised learning
data = series_to_supervised(values, n_in=6)
# evaluate
mae, y, yhat = walk_forward_validation(data, 12)
这篇关于时间序列预测-女性出生数量预测的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!