天池AI Earth task02

2024-01-01 18:58
文章标签 ai task02 earth 天池

本文主要是介绍天池AI Earth task02,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

文章目录

  • Baseline学习
    • 库导入
    • 数据读取
      • 1.CMIP(SODA)_label处理
      • 2.SODA_train处理
      • 3.CMIP_train处理
    • 数据建模
      • 1. 工具包导入
      • 2.标签数据读取
      • 3.原始特征数据读取
      • 4.构建神经网络框架
      • 5.划分训练集验证集
      • 6.模型训练
      • 7.模型预测
      • 8.计算分数
    • 模型预测
      • 1.模型导入
      • 2.模型预测
      • 3.预测结果打包

Baseline学习

库导入

import pandas as pd
import numpy  as np
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
import scipy 
from netCDF4 import Dataset
import netCDF4 as nc
import gc

数据读取

1.CMIP(SODA)_label处理

标签数据为Nino3.4 SST异常指数,数据维度为(year,month)。
CMIP(SODA)_train.nc对应的标签数据当前时刻Nino3.4 SST异常指数的三个月滑动平均值。

以下代码为读取SODA_label后转化成csv文件:

label_path       = './data/SODA_label.nc'
label_trans_path = './data/'
nc_label         = Dataset(label_path,'r')years            = np.array(nc_label['year'][:])
months           = np.array(nc_label['month'][:])year_month_index = []
vs               = []
for i,year in enumerate(years):for j,month in enumerate(months):year_month_index.append('year_{}_month_{}'.format(year,month))vs.append(np.array(nc_label['nino'][i,j]))df_SODA_label               = pd.DataFrame({'year_month':year_month_index}) 
df_SODA_label['year_month'] = year_month_index
df_SODA_label['label']      = vsdf_SODA_label.to_csv(label_trans_path + 'df_SODA_label.csv',index = None)

以下代码为读取CMIP_label后转化成csv文件:

label_path       = './data/CMIP_label.nc'
label_trans_path = './data/'
nc_label         = Dataset(label_path,'r')years            = np.array(nc_label['year'][:])
months           = np.array(nc_label['month'][:])year_month_index = []
vs               = []
for i,year in enumerate(years):for j,month in enumerate(months):year_month_index.append('year_{}_month_{}'.format(year,month))vs.append(np.array(nc_label['nino'][i,j]))df_CMIP_label               = pd.DataFrame({'year_month':year_month_index}) 
df_CMIP_label['year_month'] = year_month_index
df_CMIP_label['label']      = vsdf_CMIP_label.to_csv(label_trans_path + 'df_CMIP_label.csv',index = None)

2.SODA_train处理

SODA_train.nc中[0,0:36,:,:]为第1-第3年逐月的历史观测数据;

SODA_train.nc中[1,0:36,:,:]为第2-第4年逐月的历史观测数据; …,
SODA_train.nc中[99,0:36,:,:]为第100-102年逐月的历史观测数据。

自定义抽取对应数据转化为csv的函数:

def trans_df(df, vals, lats, lons, years, months):'''(100, 36, 24, 72) -- year, month,lat,lon '''for j,lat_ in enumerate(lats):for i,lon_ in enumerate(lons):c = 'lat_lon_{}_{}'.format(int(lat_),int(lon_))  v = []for y in range(len(years)):for m in range(len(months)): v.append(vals[y,m,j,i])df[c] = vreturn df

主代码:

SODA_path        = './data/SODA_train.nc'
nc_SODA          = Dataset(SODA_path,'r')
year_month_index = []years              = np.array(nc_SODA['year'][:])
months             = np.array(nc_SODA['month'][:])
lats             = np.array(nc_SODA['lat'][:])
lons             = np.array(nc_SODA['lon'][:])for year in years:for month in months:year_month_index.append('year_{}_month_{}'.format(year,month))
#读取数据
df_sst  = pd.DataFrame({'year_month':year_month_index}) 
df_t300 = pd.DataFrame({'year_month':year_month_index}) 
df_ua   = pd.DataFrame({'year_month':year_month_index}) 
df_va   = pd.DataFrame({'year_month':year_month_index})
#转换成csv
df_sst = trans_df(df = df_sst, vals = np.array(nc_SODA['sst'][:]), lats = lats, lons = lons, years = years, months = months)
df_t300 = trans_df(df = df_t300, vals = np.array(nc_SODA['t300'][:]), lats = lats, lons = lons, years = years, months = months)
df_ua   = trans_df(df = df_ua, vals = np.array(nc_SODA['ua'][:]), lats = lats, lons = lons, years = years, months = months)
df_va   = trans_df(df = df_va, vals = np.array(nc_SODA['va'][:]), lats = lats, lons = lons, years = years, months = months)
#存储到data中
label_trans_path = './data/'
df_sst.to_csv(label_trans_path  + 'df_sst_SODA.csv',index = None)
df_t300.to_csv(label_trans_path + 'df_t300_SODA.csv',index = None)
df_ua.to_csv(label_trans_path   + 'df_ua_SODA.csv',index = None)
df_va.to_csv(label_trans_path   + 'df_va_SODA.csv',index = None)

3.CMIP_train处理

CMIP_train.nc中[0,0:36,:,:]为CMIP6第一个模式提供的第1-第3年逐月的历史模拟数据;
…,
CMIP_train.nc中[150,0:36,:,:]为CMIP6第一个模式提供的第151-第153年逐月的历史模拟数据;
CMIP_train.nc中[151,0:36,:,:]为CMIP6第二个模式提供的第1-第3年逐月的历史模拟数据; …,
CMIP_train.nc中[2265,0:36,:,:]为CMIP5第一个模式提供的第1-第3年逐月的历史模拟数据; …,
CMIP_train.nc中[2405,0:36,:,:]为CMIP5第二个模式提供的第1-第3年逐月的历史模拟数据; …,
CMIP_train.nc中[4644,0:36,:,:]为CMIP5第17个模式提供的第140-第142年逐月的历史模拟数据。

其中每个样本第三、第四维度分别代表经纬度(南纬55度北纬60度,东经0360度),所有数据的经纬度范围相同。

主代码:

CMIP_path       = './data/CMIP_train.nc'
CMIP_trans_path = './data'
nc_CMIP  = Dataset(CMIP_path,'r')
year_month_index = []years              = np.array(nc_CMIP['year'][:])
months             = np.array(nc_CMIP['month'][:])
lats               = np.array(nc_CMIP['lat'][:])
lons               = np.array(nc_CMIP['lon'][:])last_thre_years = 1000
for year in years:'''数据的原因,因为内存限制,我们暂时取最后1000个year的数据'''if year >= 4645 - last_thre_years:for month in months:year_month_index.append('year_{}_month_{}'.format(year,month))df_CMIP_sst  = pd.DataFrame({'year_month':year_month_index}) 
df_CMIP_t300 = pd.DataFrame({'year_month':year_month_index}) 
df_CMIP_ua   = pd.DataFrame({'year_month':year_month_index}) 
df_CMIP_va   = pd.DataFrame({'year_month':year_month_index})df_CMIP_sst  = trans_thre_df(df = df_CMIP_sst,  vals   = np.array(nc_CMIP['sst'][:]),  lats = lats, lons = lons, years = years, months = months)
df_CMIP_sst.to_csv(CMIP_trans_path + 'df_CMIP_sst.csv',index = None)
del df_CMIP_sst
gc.collect()df_CMIP_t300 = trans_thre_df(df = df_CMIP_t300, vals   = np.array(nc_CMIP['t300'][:]), lats = lats, lons = lons, years = years, months = months)
df_CMIP_t300.to_csv(CMIP_trans_path + 'df_CMIP_t300.csv',index = None)
del df_CMIP_t300
gc.collect()df_CMIP_ua   = trans_thre_df(df = df_CMIP_ua,   vals   = np.array(nc_CMIP['ua'][:]),   lats = lats, lons = lons, years = years, months = months)
df_CMIP_ua.to_csv(CMIP_trans_path + 'df_CMIP_ua.csv',index = None)
del df_CMIP_ua
gc.collect()df_CMIP_va   = trans_thre_df(df = df_CMIP_va,   vals   = np.array(nc_CMIP['va'][:]),   lats = lats, lons = lons, years = years, months = months)
df_CMIP_va.to_csv(CMIP_trans_path + 'df_CMIP_va.csv',index = None)
del df_CMIP_va

定义转化成csv的函数:

def trans_thre_df(df, vals, lats, lons, years, months, last_thre_years = 1000):'''(4645, 36, 24, 72) -- year, month,lat,lon '''for j,lat_ in (enumerate(lats)):
#         print(j)for i,lon_ in enumerate(lons):c = 'lat_lon_{}_{}'.format(int(lat_),int(lon_))  v = []for y_,y in enumerate(years):'''数据的原因,我们'''if y >= 4645 - last_thre_years:for m_,m in  enumerate(months): v.append(vals[y_,m_,j,i])df[c] = vreturn df

数据建模

1. 工具包导入

import pandas as pd
import numpy  as np
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
import scipy 
import joblib
from netCDF4 import Dataset
import netCDF4 as nc 
from tensorflow.keras.callbacks import LearningRateScheduler, Callback
import tensorflow.keras.backend as K
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.optimizers import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.layers import Input 
import gc

2.标签数据读取

label_path       = './data/SODA_label.nc'
nc_label         = Dataset(label_path,'r')
tr_nc_labels     = nc_label['nino'][:]

3.原始特征数据读取

SODA_path        = './data/SODA_train.nc'
nc_SODA          = Dataset(SODA_path,'r') nc_sst           = np.array(nc_SODA['sst'][:])
nc_t300          = np.array(nc_SODA['t300'][:])
nc_ua            = np.array(nc_SODA['ua'][:])
nc_va            = np.array(nc_SODA['va'][:])

4.构建神经网络框架

def RMSE(y_true, y_pred):return tf.sqrt(tf.reduce_mean(tf.square(y_true - y_pred)))def RMSE_fn(y_true, y_pred):return np.sqrt(np.mean(np.power(np.array(y_true, float).reshape(-1, 1) - np.array(y_pred, float).reshape(-1, 1), 2)))def build_model():inp    = Input(shape=(12,24,72,4))  x_4    = Dense(1, activation='relu')(inp)   x_3    = Dense(1, activation='relu')(tf.reshape(x_4,[-1,12,24,72]))x_2    = Dense(1, activation='relu')(tf.reshape(x_3,[-1,12,24]))x_1    = Dense(1, activation='relu')(tf.reshape(x_2,[-1,12]))x = Dense(64, activation='relu')(x_1)  x = Dropout(0.25)(x) x = Dense(32, activation='relu')(x)   x = Dropout(0.25)(x)  output = Dense(24, activation='linear')(x)   model  = Model(inputs=inp, outputs=output)adam = tf.optimizers.Adam(lr=1e-3,beta_1=0.99,beta_2 = 0.99) model.compile(optimizer=adam, loss=RMSE)return model

5.划分训练集验证集

### 训练特征,保证和训练集一致
tr_features = np.concatenate([nc_sst[:,:12,:,:].reshape(-1,12,24,72,1),nc_t300[:,:12,:,:].reshape(-1,12,24,72,1),\nc_ua[:,:12,:,:].reshape(-1,12,24,72,1),nc_va[:,:12,:,:].reshape(-1,12,24,72,1)],axis=-1)### 训练标签,取后24个
tr_labels = tr_nc_labels[:,12:] ### 训练集验证集划分
tr_len     = int(tr_features.shape[0] * 0.8)
tr_fea     = tr_features[:tr_len,:].copy()
tr_label   = tr_labels[:tr_len,:].copy()val_fea     = tr_features[tr_len:,:].copy()
val_label   = tr_labels[tr_len:,:].copy()

6.模型训练

构建模型后将其储存至model_baseline

#### 构建模型
model_mlp     = build_model()
#### 模型存储的位置
model_weights = './model_baseline/model_mlp_baseline.h5'checkpoint = ModelCheckpoint(model_weights, monitor='val_loss', verbose=0, save_best_only=True, mode='min',save_weights_only=True)plateau        = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, verbose=1, min_delta=1e-4, mode='min')
early_stopping = EarlyStopping(monitor="val_loss", patience=20)
history        = model_mlp.fit(tr_fea, tr_label,validation_data=(val_fea, val_label),batch_size=4096, epochs=200,callbacks=[plateau, checkpoint, early_stopping],verbose=2)

7.模型预测

prediction = model_mlp.predict(val_fea)

8.计算分数

from   sklearn.metrics import mean_squared_error
def rmse(y_true, y_preds):return np.sqrt(mean_squared_error(y_pred = y_preds, y_true = y_true))def score(y_true, y_preds):accskill_score = 0rmse_scores    = 0a = [1.5] * 4 + [2] * 7 + [3] * 7 + [4] * 6y_true_mean = np.mean(y_true,axis=0) y_pred_mean = np.mean(y_preds,axis=0) 
#     print(y_true_mean.shape, y_pred_mean.shape)for i in range(24): fenzi = np.sum((y_true[:,i] -  y_true_mean[i]) *(y_preds[:,i] -  y_pred_mean[i]) ) fenmu = np.sqrt(np.sum((y_true[:,i] -  y_true_mean[i])**2) * np.sum((y_preds[:,i] -  y_pred_mean[i])**2) ) cor_i = fenzi / fenmuaccskill_score += a[i] * np.log(i+1) * cor_irmse_score   = rmse(y_true[:,i], y_preds[:,i])
#         print(cor_i,  2 / 3.0 * a[i] * np.log(i+1) * cor_i - rmse_score)rmse_scores += rmse_score return 2 / 3.0 * accskill_score - rmse_scores

模型预测

上面的部分,我们已经训练好了模型,接下来就是提交模型并在线上进行预测,这块可以分为三步:
导入模型
读取测试数据并且进行预测
生成提交所需的版本

1.模型导入

import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.optimizers import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.layers import Input 
import numpy as np
import os
import zipfiledef RMSE(y_true, y_pred):return tf.sqrt(tf.reduce_mean(tf.square(y_true - y_pred)))def build_model():inp    = Input(shape=(12,24,72,4))  x_4    = Dense(1, activation='relu')(inp)   x_3    = Dense(1, activation='relu')(tf.reshape(x_4,[-1,12,24,72]))x_2    = Dense(1, activation='relu')(tf.reshape(x_3,[-1,12,24]))x_1    = Dense(1, activation='relu')(tf.reshape(x_2,[-1,12]))x = Dense(64, activation='relu')(x_1)  x = Dropout(0.25)(x) x = Dense(32, activation='relu')(x)   x = Dropout(0.25)(x)  output = Dense(24, activation='linear')(x)   model  = Model(inputs=inp, outputs=output)adam = tf.optimizers.Adam(lr=1e-3,beta_1=0.99,beta_2 = 0.99) model.compile(optimizer=adam, loss=RMSE)return model model = build_model()
model.load_weights('./user_data/model_data/model_mlp_baseline.h5')

2.模型预测

test_path = './tcdata/enso_round1_test_20210201/'### 1. 测试数据读取
files = os.listdir(test_path)
test_feas_dict = {}
for file in files:test_feas_dict[file] = np.load(test_path + file)### 2. 结果预测
test_predicts_dict = {}
for file_name,val in test_feas_dict.items():#test_predicts_dict[file_name] = model.predict(val).reshape(-1,)test_predicts_dict[file_name] = model.predict(val.reshape([-1,12])[0,:])### 3.存储预测结果
for file_name,val in test_predicts_dict.items(): np.save('./result/' + file_name,val)

3.预测结果打包

#打包目录为zip文件(未压缩)
def make_zip(source_dir='./result/', output_filename = 'result.zip'):zipf = zipfile.ZipFile(output_filename, 'w')pre_len = len(os.path.dirname(source_dir))source_dirs = os.walk(source_dir)print(source_dirs)for parent, dirnames, filenames in source_dirs:print(parent, dirnames)for filename in filenames:if '.npy' not in filename:continuepathfile = os.path.join(parent, filename)arcname = pathfile[pre_len:].strip(os.path.sep)   #相对路径zipf.write(pathfile, arcname)zipf.close()
make_zip()

模型构建运行结果:
在这里插入图片描述
但是模型预测结果报错,目前正在解决中。在这里插入图片描述

这篇关于天池AI Earth task02的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/560186

相关文章

基于Flask框架添加多个AI模型的API并进行交互

《基于Flask框架添加多个AI模型的API并进行交互》:本文主要介绍如何基于Flask框架开发AI模型API管理系统,允许用户添加、删除不同AI模型的API密钥,感兴趣的可以了解下... 目录1. 概述2. 后端代码说明2.1 依赖库导入2.2 应用初始化2.3 API 存储字典2.4 路由函数2.5 应

Spring AI ectorStore的使用流程

《SpringAIectorStore的使用流程》SpringAI中的VectorStore是一种用于存储和检索高维向量数据的数据库或存储解决方案,它在AI应用中发挥着至关重要的作用,本文给大家介... 目录一、VectorStore的基本概念二、VectorStore的核心接口三、VectorStore的

Spring AI集成DeepSeek三步搞定Java智能应用的详细过程

《SpringAI集成DeepSeek三步搞定Java智能应用的详细过程》本文介绍了如何使用SpringAI集成DeepSeek,一个国内顶尖的多模态大模型,SpringAI提供了一套统一的接口,简... 目录DeepSeek 介绍Spring AI 是什么?Spring AI 的主要功能包括1、环境准备2

Spring AI集成DeepSeek实现流式输出的操作方法

《SpringAI集成DeepSeek实现流式输出的操作方法》本文介绍了如何在SpringBoot中使用Sse(Server-SentEvents)技术实现流式输出,后端使用SpringMVC中的S... 目录一、后端代码二、前端代码三、运行项目小天有话说题外话参考资料前面一篇文章我们实现了《Spring

Spring AI与DeepSeek实战一之快速打造智能对话应用

《SpringAI与DeepSeek实战一之快速打造智能对话应用》本文详细介绍了如何通过SpringAI框架集成DeepSeek大模型,实现普通对话和流式对话功能,步骤包括申请API-KEY、项目搭... 目录一、概述二、申请DeepSeek的API-KEY三、项目搭建3.1. 开发环境要求3.2. mav

C#集成DeepSeek模型实现AI私有化的流程步骤(本地部署与API调用教程)

《C#集成DeepSeek模型实现AI私有化的流程步骤(本地部署与API调用教程)》本文主要介绍了C#集成DeepSeek模型实现AI私有化的方法,包括搭建基础环境,如安装Ollama和下载DeepS... 目录前言搭建基础环境1、安装 Ollama2、下载 DeepSeek R1 模型客户端 ChatBo

Spring AI集成DeepSeek的详细步骤

《SpringAI集成DeepSeek的详细步骤》DeepSeek作为一款卓越的国产AI模型,越来越多的公司考虑在自己的应用中集成,对于Java应用来说,我们可以借助SpringAI集成DeepSe... 目录DeepSeek 介绍Spring AI 是什么?1、环境准备2、构建项目2.1、pom依赖2.2

Deepseek R1模型本地化部署+API接口调用详细教程(释放AI生产力)

《DeepseekR1模型本地化部署+API接口调用详细教程(释放AI生产力)》本文介绍了本地部署DeepSeekR1模型和通过API调用将其集成到VSCode中的过程,作者详细步骤展示了如何下载和... 目录前言一、deepseek R1模型与chatGPT o1系列模型对比二、本地部署步骤1.安装oll

Spring AI Alibaba接入大模型时的依赖问题小结

《SpringAIAlibaba接入大模型时的依赖问题小结》文章介绍了如何在pom.xml文件中配置SpringAIAlibaba依赖,并提供了一个示例pom.xml文件,同时,建议将Maven仓... 目录(一)pom.XML文件:(二)application.yml配置文件(一)pom.xml文件:首

SpringBoot整合DeepSeek实现AI对话功能

《SpringBoot整合DeepSeek实现AI对话功能》本文介绍了如何在SpringBoot项目中整合DeepSeekAPI和本地私有化部署DeepSeekR1模型,通过SpringAI框架简化了... 目录Spring AI版本依赖整合DeepSeek API key整合本地化部署的DeepSeek