深度学习_数据读取到model模型存储

2024-09-01 01:20

本文主要是介绍深度学习_数据读取到model模型存储,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

概要

应用场景:用户流失
本文将介绍模型调用预测的步骤,这里深度学习模型使用的是自定义的deepfm,并用机器学习lgb做比较

代码

导包

import pandas as pd
import numpy as npimport matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict  
from scipy import stats
from scipy import signal
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, f1_score
from scipy.spatial.distance import cosineimport lightgbm as lgbfrom sklearn.preprocessing import LabelEncoder, MinMaxScaler, StandardScaler
from tensorflow.keras.layers import *
import tensorflow.keras.backend as K
import tensorflow as tf
from tensorflow.keras.models import Modelimport os,gc,re,warnings,sys,math
warnings.filterwarnings("ignore")pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)

读取数据

data = pd.read_csv('df_03m.csv')

区分稀疏及类别变量

sparse_cols = ['shop_id','sex']
dense_cols  = [c for c in data.columns if c not in sparse_cols + ['customer_id', 'flag', 'duartion_is_lm']]

dense特征处理

def process_dense_feats(data, cols):d = data.copy()for f in cols:d[f] = d[f].fillna(0)ss=StandardScaler()d[f] = ss.fit_transform(d[[f]])return ddata = process_dense_feats(data, dense_cols)

sparse稀疏特征处理

def process_sparse_feats(data, cols):d = data.copy()for f in cols:d[f] = d[f].fillna('-1').astype(str)label_encoder = LabelEncoder()d[f] = label_encoder.fit_transform(d[f])return ddata = process_sparse_feats(data, sparse_cols)

切分训练及测试集

X_train, X_test, _, _ = train_test_split(data, data, test_size=0.3, random_state=2024)y_train = X_train['flag']
y_test = X_test['flag']X_train1 = X_train.drop(['customer_id', 'flag', 'duartion_is_lm'], axis = 1)
X_test1 = X_test.drop(['customer_id', 'flag', 'duartion_is_lm'], axis = 1)

模型定义

def deepfm_model(sparse_columns, dense_columns, train, test):####### sparse features ##########sparse_input = []lr_embedding = []fm_embedding = []for col in sparse_columns:## lr_embedding_input = Input(shape=(1,))sparse_input.append(_input)nums = pd.concat((train[col], test[col])).nunique() + 1embed = Flatten()(Embedding(nums, 1, embeddings_regularizer=tf.keras.regularizers.l2(0.5))(_input))lr_embedding.append(embed)## fm_embeddingembed = Embedding(nums, 10, input_length=1, embeddings_regularizer=tf.keras.regularizers.l2(0.5))(_input)reshape = Reshape((10,))(embed)fm_embedding.append(reshape)####### fm layer ##########fm_square = Lambda(lambda x: K.square(x))(Add()(fm_embedding)) # square_fm = Add()([Lambda(lambda x:K.square(x))(embed)for embed in fm_embedding])snd_order_sparse_layer = subtract([fm_square, square_fm])snd_order_sparse_layer  = Lambda(lambda x: x * 0.5)(snd_order_sparse_layer)####### dense features ##########dense_input = []for col in dense_columns:_input = Input(shape=(1,))dense_input.append(_input)concat_dense_input = concatenate(dense_input)fst_order_dense_layer = Dense(4, activation='relu')(concat_dense_input)#     #######  NFM  ##########
#     inner_product = []
#     for i in range(field_cnt):
#         for j in range(i + 1, field_cnt):
#             tmp = dot([fm_embedding[i], fm_embedding[j]], axes=1)
#             # tmp = multiply([fm_embedding[i], fm_embedding[j]])
#             inner_product.append(tmp)
#     add_inner_product = add(inner_product)#     #######  PNN  ##########
#     for i in range(field_cnt):
#         for j in range(i+1,field_cnt):
#             tmp = dot([lr_embedding[i],lr_embedding[j]],axes=1)
#             product_list.append(temp)
#     inp = concatenate(lr_embedding+product_list)####### linear concat ##########fst_order_sparse_layer = concatenate(lr_embedding)linear_part = concatenate([fst_order_dense_layer, fst_order_sparse_layer])#     #######  DCN  ##########
#     linear_part = concatenate([fst_order_dense_layer, fst_order_sparse_layer])
#     x0 = linear_part
#     xl = x0
#     for i in range(3):
#         embed_dim = xl.shape[-1]
#         w = tf.Variable(tf.random.truncated_normal(shape=(embed_dim,), stddev=0.01))
#         b = tf.Variable(tf.zeros(shape=(embed_dim,)))
#         x_lw = tf.tensordot(tf.reshape(xl, [-1, 1, embed_dim]), w, axes=1)
#         cross = x0 * x_lw 
#         xl = cross + b + xl#######dnn layer##########concat_fm_embedding = concatenate(fm_embedding, axis=-1) # (None, 10*26)fc_layer = Dropout(0.2)(Activation(activation="relu")(BatchNormalization()(Dense(128)(concat_fm_embedding))))fc_layer = Dropout(0.2)(Activation(activation="relu")(BatchNormalization()(Dense(64)(fc_layer))))fc_layer = Dropout(0.2)(Activation(activation="relu")(BatchNormalization()(Dense(32)(fc_layer))))######## output layer ##########output_layer = concatenate([linear_part, snd_order_sparse_layer, fc_layer]) # (None, )output_layer = Dense(1, activation='sigmoid')(output_layer)model = Model(inputs=sparse_input+dense_input, outputs=output_layer)return model
model = deepfm_model(sparse_cols, dense_cols, X_train1, X_test1)
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["binary_crossentropy", tf.keras.metrics.AUC(name='auc')])
train_sparse_x = [X_train1[f].values for f in sparse_cols]
train_dense_x = [X_train1[f].values for f in dense_cols]
train_label = [y_train.values]test_sparse_x = [X_test1[f].values for f in sparse_cols]
test_dense_x = [X_test1[f].values for f in dense_cols]
test_label = [y_test.values]
test_sparse_x

训练模型

from keras.callbacks import *
# 回调函数
file_path = "deepfm_model_data.h5"
earlystopping = EarlyStopping(monitor="val_loss", patience=3)
checkpoint = ModelCheckpoint(file_path, save_weights_only=True, verbose=1, save_best_only=True)
callbacks_list = [earlystopping, checkpoint]hist = model.fit(train_sparse_x+train_dense_x, train_label,batch_size=4096,epochs=20,validation_data=(test_sparse_x+test_dense_x, test_label),callbacks=callbacks_list,shuffle=False)

模型存储

model.save('deepfm_model.h5')
loaded_model = tf.keras.models.load_model('deepfm_model.h5')
print("np.min(hist.history['val_loss']):", np.min(hist.history['val_loss']))
#np.min(hist.history['val_loss']):0.19
print("np.max(hist.history['val_auc']):", np.max(hist.history['val_auc']))
#np.max(hist.history['val_auc']):0.95

模型预测

deepfm_prob = model.predict(test_sparse_x+test_dense_x, batch_size=4096*4, verbose=1)
deepfm_prob.shape
deepfm_prob
df_submit          = pd.DataFrame()
df_submit          = X_test
df_submit['prob']  = deepfm_prob 
df_submit.head(3)
df_submit.shape
df_submit['y_pre'] = ''
df_submit['y_pre'].loc[(df_submit['prob']>=0.5)] = 1
df_submit['y_pre'].loc[(df_submit['prob']<0.5)]  = 0
df_submit.head(3)
df_submit = df_submit.reset_index()
df_submit.head(1)
df_submit = df_submit.drop('index', axis = 1)
df_submit.head(1)
df_submit.groupby(['flag', 'y_pre'])['customer_id'].count()

根据上述结果打印召回及精准

precision = 
recall  = 

查看lgb结果做比较

from lightgbm import LGBMClassifier
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import f1_score, confusion_matrix, recall_score, precision_scoreparams = {'n_estimators': 1500, 'learning_rate': 0.1,'max_depth': 15,'metric': 'auc','verbose': -1, 'seed: 2023,'n_jobs':-1model=LGBMClarsifier(**params) 
model.fit(X_train, y_train,eval_set=[(X_train1, y_train), (X_test1, y_test)], eval_metric = 'auc', verbose=50,early_stopping_rounds = 100)
y_pred = model.predict(X_test1, num_iteration = model.best_iteration_)y_pred = model.predict(X_test1)
y_pred_proba = model.predict_proba(X_test1)
lgb_acc = model.score(X_test1, y_test) * 100
lgb_recall = recall_score(y_test, y_pred) * 100
lgb_precision = precision_score(y_test, y_pred) * 100 I 
lgb_f1 = f1_score(y_test, y_pred, pos_label=1) * 100
print("1gb 准确率:{:.2f}%".format(lgb_acc))
print("lgb 召回率:{:.2f}%".fornat(lgb_recall))
print("lgb 精准率:{:.2f}%".format(lgb_precision))
print("lgb F1分数:{:.2f}%".format(lgb_f1))#from sklearn.metrics import classification_report
#printf(classification_report(y_test, y_pred))# 混淆矩阵
plt.title("混淆矩阵", fontsize=21)
data_confusion_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(data_confusion_matrix, annot=True, cmap='Blues', fmt='d', cbar='False', annot_kws={'size': 28})
plt.xlabel('Predicted label') 
plt.ylabel('True label')from sklearn.metrics import roc_curve, auc
probs = model.predict_proba(X_test1)
preds = probs[:, 1]
fpr, tpr, threshold = roc_curve(y_test, preds)
# 绘制ROC曲线
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.plot([0, 1], [0, 1], 'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive(TPR)')
plt.xlabel('False Positive(FPR)')
plt.title('ROC')
plt.legend(loc='lower right')
plt.show()

参考资料:自己琢磨将资料整合

这篇关于深度学习_数据读取到model模型存储的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1125544

相关文章

SpringCloud动态配置注解@RefreshScope与@Component的深度解析

《SpringCloud动态配置注解@RefreshScope与@Component的深度解析》在现代微服务架构中,动态配置管理是一个关键需求,本文将为大家介绍SpringCloud中相关的注解@Re... 目录引言1. @RefreshScope 的作用与原理1.1 什么是 @RefreshScope1.

C# WinForms存储过程操作数据库的实例讲解

《C#WinForms存储过程操作数据库的实例讲解》:本文主要介绍C#WinForms存储过程操作数据库的实例,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、存储过程基础二、C# 调用流程1. 数据库连接配置2. 执行存储过程(增删改)3. 查询数据三、事务处

Java利用JSONPath操作JSON数据的技术指南

《Java利用JSONPath操作JSON数据的技术指南》JSONPath是一种强大的工具,用于查询和操作JSON数据,类似于SQL的语法,它为处理复杂的JSON数据结构提供了简单且高效... 目录1、简述2、什么是 jsONPath?3、Java 示例3.1 基本查询3.2 过滤查询3.3 递归搜索3.4

Java的IO模型、Netty原理解析

《Java的IO模型、Netty原理解析》Java的I/O是以流的方式进行数据输入输出的,Java的类库涉及很多领域的IO内容:标准的输入输出,文件的操作、网络上的数据传输流、字符串流、对象流等,这篇... 目录1.什么是IO2.同步与异步、阻塞与非阻塞3.三种IO模型BIO(blocking I/O)NI

MySQL大表数据的分区与分库分表的实现

《MySQL大表数据的分区与分库分表的实现》数据库的分区和分库分表是两种常用的技术方案,本文主要介绍了MySQL大表数据的分区与分库分表的实现,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有... 目录1. mysql大表数据的分区1.1 什么是分区?1.2 分区的类型1.3 分区的优点1.4 分

Mysql删除几亿条数据表中的部分数据的方法实现

《Mysql删除几亿条数据表中的部分数据的方法实现》在MySQL中删除一个大表中的数据时,需要特别注意操作的性能和对系统的影响,本文主要介绍了Mysql删除几亿条数据表中的部分数据的方法实现,具有一定... 目录1、需求2、方案1. 使用 DELETE 语句分批删除2. 使用 INPLACE ALTER T

Python 中的异步与同步深度解析(实践记录)

《Python中的异步与同步深度解析(实践记录)》在Python编程世界里,异步和同步的概念是理解程序执行流程和性能优化的关键,这篇文章将带你深入了解它们的差异,以及阻塞和非阻塞的特性,同时通过实际... 目录python中的异步与同步:深度解析与实践异步与同步的定义异步同步阻塞与非阻塞的概念阻塞非阻塞同步

Python Dash框架在数据可视化仪表板中的应用与实践记录

《PythonDash框架在数据可视化仪表板中的应用与实践记录》Python的PlotlyDash库提供了一种简便且强大的方式来构建和展示互动式数据仪表板,本篇文章将深入探讨如何使用Dash设计一... 目录python Dash框架在数据可视化仪表板中的应用与实践1. 什么是Plotly Dash?1.1

基于Flask框架添加多个AI模型的API并进行交互

《基于Flask框架添加多个AI模型的API并进行交互》:本文主要介绍如何基于Flask框架开发AI模型API管理系统,允许用户添加、删除不同AI模型的API密钥,感兴趣的可以了解下... 目录1. 概述2. 后端代码说明2.1 依赖库导入2.2 应用初始化2.3 API 存储字典2.4 路由函数2.5 应

GORM中Model和Table的区别及使用

《GORM中Model和Table的区别及使用》Model和Table是两种与数据库表交互的核心方法,但它们的用途和行为存在著差异,本文主要介绍了GORM中Model和Table的区别及使用,具有一... 目录1. Model 的作用与特点1.1 核心用途1.2 行为特点1.3 示例China编程代码2. Tab