深度学习构建肿瘤依赖性图谱

2024-01-31 16:30

本文主要是介绍深度学习构建肿瘤依赖性图谱,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

来源于论文

Predicting and characterizing a cancer dependency map oftumors with deep learning

代码地址:Code Ocean

大家好呀!今天给大家介绍一篇2021年发表在Science Advances上的文章。

全基因组功能缺失筛查揭示了对癌细胞增殖十分重要的基因,称为肿瘤依赖性

然而将肿瘤依赖性关系癌细胞的分子组成联系起来并进一步与肿瘤联系起来还是一个巨大的挑战。

本研究,作者提出了tensorflow框架的深度学习模型Deep—DEP

版本要求:

  • tensorflow:1.4.0
  • python3.5.2
  • cuda8.0.61
  • cudnn6.0.21
  • h5py==2.7.1
  • keras==1.2.2

首先作者队该模型使用无标签的肿瘤基因组(CCL)进行无监督预训练然后保存权重。

无监督预训练(训练集与label一致,带激活函数)

模型流程图:

 

作者使用三个独立数据集验证DeepDEP的性能。通过系统的模型解释,作者扩展了当前的癌症依赖性图谱。将DeepDEP应用于泛癌的肿瘤基因组数据并首次构建了具有临床相关性的泛癌依赖性图谱。总的来说,DeepDEP作为一种新的工具可以用于研究癌症依赖性。

无监督预训练

# Pretrain an autoencoder (AE) of tumor genomics (TCGA) to be used to initialize DeepDEP model training
print("\n\nStarting to run PretrainAE.py with a demo example of gene mutation data of 50 TCGA tumors...")import pickle
from keras import models
from keras.layers import Dense, Merge
from keras.callbacks import EarlyStopping
import numpy as np
import timedef load_data(filename):data = []gene_names = []data_labels = []lines = open(filename).readlines()#readlines读取全内容sample_names = lines[0].replace('\n', '').split('\t')[1:]#replace将空格替换  #拆分字符串。dx = 1for line in lines[dx:]:values = line.replace('\n', '').split('\t')gene = str.upper(values[0]) #upper将字符串中的小写字母转为大写字母。gene_names.append(gene)data.append(values[1:])data = np.array(data, dtype='float32')data = np.transpose(data)return data, data_labels, sample_names, gene_namesdef AE_dense_3layers(input_dim, first_layer_dim, second_layer_dim, third_layer_dim, activation_func, init='he_uniform'):print('input_dim = ', input_dim)print('first_layer_dim = ', first_layer_dim)print('second_layer_dim = ', second_layer_dim)print('third_layer_dim = ', third_layer_dim)print('init = ', init)model = models.Sequential()model.add(Dense(output_dim = first_layer_dim, input_dim = input_dim, activation = activation_func, init = init))model.add(Dense(output_dim = second_layer_dim, input_dim = first_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = third_layer_dim, input_dim = second_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = second_layer_dim, input_dim = third_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = first_layer_dim, input_dim = second_layer_dim, activation = activation_func, init = init))model.add(Dense(output_dim = input_dim, input_dim = first_layer_dim, activation = activation_func, init = init))return modeldef save_weight_to_pickle(model, file_name):print('saving weights')weight_list = []for layer in model.layers:weight_list.append(layer.get_weights())with open(file_name, 'wb') as handle:pickle.dump(weight_list, handle)if __name__ == '__main__':# load TCGA mutation data, substitute here with other genomicsdata_mut_tcga, data_labels_mut_tcga, sample_names_mut_tcga, gene_names_mut_tcga = load_data(r"D:\DEPOI\data/tcga_mut_data_paired_with_ccl.txt")print("\n\nDatasets successfully loaded.")samples_to_predict = np.arange(0, 50)# predict the first 50 samples for DEMO ONLY, for all samples please substitute 50 by data_mut_tcga.shape[0]# prediction results of all 8238 TCGA samples can be found in /data/premodel_tcga_*.pickleprint()input_dim = data_mut_tcga.shape[1]first_layer_dim = 1000second_layer_dim = 100third_layer_dim = 50batch_size = 64epoch_size = 100activation_function = 'relu'init = 'he_uniform'model_save_name = "premodel_tcga_mut_%d_%d_%d" % (first_layer_dim, second_layer_dim, third_layer_dim)t = time.time()model = AE_dense_3layers(input_dim = input_dim, first_layer_dim = first_layer_dim, second_layer_dim=second_layer_dim, third_layer_dim=third_layer_dim, activation_func=activation_function, init=init)model.compile(loss = 'mse', optimizer = 'adam')model.fit(data_mut_tcga[samples_to_predict], data_mut_tcga[samples_to_predict], nb_epoch=epoch_size, batch_size=batch_size, shuffle=True)cost = model.evaluate(data_mut_tcga[samples_to_predict], data_mut_tcga[samples_to_predict], verbose = 0)print('\n\nAutoencoder training completed in %.1f mins.\n with testloss:%.4f' % ((time.time()-t)/60, cost))save_weight_to_pickle(model, r'D:\DEPOI/results/autoencoders/' + model_save_name + '_demo.pickle')print("\nResults saved in /results/autoencoders/%s_demo.pickle\n\n" % model_save_name)

经过无监督预训练后,保存权重到pickle文件,以后载入到训练模型上用

主训练

# Train, validate, and test single-, 2-, and full 4-omics DeepDEP models
print("\n\nStarting to run TrainNewModel.py with a demo example of 28 CCLs x 1298 DepOIs...")import pickle
from keras import models
from keras.layers import Dense, Merge
from keras.callbacks import EarlyStopping
import numpy as np
import time
from matplotlib import pyplot as pltif __name__ == '__main__':with open(r'D:\DEPOI/data/ccl_complete_data_278CCL_1298DepOI_360844samples.pickle', 'rb') as f:data_mut, data_exp, data_cna, data_meth, data_dep, data_fprint = pickle.load(f)# This pickle file is for DEMO ONLY (containing 28 CCLs x 1298 DepOIs = 36344 samples)!# First 1298 samples correspond to 1298 DepOIs of the first CCL, and so on.# For the complete data used in the paper (278 CCLs x 1298 DepOIs = 360844 samples),# please substitute by 'ccl_complete_data_278CCL_1298DepOI_360844samples.pickle',# to which a link can be found in README.md# Load autoencoders of each genomics that were pre-trained using 8238 TCGA samples# New autoencoders can be pretrained using PretrainAE.pypremodel_mut = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_mut_1000_100_50.pickle', 'rb'))premodel_exp = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_exp_500_200_50.pickle', 'rb'))premodel_cna = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_cna_500_200_50.pickle', 'rb'))premodel_meth = pickle.load(open(r'D:\DEPOI/data/premodel_tcga_meth_500_200_50.pickle', 'rb'))print("\n\nDatasets successfully loaded.")activation_func = 'relu' # for all middle layersactivation_func2 = 'linear' # for output layer to output unbounded gene-effect scoresinit = 'he_uniform'dense_layer_dim = 250batch_size = 10000num_epoch = 100num_DepOI = 1298 # 1298 DepOIs as defined in our papernum_ccl = int(data_mut.shape[0]/num_DepOI)# 90% CCLs for training/validation, and 10% for testingid_rand = np.random.permutation(num_ccl)id_cell_train = id_rand[np.arange(0, round(num_ccl*0.9))]id_cell_test = id_rand[np.arange(round(num_ccl*0.9), num_ccl)]# print(id_cell_train)# prepare sample indices (selected CCLs x 1298 DepOIs)id_x=np.arange(0, 1298)id_y=id_cell_train[0]*1298id_train = np.arange(0, 1298) + id_cell_train[0]*1298for y in id_cell_train:id_train = np.union1d(id_train, np.arange(0, 1298) + y*1298)id_test = np.arange(0, 1298) + id_cell_test[0] * 1298for y in id_cell_test:id_test = np.union1d(id_test, np.arange(0, 1298) + y*1298)print("\n\nTraining/validation on %d samples (%d CCLs x %d DepOIs) and testing on %d samples (%d CCLs x %d DepOIs).\n\n" % (len(id_train), len(id_cell_train), num_DepOI, len(id_test), len(id_cell_test), num_DepOI))# Full 4-omic DeepDEP model, composed of 6 sub-networks:# model_mut, model_exp, model_cna, model_meth: to learn data embedding of each omics# model_gene: to learn data embedding of gene fingerprints (involvement of a gene in 3115 functions)# model_final: to merge the above 5 sub-networks and predict gene-effect scorest = time.time()# subnetwork of mutationsmodel_mut = models.Sequential()model_mut.add(Dense(output_dim=1000, input_dim=premodel_mut[0][0].shape[0], activation=activation_func,weights=premodel_mut[0], trainable=True))model_mut.add(Dense(output_dim=100, input_dim=1000, activation=activation_func, weights=premodel_mut[1],trainable=True))model_mut.add(Dense(output_dim=50, input_dim=100, activation=activation_func, weights=premodel_mut[2],trainable=True))# subnetwork of expressionmodel_exp = models.Sequential()model_exp.add(Dense(output_dim=500, input_dim=premodel_exp[0][0].shape[0], activation=activation_func,weights=premodel_exp[0], trainable=True))model_exp.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_exp[1],trainable=True))model_exp.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_exp[2],trainable=True))# subnetwork of copy number alterationsmodel_cna = models.Sequential()model_cna.add(Dense(output_dim=500, input_dim=premodel_cna[0][0].shape[0], activation=activation_func,weights=premodel_cna[0], trainable=True))model_cna.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_cna[1],trainable=True))model_cna.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_cna[2],trainable=True))# subnetwork of DNA methylationsmodel_meth = models.Sequential()model_meth.add(Dense(output_dim=500, input_dim=premodel_meth[0][0].shape[0], activation=activation_func,weights=premodel_meth[0], trainable=True))model_meth.add(Dense(output_dim=200, input_dim=500, activation=activation_func, weights=premodel_meth[1],trainable=True))model_meth.add(Dense(output_dim=50, input_dim=200, activation=activation_func, weights=premodel_meth[2],trainable=True))# subnetwork of gene fingerprintsmodel_gene = models.Sequential()model_gene.add(Dense(output_dim=1000, input_dim=data_fprint.shape[1], activation=activation_func, init=init,trainable=True))model_gene.add(Dense(output_dim=100, input_dim=1000, activation=activation_func, init=init, trainable=True))model_gene.add(Dense(output_dim=50, input_dim=100, activation=activation_func, init=init, trainable=True))# prediction networkmodel_final = models.Sequential()model_final.add(Merge([model_mut, model_exp, model_cna, model_meth, model_gene], mode='concat'))model_final.add(Dense(output_dim=dense_layer_dim, input_dim=250, activation=activation_func, init=init,trainable=True))model_final.add(Dense(output_dim=dense_layer_dim, input_dim=dense_layer_dim, activation=activation_func, init=init,trainable=True))model_final.add(Dense(output_dim=1, input_dim=dense_layer_dim, activation=activation_func2, init=init,trainable=True))# training with early stopping with 3 patiencehistory = EarlyStopping(monitor='val_loss', min_delta=0, patience=100, verbose=0, mode='min')model_final.compile(loss='mse', optimizer='adam')model_final.fit([data_mut[id_train], data_exp[id_train], data_cna[id_train], data_meth[id_train], data_fprint[id_train]],data_dep[id_train], nb_epoch=num_epoch, validation_split=1/9, batch_size=batch_size, shuffle=True,callbacks=[history])cost_testing = model_final.evaluate([data_mut[id_test], data_exp[id_test], data_cna[id_test], data_meth[id_test], data_fprint[id_test]],data_dep[id_test], verbose=0, batch_size=batch_size)print("\n\nFull DeepDEP model training completed in %.1f mins.\nloss:%.4f valloss:%.4f testloss:%.4f" % ((time.time() - t)/60,history.model.model.history.history['loss'][history.stopped_epoch],history.model.model.history.history['val_loss'][history.stopped_epoch], cost_testing))model_final.save(r'D:\DEPOI\results_cai/models/model_demo.h5')print("\n\nFull DeepDEP model saved in /results/models/model_demo.h5\n\n")
############################################################################################################################loss = history.model.model.history.history['loss']val_loss = history.model.model.history.history['val_loss']fig = plt.figure()plt.plot(loss, label="Training Loss")plt.plot(val_loss, label="Validation Loss")plt.title("Training and Validation Loss")plt.legend()fig.savefig("loss.png")plt.show()

预测:观察模型性能

# Predict TCGA (or other new) samples using a trained model
print("\n\nStarting to run PredictNewSamples.py with a demo example of 10 TCGA tumors...")import numpy as np
import pandas as pd
from keras import models
import time
import tensorflow as tf
import pickleif __name__ == '__main__':model_name = "model_demo"model_saved = models.load_model(r"D:\DEPOI\results_cai/models/%s.h5" % model_name)#D:\DEPOI\results_cai\models# model_paper is the full 4-omics DeepDEP model used in the paper# user can choose from single-omics, 2-omics, or full DeepDEP models from the# /data/full_results_models_paper/models/ directorywith open(r'D:\DEPOI/data/ccl_complete_data_28CCL_1298DepOI_36344samples_demo.pickle', 'rb') as f:data_mut, data_exp, data_cna, data_meth, data_dep, data_fprint = pickle.load(f)print("\n\nDatasets successfully loaded.\n\n")batch_size = 500# predict the first 10 samples for DEMO ONLY, for all samples please substitute 10 by data_mut_tcga.shape[0]# prediction results of all 8238 TCGA samples can be found in /data/full_results_models_paper/predictions/## t = time.time()y = data_depdata_pred_tmp = model_saved.predict([data_mut,data_exp,data_cna,data_meth,data_fprint], batch_size=batch_size, verbose=0)def MSE(y, t):return np.sum((y - t) ** 2)T = []T[:] = y[:, 0]P = []P[:] = data_pred_tmp[:,0]x =(MSE(np.array(P),np.array(T)).sum())X = x/(data_mut.shape[0])print(X)

这篇关于深度学习构建肿瘤依赖性图谱的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!


原文地址:https://blog.csdn.net/caihaihua0572/article/details/125095426
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.chinasem.cn/article/664379

相关文章

SpringCloud动态配置注解@RefreshScope与@Component的深度解析

《SpringCloud动态配置注解@RefreshScope与@Component的深度解析》在现代微服务架构中,动态配置管理是一个关键需求,本文将为大家介绍SpringCloud中相关的注解@Re... 目录引言1. @RefreshScope 的作用与原理1.1 什么是 @RefreshScope1.

一文详解如何从零构建Spring Boot Starter并实现整合

《一文详解如何从零构建SpringBootStarter并实现整合》SpringBoot是一个开源的Java基础框架,用于创建独立、生产级的基于Spring框架的应用程序,:本文主要介绍如何从... 目录一、Spring Boot Starter的核心价值二、Starter项目创建全流程2.1 项目初始化(

Python 中的异步与同步深度解析(实践记录)

《Python中的异步与同步深度解析(实践记录)》在Python编程世界里,异步和同步的概念是理解程序执行流程和性能优化的关键,这篇文章将带你深入了解它们的差异,以及阻塞和非阻塞的特性,同时通过实际... 目录python中的异步与同步:深度解析与实践异步与同步的定义异步同步阻塞与非阻塞的概念阻塞非阻塞同步

使用Java实现通用树形结构构建工具类

《使用Java实现通用树形结构构建工具类》这篇文章主要为大家详细介绍了如何使用Java实现通用树形结构构建工具类,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 目录完整代码一、设计思想与核心功能二、核心实现原理1. 数据结构准备阶段2. 循环依赖检测算法3. 树形结构构建4. 搜索子

使用Python和python-pptx构建Markdown到PowerPoint转换器

《使用Python和python-pptx构建Markdown到PowerPoint转换器》在这篇博客中,我们将深入分析一个使用Python开发的应用程序,该程序可以将Markdown文件转换为Pow... 目录引言应用概述代码结构与分析1. 类定义与初始化2. 事件处理3. Markdown 处理4. 转

Redis中高并发读写性能的深度解析与优化

《Redis中高并发读写性能的深度解析与优化》Redis作为一款高性能的内存数据库,广泛应用于缓存、消息队列、实时统计等场景,本文将深入探讨Redis的读写并发能力,感兴趣的小伙伴可以了解下... 目录引言一、Redis 并发能力概述1.1 Redis 的读写性能1.2 影响 Redis 并发能力的因素二、

最新Spring Security实战教程之表单登录定制到处理逻辑的深度改造(最新推荐)

《最新SpringSecurity实战教程之表单登录定制到处理逻辑的深度改造(最新推荐)》本章节介绍了如何通过SpringSecurity实现从配置自定义登录页面、表单登录处理逻辑的配置,并简单模拟... 目录前言改造准备开始登录页改造自定义用户名密码登陆成功失败跳转问题自定义登出前后端分离适配方案结语前言

Java进阶学习之如何开启远程调式

《Java进阶学习之如何开启远程调式》Java开发中的远程调试是一项至关重要的技能,特别是在处理生产环境的问题或者协作开发时,:本文主要介绍Java进阶学习之如何开启远程调式的相关资料,需要的朋友... 目录概述Java远程调试的开启与底层原理开启Java远程调试底层原理JVM参数总结&nbsMbKKXJx

Java使用Mail构建邮件功能的完整指南

《Java使用Mail构建邮件功能的完整指南》JavaMailAPI是一个功能强大的工具,它可以帮助开发者轻松实现邮件的发送与接收功能,本文将介绍如何使用JavaMail发送和接收邮件,希望对大家有所... 目录1、简述2、主要特点3、发送样例3.1 发送纯文本邮件3.2 发送 html 邮件3.3 发送带

Python结合Flask框架构建一个简易的远程控制系统

《Python结合Flask框架构建一个简易的远程控制系统》这篇文章主要为大家详细介绍了如何使用Python与Flask框架构建一个简易的远程控制系统,能够远程执行操作命令(如关机、重启、锁屏等),还... 目录1.概述2.功能使用系统命令执行实时屏幕监控3. BUG修复过程1. Authorization