本文主要是介绍从多序列比对MSA中计算每个位置氨基酸的概率特征,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
对于蛋白质序列中的每个位置,从 protein['msa']特征计算氨基酸(22种,包括X和gap)的概率值,从而得到protein['hhblits_profile']特征,protein['msa']先进行one-hot转化再reduce_mean计算。
import pickle
import tensorflow as tfdef make_hhblits_profile(protein):"""Compute the HHblits MSA profile if not already present."""if 'hhblits_profile' in protein:return protein# 表示蛋白质氨基酸存在于特定位置的概率值# Compute the profile for every residue (over all MSA sequences).protein['hhblits_profile'] = tf.reduce_mean(tf.one_hot(protein['msa'], 22), axis=0)return proteinwith open("Human_HBB_tensor_dict.pkl",'rb') as f:Human_HBB_tensor_dict = pickle.load(f)protein = Human_HBB_tensor_dict# protein['msa'] 维度为 shape=(771, 144)
print("protein['msa']")
print(protein['msa'])# tf.one_hot(protein['msa'], 22) 维度为shape=(771, 144, 22)
#print("tf.one_hot(protein['msa'], 22)")
#print(tf.one_hot(protein['msa'], 22)) protein = make_hhblits_profile(protein)# protein['hhblits_profile'] 维度为shape=(144, 22)
print("protein['hhblits_profile']")
print(protein['hhblits_profile'])
这篇关于从多序列比对MSA中计算每个位置氨基酸的概率特征的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!