论文：用GNN（化合物）和CNN（蛋白）进行CPI预测

本文主要是介绍论文：用GNN（化合物）和CNN（蛋白）进行CPI预测，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

用GNN（化合物）和CNN（蛋白）进行CPI预测

github地址
题目：Compound-protein interaction (CPI) prediction using a GNN for compounds and a CNN for proteins
在这里插入图片描述
Furthermore, by using the obtained weights, the neural attention mechanism provides clear visualizations, which makes models easier to analyze (Fig. 9) even when modeling is performed using real-valued vector representations rather than discrete features.

总的来说，这篇文章使用的分子特征有：

1，每个原子的编号，每个原子是否有芳香性，
在这里插入图片描述
2，每个原子周围有几个几号原子，对应的分别是什么健类型

3，使用了分子指纹，Weisfeiler-Lehman(WL)算法算出来的，根据半径进行卷积，具体的原理我也不是特别清楚

4，使用了邻接矩阵

5，

示例分子：
在这里插入图片描述

遇到的知识点学习

1,from collections import defaultdict
Python defaultdict() 的理解
在这里插入图片描述

defaultdict lambda 用法

2,什么是Weisfeiler-Lehman(WL)算法和WL Test？
本文中用到了，但是我不清楚其具体的原理，目前先理解为进行了一下图卷积

3,这里有for循环的高阶用法

    words = [word_dict[sequence[i:i+ngram]]for i in range(len(sequence)-ngram+1)]

word_dict = defaultdict(lambda: len(word_dict))def split_sequence(sequence, ngram):sequence = '-' + sequence + '='words = [word_dict[sequence[i:i+ngram]]for i in range(len(sequence)-ngram+1)]return np.array(words)split_sequence('ab=aaabb=ababba=', 2)

output>>> array([0, 1, 2, 3, 4, 4, 1, 5, 2, 3, 1, 6, 1, 5, 6, 7, 8])

4,sys.argv[] 用法
这个实在是看不懂。。。
5，python 中 map函数的用法（超详细）
6, python enumerate用法总结
对于一个可迭代的（iterable）/可遍历的对象（如列表、字符串），enumerate将其组成一个索引序列，利用它可以同时获得索引和值
7,strip()函数使用方法
1.默认用法：去除空格
str.strip() ：去除字符串两边的空格 str.lstrip() ：去除字符串左边的空格 str.rstrip() ：去除字符串右边的空格
注：此处的空格包含’\n’, ‘\r’, ‘\t’, ’ ’
2、去除指定字符
str.strip(‘do’) ：去除字符串两端指定的字符 str.lstrip(‘do’) ：用于去除左边指定的字符 str.rstrip(‘do’) ：用于去除右边指定的字符
8,复习pytorch构架
在这里插入图片描述