本文主要是介绍Paper reading (三十四):Protein−Ligand Scoring with Convolutional Neural Networks,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
论文题目:Protein−Ligand Scoring with Convolutional Neural Networks
scholar 引用:121
页数:16
发表时间:2017.04
发表刊物:Journal of Chemical Information and Modeling
作者:Matthew Ragoza, Joshua Hochuli, Elisa Idrobo,Jocelyn Sunseri, and David Ryan Koes
摘要:
Computational approaches to drug discovery can reduce the time and cost associtated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network(CNN) scoring functions that take as input a comprehensive three-dimensional(3D) representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and nonbinders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses prediction and virtual screening.
Discussion:
- By several metrics, our CNN models outperform alternative approaches,in particular the Autodock Vina empirical scoring function and the RF-Score and NNScore machine learning scoring functions.
- In intertarget evaluations of pose prediction, both using crossvalidation and an independent test set, CNN models can perform substantially better.
- CNN models can do well in virtual screening evaluations.
- Although the CNN models performed well in an intertarget pose prediction evaluation, they performed worse at intratarget pose ranking, which is more relevant to molecular docking. 可能的应对方法:changing the training protocol to more faithfully represent this task
- Incorporating the binding affinity as a component of training, or performing relation classification, which assesses the ability of the network to rank rather than score poses, may significantly improve intratarget performance of CNN models.
- We believe that our use of clustered cross-validation, which not only avoids training on ligands of the same target but also all similar targets, should mitigate some of the artificial enrichment issues inherent in DUD-E.
- our independent test sets both used an entirely different method of data set construction than the DUD-E set.
- our models’ ability to generalize beyond the task inherent in the training data, while present, is limited.
- we have not yet observed instances where including multitask training data resulted in a synergistic effect, improving the performance of all tasks.
- 开源~~~~all of our code and models are available under an open source license as part of our gnina molecular docking software at https://github. com/gnina.
Introduction:
- Protein−ligand scoring is a keystone of structure-based drug design.
- Three goals:accurately predicting the binding affinity of the complex;selecting the correct binding mode (pose prediction);distinguishing between binders and nonbinders (virtual screening).
- Structure-based scoring functions that use neural networks were recently shown to be competitive with empirical scoring in retrospective virtual screening exercises while also being effective in a prospective screen of estrogen receptor ligands.
- The impressive performance of CNNs at the image recognition task suggests that they are well-suited for learning from other types of spatial data, such as protein−ligand structures.
- a CNN model for protein−ligand scoring that is trained to classify compound poses as binders or nonbinders using a 3D grid representation of protein−ligand structures generated through docking.
- our CNN scoring method outperforms the AutoDock Vina scoring function that is used to generate the poses both when selecting poses for pose prediction and for virtual screening tasks.
- Our method outperforms other machine learning approaches in our virtual screening evaluation even when it is also trained to perform well at pose-sensitive pose prediction.
- we illustrate how our CNN score can be decomposed into individual atomic contributions to generate informative visualizations.
正文组织架构:
1. Introduction
2. Methods
2.1 Training Sets
2.2 Input Format
2.3 Training
2.4 Model Evaluation
2.5 Optimization
2.6 Visualization
3. Results
3.1 Optimization
3.2 Pose Prediction
3.3 Vitual Screening
3.4 Combined Training
3.5 Independent Test Sets
3.6 Visualization
4. Discussion
正文部分内容摘录:
2. Methods
2.1 Training Sets
- 有两个训练集,two training sets, one focused on pose prediction and the other on virtual screening.
- virtual screening,虚拟筛选,也称计算机筛选,即在进行生物活性筛选之前,利用计算机上的分子对接软件模拟目标靶点与候选药物之间的相互作用,计算两者之间的亲和力大小,以降低实际筛选化合物数目,同时提高先导化合物发现效率。
- pose prediction,结合模式预测
- 分子对接常用于:(1)预测结合模式(pose/binding mode);(2)预测结合亲和力(binding affinity);(3)虚拟筛选(Virtual screening)。
- We use docked poses, even for active compounds with a known crystal structure, because (1) these are the types of poses the model will ultimately have to score and (2) to avoid the model simply learning to distinguish between docked poses and crystal structures (which were likely optimized with different force fields).
- Pose Prediction: CSAR. based on the CSAR-NRC HiQ data set, The final training set consists of 745 positive examples from 327 distinct targets and 3251 negative examples from 300 distinct targets (some targets produce only low or high RMSD poses).
- Virtual Screening: DUD-E.based off the Database of Useful Decoys: Enhanced (DUDFigureE) data set. The final training set contains 22 645 positive examples and 1 407 145 negative examples.
2.2 Input Format
- To handle our 3D structural data, we discretize a protein−ligand structure into a grid.
- Only smina atom types that were present in the ligands and proteins of the training set were retained.
- We represent each atom as a function A(d, r) where d is the distance from the atom center and r is the van der Waals radius
- We generate these grids of atom density using a custom, GPU-accelerated layer, MolGridDataLayer, of the Caffe deep learning framework.
2.3 Training
- Our CNN models were defined and trained using the Caffe deep learning framework. caffe框架
- Training minimized the multinomial logistic loss of the network using a variant of stochastic gradient descent (SGD) and backpropagation. 随机梯度下降,反向传播
- The same parameters for the SGD solver (batch_size = 10, base_lr = 0.01, momentum = 0.9), for learning rate decay (lr_policy = inverse, power = 1, gamma = 0.001), and for regularization (weight_decay = 0.001, dropout_ratio = 0.5) were used to train all models.
2.4 Model Evaluation
- The performance of trained CNN models were evaluated by 3-fold cross-validation for both the pose prediction and virtual screening tasks.
- 除了跟Vina scoring function比较,还有跟RF-Score and NNScore对比, to separate actives from inactives in the virtual screening evaluation.
- Independent Test Sets.
- To evaluate pose prediction performance, we utilized the 2013 PDBbind core set. The core set is a representative, nonredundant subset of the database and is composed of 195 protein−ligand complexes in 65 families.
- To assess virtual screening performance, we utilized two data sets created from assay results.One was generated from ChEMBL by Riniker and Landrum, Our other virtual screening data set is a subset of the maximum unbiased validation (MUV) data set, which is based on PubChem bioactivity data.
- To avoid artificially enhancing our performance on these test sets, we enforced a maximum similarity between targets included in the test sets and targets from DUD-E and CSAR used for training.
2.5 Optimization
- Several model parameters were explored: Atom Types, Occupancy Type, Atomic Radius Multiplier, Resolution, Layer Width, Model Depth, Pooling Type, Fully Connected Layer.
- Atom Types: In addition to the default smina atom types, we evaluated two simpler atom typing schemes: element-only and ligand/receptor only.
- Occupancy Type:In addition to a smoothed Gaussian distribution of atom density, we also evaluated a Boolean representation, where grid point values are one if they overlap an atom and zero otherwise. 感觉这系列就是又手工设计了很多特征?
- Atomic Radius Multiplier: By default, we extend atom densities beyond the van der Waals radius by a multiple of 1.5. Additionally, we evaluated multiples of 1.0, 1.25, 1.75, and 2.0.
- Resolution(分辨率): The default grid resolution is 0.5 Å resulting in 48 grid points. We also evaluated higher (0.25 Å) and lower (0.75, 1.0, and 1.5 Å) resolution grids.
- Layer Width: We also evaluate models that double, half, and quarter the width of these layers. Wider layers allow for a more expressive model, but at the cost of more computation.
- Model Depth: Our initial model contained five convolution layers. We also evaluate models with more (up to 8) and fewer (as little as 1) convolution layers. 超参的选择?
- Pooling Type: In our initial model we use max pooling with a kernel size of 2. We additionally evaluate average pooling and kernels of size 4.
- Fully Connected Layer: Our initial model contains a single fully connected layer with no hidden nodes. Additionally, we evaluate alternative models with a single hidden layer with anywhere from 6 to 50 nodes.
2.6 Visualization
- In order to better understand the features that the neural network learns, we implemented a visualization algorithm based on masking.
3. Results
We evaluated the optimized network architecture for performance in pose prediction, virtual screening, and affinity prediction, while also considering the importance of the training set used to create the model.
3.1 Optimization
- Two rounds of model optimization were performed. In each round, parameters of a reference model were individually varied.
- the final model increased to an AUC of 0.82 with a training time of 120 ms per an iteration.
- the final reference model further reduced the depth to three convolutional layers.
- The best AUCs are achieved using smina atom types.
- although the overall AUCs were similar, smina and element-only atom types result in better early enrichment (the initial slope of the ROC curve is steeper).
- The models do not seem to need the additional distance information provided by a Gaussian atomic density.
- The default radius multiplier of 1.5 provided the best AUC.
- we decided against using higher resolution grids since the small increase in AUC (0.02) in increasing the resolution from 0.5 to 0.25 Å was accompanied by a more than 4× increase in periteration training time, which directly correlates with the evaluation time of the final model.
- Reducing the width improved both the AUC and training time up to a limit.
- Model depth behaved similarly to the layer width parameter.
- Somewhat surprisingly, the use of average pooling instead of max pooling obliterated predictive performance and prevented the model from learning. Alternative kernel sizes did not improve the AUC.
- suggesting most of the learning is taking place in the convolutional layers.
3.2 Pose Prediction
- Pose prediction assesses the ability of a scoring function to distinguish between low RMSD and high RMSD poses of the same compound.
- The CNN model performed substantially better than the Autodock Vina scoring function in its ability to perform intertarget ranking of CSAR poses. The CNN model achieves an AUC of 0.815 while the Vina scoring function has an AUC of 0.645.
- In intratarget ranking, the CNN model performed substantially worse than Autodock Vina.
- The CNN scores weakly correlate with RMSD, with higher RMSD poses exhibiting lower scores as expected (a more positive CNN score is more favorable).
3.3 Vitual Screening
- Structure-based virtual screening assesses the ability of a scoring function to distinguish between active and inactive compounds using docked structures.
- In terms of early enrichment, the CNN model is 2−4 times better than Vina on average.
- the CNN models and Vina both outperform the alternative machine learning scoring functions evaluated.
3.4 Combined Training
- CNN models trained on one kind of data do not generalize particularly well to another.
- generated poses may be highly inaccurate.
- Similar trade-offs between learning nonstructural cheminformatic information and enforcing structural constraints likely explain the difference in performance between the DUD-E and combined models.
3.5 Independent Test Sets
- To evaluate CNN scoring performance on our independent test sets, we trained three models using all folds of the available training data: a pose prediction model trained only on CSAR data, a virtual screening model trained only on DUD-E data, and a combined model trained on DUD-E and CSAR data at a 2:1 ratio.
- for all methods, the highest performance is achieved with the ChEMBL actives.
- for this target at least, that the method used to construct the decoys is not the cause of the observed poor performance and that the performance observed on the ChEMBL set is not due to artificial enrichment.
3.6 Visualization
- Visualization is intended to provide a qualitative and easy to interpret indication of the atomic features that are driving the CNN model’s output.
- In order to more quantitatively assess the utility of our visualization approach, we considered single-residue protein mutation data and partially aligned poses.
- The fact that critical residues are highlighted suggests that the model is learning some general underlying model of the key features of protein−ligand interactions.
- For all five poses, the CNN model ranks the crystal pose higher than the docked pose.
- 表示好多图都没看太懂。。。
这篇关于Paper reading (三十四):Protein−Ligand Scoring with Convolutional Neural Networks的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!