Paper reading (三十四)：Protein−Ligand Scoring with Convolutional Neural Networks

本文主要是介绍Paper reading (三十四)：Protein−Ligand Scoring with Convolutional Neural Networks，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

论文题目：Protein−Ligand Scoring with Convolutional Neural Networks

scholar 引用：121

页数：16

发表时间：2017.04

发表刊物：Journal of Chemical Information and Modeling

作者：Matthew Ragoza, Joshua Hochuli, Elisa Idrobo,Jocelyn Sunseri, and David Ryan Koes

摘要：

Computational approaches to drug discovery can reduce the time and cost associtated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network(CNN) scoring functions that take as input a comprehensive three-dimensional(3D) representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and nonbinders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses prediction and virtual screening.

Discussion：

By several metrics, our CNN models outperform alternative approaches,in particular the Autodock Vina empirical scoring function and the RF-Score and NNScore machine learning scoring functions.
In intertarget evaluations of pose prediction, both using crossvalidation and an independent test set, CNN models can perform substantially better.
CNN models can do well in virtual screening evaluations.
Although the CNN models performed well in an intertarget pose prediction evaluation, they performed worse at intratarget pose ranking, which is more relevant to molecular docking. 可能的应对方法：changing the training protocol to more faithfully represent this task
Incorporating the binding affinity as a component of training, or performing relation classification, which assesses the ability of the network to rank rather than score poses, may significantly improve intratarget performance of CNN models.
We believe that our use of clustered cross-validation, which not only avoids training on ligands of the same target but also all similar targets, should mitigate some of the artificial enrichment issues inherent in DUD-E.
our independent test sets both used an entirely different method of data set construction than the DUD-E set.
our models’ ability to generalize beyond the task inherent in the training data, while present, is limited.
we have not yet observed instances where including multitask training data resulted in a synergistic effect, improving the performance of all tasks.
开源~~~~all of our code and models are available under an open source license as part of our gnina molecular docking software at https://github. com/gnina.

Introduction：

Protein−ligand scoring is a keystone of structure-based drug design.
Three goals：accurately predicting the binding affinity of the complex；selecting the correct binding mode (pose prediction)；distinguishing between binders and nonbinders (virtual screening).
Structure-based scoring functions that use neural networks were recently shown to be competitive with empirical scoring in retrospective virtual screening exercises while also being effective in a prospective screen of estrogen receptor ligands.
The impressive performance of CNNs at the image recognition task suggests that they are well-suited for learning from other types of spatial data, such as protein−ligand structures.
a CNN model for protein−ligand scoring that is trained to classify compound poses as binders or nonbinders using a 3D grid representation of protein−ligand structures generated through docking.
our CNN scoring method outperforms the AutoDock Vina scoring function that is used to generate the poses both when selecting poses for pose prediction and for virtual screening tasks.
Our method outperforms other machine learning approaches in our virtual screening evaluation even when it is also trained to perform well at pose-sensitive pose prediction.
we illustrate how our CNN score can be decomposed into individual atomic contributions to generate informative visualizations.

正文组织架构：

1. Introduction

2. Methods

2.1 Training Sets

2.2 Input Format

2.3 Training

2.4 Model Evaluation

2.5 Optimization

2.6 Visualization

3. Results

3.1 Optimization

3.2 Pose Prediction

3.3 Vitual Screening

3.4 Combined Training

3.5 Independent Test Sets

3.6 Visualization

4. Discussion

正文部分内容摘录：