# Bibliography of computer-aided Drug Design

Updated on 7/18/2014. Currently 2130 references

## Screening / Benchmarks

2014 / 2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2007 / 2006 / 2005 / 2004 / 2003 /

## 2014

• Combining in silico and in cerebro approaches for virtual screening and pose prediction in SAMPL4.
Voet, Arnout R D and Kumar, Ashutosh and Berenger, Francois and Zhang, Kam Y J
Journal of computer-aided molecular design, 2014
PMID: 24446075     doi: 10.1007/s10822-013-9702-2

The SAMPL challenges provide an ideal opportunity for unbiased evaluation and comparison of different approaches used in computational drug design. During the fourth round of this SAMPL challenge, we participated in the virtual screening and binding pose prediction on inhibitors targeting the HIV-1 integrase enzyme. For virtual screening, we used well known and widely used in silico methods combined with personal in cerebro insights and experience. Regular docking only performed slightly better than random selection, but the performance was significantly improved upon incorporation of additional filters based on pharmacophore queries and electrostatic similarities. The best performance was achieved when logical selection was added. For the pose prediction, we utilized a similar consensus approach that amalgamated the results of the Glide-XP docking with structural knowledge and rescoring. The pose prediction results revealed that docking displayed reasonable performance in predicting the binding poses. However, prediction performance can be improved utilizing scientific experience and rescoring approaches. In both the virtual screening and pose prediction challenges, the top performance was achieved by our approaches. Here we describe the methods and strategies used in our approaches and discuss the rationale of their performances.

## 2013

• Comparing neural-network scoring functions and the state of the art: applications to common library screening.
Durrant, Jacob D and Friedman, Aaron J and Rogers, Kathleen E and McCammon, J Andrew
Journal of chemical information and modeling, 2013, 53(7), 1726-1735
PMID: 23734946     doi: 10.1021/ci400042y

We compare established docking programs, AutoDock Vina and Schrödinger's Glide, to the recently published NNScore scoring functions. As expected, the best protocol to use in a virtual-screening project is highly dependent on the target receptor being studied. However, the mean screening performance obtained when candidate ligands are docked with Vina and rescored with NNScore 1.0 is not statistically different than the mean performance obtained when docking and scoring with Glide. We further demonstrate that the Vina and NNScore docking scores both correlate with chemical properties like small-molecule size and polarizability. Compensating for these potential biases leads to improvements in virtual screen performance. Composite NNScore-based scoring functions suited to a specific receptor further improve performance. We are hopeful that the current study will prove useful for those interested in computer-aided drug design.

• Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening.
Heikamp, Kathrin and Bajorath, Jürgen
Journal of chemical information and modeling, 2013, 53(7), 1595-1601
PMID: 23799269     doi: 10.1021/ci4002712

The choice of negative training data for machine learning is a little explored issue in chemoinformatics. In this study, the influence of alternative sets of negative training data and different background databases on support vector machine (SVM) modeling and virtual screening has been investigated. Target-directed SVM models have been derived on the basis of differently composed training sets containing confirmed inactive molecules or randomly selected database compounds as negative training instances. These models were then applied to search background databases consisting of biological screening data or randomly assembled compounds for available hits. Negative training data were found to systematically influence compound recall in virtual screening. In addition, different background databases had a strong influence on the search results. Our findings also indicated that typical benchmark settings lead to an overestimation of SVM-based virtual screening performance compared to search conditions that are more relevant for practical applications.

• Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 - A Public Library of Challenging Docking Benchmark Sets.
Bauer, Matthias R and Ibrahim, Tamer M and Vogel, Simon M and Boeckler, Frank M
Journal of chemical information and modeling, 2013, 53(6), 1447-1462
PMID: 23705874     doi: 10.1021/ci400115b

The application of molecular benchmarking sets helps to assess the actual performance of virtual screening (VS) workflows. To improve the efficiency of structure-based VS approaches, the selection and optimization of various parameters can be guided by benchmarking. With the DEKOIS 2.0 library, we aim to further extend and complement the collection of publicly available decoy sets. Based on BindingDB bioactivity data, we provide 81 new and structurally diverse benchmark sets for a wide variety of different target classes. To ensure a meaningful selection of ligands, we address several issues that can be found in bioactivity data. We have improved our previously introduced DEKOIS methodology with enhanced physicochemical matching, now including the consideration of molecular charges, as well as a more sophisticated elimination of latent actives in the decoy set (LADS). We evaluate the docking performance of Glide, GOLD, and AutoDock Vina with our data sets and highlight existing challenges for VS tools. All DEKOIS 2.0 benchmark sets will be made accessible at http://www.dekois.com .

• Are predicted protein structures of any value for binding site prediction and virtual ligand screening?
Skolnick, Jeffrey and Zhou, Hongyi and Gao, Mu
Current Opinion in Structural Biology VL -, 2013(0 SP - EP - PY - T2 -)
PMID: 23415854     doi: 10.1016/j.sbi.2013.01.009

The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ∼75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.

## 2012

• FRED and HYBRID docking performance on standardized datasets.
McGann, Mark
Journal of computer-aided molecular design, 2012, 26(8), 897-906
PMID: 22669221     doi: 10.1007/s10822-012-9584-8

The docking performance of the FRED and HYBRID programs are evaluated on two standardized datasets from the Docking and Scoring Symposium of the ACS Spring 2011 national meeting. The evaluation includes cognate docking and virtual screening performance. FRED docks 70 % of the structures to within 2\AA} in the cognate docking test. In the virtual screening test, FRED is found to have a mean AUC of 0.75. The HYBRID program uses a modified version of FRED's algorithm that uses both ligand- and structure-based information to dock molecules, which increases its mean AUC to 0.78. HYBRID can also implicitly account for protein flexibility by making use of multiple crystal structures. Using multiple crystal structures improves HYBRID's performance (mean AUC 0.80) with a negligible increase in docking time (~15 %).

• A comparative analysis of pharmacophore screening tools.
Sanders, Marijn P A and Barbosa, Arménio J M and Zarzycka, Barbara and Nicolaes, Gerry A F and Klomp, Jan P G and de Vlieg, Jacob and Del Rio, Alberto
Journal of chemical information and modeling, 2012, 52(6), 1607-1620
PMID: 22646988     doi: 10.1021/ci2005274

The pharmacophore concept is of central importance in computer-aided drug design (CADD) mainly due to its successful application in medicinal chemistry and, in particular, high-throughput virtual screening (HTVS). The simplicity of the pharmacophore definition enables the complexity of molecular interactions between ligand and receptor to be reduced to a handful set of features. With many pharmacophore screening software available, it is of the utmost interest to explore the behavior of these tools when applied to different biological systems. In this work we present a comparative analysis of eight pharmacophore screening algorithms (Catalyst, Unity, LigandScout, Phase, Pharao, MOE, Pharmer and POT) for their use in typical HTVS campaigns against four different biological targets by using default settings. The results herein presented show how the performance of each pharmacophore screening tool might be specifically related to factors such as the characteristics of the binding pocket, the use of specific pharmacophore features and the use of these techniques in specific steps/contexts of the drug discovery pipeline. Algorithms with RMSD-based scoring functions are able to predict more compound poses correctly as overlay-based scoring functions. However the ratio of correctly predicted compound poses versus incorrectly predicted poses is better for overlay-based scoring functions which also insure better performances in compound library enrichments. While the ensemble of these observations can be used to choose the most appropriate class of algorithm for specific virtual screening projects, we remarked that pharmacophore algorithms are often equally good and in this respect we also analyzed how pharmacophore algorithms can be combined together in order to increase the success of hit compound identification. This study provides a valuable benchmark set for further developments in the field of pharmacophore search algorithms e.g. by using pose predictions and compound library enrichment criteria.

• Performance Evaluation of 2D Fingerprint and 3D Shape Similarity Methods in Virtual Screening.
Hu, Guoping and Kuang, Guanglin and Xiao, Wen and Li, Weihua and Liu, Guixia and Tang, Yun
Journal of chemical information and modeling, 2012, 52(5), 1103-1113
PMID: 22551340     doi: 10.1021/ci300030u

Virtual screening (VS) can be accomplished in either ligand- or structure-based methods. In recent times, an increasing number of 2D fingerprint and 3D shape similarity methods have been used in ligand-based VS. To evaluate the performance of these ligand-based methods, retrospective VS was performed on a tailored directory of useful decoys (DUD). The VS performances of 14 2D fingerprints and four 3D shape similarity methods were compared. The results revealed that 2D fingerprints ECFP_2 and FCFP_4 yielded better performance than the 3D Phase Shape methods. These ligand-based methods were also compared with structure-based methods, such as Glide docking and Prime molecular mechanics generalized Born surface area rescoring, which demonstrated that both 2D fingerprint and 3D shape similarity methods could yield higher enrichment during early retrieval of active compounds. The results demonstrated the superiority of ligand-based methods over the docking-based screening in terms of both speed and hit enrichment. Therefore, considering ligand-based methods first in any VS workflow would be a wise option.

• Evaluation of DOCK 6 as a pose generation and database enrichment tool.
Brozell, Scott R and Mukherjee, Sudipto and Balius, Trent E and Roe, Daniel R and Case, David A and Rizzo, Robert C
Journal of computer-aided molecular design, 2012, 26(6), 749-773
PMID: 22569593     doi: 10.1007/s10822-012-9565-y

In conjunction with the recent American Chemical Society symposium titled "Docking and Scoring: A Review of Docking Programs" the performance of the DOCK6 program was evaluated through (1) pose reproduction and (2) database enrichment calculations on a common set of organizer-specified systems and datasets (ASTEX, DUD, WOMBAT). Representative baseline grid score results averaged over five docking runs yield a relatively high pose identification success rate of 72.5 % (symmetry corrected rmsd) and sampling rate of 91.9 % for the multi site ASTEX set (N

• Lead Finder docking and virtual screening evaluation with Astex and DUD test sets.
Novikov, Fedor N and Stroylov, Viktor S and Zeifman, Alexey A and Stroganov, Oleg V and Kulkov, Val and Chilov, Ghermes G
Journal of computer-aided molecular design, 2012, 26(6), 725-735
PMID: 22569592     doi: 10.1007/s10822-012-9549-y

Lead Finder is a molecular docking software. Sampling uses an original implementation of the genetic algorithm that involves a number of additional optimization procedures. Lead Finder's scoring functions employ a set of semi-empiric molecular mechanics functionals that have been parameterized independently for docking, binding energy predictions and rank-ordering for virtual screening. Sampling and scoring both utilize a staged approach, moving from fast but less accurate algorithm versions to computationally more intensive but more accurate versions. Lead Finder includes tools for the preparation of full atom protein and ligand models. In this exercise, Lead Finder achieved 72.9% docking success rate on the Astex test set when the original author-prepared full atom models were used, and 74.1% success rate when the structures were prepared by Lead Finder. The major cause of docking failures were scoring errors resulting from the use of imperfect solvation models. In many cases, docking errors could be corrected by the proper protonation and the use of correct cyclic conformations of ligands. In virtual screening experiments on the DUD test set the early enrichment factor of several tens was achieved on average. However, the area under the ROC curve ("AUC ROC") ranged from 0.70 to 0.74 depending on the screening protocol used, and the separation from the null model was not perfect-0.12-0.15 units of AUC ROC. We assume that effective virtual screening in the whole range of enrichment curve and not just at the early enrichment stages requires more accurate solvation modeling and accounting for the protein backbone flexibility.

• Surflex-Dock: Docking benchmarks and real-world application.
Spitzer, Russell and Jain, Ajay N
Journal of computer-aided molecular design, 2012, 26(6), 687-699
PMID: 22569590     doi: 10.1007/s10822-011-9533-y

Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.

• Docking performance of the glide program as evaluated on the Astex and DUD datasets: a complete set of glide SP results and selected results for a new scoring function integrating WaterMap and glide.
Repasky, Matthew P and Murphy, Robert B and Banks, Jay L and Greenwood, Jeremy R and Tubert-Brohman, Ivan and Bhat, Sathesh and Friesner, Richard A
Journal of computer-aided molecular design, 2012, 26(6), 787-799
PMID: 22576241     doi: 10.1007/s10822-012-9575-9

Glide SP mode enrichment results for two preparations of the DUD dataset and native ligand docking RMSDs for two preparations of the Astex dataset are presented. Following a best-practices preparation scheme, an average RMSD of 1.140\AA} for native ligand docking with Glide SP is computed. Following the same best-practices preparation scheme for the DUD dataset an average area under the ROC curve (AUC) of 0.80 and average early enrichment via the ROC (0.1 %) metric of 0.12 were observed. 74 and 56 % of the 39 best-practices prepared targets showed AUC over 0.7 and 0.8, respectively. Average AUC was greater than 0.7 for all best-practices protein families demonstrating consistent enrichment performance across a broad range of proteins and ligand chemotypes. In both Astex and DUD datasets, docking performance is significantly improved employing a best-practices preparation scheme over using minimally-prepared structures from the PDB. Enrichment results for WScore, a new scoring function and sampling methodology integrating WaterMap and Glide, are presented for four DUD targets, hivrt, hsp90, cdk2, and fxa. WScore performance in early enrichment is consistently strong and all systems examined show AUC > 0.9 and superior early enrichment to DUD best-practices Glide SP results.

• Pose prediction and virtual screening performance of GOLD scoring functions in a standardized test.
Liebeschuetz, John W and Cole, Jason C and Korb, Oliver
Journal of computer-aided molecular design, 2012, 26(6), 737-748
PMID: 22371207     doi: 10.1007/s10822-012-9551-4

The performance of all four GOLD scoring functions has been evaluated for pose prediction and virtual screening under the standardized conditions of the comparative docking and scoring experiment reported in this Edition. Excellent pose prediction and good virtual screening performance was demonstrated using unmodified protein models and default parameter settings. The best performing scoring function for both pose prediction and virtual screening was demonstrated to be the recently introduced scoring function ChemPLP. We conclude that existing docking programs already perform close to optimally in the cognate pose prediction experiments currently carried out and that more stringent pose prediction tests should be used in the future. These should employ cross-docking sets. Evaluation of virtual screening performance remains problematic and much remains to be done to improve the usefulness of publically available active and decoy sets for virtual screening. Finally we suggest that, for certain target/scoring function combinations, good enrichment may sometimes be a consequence of 2D property recognition rather than a modelling of the correct 3D interactions.

• Cheminformatics Meets Molecular Mechanics: A Combined Application of Knowledge-Based Pose Scoring and Physical Force Field-Based Hit Scoring Functions Improves the Accuracy of Structure-Based Virtual Screening
Hsieh, Jui-Hua and Yin, Shuangye and Wang, Xiang S and Liu, Shubin and Dokholyan, Nikolay V and Tropsha, Alexander
Journal of chemical information and modeling, 2012, 52(1), 16-28
PMID: 22017385     doi: 10.1021/ci2002507

• Virtual screening data fusion using both structure- and ligand-based methods.
Svensson, Fredrik and Karlén, Anders and Sköld, Christian
Journal of chemical information and modeling, 2012, 52(1), 225-232
PMID: 22148635     doi: 10.1021/ci2004835

Virtual screening is widely applied in drug discovery, and significant effort has been put into improving current methods. In this study, we have evaluated the performance of compound ranking in virtual screening using five different data fusion algorithms on a total of 16 data sets. The data were generated by docking, pharmacophore search, shape similarity, and electrostatic similarity, spanning both structure- and ligand-based methods. The algorithms used for data fusion were sum rank, rank vote, sum score, Pareto ranking, and parallel selection. None of the fusion methods require any prior knowledge or input other than the results from the single methods and, thus, are readily applicable. The results show that compound ranking using data fusion improves the performance and consistency of virtual screening compared to the single methods alone. The best performing data fusion algorithm was parallel selection, but both rank voting and Pareto ranking also have good performance.

• Kinase-Kernel Models: Accurate In silico Screening of 4 Million Compounds Across the Entire Human Kinome.
Martin, Eric and Mukherjee, Prasenjit
Journal of chemical information and modeling, 2012, 52(1), 156-170
PMID: 22133092     doi: 10.1021/ci200314j

Reliable in silico prediction methods promise many advantages over experimental high-throughput screening (HTS): vastly lower time and cost, affinity magnitude estimates, no requirement for a physical sample, and a knowledge-driven exploration of chemical space. For the specific case of kinases, given several hundred experimental IC(50) training measurements, the empirically parametrized profile-quantitative structure-activity relationship (profile-QSAR) and surrogate AutoShim methods developed at Novartis can predict IC(50) with a reliability approaching experimental HTS. However, in the absence of training data, prediction is much harder. The most common a priori prediction method is docking, which suffers from many limitations: It requires a protein structure, is slow, and cannot predict affinity. (1) Highly accurate profile-QSAR (2) models have now been built for roughly 100 kinases covering most of the kinome. Analyzing correlations among neighboring kinases shows that near neighbors share a high degree of SAR similarity. The novel chemogenomic kinase-kernel method reported here predicts activity for new kinases as a weighted average of predicted activities from profile-QSAR models for nearby neighbor kinases. Three different factors for weighting the neighbors were evaluated: binding site sequence identity to the kinase neighbors, similarity of the training set for each neighbor model to the compound being predicted, and accuracy of each neighbor model. Binding site sequence identity was by far most important, followed by chemical similarity. Model quality had almost no relevance. The median R(2)

• Do crystal structures obviate the need for theoretical models of GPCRs for structure based virtual screening.
Tang, Hao and Wang, Xiang Simon and Hsieh, Jui-Hua and Tropsha, Alexander
Proteins, 2012, 80(6), 1503-1521
PMID: 22275072     doi: 10.1002/prot.24035

Recent highly expected structural characterizations of agonist-bound and antagonist-bound beta-2 adrenoreceptor ($\beta$2AR) by X-ray crystallography have been widely regarded as critical advances to enable more effective structure-based discovery of GPCRs ligands. It appears that this very important development may have undermined many previous efforts to develop 3D theoretical models of GPCRs. To address this question directly we have compared several historical $\beta$2AR models versus the inactive state and nanobody-stabilized active state of $\beta$2AR crystal structures in terms of their structural similarity and effectiveness of use in virtual screening for $\beta$2AR specific agonists and antagonists. Theoretical models, incluing both homology and de novo types, were collected from five different groups who have published extensively in the field of GPCRs modeling; all models were built before X-ray structures became available. In general, $\beta$2AR theoretical models differ significantly from the crystal structure in terms of TMH definition and the global packing. Nevertheless, surprisingly, several models afforded hit rates resulting from virtual screening of large chemical library enriched by known $\beta$2AR ligands that exceeded those using X-ray structures; the hit rates were particularly higher for agonists. Furthemore, the screening performance of models is associated with local structural quality such as the RMSDs for binding pocket residues and the ability to capture accurately most if not all critical protein/ligand interactions. These results suggest that carefully built models of GPCRs could capture critical chemical and structural features of the binding pocket thus may be even more useful for practical structure-based drug discovery than X-ray structures. Proteins 2012.

• Computational fragment-based screening using RosettaLigand: the SAMPL3 challenge.
Kumar, Ashutosh and Zhang, Kam Y J
Journal of computer-aided molecular design, 2012, 26(5), 603-616
PMID: 22246345     doi: 10.1007/s10822-011-9523-0

SAMPL3 fragment based virtual screening challenge provides a valuable opportunity for researchers to test their programs, methods and screening protocols in a blind testing environment. We participated in SAMPL3 challenge and evaluated our virtual fragment screening protocol, which involves RosettaLigand as the core component by screening a 500 fragments Maybridge library against bovine pancreatic trypsin. Our study reaffirmed that the real test for any virtual screening approach would be in a blind testing environment. The analyses presented in this paper also showed that virtual screening performance can be improved, if a set of known active compounds is available and parameters and methods that yield better enrichment are selected. Our study also highlighted that to achieve accurate orientation and conformation of ligands within a binding site, selecting an appropriate method to calculate partial charges is important. Another finding is that using multiple receptor ensembles in docking does not always yield better enrichment than individual receptors. On the basis of our results and retrospective analyses from SAMPL3 fragment screening challenge we anticipate that chances of success in a fragment screening process could be increased significantly with careful selection of receptor structures, protein flexibility, sufficient conformational sampling within binding pocket and accurate assignment of ligand and protein partial charges.

## 2011

• Virtual decoy sets for molecular docking benchmarks.
Wallach, Izhar and Lilien, Ryan
Journal of chemical information and modeling, 2011, 51(2), 196-202
PMID: 21207928     doi: 10.1021/ci100374f

Virtual docking algorithms are often evaluated on their ability to separate active ligands from decoy molecules. The current state-of-the-art benchmark, the Directory of Useful Decoys (DUD), minimizes bias by including decoys from a library of synthetically feasible molecules that are physically similar yet chemically dissimilar to the active ligands. We show that by ignoring synthetic feasibility, we can compile a benchmark that is comparable to the DUD and less biased with respect to physical similarity.

• FRED pose prediction and virtual screening accuracy.
McGann, Mark
Journal of chemical information and modeling, 2011, 51(3), 578-596
PMID: 21323318     doi: 10.1021/ci100436p

Results of a previous docking study are reanalyzed and extended to include results from the docking program FRED and a detailed statistical analysis of both structure reproduction and virtual screening results. FRED is run both in a traditional docking mode and in a hybrid mode that makes use of the structure of a bound ligand in addition to the protein structure to screen molecules. This analysis shows that most docking programs are effective overall but highly inconsistent, tending to do well on one system and poorly on the next. Comparing methods, the difference in mean performance on DUD is found to be statistically significant (95% confidence) 61% of the time when using a global enrichment metric (AUC). Early enrichment metrics are found to have relatively poor statistical power, with 0.5% early enrichment only able to distinguish methods to 95% confidence 14% of the time.

• Evaluation of docking performance in a blinded virtual screening of fragment-like trypsin inhibitors.
Surpateanu, Georgiana and Iorga, Bogdan I
Journal of computer-aided molecular design, 2011, 26(5), 595-601
PMID: 22180049     doi: 10.1007/s10822-011-9526-x

In this study, we have "blindly" assessed the ability of several combinations of docking software and scoring functions to predict the binding of a fragment-like library of bovine trypsine inhibitors. The most suitable protocols (involving Gold software and GoldScore scoring function, with or without rescoring) were selected for this purpose using a training set of compounds with known biological activities. The selected virtual screening protocols provided good results with the SAMPL3-VS dataset, showing enrichment factors of about 10 for Top 20 compounds. This methodology should be useful in difficult cases of docking, with a special emphasis on the fragment-based virtual screening campaigns.

• Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors.
Gatica, Edgar A and Cavasotto, Claudio N
Journal of chemical information and modeling, 2011, 52(1), 1-6
PMID: 22168315     doi: 10.1021/ci200412p

We compiled a G protein-coupled receptor (GPCR) ligand library (GLL) for 147 targets, selecting for each ligand 39 decoy molecules, collected in the GPCR Decoy Database (GDD). Decoys were chosen ensuring a ligand-decoy similarity of six physical properties, while enforcing ligand-decoy chemical dissimilarity. The performance in docking of the GDD was evaluated on 19 GPCRs, showing a marked decrease in enrichment compared to bias-uncorrected decoy sets. Both the GLL and GDD are freely available for the scientific community.

• DEKOIS: Demanding Evaluation Kits for Objective in Silico Screening - A Versatile Tool for Benchmarking Docking Programs and Scoring Functions.
Vogel, Simon M and Bauer, Matthias R and Boeckler, Frank M
Journal of chemical information and modeling, 2011, 51(10), 2650-2665
PMID: 21774552     doi: 10.1021/ci2001549

For widely applied in silico screening techniques success depends on the rational selection of an appropriate method. We herein present a fast, versatile, and robust method to construct demanding evaluation kits for objective in silico screening (DEKOIS). This automated process enables creating tailor-made decoy sets for any given sets of bioactives. It facilitates a target-dependent validation of docking algorithms and scoring functions helping to save time and resources. We have developed metrics for assessing and improving decoy set quality and employ them to investigate how decoy embedding affects docking. We demonstrate that screening performance is target-dependent and can be impaired by latent actives in the decoy set (LADS) or enhanced by poor decoy embedding. The presented method allows extending and complementing the collection of publicly available high quality decoy sets toward new target space. All present and future DEKOIS data sets will be made accessible at www.dekois.com .

• PLS-DA - Docking Optimized Combined Energetic Terms (PLSDA-DOCET) protocol: a brief evaluation.
Avram, Sorin and Pacureanu, Liliana Mioara and Seclaman, Edward and Bora, Alina and Kurunczi, Ludovic G
Journal of chemical information and modeling, 2011, 51(12), 3169-3179
PMID: 22066983     doi: 10.1021/ci2002268

Docking studies have become popular approaches in drug design, where the binding energy of the ligand in the active site of the protein is estimated by a scoring function. Many promising techniques were developed to enhance the performance of scoring functions including the fusion of multiple scoring functions outcomes into a so-called consensus scoring function. Hereby, we evaluated the target oriented consensus technique using the energetic terms of several scoring functions. The approach was denoted PLSDA-DOCET. Optimization strategies for consensus energetic terms and scoring functions based on ROC metric were compared to classical rigid docking and to ligand-based similarity search methods comprising 2D fingerprints and ROCS. The ROCS results indicate large performance variations depending on the biological target. The AUC-based strategy of PLSDA-DOCET outperformed the other docking approaches regarding simple retrieval and scaffold-hopping. The superior performance of PLSDA-DOCET protocol relative to single and combined scoring functions was validated on an external test set. We found a relative low mean correlation of the ranks of the chemotypes retrieved by the PLSDA-DOCET protocol and all the other methods employed here.

• REPROVIS-DB: a benchmark system for ligand-based virtual screening derived from reproducible prospective applications.
Ripphausen, Peter and Wassermann, Anne Mai and Bajorath, Jürgen
Journal of chemical information and modeling, 2011, 51(10), 2467-2473
PMID: 21902278     doi: 10.1021/ci200309j

Benchmark calculations are essential for the evaluation of virtual screening (VS) methods. Typically, classes of known active compounds taken from the medicinal chemistry literature are divided into reference molecules (search templates) and potential hits that are added to background databases assumed to consist of compounds not sharing this activity. Then VS calculations are carried out, and the recall of known active compounds is determined. However, conventional benchmarking is affected by a number of problems that reduce its value for method evaluation. In addition to often insufficient statistical validation and the lack of generally accepted evaluation standards, the artificial nature of typical benchmark settings is often criticized. Retrospective benchmark calculations generally overestimate the potential of VS methods and do not scale with their performance in prospective applications. In order to provide additional opportunities for benchmarking that more closely resemble practical VS conditions, we have designed a publicly available compound database (DB) of reproducible virtual screens (REPROVIS-DB) that organizes information from successful ligand-based VS applications including reference compounds, screening databases, compound selection criteria, and experimentally confirmed hits. Using the currently available 25 hand-selected compound data sets, one can attempt to reproduce successful virtual screens with other than the originally applied methods and assess their potential for practical applications.

• Computational screening for active compounds targeting protein sequences: methodology and experimental validation.
Wang, Fei and Liu, Dongxiang and Wang, Heyao and Luo, Cheng and Zheng, Mingyue and Liu, Hong and Zhu, Weiliang and Luo, Xiaomin and Zhang, Jian and Jiang, Hualiang
Journal of chemical information and modeling, 2011, 51(11), 2821-2828
PMID: 21955088     doi: 10.1021/ci200264h

The three-dimensional (3D) structures of most protein targets have not been determined so far, with many of them not even having a known ligand, a truly general method to predict ligand-protein interactions in the absence of three-dimensional information would be of great potential value in drug discovery. Using the support vector machine (SVM) approach, we constructed a model for predicting ligand-protein interaction based only on the primary sequence of proteins and the structural features of small molecules. The model, trained by using 15,000 ligand-protein interactions between 626 proteins and over 10,000 active compounds, was successfully used in discovering nine novel active compounds for four pharmacologically important targets (i.e., GPR40, SIRT1, p38, and GSK-3$\beta$). To our knowledge, this is the first example of a successful sequence-based virtual screening campaign, demonstrating that our approach has the potential to discover, with a single model, active ligands for any protein.

• Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision.
Holliday, John D and Kanoulas, Evangelos and Malim, Nurul and Willett, Peter
Journal of cheminformatics, 2011, 3(1), 29
PMID: 21824430     doi: 10.1186/1758-2946-3-29

UNLABELLED:ABSTRACT:

## 2010

• Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods.
Venkatraman, Vishwesh and Pérez-Nueno, Violeta I and Mavridis, Lazaros and Ritchie, David W
Journal of chemical information and modeling, 2010, 50(12), 2079-2093
PMID: 21090728     doi: 10.1021/ci100263p

In recent years, many virtual screening (VS) tools have been developed that employ different molecular representations and have different speed and accuracy characteristics. In this paper, we compare ten popular ligand-based VS tools using the publicly available Directory of Useful Decoys (DUD) data set comprising over 100 000 compounds distributed across 40 protein targets. The DUD was developed initially to evaluate docking algorithms, but our results from an operational correlation analysis show that it is also well suited for comparing ligand-based VS tools. Although it is conventional wisdom that 3D molecular shape is an important determinant of biological activity, our results based on permutational significance tests of several commonly used VS metrics show that the 2D fingerprint-based methods generally give better VS performance than the 3D shape-based approaches for surprisingly many of the DUD targets. To help understand this finding, we have analyzed the nature of the scoring functions used and the composition of the DUD data set itself. We propose that to improve the VS performance of current 3D methods, it will be necessary to devise screening queries that can represent multiple possible conformations and which can exploit knowledge of known actives that span multiple scaffold families.

• Comparative evaluation of 3D virtual ligand screening methods: impact of the molecular alignment on enrichment.
Giganti, David and Guillemain, Hélène and Spadoni, Jean-Louis and Nilges, Michael and Zagury, Jean-François and Montes, Matthieu
Journal of chemical information and modeling, 2010, 50(6), 992-1004
PMID: 20527883     doi: 10.1021/ci900507g

In the early stage of drug discovery programs, when the structure of a complex involving a target and a small molecule is available, structure-based virtual ligand screening methods are generally preferred. However, ligand-based strategies like shape-similarity search methods can also be applied. Shape-similarity search methods consist in exploring a pseudo-binding-site derived from the known small molecule used as a reference. Several of these methods use conformational sampling algorithms which are also shared by corresponding docking methods: for example Surflex-dock/Surflex-sim, FlexX/FlexS, ICM, and OMEGA-FRED/OMEGA-ROCS. Using 11 systems issued from the challenging "own" subsets of the Directory of Useful Decoys (DUD-own), we evaluated and compared the performance of the above-cited programs in terms of molecular alignment accuracy, enrichment in active compounds, and enrichment in different chemotypes (scaffold-hopping). Since molecular alignment is a crucial aspect of performance for the different methods, we have assessed its impact on enrichment. We have also illustrated the paradox of retrieving active compounds with good scores even if they are inaccurately positioned. Finally, we have highlighted possible positive aspects of using shape-based approaches in drug-discovery protocols when the structure of the target in complex with a small molecule is known.

• Comparison of Structure- and Ligand-Based Virtual Screening Protocols Considering Hit List Complementarity and Enrichment Factors
Krueger, Dennis M. and Evers, Andreas
Chemmedchem, 2010, 5(1), 148-158
PMID: 19908272     doi: 10.1002/cmdc.200900314

Structure- and ligand-based virtual-screening methods (clocking, 2D- and 3D-similarity searching) were analysed for their effectiveness in virtual screening against four different targets: angiotensin-converting enzyme (ACE), cyclooxygenase 2 (COX-1 2), thrombin and human immunodeficiency virus I (HIV-1) protease. The relative performance of the tools was compared by examining their ability to recognise known active compounds from a set of actives and nonactives. Furthermore, we investigated whether the application of different virtual-screening methods in parallel provides complementary or redundant hit lists. Docking was performed with GOLD, Glide, FlexX and Surflex. The obtained docking poses were rescored by using nine different scoring functions in addition to the scoring functions implemented as objective functions in the docking algorithms. Ligand-based virtual screening was done with ROCS (3D-similarity searching), Feature Trees and Scitegic Functional Fingerprints (2D-similarity searching). The results show that structure- and ligand-based virtual-screening methods provide comparable enrichments in detecting active compounds. Interestingly, the hit lists that are obtained from different virtual-screening methods are generally highly complementary. These results suggest that a parallel application of different structure- and ligand-based virtual-screening methods increases the chance of identifying more (and more diverse) active compounds from a virtual-screening campaign.

• FLAP: GRID molecular interaction fields in virtual screening. validation using the DUD data set.
Cross, Simon and Baroni, Massimo and Carosati, Emanuele and Benedetti, Paolo and Clementi, Sergio
Journal of chemical information and modeling, 2010, 50(8), 1442-1450
PMID: 20690627     doi: 10.1021/ci100221g

The performance of FLAP (Fingerprints for Ligands and Proteins) in virtual screening is assessed using a subset of the DUD (Directory of Useful Decoys) benchmarking data set containing 13 targets each with more than 15 different chemotype classes. A variety of ligand and receptor-based virtual screening approaches are examined, using combinations of individual templates 2D structures of known actives, a cocrystallized ligand, a receptor structure, or a cocrystallized ligand-biased receptor structure. We examine several data fusion approaches to combine the results of the individual virtual screens. In doing so, we show that excellent chemotype enrichment is achieved in both single target ligand-based and receptor-based approaches, of approximately 17-fold over random on average at a false positive rate of 1%. We also show that using as much starting knowledge as possible improves chemotype enrichment, and that data fusion using Pareto ranking is an effective method to do this giving up to 50% improvement in enrichment over the single methods. Finally we show that if inactivity or decoy data is incorporated, automatically training the scoring function in FLAP improves recovery still further, with almost 2-fold improvement over the enrichments shown by the single methods. The results clearly demonstrate the utility of FLAP for virtual screening when either a limited or wide range of prior knowledge is available.

## 2009

• Comparison of ligand- and structure-based virtual screening on the DUD data set.
von Korff, Modest and Freyss, Joel and Sander, Thomas
Journal of chemical information and modeling, 2009, 49(2), 209-231
PMID: 19434824     doi: 10.1021/ci800303k

Several in-house developed descriptors and our in-house docking tool ActDock were compared with virtual screening on the data set of useful decoys (DUD). The results were compared with the chemical fingerprint descriptor from ChemAxon and with the docking results of the original DUD publication. The DUD is the first published data set providing active molecules, decoys, and references for crystal structures of ligand-target complexes. The DUD was designed for the purpose of evaluating docking programs. It contains 2950 active compounds against a total of 40 target proteins. Furthermore, for every ligand the data set contains 36 structurally dissimilar decoy compounds with similar physicochemical properties. We extracted the ligands from the target proteins to extend the applicability of the data set to include ligand based virtual screening. From the 40 target proteins, 37 contained ligands that we used as query molecules for virtual screening evaluation. With this data set a large comparison was done between four different chemical fingerprints, a topological pharmacophore descriptor, the Flexophore descriptor, and ActDock. The Actelion docking tool relies on a MM2 forcefield and a pharmacophore point interaction statistic for scoring; the details are described in this publication. In terms of enrichment rates the chemical fingerprint descriptors performed better than the Flexophore and the docking tool. After removing molecules chemically similar to the query molecules the Flexophore descriptor outperformed the chemical descriptors and the topological pharmacophore descriptors. With the similarity matrix calculations used in this study it was shown that the Flexophore is well suited to find new chemical entities via "scaffold hopping". The Flexophore descriptor can be explored with a Java applet at http://www.cheminformatics.ch in the submenu Tools->Flexophore. Its usage is free of charge and does not require registration.

• Pharmacophore-based virtual screening versus docking-based virtual screening: a benchmark comparison against eight targets.
Chen, Zhi and Li, Hong-lin and Zhang, Qi-jun and Bao, Xiao-guang and Yu, Kun-qian and Luo, Xiao-min and Zhu, Wei-liang and Jiang, Hua-liang
Acta pharmacologica Sinica, 2009, 30(12), 1694-1708
PMID: 19935678     doi: 10.1038/aps.2009.159

AIM:This study was conducted to compare the efficiencies of two virtual screening approaches, pharmacophore-based virtual screening (PBVS) and docking-based virtual screening (DBVS) methods.

• APIF: a new interaction fingerprint based on atom pairs and its application to virtual screening.
Pérez-Nueno, Violeta I and Rabal, Obdulia and Borrell, José I and Teixidó, Jordi
Journal of chemical information and modeling, 2009, 49(5), 1245-1260
PMID: 19364101     doi: 10.1021/ci900043r

A new interaction fingerprint (IF) called APIF (atom-pairs-based interaction fingerprint) has been developed for postprocessing protein-ligand docking results. Unlike other existing fingerprints which employ absolute locations of individual interactions, APIF considers the relative positions of pairs of interacting atoms. Docking-based virtual screening was performed with GOLD using the crystal structures of trypsin, rhinovirus, HIV protease, carboxypeptidase, and estrogen receptor-alpha as targets. A score derived from the similarity of the bit strings for each docking solution to that of a known reference binding mode was obtained. Comparisons between APIF, GoldScore function, and standard interaction fingerprint (CHIF) scores were performed using enrichment plots. Superior recovery rates were observed in the IF score cases. Comparable results were achieved by using either of the two interaction fingerprints, substantially improving GoldScore function enrichment factors. Binding mode analyses were also carried out in order to study the best method for selecting conformations with a binding mode similar to that of the reference crystallized complex. These showed that the first conformations retrieved by interaction fingerprint scores had a more similar binding mode to the reference complex than those retrieved by the GoldScore function.

• Validation of molecular docking programs for virtual screening against dihydropteroate synthase.
Hevener, Kirk E and Zhao, Wei and Ball, David M and Babaoglu, Kerim and Qi, Jianjun and White, Stephen W and Lee, Richard E
Journal of chemical information and modeling, 2009, 49(2), 444-460
PMID: 19434845     doi: 10.1021/ci800293n

Dihydropteroate synthase (DHPS) is the target of the sulfonamide class of antibiotics and has been a validated antibacterial drug target for nearly 70 years. The sulfonamides target the p-aminobenzoic acid (pABA) binding site of DHPS and interfere with folate biosynthesis and ultimately prevent bacterial replication. However, widespread bacterial resistance to these drugs has severely limited their effectiveness. This study explores the second and more highly conserved pterin binding site of DHPS as an alternative approach to developing novel antibiotics that avoid resistance. In this study, five commonly used docking programs, FlexX, Surflex, Glide, GOLD, and DOCK, and nine scoring functions, were evaluated for their ability to rank-order potential lead compounds for an extensive virtual screening study of the pterin binding site of B. anthracis DHPS. Their performance in ligand docking and scoring was judged by their ability to reproduce a known inhibitor conformation and to efficiently detect known active compounds seeded into three separate decoy sets. Two other metrics were used to assess performance; enrichment at 1% and 2% and Receiver Operating Characteristic (ROC) curves. The effectiveness of postdocking relaxation prior to rescoring and consensus scoring were also evaluated. Finally, we have developed a straightforward statistical method of including the inhibition constants of the known active compounds when analyzing enrichment results to more accurately assess scoring performance, which we call the 'sum of the sum of log rank' or SSLR. Of the docking and scoring functions evaluated, Surflex with Surflex-Score and Glide with GlideScore were the best overall performers for use in virtual screening against the DHPS target, with neither combination showing statistically significant superiority over the other in enrichment studies or pose selection. Postdocking ligand relaxation and consensus scoring did not improve overall enrichment.

• Comparison of several molecular docking programs: pose prediction and virtual screening accuracy.
Cross, Jason B and Thompson, David C and Rai, Brajesh K and Baber, J Christian and Fan, Kristi Yi and Hu, Yongbo and Humblet, Christine
Journal of chemical information and modeling, 2009, 49(6), 1455-1474
PMID: 19476350     doi: 10.1021/ci900056c

Molecular docking programs are widely used modeling tools for predicting ligand binding modes and structure based virtual screening. In this study, six molecular docking programs (DOCK, FlexX, GLIDE, ICM, PhDOCK, and Surflex) were evaluated using metrics intended to assess docking pose and virtual screening accuracy. Cognate ligand docking to 68 diverse, high-resolution X-ray complexes revealed that ICM, GLIDE, and Surflex generated ligand poses close to the X-ray conformation more often than the other docking programs. GLIDE and Surflex also outperformed the other docking programs when used for virtual screening, based on mean ROC AUC and ROC enrichment values obtained for the 40 protein targets in the Directory of Useful Decoys (DUD). Further analysis uncovered general trends in accuracy that are specific for particular protein families. Modifying basic parameters in the software was shown to have a significant effect on docking and virtual screening results, suggesting that expert knowledge is critical for optimizing the accuracy of these methods.

• Ultrafast shape recognition: evaluating a new ligand-based virtual screening technology.
Ballester, Pedro J and Finn, Paul W and Richards, W Graham
Journal of molecular graphics & modelling, 2009, 27(7), 836-845
PMID: 19188082     doi: 10.1016/j.jmgm.2009.01.001

Large scale database searching to identify molecules that share a common biological activity for a target of interest is widely used in drug discovery. Such an endeavour requires the availability of a method encoding molecular properties that are indicative of biological activity and at least one active molecule to be used as a template. Molecular shape has been shown to be an important indicator of biological activity; however, currently used methods are relatively slow, so faster and more reliable methods are highly desirable. Recently, a new non-superposition based method for molecular shape comparison, called Ultrafast Shape Recognition (USR), has been devised with computational performance at least three orders of magnitude faster than previously existing methods. In this study, we investigate the performance of USR in retrieving biologically active compounds through retrospective Virtual Screening experiments. Results show that USR performs better on average than a commercially available shape similarity method, while screening conformers at a rate that is more than 2500 times faster. This outstanding computational performance is particularly useful for searching much larger portions of chemical space than previously possible, which makes USR a very valuable new tool in the search for new lead molecules for drug discovery programs.

• Novel Method for Generating Structure-Based Pharmacophores Using Energetic Analysis
Salam, Noeris K. and Nuti, Roberto and Sherman, Woody
Journal of chemical information and modeling, 2009, 49(10), 2356-2368
PMID: 19761201     doi: 10.1021/ci900212v

We describe a novel method to develop energetically optimized, structure-based pharmacophores for use in rapid in silico screening. The method combines pharmacophore perception and database screening with protein-ligand energetic terms computed by the Glide XP scoring function to rank the importance of pharmacophore features. We derive energy-optimized pharmacophore hypotheses for 30 pharmaceutically relevant crystal structures and screen a database to assess the enrichment of active compounds. The method is compared to three other approaches: (1) pharmacophore hypotheses derived from a systematic assessment of receptor-ligand contacts, (2) Glide SP docking, and (3) 2D ligand fingerprint similarity. The method developed here shows better enrichments than the other three methods and yields a greater diversity of actives than the contact-based pharmacophores or the 2D ligand similarity. Docking produces the most cases (28/30) with enrichments greater than 10.0 in the top I% of the database and on average produces the greatest diversity of active molecules. The combination of energy terms from a structure-based analysis with the speed of a ligand-based pharmacophore search.results in a method that leverages the strengths of both approaches to produce high enrichments with a g6od diversity of active molecules.

• Critical comparison of virtual screening methods against the MUV data set.
Tiikkainen, Pekka and Markt, Patrick and Wolber, Gerhard and Kirchmair, Johannes and Distinto, Simona and Poso, Antti and Kallioniemi, Olli
Journal of chemical information and modeling, 2009, 49(10), 2168-2178
PMID: 19799417     doi: 10.1021/ci900249b

In the current work, we measure the performance of seven ligand-based virtual screening tools-five similarity search methods and two pharmacophore elucidators-against the MUV data set. For the similarity search tools, single active molecules as well as active compound sets clustered in terms of their chemical diversity were used as templates. Their score was calculated against all inactive and active compounds in their target class. Subsequently, the scores were used to calculate different performance metrics including enrichment factors and AUC values. We also studied the effect of data fusion on the results. To measure the performance of the pharmacophore tools, a set of active molecules was picked either random- or chemical diversity-based from each target class to build a pharmacophore model which was then used to screen the remaining compounds in the set. Our results indicate that template sets selected by their chemical diversity are the best choice for similarity search tools, whereas the optimal training sets for pharmacophore elucidators are based on random selection underscoring that pharmacophore modeling cannot be easily automated. We also suggest a number of improvements for future benchmark sets and discuss activity cliffs as a potential problem in ligand-based virtual screening.

## 2008

• Consensus scoring with feature selection for structure-based virtual screening
Teramoto, Reiji and Fukunishi, Hiroaki
Journal of chemical information and modeling, 2008, 48(2), 288-295
doi: 10.1021/ci700239t

The evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, and scoring functions play significant roles in it. While consensus scoring (CS) generally improves enrichment by compensating for the deficiencies of each scoring function, the strategy of how individual scoring functions are selected remains a challenging task when few known active compounds are available. To address this problem, we propose feature selection-based consensus scoring (FSCS), which performs supervised feature selection with docked native ligand conformations to select complementary scoring functions. We evaluated the enrichments of five scoring functions (F-Score, D-Score, PMF, G-Score, and ChemScore), FSCS, and RCS (rank-by-rank consensus scoring) for four different target proteins: acetylcholine esterase (AChE), thrombin (thrombin), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPAR gamma). The results indicated that FSCS was able to select the complementary scoring functions and enhance ligand enrichments and that it outperformed RCS and the individual scoring functions for all target proteins. They also indicated that the performances of the single scoring functions were strongly dependent on the target protein. An especially favorable result with implications for practical drug screening is that FSCS performs well even if only one 3D structure of the protein-ligand complex is known. Moreover, we found that one can infer which scoring functions significantly enrich active compounds by using feature selection before actual docking and that the selected scoring functions are complementary.

• Structure-based virtual screening with supervised consensus scoring: Evaluation of pose prediction and enrichment factors
Teramoto, Reiji and Fukunishi, Hiroaki
Journal of chemical information and modeling, 2008, 48(4), 747-754
doi: 10.1021/ci700464x

Since the evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, scoring functions play significant roles in it. However, it is known that a scoring function does not always work well for all target proteins. When one cannot know which scoring function works best against a target protein a priori, there is no standard scoring method to know it even if 3D structure of a target protein-ligand complex is available. Therefore, development of the method to achieve high enrichments from given scoring functions and 3D structure of protein-ligand complex is a crucial and challenging task. To address this problem, we applied SCS (supervised consensus scoring), which employs a rough linear correlation between the binding free energy and the root-mean-square deviation (rmsd) of a native ligand conformations and incorporates protein-ligand binding process with docked ligand conformations using supervised learning, to virtual screening. We evaluated both the docking poses and enrichments of SCS and five scoring functions (F-Score, G-Score, D-Score, ChemScore, and PMF) for three different target proteins: thymidine kinase (TK), thrombin (thrombin), and peroxisome proliferator-activated receptor gamma (PPAR gamma). Our enrichment studies show that SCS is competitive or superior to a best single scoring function at the top ranks of screened database. We found that the enrichments of SCS could be limited by a best scoring function, because SCS is obtained on the basis of the five individual scoring functions. Therefore, it is concluded that SCS works very successfully from our results. Moreover, from docking pose analysis, we revealed the connection between enrichment and average centroid distance of top-scored docking poses. Since SCS requires only one 3D structure of protein-ligand complex, SCS will be useful for identifying new ligands.

• Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results.
Sheridan, Robert P and McGaughey, Georgia B and Cornell, Wendy D
Journal of computer-aided molecular design, 2008, 22(3-4), 257-265
PMID: 18273559     doi: 10.1007/s10822-008-9168-9

As an extension to a previous published study (McGaughey et al., J Chem Inf Model 47:1504-1519, 2007) comparing 2D and 3D similarity methods to docking, we apply a subset of those virtual screening methods (TOPOSIM, SQW, ROCS-color, and Glide) to a set of protein/ligand pairs where the protein is the target for docking and the cocrystallized ligand is the target for the similarity methods. Each protein is represented by a maximum of five crystal structures. We search a diverse subset of the MDDR as well as a diverse small subset of the MCIDB, Merck's proprietary database. It is seen that the relative effectiveness of virtual screening methods, as measured by the enrichment factor, is highly dependent on the particular crystal structure or ligand, and on the database being searched. 2D similarity methods appear very good for the MDDR, but poor for the MCIDB. However, ROCS-color (a 3D similarity method) does well for both databases.

• Comparison of ligand-based and receptor-based virtual screening of HIV entry inhibitors for the CXCR4 and CCR5 receptors using 3D ligand shape matching and ligand-receptor docking.
Pérez-Nueno, Violeta I and Ritchie, David W and Rabal, Obdulia and Pascual, Rosalia and Borrell, José I and Teixidó, Jordi
Journal of chemical information and modeling, 2008, 48(3), 509-533
PMID: 18298095     doi: 10.1021/ci700415g

HIV infection is initiated by fusion of the virus with the target cell through binding of the viral gp120 protein with the CD4 cell surface receptor protein and the CXCR4 or CCR5 co-receptors. There is currently considerable interest in developing novel ligands that can modulate the conformations of these co-receptors and, hence, ultimately block virus-cell fusion. This article describes a detailed comparison of the performance of receptor-based and ligand-based virtual screening approaches to find CXCR4 and CCR5 antagonists that could potentially serve as HIV entry inhibitors. Because no crystal structures for these proteins are available, homology models of CXCR4 and CCR5 have been built, using bovine rhodopsin as the template. For ligand-based virtual screening, several shape-based and property-based molecular comparison approaches have been compared, using high-affinity ligands as query molecules. These methods were compared by virtually screening a library assembled by us, consisting of 602 known CXCR4 and CCR5 inhibitors and some 4700 similar presumed inactive molecules. For each receptor, the library was queried using known binders, and the enrichment factors and diversity of the resulting virtual hit lists were analyzed. Overall, ligand-based shape-matching searches yielded higher enrichments than receptor-based docking, especially for CXCR4. The results obtained for CCR5 suggest the possibility that different active scaffolds bind in different ways within the CCR5 pocket.

• Evaluating docking programs: keeping the playing field level.
Liebeschuetz, John W
Journal of computer-aided molecular design, 2008, 22(3-4), 229-238
PMID: 18196461     doi: 10.1007/s10822-008-9169-8

Over recent years many enrichment studies have been published which purport to rigorously compare the performance of two or more docking protocols. It has become clear however that such studies often have flaws within their methodologies, which cast doubt on the rigour of the conclusions. Setting up such comparisons is fraught with difficulties and no best mode of practice is available to guide the experimenter. Careful choice of structural models and ligands appropriate to those models is important. The protein structure should be representative for the target. In addition the set of active ligands selected should be appropriate to the structure in cases where different forms of the protein bind different classes of ligand. Binding site definition is also an area in which errors arise. Particular care is needed in deciding which crystallographic waters to retain and again this may be predicated by knowledge of the likely binding modes of the ligands making up the active ligand list. Geometric integrity of the ligand structures used is clearly important yet it is apparent that published sets of actives + decoys may contain sometimes high proportions of incorrect structures. Choice of protocol for docking and analysis needs careful consideration as many programs can be tweaked for optimum performance. Should studies be run using 'black box' protocols supplied by the software provider? Lastly, the correct method of analysis of enrichment studies is a much discussed topic at the moment. However currently promoted approaches do not consider a crucial aspect of a successful virtual screen, namely that a good structural diversity of hits be returned. Overall there is much to consider in the experimental design of enrichment studies. Hopefully this study will be of benefit in helping others plan such experiments.

## 2007

• Supervised consensus scoring for docking and virtual screening
Teramoto, Reiji and Fukunishi, Hiroaki
Journal of chemical information and modeling, 2007, 47(2), 526-534
doi: 10.1021/ci6004993

Docking programs are widely used to discover novel ligands efficiently and can predict protein-ligand complex structures with reasonable accuracy and speed. However, there is an emerging demand for better performance from the scoring methods. Consensus scoring (CS) methods improve the performance by compensating for the deficiencies of each scoring function. However, conventional CS and existing scoring functions have the same problems, such as a lack of protein flexibility, inadequate treatment of salvation, and the simplistic nature of the energy function used. Although there are many problems in current scoring functions, we focus our attention on the incorporation of unbound ligand conformations. To address this problem, we propose supervised consensus scoring (SCS), which takes into account protein-ligand binding process using unbound ligand conformations with supervised learning. An evaluation of docking accuracy for 100 diverse protein-ligand complexes shows that SCS outperforms both CS and 11 scoring functions (PLP, F-Score, LigScore, DrugScore, LUDI, X-Score, AutoDock, PMF, G-Score, ChemScore, and D-score). The success rates of SCS range from 89% to 91% in the range of rmsd < 2 A, while those of CS range from 80% to 85%, and those of the scoring functions range from 26% to 76%. Moreover, we also introduce a method for judging whether a compound is active or inactive with the appropriate criterion for virtual screening. SCS performs quite well in docking accuracy and is presumably useful for screening large-scale compound databases before predicting binding affinity.

• Evaluation of docking programs for predicting binding of Golgi alpha-mannosidase II inhibitors: a comparison with crystallography.
Englebienne, Pablo and Fiaux, Hélène and Kuntz, Douglas A and Corbeil, Christopher R and Gerber-Lemaire, Sandrine and Rose, David R and Moitessier, Nicolas
Proteins, 2007, 69(1), 160-176
PMID: 17557336     doi: 10.1002/prot.21479

Golgi alpha-mannosidase II (GMII), a zinc-dependent glycosyl hydrolase, is a promising target for drug development in anti-tumor therapies. Using X-ray crystallography, we have determined the structure of Drosophila melanogaster GMII (dGMII) complexed with three different inhibitors exhibiting IC50's ranging from 80 to 1000 microM. These structures, along with those of seven other available dGMII/inhibitor complexes, were then used as a basis for the evaluation of seven docking programs (GOLD, Glide, FlexX, AutoDock, eHiTS, LigandFit, and FITTED). We found that small inhibitors could be accurately docked by most of the software, while docking of larger compounds (i.e., those with extended aromatic cycles or long aliphatic chains) was more problematic. Overall, Glide provided the best docking results, with the most accurately predicted binding around the active site zinc atom. Further evaluation of Glide's performance revealed its ability to extract active compounds from a benchmark library of decoys.

• Comparative performance of several flexible docking programs and scoring functions: enrichment studies for a diverse set of pharmaceutically relevant targets.
Zhou, Zhiyong and Felts, Anthony K and Friesner, Richard A and Levy, Ronald M
Journal of chemical information and modeling, 2007, 47(4), 1599-1608
PMID: 17585856     doi: 10.1021/ci7000346

Virtual screening by molecular docking has become a widely used approach to lead discovery in the pharmaceutical industry when a high-resolution structure of the biological target of interest is available. The performance of three widely used docking programs (Glide, GOLD, and DOCK) for virtual database screening is studied when they are applied to the same protein target and ligand set. Comparisons of the docking programs and scoring functions using a large and diverse data set of pharmaceutically interesting targets and active compounds are carried out. We focus on the problem of docking and scoring flexible compounds which are sterically capable of docking into a rigid conformation of the receptor. The Glide XP methodology is shown to consistently yield enrichments superior to the two alternative methods, while GOLD outperforms DOCK on average. The study also shows that docking into multiple receptor structures can decrease the docking error in screening a diverse set of active compounds.

• Comments on the article "On evaluating molecular-docking methods for pose prediction and enrichment factors".
Perola, Emanuele and Walters, W Patrick and Charifson, Paul
Journal of chemical information and modeling, 2007, 47(2), 251-253
PMID: 17260981     doi: 10.1021/ci600460h

The recent article "On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors" (Chen H. et al. J. Chem. Inf. Model. 2006, 46, 401-415) contains a series of comments on a similar study we published in Proteins in 2004 (Perola et al. Proteins 2004, 56, 235-249). We believe that some of these comments are misleading, and we feel that an adequate response is in order.

• Comparison of topological, shape, and docking methods in virtual screening.
McGaughey, Georgia B and Sheridan, Robert P and Bayly, Christopher I and Culberson, J Chris and Kreatsoulas, Constantine and Lindsley, Stacey and Maiorov, Vladimir and Truchon, Jean-Francois and Cornell, Wendy D
Journal of chemical information and modeling, 2007, 47(4), 1504-1519
PMID: 17591764     doi: 10.1021/ci700052x

Virtual screening benchmarking studies were carried out on 11 targets to evaluate the performance of three commonly used approaches: 2D ligand similarity (Daylight, TOPOSIM), 3D ligand similarity (SQW, ROCS), and protein structure-based docking (FLOG, FRED, Glide). Active and decoy compound sets were assembled from both the MDDR and the Merck compound databases. Averaged over multiple targets, ligand-based methods outperformed docking algorithms. This was true for 3D ligand-based methods only when chemical typing was included. Using mean enrichment factor as a performance metric, Glide appears to be the best docking method among the three with FRED a close second. Results for all virtual screening methods are database dependent and can vary greatly for particular targets.

• Comparison of Shape-Matching and Docking as Virtual Screening Tools
Hawkins, Paul C D and Skillman, A Geoffrey and Nicholls, Anthony
Journal of medicinal chemistry, 2007, 50(1), 74-82
doi: 10.1021/jm0603365

## 2006

• Screening drug-like compounds by docking to homology models: a systematic study.
Kairys, Visvaldas and Fernandes, Miguel X and Gilson, Michael K
Journal of chemical information and modeling, 2006, 46(1), 365-379
PMID: 16426071     doi: 10.1021/ci050238c

In the absence of an experimentally solved structure, a homology model of a protein target can be used instead for virtual screening of drug candidates by docking and scoring. This approach poses a number of questions regarding the choice of the template to use in constructing the model, the accuracy of the screening results, and the importance of allowing for protein flexibility. The present study addresses such questions with compound screening calculations for multiple homology models of five drug targets. A central result is that docking to homology models frequently yields enrichments of known ligands as good as that obtained by docking to a crystal structure of the actual target protein. Interestingly, however, standard measures of the similarity of the template used to build the homology model to the targeted protein show little correlation with the effectiveness of the screening calculations, and docking to the template itself often is as successful as docking to the corresponding homology model. Treating key side chains as mobile produces a modest improvement in the results. The reasons for these sometimes unexpected results, and their implications for future methodologic development, are discussed.

## 2005

• Comparison of automated docking programs as virtual screening tools.
Cummings, Maxwell D and Desjarlais, Renee L and Gibbs, Alan C and Mohan, Venkatraman and Jaeger, Edward P
Journal of medicinal chemistry, 2005, 48(4), 962-976
PMID: 15715466     doi: 10.1021/jm049798d

The performance of several commercially available docking programs is compared in the context of virtual screening. Five different protein targets are used, each with several known ligands. The simulated screening deck comprised 1000 molecules from a cleansed version of the MDL drug data report and 49 known ligands. For many of the known ligands, crystal structures of the relevant protein-ligand complexes were available. We attempted to run experiments with each docking method that were as similar as possible. For a given docking method, hit rates were improved versus what would be expected for random selection for most protein targets. However, the ability to prioritize known ligands on the basis of docking poses that resemble known crystal structures is both method- and target-dependent.

• Evaluation of library ranking efficacy in virtual screening
Kontoyianni, M and Sokol, GS and McClellan, LM
Journal of computational chemistry, 2005, 26(1), 11-22
PMID: 15526325     doi: 10.1002/jcc.20141

We present the results of a comprehensive study in which we explored how the docking procedure affects the performance of a virtual screening approach. We used four docking engines and applied 10 scoring functions to the top-ranked docking solutions of seeded databases against six target proteins. The scores of the experimental poses were placed within the total set to assess whether the scoring function required an accurate pose to provide the appropriate rank for the seeded compounds. This method allows a direct comparison of library ranking efficacy. Our results indicate that the LigandFit/Ligscore1 and LigandFit/GOLD docking/scoring combinations, and to a lesser degree FlexX/FlexX, Glide/Ligscore1, DOCK/PMF (Tripos implementation), LigandFit1/Ligscore2 and LigandFit/PMF (Tripos implementation) were able to retrieve the highest number of actives at a 10% fraction of the database when all targets were looked upon collectively. We also show that the scoring functions rank the observed binding modes higher than the inaccurate poses provided that the experimental poses are available. This finding stresses the discriminatory ability of the scoring algorithms, when better poses are available, and suggests that the number of false positives can be lowered with conformers closer to bioactive ones. (C) 2004 Wiley Periodicals, Inc.

## 2004

• Evaluation and application of multiple scoring functions for a virtual screening experiment.
Xing, Li and Hodgkin, Edward and Liu, Qian and Sedlock, David
Journal of computer-aided molecular design, 2004, 18(5), 333-344
PMID: 15595460

In order to identify novel chemical classes of factor Xa inhibitors, five scoring functions (FlexX, DOCK, GOLD, ChemScore and PMF) were engaged to evaluate the multiple docking poses generated by FlexX. The compound collection was composed of confirmed potent factor Xa inhibitors and a subset of the LeadQuest screening compound library. Except for PMF the other four scoring functions succeeded in reproducing the crystal complex (PDB code: 1FAX). During virtual screening the highest hit rate (80%) was demonstrated by FlexX at an energy cutoff of -40 kJ/mol, which is about 40-fold over random screening (2.06%). Limited results suggest that presenting more poses of a single molecule to the scoring functions could deteriorate their enrichment factors. A series of promising scaffolds with favorable binding scores was retrieved from LeadQuest. Consensus scoring by pair-wise intersection failed to enrich the hit rate yielded by single scorings (i.e. FlexX). We note that reported successes of consensus scoring in hit rate enrichment could be artificial because their comparisons were based on a selected subset of single scoring and a markedly reduced subset of double or triple scoring. The findings presented in this report are based upon a single biological system and support further studies.

• A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance.
Perola, Emanuele and Walters, W Patrick and Charifson, Paul S
Proteins, 2004, 56(2), 235-249
PMID: 15211508     doi: 10.1002/prot.20088

A thorough evaluation of some of the most advanced docking and scoring methods currently available is described, and guidelines for the choice of an appropriate protocol for docking and virtual screening are defined. The generation of a large and highly curated test set of pharmaceutically relevant protein-ligand complexes with known binding affinities is described, and three highly regarded docking programs (Glide, GOLD, and ICM) are evaluated on the same set with respect to their ability to reproduce crystallographic binding orientations. Glide correctly identified the crystallographic pose within 2.0 A in 61% of the cases, versus 48% for GOLD and 45% for ICM. In general Glide appears to perform most consistently with respect to diversity of binding sites and ligand flexibility, while the performance of ICM and GOLD is more binding site-dependent and it is significantly poorer when binding is predominantly driven by hydrophobic interactions. The results also show that energy minimization and reranking of the top N poses can be an effective means to overcome some of the limitations of a given docking function. The same docking programs are evaluated in conjunction with three different scoring functions for their ability to discriminate actives from inactives in virtual screening. The evaluation, performed on three different systems (HIV-1 protease, IMPDH, and p38 MAP kinase), confirms that the relative performance of different docking and scoring methods is to some extent binding site-dependent. GlideScore appears to be an effective scoring function for database screening, with consistent performance across several types of binding sites, while ChemScore appears to be most useful in sterically demanding sites since it is more forgiving of repulsive interactions. Energy minimization of docked poses can significantly improve the enrichments in systems with sterically demanding binding sites. Overall Glide appears to be a safe general choice for docking, while the choice of the best scoring tool remains to a larger extent system-dependent and should be evaluated on a case-by-case basis.

• Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening
Halgren, TA and Murphy, RB and Friesner, RA and Beard, HS and Frye, LL and Pollard, WT and Banks, JL
Journal of medicinal chemistry, 2004, 47(7), 1750-1759
PMID: 15027866     doi: 10.1021/jm030644s

Glide's ability to identify active compounds in a database screen is characterized by applying Glide to a diverse set of nine protein receptors. In many cases, two, or even three, protein sites are employed to probe the sensitivity of the results to the site geometry. To make the database screens as realistic as possible, the screens use sets of "druglike" decoy ligands that have been selected to be representative of what we believe is likely to be found in the compound collection of a pharmaceutical or biotechnology company. Results are presented for releases 1.8, 2.0, and 2.5 of Glide. The comparisons show that average measures for both "early" and "global" enrichment for Glide 2.5 are 3 times higher than for Glide 1.8 and more than 2 times higher than for Glide 2.0 because of better results for the least well-handled screens. This improvement in enrichment stems largely from the better balance of the more widely parametrized GlideScore 2.5 function and the inclusion of terms that penalize ligand-protein interactions that violate established principles of physical chemistry, particularly as it concerns the exposure to solvent of charged protein and ligand groups. Comparisons to results for the thymidine kinase and estrogen receptors published by Rognan and co-workers (J. Med. Chem. 2000, 43, 4759-4767) show that Glide 2.5 performs better than GOLD 1.1, FlexX 1.8, or DOCK 4.01.

## 2003

• LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites
Venkatachalam, CM and Jiang, X and Oldfield, T and Waldman, M
Journal of molecular graphics & modelling, 2003, 21(4), 289-307
PMID: 12479928

We present a new shape-based method, LigandFit, for accurately docking ligands into protein active sites. The method employs a cavity detection algorithm for detecting invaginations in the protein as candidate active site regions. A shape comparison filter is combined with a Monte Carlo conformational search for generating ligand poses consistent with the active site shape. Candidate poses are minimized in the context of the active site using a grid-based method for evaluating protein-ligand interaction energies. Errors arising from grid interpolation are dramatically reduced using a new non-linear interpolation scheme. Results are presented for 19 diverse protein-ligand complexes. The method appears quite promising, reproducing the X-ray structure ligand pose within an RMS of 2Angstrom in 14 out of the 19 complexes. A high-throughput screening study applied to the thymidine kinase receptor is also presented in which LigandFit, when combined with LigScore, an internally developed scoring function [1], yields very good hit rates for a ligand pool seeded with known actives. (C) 2002 Published by Elsevier Science Inc.