# Bibliography of computer-aided Drug Design

Updated on 7/18/2014. Currently 2130 references

## Screening / Methodology / Structure-based

2014 / 2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2007 / 2006 / 2005 / 2004 / 2003 / 2002 / 2001 /

## 2014

• Combining in silico and in cerebro approaches for virtual screening and pose prediction in SAMPL4.
Voet, Arnout R D and Kumar, Ashutosh and Berenger, Francois and Zhang, Kam Y J
Journal of computer-aided molecular design, 2014
PMID: 24446075     doi: 10.1007/s10822-013-9702-2

The SAMPL challenges provide an ideal opportunity for unbiased evaluation and comparison of different approaches used in computational drug design. During the fourth round of this SAMPL challenge, we participated in the virtual screening and binding pose prediction on inhibitors targeting the HIV-1 integrase enzyme. For virtual screening, we used well known and widely used in silico methods combined with personal in cerebro insights and experience. Regular docking only performed slightly better than random selection, but the performance was significantly improved upon incorporation of additional filters based on pharmacophore queries and electrostatic similarities. The best performance was achieved when logical selection was added. For the pose prediction, we utilized a similar consensus approach that amalgamated the results of the Glide-XP docking with structural knowledge and rescoring. The pose prediction results revealed that docking displayed reasonable performance in predicting the binding poses. However, prediction performance can be improved utilizing scientific experience and rescoring approaches. In both the virtual screening and pose prediction challenges, the top performance was achieved by our approaches. Here we describe the methods and strategies used in our approaches and discuss the rationale of their performances.

• DiSCuS: an open platform for (not only) virtual screening results management.
Wójcikowski, Maciej and Zielenkiewicz, Piotr and Siedlecki, Pawel
Journal of chemical information and modeling, 2014, 54(1), 347-354
PMID: 24364790     doi: 10.1021/ci400587f

DiSCuS, a "Database System for Compound Selection", has been developed. The primary goal of DiSCuS is to aid researchers in the steps subsequent to generating high-throughput virtual screening (HTVS) results, such as selection of compounds for further study, purchase, or synthesis. To do so, DiSCuS provides (1) a storage facility for ligand-receptor complexes (generated with external programs), (2) a number of tools for validating these complexes, such as scoring functions, potential energy contributions, and med-chem features with ligand similarity estimates, and (3) powerful searching and filtering options with logical operators. DiSCuS supports multiple receptor targets for a single ligand, so it can be used either to evaluate different variants of an active site or for selectivity studies. DiSCuS documentation, installation instructions, and source code can be found at http://discus.ibb.waw.pl .

• istar: a web platform for large-scale protein-ligand docking.
Li, Hongjian and Leung, Kwong-Sak and Ballester, Pedro J and Wong, Man-Hon
PloS one, 2014, 9(1), e85678
PMID: 24475049     doi: 10.1371/journal.pone.0085678

Protein-ligand docking is a key computational method in the design of starting points for the drug discovery process. We are motivated by the desire to automate large-scale docking using our popular docking engine idock and thus have developed a publicly-accessible web platform called istar. Without tedious software installation, users can submit jobs using our website. Our istar website supports 1) filtering ligands by desired molecular properties and previewing the number of ligands to dock, 2) monitoring job progress in real time, and 3) visualizing ligand conformations and outputting free energy and ligand efficiency predicted by idock, binding affinity predicted by RF-Score, putative hydrogen bonds, and supplier information for easy purchase, three useful features commonly lacked on other online docking platforms like DOCK Blaster or iScreen. We have collected 17,224,424 ligands from the All Clean subset of the ZINC database, and revamped our docking engine idock to version 2.0, further improving docking speed and accuracy, and integrating RF-Score as an alternative rescoring function. To compare idock 2.0 with the state-of-the-art AutoDock Vina 1.1.2, we have carried out a rescoring benchmark and a redocking benchmark on the 2,897 and 343 protein-ligand complexes of PDBbind v2012 refined set and CSAR NRC HiQ Set 24Sept2010 respectively, and an execution time benchmark on 12 diverse proteins and 3,000 ligands of different molecular weight. Results show that, under various scenarios, idock achieves comparable success rates while outperforming AutoDock Vina in terms of docking speed by at least 8.69 times and at most 37.51 times. When evaluated on the PDBbind v2012 core set, our istar platform combining with RF-Score manages to reproduce Pearson's correlation coefficient and Spearman's correlation coefficient of as high as 0.855 and 0.859 respectively between the experimental binding affinity and the predicted binding affinity of the docked conformation. istar is freely available at http://istar.cse.cuhk.edu.hk/idock.

## 2013

• Pathway-based Screening Strategy for Multitarget Inhibitors of Diverse Proteins in Metabolic Pathways.
Hsu, Kai-Cheng and Cheng, Wen-Chi and Chen, Yen-Fu and Wang, Wen-Ching and Yang, Jinn-Moon
PLoS computational biology, 2013, 9(7), e1003127
PMID: 23861662     doi: 10.1371/journal.pcbi.1003127

Many virtual screening methods have been developed for identifying single-target inhibitors based on the strategy of "one-disease, one-target, one-drug". The hit rates of these methods are often low because they cannot capture the features that play key roles in the biological functions of the target protein. Furthermore, single-target inhibitors are often susceptible to drug resistance and are ineffective for complex diseases such as cancers. Therefore, a new strategy is required for enriching the hit rate and identifying multitarget inhibitors. To address these issues, we propose the pathway-based screening strategy (called PathSiMMap) to derive binding mechanisms for increasing the hit rate and discovering multitarget inhibitors using site-moiety maps. This strategy simultaneously screens multiple target proteins in the same pathway; these proteins bind intermediates with common substructures. These proteins possess similar conserved binding environments (pathway anchors) when the product of one protein is the substrate of the next protein in the pathway despite their low sequence identity and structure similarity. We successfully discovered two multitarget inhibitors with IC50 of <10 µM for shikimate dehydrogenase and shikimate kinase in the shikimate pathway of Helicobacter pylori. Furthermore, we found two selective inhibitors (IC50 of <10 µM) for shikimate dehydrogenase using the specific anchors derived by our method. Our experimental results reveal that this strategy can enhance the hit rates and the pathway anchors are highly conserved and important for biological functions. We believe that our strategy provides a great value for elucidating protein binding mechanisms and discovering multitarget inhibitors.

• Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 - A Public Library of Challenging Docking Benchmark Sets.
Bauer, Matthias R and Ibrahim, Tamer M and Vogel, Simon M and Boeckler, Frank M
Journal of chemical information and modeling, 2013, 53(6), 1447-1462
PMID: 23705874     doi: 10.1021/ci400115b

The application of molecular benchmarking sets helps to assess the actual performance of virtual screening (VS) workflows. To improve the efficiency of structure-based VS approaches, the selection and optimization of various parameters can be guided by benchmarking. With the DEKOIS 2.0 library, we aim to further extend and complement the collection of publicly available decoy sets. Based on BindingDB bioactivity data, we provide 81 new and structurally diverse benchmark sets for a wide variety of different target classes. To ensure a meaningful selection of ligands, we address several issues that can be found in bioactivity data. We have improved our previously introduced DEKOIS methodology with enhanced physicochemical matching, now including the consideration of molecular charges, as well as a more sophisticated elimination of latent actives in the decoy set (LADS). We evaluate the docking performance of Glide, GOLD, and AutoDock Vina with our data sets and highlight existing challenges for VS tools. All DEKOIS 2.0 benchmark sets will be made accessible at http://www.dekois.com .

• Ligand-Optimized Homology Models of D1 and D2 Dopamine Receptors: Application for Virtual Screening
Kolaczkowski, Marcin and Bucki, Adam and Feder, Marcin and Pawlowski, Maciej
Journal of chemical information and modeling, 2013, 53(3), 638-648
PMID: 23398329

Recent breakthroughs in crystallographic studies of G protein-coupled receptors (GPCRs), together with continuous progress in molecular modeling methods, have opened new perspectives for structure-based drug discovery. A crucial enhancement in this area was development of induced fit docking procedures that allow optimization of binding pocket conformation guided by the features of its active ligands. In the course of our research program aimed at discovery of novel antipsychotic agents, our attention focused on dopaminergic D2 and D1 receptors (D2R and D1R). Thus we decided to investigate whether the availability of a novel structure of the closely-related D3 receptor and application of induced fit docking procedures for binding pocket refinement would permit the building of models of D2R and D1R that facilitate a successful virtual screening (VS). Here, we provide an in-depth description of the modeling procedure and the discussion of the results of a VS benchmark we performed to compare efficiency of the ligand-optimized receptors in comparison with the regular homology models. We observed that application of the ligand-optimized models significantly improved the VS performance both in terms of BEDROC (0.325 vs. 0.182 for D1R and 0.383 vs. 0.301 for D2R) as well as EF1% (17 vs. 11 for D1R and 18 vs. 7.1 for D2R). In contrast, no improvement was observed for the performance of a D2R model built on the D3R template, when compared with that derived from the structure of the previously published and more evolutionary distant $\beta$2 adrenergic receptor. The comparison of results for receptors built according to various protocols and templates revealed that the most significant factor for the receptor performance was a proper selection of "tool ligand" used in induced fit docking procedure. Taken together, our results suggest that the described homology modeling procedure could be a viable tool for structure-based GPCR ligand design, even for the targets for which only a relatively distant structural template is available.

• Are predicted protein structures of any value for binding site prediction and virtual ligand screening?
Skolnick, Jeffrey and Zhou, Hongyi and Gao, Mu
Current Opinion in Structural Biology VL -, 2013(0 SP - EP - PY - T2 -)
PMID: 23415854     doi: 10.1016/j.sbi.2013.01.009

The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ∼75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.

• Docking-Based Virtual Screening of Covalently Binding Ligands: An Orthogonal Lead Discovery Approach
Schröder, Jörg and Klinger, Anette and Oellien, Frank and Marhofer, Richard J and Duszenko, Michael and Selzer, Paul M
Journal of medicinal chemistry, 2013, 56(4), 1478-1490
PMID: 23350811

In pharmaceutical industry, lead discovery strategies and screening collections have been predominantly tailored to discover compounds that modulate target proteins through noncovalent interactions. Conversely, covalent linkage formation is an important mechanism for a quantity of successful drugs in the market, which are discovered in most cases by hindsight instead of systematical design. In this article, the implementation of a docking-based virtual screening workflow for the retrieval of covalent binders is presented considering human cathepsin K as a test case. By use of the docking conditions that led to the best enrichment of known actives, 44 candidate compounds with unknown activity on cathepsin K were finally selected for experimental evaluation. The most potent inhibitor, 4-(N-phenylanilino)-6-pyrrolidin-1-yl-1,3,5-triazine-2-carbonitrile (CP243522), showed a K(i) of 21 nM and was confirmed to have a covalent reversible mechanism of inhibition. The presented approach will have great potential in cases where covalent inhibition is the desired drug discovery strategy.

• Multiple structures for virtual ligand screening: defining binding site properties-based criteria to optimize the selection of the query.
Ben Nasr, Nesrine and Guillemain, Hélène and Lagarde, Nathalie and Zagury, Jean-François and Montes, Matthieu
Journal of chemical information and modeling, 2013, 53(2), 293-311
PMID: 23312043

Virtual ligand screening is an integral part of the modern drug discovery process. Traditional ligand-based, virtual screening approaches are fast but require a set of structurally diverse ligands known to bind to the target. Traditional structure-based approaches require high-resolution target protein structures and are computationally demanding. In contrast, the recently developed threading/structure-based FINDSITE-based approaches have the advantage that they are as fast as traditional ligand-based approaches and yet overcome the limitations of traditional ligand- or structure-based approaches. These new methods can use predicted low-resolution structures and infer the likelihood of a ligand binding to a target by utilizing ligand information excised from the target's remote or close homologous proteins and/or libraries of ligand binding databases. Here, we develop an improved version of FINDSITE, FINDSITEfilt, that filters out false positive ligands in threading identified templates by a better binding site detection procedure that includes information about the binding site amino acid similarity. We then combine FINDSITEfilt with FINDSITEX that uses publicly available binding databases ChEMBL and DrugBank for virtual ligand screening. The combined approach, FINDSITEcomb, is compared to two traditional docking methods, AUTODOCK Vina and DOCK 6, on the DUD benchmark set. It is shown to be significantly better in terms of enrichment factor, dependence on target structure quality, and speed. FINDSITEcomb is then tested for virtual ligand screening on a large set of 3576 generic targets from the DrugBank database as well as a set of 168 Human GPCRs. Excluding close homologues, FINDSITEcomb gives an average enrichment factor of 52.1 for generic targets and 22.3 for GPCRs within the top 1% of the screened compound library. Around 65% of the targets have better than random enrichment factors. The performance is insensitive to target structure quality, as long as it has a TM-score ≥ 0.4 to native. Thus, FINDSITEcomb makes the screening of millions of compounds across entire proteomes feasible. The FINDSITEcomb web service is freely available for academic users at http://cssb.biology.gatech.edu/skolnick/webservice/FINDSITE-COMB/index.html

• Consensus Docking: Improving the Reliability of Docking in a Virtual Screening Context
Houston, Douglas R and Walkinshaw, Malcolm D
Journal of chemical information and modeling, 2013, 53(2), 384-390
PMID: 23351099

Structure-based virtual screening relies on scoring the predicted binding modes of compounds docked into the target. Because the accuracy of this scoring relies on the accuracy of the docking, methods that increase docking accuracy are valuable. Here, we present a relatively straightforward method for improving the probability of identifying accurately docked poses. The method is similar in concept to consensus scoring schemes, which have been shown to increase ranking power and thus hit rates, but combines information about predicted binding modes rather than predicted binding affinities. The pose prediction success rate of each docking program alone was found in this trial to be 55% for Autodock, 58% for DOCK, and 64% for Vina. By using more than one docking program to predict the binding pose, correct poses were identified in 82% or more of cases, a significant improvement. In a virtual screen, these more reliably posed compounds can be preferentially advanced to subsequent scoring stages to improve hit rates. Consensus docking can be easily introduced into established structure-based virtual screening methodologies.

• Characterization of small molecule binding. I. Accurate identification of strong inhibitors in virtual screening.
Ding, Bo and Wang, Jian and Li, Nan and Wang, Wei
Journal of chemical information and modeling, 2013, 53(1), 114-122
PMID: 23259763     doi: 10.1021/ci300508m

Accurately ranking docking poses remains a great challenge in computer-aided drug design. In this study, we present an integrated approach called MIEC-SVM that combines structure modeling and statistical learning to characterize protein-ligand binding based on the complex structure generated from docking. Using the HIV-1 protease as a model system, we showed that MIEC-SVM can successfully rank the docking poses and consistently outperformed the state-of-art scoring functions when the true positives only account for 1% or 0.5% of all the compounds under consideration. More excitingly, we found that MIEC-SVM can achieve a significant enrichment in virtual screening even when trained on a set of known inhibitors as small as 50, especially when enhanced by a model average approach. Given these features of MIEC-SVM, we believe it provides a powerful tool for searching for and designing new drugs.

• Structure-Based Fragment Screening Is Demonstrated To Be a Practical Lead Discovery Method for a Representative G-Protein-Coupled Receptor
Stevens, Benjamin D
Journal of medicinal chemistry, 2013
PMID: 23614494

• Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments.
Madhavi Sastry, G and Adzhigirey, Matvey and Day, Tyler and Annabhimoju, Ramakrishna and Sherman, Woody
Journal of computer-aided molecular design, 2013, 27(3), 221-234
PMID: 23579614     doi: 10.1007/s10822-013-9644-8

Structure-based virtual screening plays an important role in drug discovery and complements other screening approaches. In general, protein crystal structures are prepared prior to docking in order to add hydrogen atoms, optimize hydrogen bonds, remove atomic clashes, and perform other operations that are not part of the x-ray crystal structure refinement process. In addition, ligands must be prepared to create 3-dimensional geometries, assign proper bond orders, and generate accessible tautomer and ionization states prior to virtual screening. While the prerequisite for proper system preparation is generally accepted in the field, an extensive study of the preparation steps and their effect on virtual screening enrichments has not been performed. In this work, we systematically explore each of the steps involved in preparing a system for virtual screening. We first explore a large number of parameters using the Glide validation set of 36 crystal structures and 1,000 decoys. We then apply a subset of protocols to the DUD database. We show that database enrichment is improved with proper preparation and that neglecting certain steps of the preparation process produces a systematic degradation in enrichments, which can be large for some targets. We provide examples illustrating the structural changes introduced by the preparation that impact database enrichment. While the work presented here was performed with the Protein Preparation Wizard and Glide, the insights and guidance are expected to be generalizable to structure-based virtual screening with other docking methods.

• Fragment-Based Drug Discovery Using a Multidomain, Parallel MD-MM/PBSA Screening Protocol
Zhu, Tian and Lee, Hyun and Lei, Hao and Jones, Christopher and Patel, Kavankumar and Johnson, Michael E and Hevener, Kirk E
Journal of chemical information and modeling, 2013, 53(3), 560-572
PMID: 23432621

We have developed a rigorous computational screening protocol to identify novel fragment-like inhibitors of N(5)-CAIR mutase (PurE), a key enzyme involved in de novo purine synthesis that represents a novel target for the design of antibacterial agents. This computational screening protocol utilizes molecular docking, graphics processing unit (GPU)-accelerated molecular dynamics, and Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) free energy estimations to investigate the binding modes and energies of fragments in the active sites of PurE. PurE is a functional octamer comprised of identical subunits. The octameric structure, with its eight active sites, provided a distinct advantage in these studies because, for a given simulation length, we were able to place eight separate fragment compounds in the active sites to increase the throughput of the MM/PBSA analysis. To validate this protocol, we have screened an in-house fragment library consisting of 352 compounds. The theoretical results were then compared with the results of two experimental fragment screens, Nuclear Magnetic Resonance (NMR) and Surface Plasmon Resonance (SPR) binding analyses. In these validation studies, the protocol was able to effectively identify the competitive binders that had been independently identified by experimental testing, suggesting the potential utility of this method for the identification of novel fragments for future development as PurE inhibitors.

## 2012

• Directory of Useful Decoys, Enhanced (DUD-E) - Better Ligands and Decoys for Better Benchmarking.
Mysinger, Michael and Carchia, Michael and Irwin, John J and Shoichet, Brian K
Journal of medicinal chemistry, 2012, 55(14), 6582-6594
PMID: 22716043     doi: 10.1021/jm300687e

A key metric to assess molecular docking remains ligand enrichment against challenging decoys. Whereas the directory of useful decoys (DUD) has been widely used, clear areas for optimization have emerged. Here we describe an improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22,886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC. To ensure chemotype diversity we cluster each target's ligands by their Bemis-Murcko atomic frameworks. We add net charge to the matched physico-chemical properties, and include only the most dissimilar decoys, by topology, from the ligands. An online automated tool (http://decoys.docking.org) generates these improved matched decoys for user-supplied ligands. We test this dataset by docking all 102 targets, using the results to improve the balance between ligand desolvation and electrostatics in DOCK 3.6. The complete DUD-E benchmarking set is freely available at http://dude.docking.org.

• Application of Drug-perturbed Essential Dynamics/Molecular Dynamics (ED/MD) to Virtual Screening and Rational Drug Design.
Chaudhuri, Rima and Carrillo, Oliver and Laughton, Charles Anthony and Orozco, Modesto
Journal of chemical theory and computation, 2012, 8(7), 2204-2214
doi: 10.1021/ct300223c

We present here the first application of a new algorithm, essential dynamics/molecular dynamics (ED/MD), to the field of small molecule docking. The method uses a previously existing molecular dynamics (MD) ensemble of a protein or protein-drug complex to generate, with a very small computational cost, perturbed ensembles which represent ligand-induced binding site flexibility in a more accurate way than the original trajectory. The use of these perturbed ensembles in a standard docking program leads to superior performance than the same docking procedure using the crystal structure or ensembles obtained from conventional MD simulations as templates. The simplicity and accuracy of the method opens up the possibility of introducing protein flexibility in high-throughput docking experiments.

• AMMOS software: method and application.
Pencheva, T and Lagorce, D and Pajeva, I and Villoutreix, B O and Miteva, M A
Methods in molecular biology (Clifton, N.J.), 2012, 819, 127-141
PMID: 22183534     doi: 10.1007/978-1-61779-465-0_9

Recent advances in computational sciences enabled extensive use of in silico methods in projects at the interface between chemistry and biology. Among them virtual ligand screening, a modern set of approaches, facilitates hit identification and lead optimization in drug discovery programs. Most of these approaches require the preparation of the libraries containing small organic molecules to be screened or a refinement of the virtual screening results. Here we present an overview of the open source AMMOS software, which is a platform performing an automatic procedure that allows for a structural generation and optimization of drug-like molecules in compound collections, as well as a structural refinement of protein-ligand complexes to assist in silico screening exercises.

• Inverse Virtual Screening allows the discovery of the biological activity of natural compounds.
Lauro, Gianluigi and Masullo, Milena and Piacente, Sonia and Riccio, Raffaele and Bifulco, Giuseppe
Bioorganic & Medicinal Chemistry, 2012, 20(11), 3596-3602
PMID: 22537682     doi: 10.1016/j.bmc.2012.03.072

A small library of phenolic natural compounds belonging to different chemical classes was screened on a panel of targets involved in the genesis and progression of cancer. The re-investigation of their potential activity was achieved through the Inverse Virtual Screening approach. The normalization of the predicted binding energies permitted the selection of promising compounds on definite targets, avoiding the selection of false positive results. In vitro biological tests revealed the inhibitory activity of xanthohumol and isoxanthohumol on PDK1 and PKC protein kinases. This study validates the robustness of the Inverse Virtual Screening in silico approach as a useful tool for the identification of the specific biological activity of a given set of compounds.

• Virtual fragment screening: Discovery of histamine H(3) receptor ligands using ligand-based and protein-based molecular fingerprints.
Sirci, Francesco and Istyastono, Enade P and Vischer, Henry F and Kooistra, Albert J and Nijmeijer, Saskia and Kuijer, Martien and Wijtmans, Maikel and Mannhold, Raimund and Leurs, Rob and de Esch, Iwan J P and de Graaf, Chris
Journal of chemical information and modeling, 2012, 52(12), 3308-3324
PMID: 23140085     doi: 10.1021/ci3004094

Virtual Fragment Screening (VFS) is a promising new method that uses computer models to identify small, fragment-like biologically active molecules as useful starting points for Fragment-Based Drug Discovery (FBDD). Training sets of true active and inactive fragment-like molecules to construct and validate target customized VFS methods are however lacking. We have for the first time explored the possibilities and challenges of VFS using molecular fingerprints derived from a unique set of fragment affinity data for the histamine H(3) receptor (H(3)R), a pharmaceutically relevant G Protein-coupled Receptor (GPCR). Optimized FLAP (Fingerprint of Ligands And Proteins) models containing essential molecular interaction fields that discriminate known H(3)R binders from inactive molecules were successfully used for the identification of new H(3)R ligands. Prospective virtual screening of 156,090 molecules yielded a high hit rate of 62% (18 of the 29 tested) experimentally confirmed novel fragment-like H(3)R ligands that offer new potential starting points for the design of H(3)R targeting drugs. The first construction and application of customized FLAP models for the discovery of fragment-like biologically active molecules demonstrates that VFS is an efficient way to explore protein-fragment interaction space in silico.

• Integrated Virtual Screening for the Identification of Novel and Selective Peroxisome Proliferator-Activated Receptor (PPAR) Scaffolds.
Nevin, Daniel K and Peters, Martin B and Carta, Giorgio and Fayne, Darren and Lloyd, David G
Journal of medicinal chemistry, 2012, 55(11), 4978-4989
PMID: 22582973     doi: 10.1021/jm300068n

We describe a fully customizable and integrated target-specific "tiered" virtual screening approach tailored to identifying and characterizing novel peroxisome proliferator activated receptor $\gamma$ (PPAR$\gamma$) scaffolds. Built on structure- and ligand-based computational techniques, a consensus protocol was developed for use in the virtual screening of chemical databases, focused toward retrieval of novel bioactive chemical scaffolds for PPAR$\gamma$. Consequent from application, three novel PPAR scaffolds displaying distinct chemotypes have been identified, namely, 5-(4-(benzyloxy)-3-chlorobenzylidene)dihydro-2-thioxopyrimidine-4,6(1H,5H)-dione (MDG 548), 3-((4-bromophenoxy)methyl)-N-(4-nitro-1H-pyrazol-1-yl)benzamide (MDG 559), and ethyl 2-[3-hydroxy-5-(5-methyl-2-furyl)-2-oxo-4-(2-thienylcarbonyl)-2,5-dihydro-1H-pyrrol-1-yl]-4-methyl-1,3-thiazole-5-carboxylate (MDG 582). Fluorescence polarization(FP) and time resolved fluorescence resonance energy transfer (TR-FRET) show that these compounds display high affinity competitive binding to the PPAR$\gamma$-LBD (EC(50) of 215 nM to 5.45 $\mu$M). Consequent characterization by a TR-FRET activation reporter assay demonstrated agonism of PPAR$\gamma$ by all three compounds (EC(50) of 467-594nM). Additionally, differential PPAR isotype specificity was demonstrated through assay against PPAR$\alpha$ and PPAR$\delta$ subtypes. This work showcases the ability of target specific "tiered screen" protocols to successfully identify novel scaffolds of individual receptor subtypes with greater efficacy than isolated screening methods.

• FINDSITE X: A Structure-Based, Small Molecule Virtual Screening Approach with Application to All Identified Human GPCRs
Zhou, Hongyi and Skolnick, Jeffrey
Molecular Pharmaceutics, 2012, 9(6), 1775-1784
PMID: 22574683     doi: 10.1021/mp3000716

We have developed FINDSITEX, an extension of FINDSITE, a protein threading based algorithm for the inference of protein binding sites, biochemical function and virtual ligand screening, that removes the limitation that holo protein structures (those containing bound ligands) of a sufficiently large set of distant evolutionarily related proteins to the target be solved; rather, predicted protein structures and experimental ligand binding information are employed. To provide the predicted protein structures, a fast and accurate version of our recently developed TASSERVMT, TASSERVMT-lite, for template-based protein structural modeling applicable up to 1000 residues is developed and tested, with comparable performance to the top CASP9 servers. Then, a hybrid approach that combines structure alignments with an evolutionary similarity score for identifying functional relationships between target and proteins with binding data has been developed. By way of illustration, FINDSITEX is applied to 998 identified human G-protein coupled receptors (GPCRs). First, TASSERVMT-lite provides updates of all human GPCR structures previously modeled in our lab. We then use these structures and the new function similarity detection algorithm to screen all human GPCRs against the ZINC8 nonredundant (TC < 0.7) ligand set combined with ligands from the GLIDA database (a total of 88,949 compounds). Testing (excluding GPCRs whose sequence identity > 30% to the target from the binding data library) on a 168 human GPCR set with known binding data, the average enrichment factor in the top 1% of the compound library (EF0.01) is 22.7, whereas EF0.01 by FINDSITE is 7.1. For virtual screening when just the target and its native ligands are excluded, the average EF0.01 reaches 41.4. We also analyze off-target interactions for the 168 protein test set. All predicted structures, virtual screening data and off-target interactions for the 998 human GPCRs are available at http://cssb.biology.gatech.edu/skolnick/webservice/gpcr/index.html.

• Consensus Induced Fit Docking (cIFD): methodology, validation, and application to the discovery of novel Crm1 inhibitors.
Kalid, Ori and Toledo Warshaviak, Dora and Shechter, Sharon and Sherman, Woody and Shacham, Sharon
Journal of computer-aided molecular design, 2012, 26(11), 1217-1228
PMID: 23053738     doi: 10.1007/s10822-012-9611-9

We present the Consensus Induced Fit Docking (cIFD) approach for adapting a protein binding site to accommodate multiple diverse ligands for virtual screening. This novel approach results in a single binding site structure that can bind diverse chemotypes and is thus highly useful for efficient structure-based virtual screening. We first describe the cIFD method and its validation on three targets that were previously shown to be challenging for docking programs (COX-2, estrogen receptor, and HIV reverse transcriptase). We then demonstrate the application of cIFD to the challenging discovery of irreversible Crm1 inhibitors. We report the identification of 33 novel Crm1 inhibitors, which resulted from the testing of 402 purchased compounds selected from a screening set containing 261,680 compounds. This corresponds to a hit rate of 8.2 %. The novel Crm1 inhibitors reveal diverse chemical structures, validating the utility of the cIFD method in a real-world drug discovery project. This approach offers a pragmatic way to implicitly account for protein flexibility without the additional computational costs of ensemble docking or including full protein flexibility during virtual screening.

• Novel Inhibitor Discovery through Virtual Screening against Multiple Protein Conformations Generated via Ligand-Directed Modeling: A Maternal Embryonic Leucine Zipper Kinase Example.
Mahasenan, Kiran V and Li, Chenglong
Journal of chemical information and modeling, 2012, 52(5), 1345-1355
PMID: 22540736     doi: 10.1021/ci300040c

Kinase targets have been demonstrated to undergo major conformational reorganization upon ligand binding. Such protein conformational plasticity remains a significant challenge in structure-based virtual screening methodology and may be approximated by screening against an ensemble of diverse protein conformations. Maternal embryonic leucine zipper kinase (MELK), a member of serine-threonine kinase family, has been recently found to be involved in the tumerogenic state of glioblastoma, breast, ovarian, and colon cancers. We therefore modeled several conformers of MELK utilizing the available chemogenomic and crystallographic data of homologous kinases. We carried out docking pose prediction and virtual screening enrichment studies with these conformers. The performances of the ensembles were evaluated by their ability to reproduce known inhibitor bioactive conformations and to efficiently recover known active compounds early in the virtual screen when seeded with decoy sets. A few of the individual MELK conformers performed satisfactorily in reproducing the native protein-ligand pharmacophoric interactions up to 50% of the cases. By selecting an ensemble of a few representative conformational states, most of the known inhibitor binding poses could be rationalized. For example, a four conformer ensemble is able to recover 95% of the studied actives, especially with imperfect scoring function(s). The virtual screening enrichment varied considerably among different MELK conformers. Enrichment appears to improve by selection of a proper protein conformation. For example, several holo and unliganded active conformations are better to accommodate diverse chemotypes than ATP-bound conformer. These results prove that using an ensemble of diverse conformations could give a better performance. Applying this approach, we were able to screen a commercially available library of half a million compounds against three conformers to discover three novel inhibitors of MELK, one from each template. Among the three compounds validated via experimental enzyme inhibition assays, one is relatively potent (15; K(d)

• Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening?
Grigoryan, Arsen V and Wang, Hong and Cardozo, Timothy J
PloS one, 2012, 7(10), e46532
doi: 10.1371/journal.pone.0046532

The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of ...

• Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening?
Grigoryan, Arsen V and Wang, Hong and Cardozo, Timothy J
PloS one, 2012, 7(10), e46532
doi: 10.1371/journal.pone.0046532

The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of ...

• Computational Approach for Fast Screening of Small Molecular Candidates To Inhibit Crystallization in Amorphous Drugs.
Pajula, Katja and Lehto, Vesa-Pekka and Ketolainen, Jarkko and Korhonen, Ossi
Molecular Pharmaceutics, 2012, 9(10), 2844-2855
PMID: 22867030     doi: 10.1021/mp300135h

The applicability of the computational docking approach was investigated to create a novel method for quick additive screening to inhibit the crystallization taking place in amorphous drugs. Surface energy and attachment energy were utilized to recognize the morphologically most important crystal faces. The surfaces (100), (001), and (010) were identified as target faces, and the estimated free energies of binding of additives on these surfaces were computationally determined. The molecule of the crystallizing compound was included in the group of the modeled additives as the reference and for the validation of the approach. Additives having a lower estimated free energy of binding than the reference molecule itself were considered as potential crystallization inhibitors. Salicylamide, salicylic acid, and sulfanilamide with computationally prescreened additives were melt-quenched, and the nucleation and crystal growth rates were subsequently monitored by polarized light microscopy. As a result, computationally screened additives decelerated the nucleation and crystal growth rates of the studied drugs while the pure drugs crystallized too fast to be measured. The use of a computational approach enabled fast and cost-effective additive selection to retard nucleation and crystal growth, thus facilitating the production of amorphous binary small molecular compounds with stabilized disordered structures.

• DecoyFinder: an easy-to-use python GUI application for building target-specific decoy sets.
Cereto-Massagué, Adrià and Guasch, Laura and Valls, Cristina and Mulero, Miquel and Pujadas, Gerard and Garcia-Vallvé, Santiago
Bioinformatics (Oxford, England), 2012, 28(12), 1661-1662
PMID: 22539671     doi: 10.1093/bioinformatics/bts249

Decoys are molecules that are presumed to be inactive against a target (i.e. will not likely bind to the target) and are used to validate the performance of molecular docking or a virtual screening workflow. The Directory of Useful Decoys database (http://dud.docking.org/) provides a free directory of decoys for use in virtual screening, though it only contains a limited set of decoys for 40 targets.To overcome this limitation, we have developed an application called DecoyFinder that selects, for a given collection of active ligands of a target, a set of decoys from a database of compounds. Decoys are selected if they are similar to active ligands according to five physical descriptors (molecular weight, number of rotational bonds, total hydrogen bond donors, total hydrogen bond acceptors and the octanol-water partition coefficient) without being chemically similar to any of the active ligands used as an input (according to the Tanimoto coefficient between MACCS fingerprints). To the best of our knowledge, DecoyFinder is the first application designed to build target-specific decoy sets. AVAILABILITY: A complete description of the software is included on the application home page. A validation of DecoyFinder on 10 DUD targets is provided as Supplementary Table S1. DecoyFinder is freely available at http://URVnutrigenomica-CTNS.github.com/DecoyFinder.

• Integrating Ligand-Based and Protein-Centric Virtual Screening of Kinase Inhibitors Using Ensembles of Multiple Protein Kinase Genes and Conformations.
Dixit, Anshuman and Verkhivker, Gennady M
Journal of chemical information and modeling, 2012, 52(10), 2501-2515
PMID: 22992037     doi: 10.1021/ci3002638

The rapidly growing wealth of structural and functional information about kinase genes and kinase inhibitors that is fueled by a significant therapeutic role of this protein family provides a significant impetus for development of targeted computational screening approaches. In this work, we explore an ensemble-based, protein-centric approach that allows for simultaneous virtual ligand screening against multiple kinase genes and multiple kinase receptor conformations. We systematically analyze and compare the results of ligand-based and protein-centric screening approaches using both single-receptor and ensemble-based docking protocols. A panel of protein kinase targets that includes ABL, EGFR, P38, CDK2, TK, and VEGFR2 kinases is used in this comparative analysis. By applying various performance metrics we have shown that ligand-centric shape matching can provide an effective enrichment of active compounds outperforming single-receptor docking screening. However, ligand-based approaches can be highly sensitive to the choice of inhibitor queries. Employment of multiple inhibitor queries combined with parallel selection ranking criteria can improve the performance and efficiency of ligand-based virtual screening. We also demonstrated that replica-exchange Monte Carlo docking with kinome-based ensembles of multiple crystal structures can provide a superior early enrichment on the kinase targets. The central finding of this study is that incorporation of the template-based structural information about kinase inhibitors and protein kinase structures in diverse functional states can significantly enhance the overall performance and robustness of both ligand and protein-centric screening strategies. The results of this study may be useful in virtual screening of kinase inhibitors potentially offering a beneficial spectrum of therapeutic activities across multiple disease states.

• Potential and Limitations of Ensemble Docking.
Korb, Oliver and Olsson, Tjelvar S G and Bowden, Simon J and Hall, Richard J and Verdonk, Marcel L and Liebeschuetz, John W and Cole, Jason C
Journal of chemical information and modeling, 2012, 52(5), 1262-1274
PMID: 22482774     doi: 10.1021/ci2005934

A major problem in structure-based virtual screening applications is the appropriate selection of a single or even multiple protein structures to be used in the virtual screening process. A priori it is unknown which protein structure(s) will perform best in a virtual screening experiment. We investigated the performance of ensemble docking, as a function of ensemble size, for eight targets of pharmaceutical interest. Starting from single protein structure docking results, for each ensemble size up to 500 000 combinations of protein structures were generated, and, for each ensemble, pose prediction and virtual screening results were derived. Comparison of single to multiple protein structure results suggests improvements when looking at the performance of the worst and the average over all single protein structures to the performance of the worst and average over all protein ensembles of size two or greater, respectively. We identified several key factors affecting ensemble docking performance, including the sampling accuracy of the docking algorithm, the choice of the scoring function, and the similarity of database ligands to the cocrystallized ligands of ligand-bound protein structures in an ensemble. Due to these factors, the prospective selection of optimum ensembles is a challenging task, shown by a reassessment of published ensemble selection protocols.

• A reverse combination of structure-based and ligand-based strategies for virtual screening.
Cortés-Cabrera, Alvaro and Gago, Federico and Morreale, Antonio
Journal of computer-aided molecular design, 2012, 26(3), 319-327
PMID: 22395903     doi: 10.1007/s10822-012-9558-x

A new approach is presented that combines structure- and ligand-based virtual screening in a reverse way. Opposite to the majority of the methods, a docking protocol is first employed to prioritize small ligands ("fragments") that are subsequently used as queries to search for similar larger ligands in a database. For a given chemical library, a three-step strategy is followed consisting of (1) contraction into a representative, non-redundant, set of fragments, (2) selection of the three best-scoring fragments docking into a given macromolecular target site, and (3) expansion of the fragments' structures back into ligands by using them as queries to search the library by means of fingerprint descriptions and similarity criteria. We tested the performance of this approach on a collection of fragments and ligands found in the ZINC database and the directory of useful decoys, and compared the results with those obtained using a standard docking protocol. The new method provided better overall results and was several times faster. We also studied the chemical diversity that both methods cover using an in-house compound library and concluded that the novel approach performs similarly but at a much smaller computational cost.

• Core Site-Moiety Maps Reveal Inhibitors and Binding Mechanisms of Orthologous Proteins by Screening Compound Libraries
Hsu, Kai-Cheng and Cheng, Wen-Chi and Chen, Yen-Fu and Wang, Hung-Jung and Li, Ling-Ting and Wang, Wen-Ching and Yang, Jinn-Moon
PloS one, 2012, 7(2), e32142
doi: 10.1371/journal.pone.0032142.g007

Members of protein families often share conserved structural subsites for interaction with chemically similar moieties despite low sequence identity. We propose a core site-moiety map of multiple proteins (called CoreSiMMap) to discover inhibitors and mechanisms by profiling subsite-moiety interactions of immense screening compounds. The consensus anchor, the subsite-moiety interactions with statistical significance, of a CoreSiMMap can be regarded as a hot spot'' that represents the conserved binding environments involved in biological functions. Here, we derive the CoreSiMMap with six consensus anchors and identify six inhibitors (IC50,8.0 mM) of shikimate kinases (SKs) of Mycobacterium tuberculosis and Helicobacter pylori from the NCI database (236,962 compounds). Studies of site-directed mutagenesis and analogues reveal that these conserved interacting residues and moieties contribute to pocket-moiety interaction spots and biological functions. These results reveal that our multi-target screening strategy and the CoreSiMMap can increase the accuracy of screening in the identification of novel inhibitors and subsite-moiety environments for elucidating the binding mechanisms of targets.

• Virtual fragment screening: exploration of MM-PBSA re-scoring.
Kawatkar, Sameer and Moustakas, Demetri and Miller, Matthew and Joseph-McCarthy, Diane
Journal of computer-aided molecular design, 2012, 26(8), 921-934
PMID: 22869295     doi: 10.1007/s10822-012-9590-x

An NMR fragment screening dataset with known binders and decoys was used to evaluate the ability of docking and re-scoring methods to identify fragment binders. Re-scoring docked poses using the Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) implicit solvent model identifies additional active fragments relative to either docking or random fragment screening alone. Early enrichment, which is clearly most important in practice for selecting relatively small sets of compounds for experimental testing, is improved by MM-PBSA re-scoring. In addition, the value in MM-PBSA re-scoring of docked poses for virtual screening may be in lessening the effect of the variation in the protein complex structure used.

• Virtual Target Screening: Validation Using Kinase Inhibitors.
Santiago, Daniel N and Pevzner, Yuri and Durand, Ashley A and Tran, Minhphuong and Scheerer, Rachel R and Daniel, Kenyon and Sung, Shen-Shu and Lee Woodcock, H and Guida, Wayne C and Brooks, Wesley H
Journal of chemical information and modeling, 2012, 52(8), 2192-2203
PMID: 22747098     doi: 10.1021/ci300073m

Computational methods involving virtual screening could potentially be employed to discover new biomolecular targets for an individual molecule of interest (MOI). However, existing scoring functions may not accurately differentiate proteins to which the MOI binds from a larger set of macromolecules in a protein structural database. An MOI will most likely have varying degrees of predicted binding affinities to many protein targets. However, correctly interpreting a docking score as a hit for the MOI docked to any individual protein can be problematic. In our method, which we term "Virtual Target Screening (VTS)", a set of small drug-like molecules are docked against each structure in the protein library to produce benchmark statistics. This calibration provides a reference for each protein so that hits can be identified for an MOI. VTS can then be used as tool for: drug repositioning (repurposing), specificity and toxicity testing, identifying potential metabolites, probing protein structures for allosteric sites, and testing focused libraries (collection of MOIs with similar chemotypes) for selectivity. To validate our VTS method, twenty kinase inhibitors were docked to a collection of calibrated protein structures. Here, we report our results where VTS predicted protein kinases as hits in preference to other proteins in our database. Concurrently, a graphical interface for VTS was developed.

• Cheminformatics Meets Molecular Mechanics: A Combined Application of Knowledge-Based Pose Scoring and Physical Force Field-Based Hit Scoring Functions Improves the Accuracy of Structure-Based Virtual Screening
Hsieh, Jui-Hua and Yin, Shuangye and Wang, Xiang S and Liu, Shubin and Dokholyan, Nikolay V and Tropsha, Alexander
Journal of chemical information and modeling, 2012, 52(1), 16-28
PMID: 22017385     doi: 10.1021/ci2002507

• Virtual screening data fusion using both structure- and ligand-based methods.
Svensson, Fredrik and Karlén, Anders and Sköld, Christian
Journal of chemical information and modeling, 2012, 52(1), 225-232
PMID: 22148635     doi: 10.1021/ci2004835

Virtual screening is widely applied in drug discovery, and significant effort has been put into improving current methods. In this study, we have evaluated the performance of compound ranking in virtual screening using five different data fusion algorithms on a total of 16 data sets. The data were generated by docking, pharmacophore search, shape similarity, and electrostatic similarity, spanning both structure- and ligand-based methods. The algorithms used for data fusion were sum rank, rank vote, sum score, Pareto ranking, and parallel selection. None of the fusion methods require any prior knowledge or input other than the results from the single methods and, thus, are readily applicable. The results show that compound ranking using data fusion improves the performance and consistency of virtual screening compared to the single methods alone. The best performing data fusion algorithm was parallel selection, but both rank voting and Pareto ranking also have good performance.

• Enrichment of virtual hits by progressive shape-matching and docking.
Choi, Jiwon and He, Ningning and Kim, Nayoung and Yoon, Sukjoon
Journal of molecular graphics & modelling, 2012, 32, 82-88
PMID: 22088763     doi: 10.1016/j.jmgm.2011.10.002

The main applications of virtual chemical screening include the selection of a minimal receptor-relevant subset of a chemical library with a maximal chemical diversity. We have previously reported that the combination of ligand-centric and receptor-centric virtual screening methods may provide a compromise between computational time and accuracy during the hit enrichment process. In the present work, we propose a "progressive distributed docking" method that improves the virtual screening process using an iterative combination of shape-matching and docking steps. Known ligands with low docking scores were used as initial 3D templates for the shape comparisons with the chemical library. Next, new compounds with good template shape matches and low receptor docking scores were selected for the next round of shape searching and docking. The present iterative virtual screening process was tested for enriching peroxisome proliferator-activated receptor and phosphoinositide 3-kinase relevant compounds from a selected subset of the chemical libraries. It was demonstrated that the iterative combination improved the lead-hopping practice by improving the chemical diversity in the selected list of virtual hits.

## 2011

• Virtual decoy sets for molecular docking benchmarks.
Wallach, Izhar and Lilien, Ryan
Journal of chemical information and modeling, 2011, 51(2), 196-202
PMID: 21207928     doi: 10.1021/ci100374f

Virtual docking algorithms are often evaluated on their ability to separate active ligands from decoy molecules. The current state-of-the-art benchmark, the Directory of Useful Decoys (DUD), minimizes bias by including decoys from a library of synthetically feasible molecules that are physically similar yet chemically dissimilar to the active ligands. We show that by ignoring synthetic feasibility, we can compile a benchmark that is comparable to the DUD and less biased with respect to physical similarity.

• FRED pose prediction and virtual screening accuracy.
McGann, Mark
Journal of chemical information and modeling, 2011, 51(3), 578-596
PMID: 21323318     doi: 10.1021/ci100436p

Results of a previous docking study are reanalyzed and extended to include results from the docking program FRED and a detailed statistical analysis of both structure reproduction and virtual screening results. FRED is run both in a traditional docking mode and in a hybrid mode that makes use of the structure of a bound ligand in addition to the protein structure to screen molecules. This analysis shows that most docking programs are effective overall but highly inconsistent, tending to do well on one system and poorly on the next. Comparing methods, the difference in mean performance on DUD is found to be statistically significant (95% confidence) 61% of the time when using a global enrichment metric (AUC). Early enrichment metrics are found to have relatively poor statistical power, with 0.5% early enrichment only able to distinguish methods to 95% confidence 14% of the time.

• Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers.
Collignon, Barbara and Schulz, Roland and Smith, Jeremy C and Baudry, Jerome
Journal of computational chemistry, 2011, 32(6), 1202-1209
PMID: 21387347     doi: 10.1002/jcc.21696

A message passing interface (MPI)-based implementation (Autodock4.lga.MPI) of the grid-based docking program Autodock4 has been developed to allow simultaneous and independent docking of multiple compounds on up to thousands of central processing units (CPUs) using the Lamarkian genetic algorithm. The MPI version reads a single binary file containing precalculated grids that represent the protein-ligand interactions, i.e., van der Waals, electrostatic, and desolvation potentials, and needs only two input parameter files for the entire docking run. In comparison, the serial version of Autodock4 reads ASCII grid files and requires one parameter file per compound. The modifications performed result in significantly reduced input/output activity compared with the serial version. Autodock4.lga.MPI scales up to 8192 CPUs with a maximal overhead of 16.3%, of which two thirds is due to input/output operations and one third originates from MPI operations. The optimal docking strategy, which minimizes docking CPU time without lowering the quality of the database enrichments, comprises the docking of ligands preordered from the most to the least flexible and the assignment of the number of energy evaluations as a function of the number of rotatable bounds. In 24 h, on 8192 high-performance computing CPUs, the present MPI version would allow docking to a rigid protein of about 300K small flexible compounds or 11 million rigid compounds.

• Substantial improvements in large-scale redocking and screening using the novel HYDE scoring function.
Schneider, Nadine and Hindle, Sally and Lange, Gudrun and Klein, Robert and Albrecht, Jürgen and Briem, Hans and Beyer, Kristin and Clau{\ss}en, Holger and Gastreich, Marcus and Lemmen, Christian and Rarey, Matthias
Journal of computer-aided molecular design, 2011, 26(6), 701-723
PMID: 22203423     doi: 10.1007/s10822-011-9531-0

The HYDE scoring function consistently describes hydrogen bonding, the hydrophobic effect and desolvation. It relies on HYdration and DEsolvation terms which are calibrated using octanol/water partition coefficients of small molecules. We do not use affinity data for calibration, therefore HYDE is generally applicable to all protein targets. HYDE reflects the Gibbs free energy of binding while only considering the essential interactions of protein-ligand complexes. The greatest benefit of HYDE is that it yields a very intuitive atom-based score, which can be mapped onto the ligand and protein atoms. This allows the direct visualization of the score and consequently facilitates analysis of protein-ligand complexes during the lead optimization process. In this study, we validated our new scoring function by applying it in large-scale docking experiments. We could successfully predict the correct binding mode in 93% of complexes in redocking calculations on the Astex diverse set, while our performance in virtual screening experiments using the DUD dataset showed significant enrichment values with a mean AUC of 0.77 across all protein targets with little or no structural defects. As part of these studies, we also carried out a very detailed analysis of the data that revealed interesting pitfalls, which we highlight here and which should be addressed in future benchmark datasets.

• Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors.
Gatica, Edgar A and Cavasotto, Claudio N
Journal of chemical information and modeling, 2011, 52(1), 1-6
PMID: 22168315     doi: 10.1021/ci200412p

We compiled a G protein-coupled receptor (GPCR) ligand library (GLL) for 147 targets, selecting for each ligand 39 decoy molecules, collected in the GPCR Decoy Database (GDD). Decoys were chosen ensuring a ligand-decoy similarity of six physical properties, while enforcing ligand-decoy chemical dissimilarity. The performance in docking of the GDD was evaluated on 19 GPCRs, showing a marked decrease in enrichment compared to bias-uncorrected decoy sets. Both the GLL and GDD are freely available for the scientific community.

• DEKOIS: Demanding Evaluation Kits for Objective in Silico Screening - A Versatile Tool for Benchmarking Docking Programs and Scoring Functions.
Vogel, Simon M and Bauer, Matthias R and Boeckler, Frank M
Journal of chemical information and modeling, 2011, 51(10), 2650-2665
PMID: 21774552     doi: 10.1021/ci2001549

For widely applied in silico screening techniques success depends on the rational selection of an appropriate method. We herein present a fast, versatile, and robust method to construct demanding evaluation kits for objective in silico screening (DEKOIS). This automated process enables creating tailor-made decoy sets for any given sets of bioactives. It facilitates a target-dependent validation of docking algorithms and scoring functions helping to save time and resources. We have developed metrics for assessing and improving decoy set quality and employ them to investigate how decoy embedding affects docking. We demonstrate that screening performance is target-dependent and can be impaired by latent actives in the decoy set (LADS) or enhanced by poor decoy embedding. The presented method allows extending and complementing the collection of publicly available high quality decoy sets toward new target space. All present and future DEKOIS data sets will be made accessible at www.dekois.com .

• PLS-DA - Docking Optimized Combined Energetic Terms (PLSDA-DOCET) protocol: a brief evaluation.
Avram, Sorin and Pacureanu, Liliana Mioara and Seclaman, Edward and Bora, Alina and Kurunczi, Ludovic G
Journal of chemical information and modeling, 2011, 51(12), 3169-3179
PMID: 22066983     doi: 10.1021/ci2002268

Docking studies have become popular approaches in drug design, where the binding energy of the ligand in the active site of the protein is estimated by a scoring function. Many promising techniques were developed to enhance the performance of scoring functions including the fusion of multiple scoring functions outcomes into a so-called consensus scoring function. Hereby, we evaluated the target oriented consensus technique using the energetic terms of several scoring functions. The approach was denoted PLSDA-DOCET. Optimization strategies for consensus energetic terms and scoring functions based on ROC metric were compared to classical rigid docking and to ligand-based similarity search methods comprising 2D fingerprints and ROCS. The ROCS results indicate large performance variations depending on the biological target. The AUC-based strategy of PLSDA-DOCET outperformed the other docking approaches regarding simple retrieval and scaffold-hopping. The superior performance of PLSDA-DOCET protocol relative to single and combined scoring functions was validated on an external test set. We found a relative low mean correlation of the ranks of the chemotypes retrieved by the PLSDA-DOCET protocol and all the other methods employed here.

• ReverseScreen3D: a structure-based ligand matching method to identify protein targets.
Kinnings, Sarah L and Jackson, Richard M
Journal of chemical information and modeling, 2011, 51(3), 624-634
PMID: 21361385     doi: 10.1021/ci1003174

Ligand promiscuity, which is now recognized as an extremely common phenomenon, is a major underlying cause of drug toxicity. We have developed a new reverse virtual screening (VS) method called ReverseScreen3D, which can be used to predict the potential protein targets of a query compound of interest. The method uses a 2D fingerprint-based method to select a ligand template from each unique binding site of each protein within a target database. The target database contains only the structurally determined bioactive conformations of known ligands. The 2D comparison is followed by a 3D structural comparison to the selected query ligand using a geometric matching method, in order to prioritize each target binding site in the database. We have evaluated the performance of the ReverseScreen2D and 3D methods using a diverse set of small molecule protein inhibitors known to have multiple targets, and have shown that they are able to provide a highly significant enrichment of true targets in the database. Furthermore, we have shown that the 3D structural comparison improves early enrichment when compared with the 2D method alone, and that the 3D method performs well even in the absence of 2D similarity to the template ligands. By carrying out further experimental screening on the prioritized list of targets, it may be possible to determine the potential targets of a new compound or determine the off-targets of an existing drug. The ReverseScreen3D method has been incorporated into a Web server, which is freely available at http://www.modelling.leeds.ac.uk/ReverseScreen3D .

• BEAR, a novel virtual screening methodology for drug discovery.
Degliesposti, Gianluca and Portioli, Corinne and Parenti, Marco Daniele and Rastelli, Giulio
Journal of biomolecular screening, 2011, 16(1), 129-133
PMID: 21084717     doi: 10.1177/1087057110388276

BEAR (binding estimation after refinement) is a new virtual screening technology based on the conformational refinement of docking poses through molecular dynamics and prediction of binding free energies using accurate scoring functions. Here, the authors report the results of an extensive benchmark of the BEAR performance in identifying a smaller subset of known inhibitors seeded in a large (1.5 million) database of compounds. BEAR performance proved strikingly better if compared with standard docking screening methods. The validations performed so far showed that BEAR is a reliable tool for drug discovery. It is fast, modular, and automated, and it can be applied to virtual screenings against any biological target with known structure and any database of compounds.

• Using consensus-shape clustering to identify promiscuous ligands and protein targets and to choose the right query for shape-based virtual screening.
Pérez-Nueno, Violeta I and Ritchie, David W
Journal of chemical information and modeling, 2011, 51(6), 1233-1248
PMID: 21604699     doi: 10.1021/ci100492r

Ligand-based shape matching approaches have become established as important and popular virtual screening (VS) techniques. However, despite their relative success, many authors have discussed how best to choose the initial query compounds and which of their conformations should be used. Furthermore, it is increasingly the case that pharmaceutical companies have multiple ligands for a given target and these may bind in different ways to the same pocket. Conversely, a given ligand can sometimes bind to multiple targets, and this is clearly of great importance when considering drug side-effects. We recently introduced the notion of spherical harmonic-based "consensus shapes" to help deal with these questions. Here, we apply a consensus shape clustering approach to the 40 protein-ligand targets in the DUD data set using PARASURF/PARAFIT. Results from clustering show that in some cases the ligands for a given target are split into two subgroups which could suggest they bind to different subsites of the same target. In other cases, our clustering approach sometimes groups together ligands from different targets, and this suggests that those ligands could bind to the same targets. Hence spherical harmonic-based clustering can rapidly give cross-docking information while avoiding the expense of performing all-against-all docking calculations. We also report on the effect of the query conformation on the performance of shape-based screening of the DUD data set and the potential gain in screening performance by using consensus shapes calculated in different ways. We provide details of our analysis of shape-based screening using both PARASURF/PARAFIT and ROCS, and we compare the results obtained with shape-based and conventional docking approaches using MSSH/SHEF and GOLD. The utility of each type of query is analyzed using commonly reported statistics such as enrichment factors (EF) and receiver-operator-characteristic (ROC) plots as well as other early performance metrics.

• iSMART: an integrated cloud computing web server for traditional Chinese medicine for online virtual screening, de novo evolution and drug design.
Chang, Kai-Wei and Tsai, Tsung-Ying and Chen, Kuan-Chung and Yang, Shun-Chieh and Huang, Hung-Jin and Chang, Tung-Ti and Sun, Mao-Feng and Chen, Hsin-Yi and Tsai, Fuu-Jen and Chen, Calvin Yu-Chian
Journal of biomolecular structure & dynamics, 2011, 29(1), 243-250
PMID: 21696236

• Consensus virtual screening approaches to predict protein ligands.
Kukol, Andreas
European journal of medicinal chemistry, 2011, 46(9), 4661-4664
PMID: 21640444     doi: 10.1016/j.ejmech.2011.05.026

In order to exploit the advantages of receptor-based virtual screening, namely time/cost saving and specificity, it is important to rely on algorithms that predict a high number of active ligands at the top ranks of a small molecule database. Towards that goal consensus methods combining the results of several docking algorithms were developed and compared against the individual algorithms. Furthermore, a recently proposed rescoring method based on drug efficiency indices was evaluated. Among AutoDock Vina 1.0, AutoDock 4.2 and GemDock, AutoDock Vina was the best performing single method in predicting high affinity ligands from a database of known ligands and decoys. The rescoring of predicted binding energies with the water/octanol partition coefficient did not lead to an improvement averaged over ten receptor targets. Various consensus algorithms were investigated and a simple combination of AutoDock and AutoDock Vina results gave the most consistent performance that showed early enrichment of known ligands for all receptor targets investigated. In case a number of ligands is known for a specific target, every method proposed in this study should be evaluated.

• Exhaustive search and solvated interaction energy (SIE) for virtual screening and affinity prediction.
Sulea, Traian and Hogues, Hervé and Purisima, Enrico O
Journal of computer-aided molecular design, 2011, 26(5), 617-633
PMID: 22198519     doi: 10.1007/s10822-011-9529-7

We carried out a prospective evaluation of the utility of the SIE (solvation interaction energy) scoring function for virtual screening and binding affinity prediction. Since experimental structures of the complexes were not provided, this was an exercise in virtual docking as well. We used our exhaustive docking program, Wilma, to provide high-quality poses that were rescored using SIE to provide binding affinity predictions. We also tested the combination of SIE with our latest solvation model, first shell of hydration (FiSH), which captures some of the discrete properties of water within a continuum model. We achieved good enrichment in virtual screening of fragments against trypsin, with an area under the curve of about 0.7 for the receiver operating characteristic curve. Moreover, the early enrichment performance was quite good with 50% of true actives recovered with a 15% false positive rate in a prospective calculation and with a 3% false positive rate in a retrospective application of SIE with FiSH. Binding affinity predictions for both trypsin and host-guest complexes were generally within 2 kcal/mol of the experimental values. However, the rank ordering of affinities differing by 2 kcal/mol or less was not well predicted. On the other hand, it was encouraging that the incorporation of a more sophisticated solvation model into SIE resulted in better discrimination of true binders from binders. This suggests that the inclusion of proper Physics in our models is a fruitful strategy for improving the reliability of our binding affinity predictions.

## 2010

• Homology modeling and metabolism prediction of human carboxylesterase-2 using docking analyses by GriDock: a parallelized tool based on AutoDock 4.0.
Vistoli, Giulio and Pedretti, Alessandro and Mazzolari, Angelica and Testa, Bernard
Journal of computer-aided molecular design, 2010, 24(9), 771-787
PMID: 20623318     doi: 10.1007/s10822-010-9373-1

Metabolic problems lead to numerous failures during clinical trials, and much effort is now devoted to developing in silico models predicting metabolic stability and metabolites. Such models are well known for cytochromes P450 and some transferases, whereas less has been done to predict the activity of human hydrolases. The present study was undertaken to develop a computational approach able to predict the hydrolysis of novel esters by human carboxylesterase hCES2. The study involved first a homology modeling of the hCES2 protein based on the model of hCES1 since the two proteins share a high degree of homology (congruent with 73%). A set of 40 known substrates of hCES2 was taken from the literature; the ligands were docked in both their neutral and ionized forms using GriDock, a parallel tool based on the AutoDock4.0 engine which can perform efficient and easy virtual screening analyses of large molecular databases exploiting multi-core architectures. Useful statistical models (e.g., r (2)

• Rapid flexible docking using a stochastic rotamer library of ligands.
Ding, Feng and Yin, Shuangye and Dokholyan, Nikolay V
Journal of chemical information and modeling, 2010, 50(9), 1623-1632
PMID: 20712341     doi: 10.1021/ci100218t

Existing flexible docking approaches model the ligand and receptor flexibility either separately or in a loosely coupled manner, which captures the conformational changes inefficiently. Here, we propose a flexible docking approach, MedusaDock, which models both ligand and receptor flexibility simultaneously with sets of discrete rotamers. We developed an algorithm to build the ligand rotamer library "on-the-fly" during docking simulations. MedusaDock benchmarks demonstrate a rapid sampling efficiency and high prediction accuracy in both self- (to the cocrystallized state) and cross-docking (to a state cocrystallized with a different ligand), the latter of which mimics the virtual screening procedure in computational drug discovery. We also perform a virtual screening test of four flexible kinase targets, including cyclin-dependent kinase 2, vascular endothelial growth factor receptor 2, HIV reverse transcriptase, and HIV protease. We find significant improvements of virtual screening enrichments when compared to rigid-receptor methods. The predictive power of MedusaDock in cross-docking and preliminary virtual-screening benchmarks highlights the importance to model both ligand and receptor flexibility simultaneously in computational docking.

• Chemical space sampling by different scoring functions and crystal structures.
Brooijmans, Natasja and Humblet, Christine
Journal of computer-aided molecular design, 2010, 24(5), 433-447
PMID: 20401681     doi: 10.1007/s10822-010-9356-2

Virtual screening has become a popular tool to identify novel leads in the early phases of drug discovery. A variety of docking and scoring methods used in virtual screening have been the subject of active research in an effort to gauge limitations and articulate best practices. However, how to best utilize different scoring functions and various crystal structures, when available, is not yet well understood. In this work we use multiple crystal structures of PI3 K-gamma in both prospective and retrospective virtual screening experiments. Both Glide SP scoring and Prime MM-GBSA rescoring are utilized in the prospective and retrospective virtual screens, and consensus scoring is investigated in the retrospective virtual screening experiments. The results show that each of the different crystal structures that was used, samples a different chemical space, i.e. different chemotypes are prioritized by each structure. In addition, the different (re)scoring functions prioritize different chemotypes as well. Somewhat surprisingly, the Prime MM-GBSA scoring function generally gives lower enrichments than Glide SP. Finally we investigate the impact of different ligand preparation protocols on virtual screening enrichment factors. In summary, different crystal structures and different scoring functions are complementary to each other and allow for a wider variety of chemotypes to be considered for experimental follow-up.

• MOLA: a bootable, self-configuring system for virtual screening using AutoDock4/Vina on computer clusters.
Abreu, Rui Mv and Froufe, Hugo Jc and Queiroz, Maria Jo{\~a}o Rp and Ferreira, Isabel Cfr
Journal of cheminformatics, 2010, 2(1), 10
PMID: 21029419     doi: 10.1186/1758-2946-2-10

BACKGROUND:Virtual screening of small molecules using molecular docking has become an important tool in drug discovery. However, large scale virtual screening is time demanding and usually requires dedicated computer clusters. There are a number of software tools that perform virtual screening using AutoDock4 but they require access to dedicated Linux computer clusters. Also no software is available for performing virtual screening with Vina using computer clusters. In this paper we present MOLA, an easy-to-use graphical user interface tool that automates parallel virtual screening using AutoDock4 and/or Vina in bootable non-dedicated computer clusters.

• Improving performance of docking-based virtual screening by structural filtration.
Novikov, Fedor N and Stroylov, Viktor S and Stroganov, Oleg V and Chilov, Ghermes G
Journal of Molecular Modeling, 2010, 16(7), 1223-1230
PMID: 20041273     doi: 10.1007/s00894-009-0633-8

In the current study an innovative method of structural filtration of docked ligand poses is introduced and applied to improve the virtual screening results. The structural filter is defined by a protein-specific set of interactions that are a) structurally conserved in available structures of a particular protein with its bound ligands, and b) that can be viewed as playing the crucial role in protein-ligand binding. The concept was evaluated on a set of 10 diverse proteins, for which the corresponding structural filters were developed and applied to the results of virtual screening obtained with the Lead Finder software. The application of structural filtration resulted in a considerable improvement of the enrichment factor ranging from several folds to hundreds folds depending on the protein target. It appeared that the structural filtration had effectively repaired the deficiencies of the scoring functions that used to overestimate decoy binding, resulting into a considerably lower false positive rate. In addition, the structural filters were also effective in dealing with some deficiencies of the protein structure models that would lead to false negative predictions otherwise. The ability of structural filtration to recover relatively small but specifically bound molecules creates promises for the application of this technology in the fragment-based drug discovery.

• Biased retrieval of chemical series in receptor-based virtual screening.
Brooijmans, Natasja and Cross, Jason B and Humblet, Christine
Journal of computer-aided molecular design, 2010, 24(12), 1053-1062
PMID: 21053053     doi: 10.1007/s10822-010-9394-9

Using the kinases in the DUD dataset and an in-house HTS dataset from PI3K-$\gamma$, receptor-based virtual screening experiments were performed using Glide SP docking. While significant enrichments were observed for eight of the nine targets in the set, more detailed analyses highlighted that much of the early enrichment (10-80%) is the result of retrieval of a single cluster of active compounds. This biased retrieval was not necessarily due to early enrichment of the cluster containing the co-crystallized ligand. Virtual screening validation studies could thus benefit from including cluster-based analyses to assess enrichment of diverse chemotypes.

• Comparison of three preprocessing filters efficiency in virtual screening: identification of new putative LXRbeta regulators as a test case.
Ghemtio, Léo and Devignes, Marie-Dominique and Smaïl-Tabbone, Malika and Souchet, Michel and Leroux, Vincent and Maigret, Bernard
Journal of chemical information and modeling, 2010, 50(5), 701-715
PMID: 20420434     doi: 10.1021/ci900356m

In silico screening methodologies are widely recognized as efficient approaches in early steps of drug discovery. However, in the virtual high-throughput screening (VHTS) context, where hit compounds are searched among millions of candidates, three-dimensional comparison techniques and knowledge discovery from databases should offer a better efficiency to finding novel drug leads than those of computationally expensive molecular dockings. Therefore, the present study aims at developing a filtering methodology to efficiently eliminate unsuitable compounds in VHTS process. Several filters are evaluated in this paper. The first two are structure-based and rely on either geometrical docking or pharmacophore depiction. The third filter is ligand-based and uses knowledge-based and fingerprint similarity techniques. These filtering methods were tested with the Liver X Receptor (LXR) as a target of therapeutic interest, as LXR is a key regulator in maintaining cholesterol homeostasis. The results show that the three considered filters are complementary so that their combination should generate consistent compound lists of potential hits.

• VSDocker: a tool for parallel high-throughput virtual screening using AutoDock on Windows-based computer clusters
Prakhov, Nikita D. and Chernorudskiy, Alexander L. and Gainullin, Murat R.
Bioinformatics (Oxford, England), 2010, 26(10), 1374-1375
PMID: 20378556     doi: 10.1093/bioinformatics/btq149

VSDocker is an original program that allows using AutoDock4 for optimized virtual ligand screening on computer clusters or multiprocessor workstations. This tool is the first implementation of parallel high-performance virtual screening of ligands for MS Windows-based computer systems.

• Efficient virtual screening using multiple protein conformations described as negative images of the ligand-binding site.
Virtanen, Salla I and Pentikäinen, Olli T
Journal of chemical information and modeling, 2010, 50(6), 1005-1011
PMID: 20504004     doi: 10.1021/ci100121c

The protein structure-based virtual screening is typically accomplished using a molecular docking procedure. However, docking is a fairly slow process that is limited by the available scoring functions that cannot reliably distinguish between active and inactive ligands. In contrast, the ligand-based screening methods that are based on shape similarity identify the active ligands with high accuracy. Here, we show that the usage of negative images of the ligand-binding site, together with shape comparison tools, which are typically used in ligand-based virtual screening, improve the discrimination of active molecules from inactives. In contrast to ligand-based shape comparison, the negative image of the binding site allows identification of compounds whose shape complements the shape of the ligand-binding cavity as closely as possible. Furthermore, the use of several target protein conformations allows the identification of active ligands whose shape is not optimal for crystallized protein conformation. Accordingly, the presented virtual screening method improves the identification of novel lead molecules by concentrating on the optimally shaped molecules for the flexible ligand binding site.

• Improved docking, screening and selectivity prediction for small molecule nuclear receptor modulators using conformational ensembles.
Park, So-Jung and Kufareva, Irina and Abagyan, Ruben
Journal of computer-aided molecular design, 2010, 24(5), 459-471
PMID: 20455005     doi: 10.1007/s10822-010-9362-4

Nuclear receptors (NRs) are ligand dependent transcriptional factors and play a key role in reproduction, development, and homeostasis of organism. NRs are potential targets for treatment of cancer and other diseases such as inflammatory diseases, and diabetes. In this study, we present a comprehensive library of pocket conformational ensembles of thirteen human nuclear receptors (NRs), and test the ability of these ensembles to recognize their ligands in virtual screening, as well as predict their binding geometry, functional type, and relative binding affinity. 157 known NR modulators and 66 structures were used as a benchmark. Our pocket ensemble library correctly predicted the ligand binding poses in 94% of the cases. The models were also highly selective for the active ligands in virtual screening, with the areas under the ROC curves ranging from 82 to a remarkable 99%. Using the computationally determined receptor-specific binding energy offsets, we showed that the ensembles can be used for predicting selectivity profiles of NR ligands. Our results evaluate and demonstrate the advantages of using receptor ensembles for compound docking, screening, and profiling.

## 2009

• Predicting multiple ligand binding modes using self-consistent pharmacophore hypotheses.
Wallach, Izhar and Lilien, Ryan
Journal of chemical information and modeling, 2009, 49(9), 2116-2128
PMID: 19711952     doi: 10.1021/ci900199e

The ability to predict ligand binding modes without the aid of wet-lab experiments may accelerate and reduce the cost of drug discovery research. Despite significant recent progress, virtual screening has not yet eliminated the need for wet-lab experiments. For example, after a lead compound has been identified, the precise binding mode is still typically determined by experimental structural biology. This structural knowledge is then employed to guide lead optimization. We present a step toward improving protein-ligand binding mode prediction for a set of ligands known to interact with a common protein. There is thus an important distinction between this work and traditional virtual screening algorithms. Whereas traditional approaches attempt to identify binding ligands from a large database of available compounds, our approach aims to more accurately predict the binding mode for a set of ligands which are already known to bind the target protein. The approach is based on the hypothesis that each active site contains a set of interaction points which binding ligands tend to exploit. In a more traditional context, these interaction points make up a pharmacophoric map. Our algorithm first performs traditional protein-ligand docking for each known binder. The ranked lists of candidate binding modes are then evaluated to identify a set of poses maximally self-consistent with respect to a pharmacophoric map generated from the same poses. We have extensively demonstrated the application of the algorithm to four protein systems (thrombin, cyclin-dependent kinase 2, dihydrofolate reductase, and HIV-1 protease) and attained predictions with an average RMSD < 2.5 A for all tested systems. This represents a typical improvement of 0.5-1.0 A (up to 25%) RMSD over the naive virtual docking predictions. Our algorithm is independent of the docking method and may significantly improve binding mode prediction of virtual docking experiments.

• Improving virtual screening performance against conformational variations of receptors by shape matching with ligand binding pocket.
Lee, Hui Sun and Lee, Cheol Soon and Kim, Jeong Sook and Kim, Dong Hou and Choe, Han
Journal of chemical information and modeling, 2009, 49(11), 2419-2428
PMID: 19852439     doi: 10.1021/ci9002365

In this report, we present a novel virtual high-throughput screening methodology to assist in computer-aided drug discovery. Our method, designated as SLIM, involves ligand-free shape and chemical feature matching. The procedure takes advantage of a negative image of a binding pocket in a target receptor. The negative image is a set of virtual atoms representing the inner shape and chemical features of the binding pocket. Using this image, SLIM implements a shape-based similarity search based on molecular volume superposition for the ensemble of conformers of each molecule. The superposed structures, prioritized by shape similarity, are subjected to comparison of chemical feature similarities. To validate the merits of the SLIM method, we compared its performance with those of three distinct widely used tools ROCS, GLIDE, and GOLD. ROCS was selected as a representative of the ligand-centric methods, and docking programs GLIDE and GOLD as representatives of the receptor-centric methods. Our data suggest that SLIM has overall hit ranking ability that is comparable to that of the docking method, retaining the high computational speed of the ligand-centric method. It is notable that the SLIM method offers consistently reliable screening quality against conformational variations of receptors, whereas the docking methods have limited screening performance.

• Scoring ligand similarity in structure-based virtual screening.
Zavodszky, Maria I and Rohatgi, Anjali and Van Voorst, Jeffrey R and Yan, Honggao and Kuhn, Leslie A
Journal of molecular recognition : JMR, 2009, 22(4), 280-292
PMID: 19235177     doi: 10.1002/jmr.942

Scoring to identify high-affinity compounds remains a challenge in virtual screening. On one hand, protein-ligand scoring focuses on weighting favorable and unfavorable interactions between the two molecules. Ligand-based scoring, on the other hand, focuses on how well the shape and chemistry of each ligand candidate overlay on a three-dimensional reference ligand. Our hypothesis is that a hybrid approach, using ligand-based scoring to rank dockings selected by protein-ligand scoring, can ensure that high-ranking molecules mimic the shape and chemistry of a known ligand while also complementing the binding site. Results from applying this approach to screen nearly 70 000 National Cancer Institute (NCI) compounds for thrombin inhibitors tend to support the hypothesis. EON ligand-based ranking of docked molecules yielded the majority (4/5) of newly discovered, low to mid-micromolar inhibitors from a panel of 27 assayed compounds, whereas ranking docked compounds by protein-ligand scoring alone resulted in one new inhibitor. Since the results depend on the choice of scoring function, an analysis of properties was performed on the top-scoring docked compounds according to five different protein-ligand scoring functions, plus EON scoring using three different reference compounds. The results indicate that the choice of scoring function, even among scoring functions measuring the same types of interactions, can have an unexpectedly large effect on which compounds are chosen from screening. Furthermore, there was almost no overlap between the top-scoring compounds from protein-ligand versus ligand-based scoring, indicating the two approaches provide complementary information. Matchprint analysis, a new addition to the SLIDE (Screening Ligands by Induced-fit Docking, Efficiently) screening toolset, facilitated comparison of docked molecules' interactions with those of known inhibitors. The majority of interactions conserved among top-scoring compounds for a given scoring function, and from the different scoring functions, proved to be conserved interactions in known inhibitors. This was particularly true in the S1 pocket, which was occupied by all the docked compounds.

• APIF: a new interaction fingerprint based on atom pairs and its application to virtual screening.
Pérez-Nueno, Violeta I and Rabal, Obdulia and Borrell, José I and Teixidó, Jordi
Journal of chemical information and modeling, 2009, 49(5), 1245-1260
PMID: 19364101     doi: 10.1021/ci900043r

A new interaction fingerprint (IF) called APIF (atom-pairs-based interaction fingerprint) has been developed for postprocessing protein-ligand docking results. Unlike other existing fingerprints which employ absolute locations of individual interactions, APIF considers the relative positions of pairs of interacting atoms. Docking-based virtual screening was performed with GOLD using the crystal structures of trypsin, rhinovirus, HIV protease, carboxypeptidase, and estrogen receptor-alpha as targets. A score derived from the similarity of the bit strings for each docking solution to that of a known reference binding mode was obtained. Comparisons between APIF, GoldScore function, and standard interaction fingerprint (CHIF) scores were performed using enrichment plots. Superior recovery rates were observed in the IF score cases. Comparable results were achieved by using either of the two interaction fingerprints, substantially improving GoldScore function enrichment factors. Binding mode analyses were also carried out in order to study the best method for selecting conformations with a binding mode similar to that of the reference crystallized complex. These showed that the first conformations retrieved by interaction fingerprint scores had a more similar binding mode to the reference complex than those retrieved by the GoldScore function.

• Beyond the virtual screening paradigm: structure-based searching for new lead compounds.
Schlosser, Jochen and Rarey, Matthias
Journal of chemical information and modeling, 2009, 49(4), 800-809
PMID: 19354328     doi: 10.1021/ci9000212

The standard approach to structure-based high-throughput virtual screening is a sequential procedure: Each molecule of a given library is screened against the target protein, eventually generating a ranked list of molecules. In this paper a new paradigm avoiding the sequential screening pipeline is presented. Based on a novel descriptor, compounds can be directly accessed by their chemical and shape complementarity to a given protein active site. The docking calculation is performed inherently during the search process since each search result automatically implies a ligand pose in the active site. The new method named TrixX BMI is ideally suited for application scenarios in which medicinal chemists request a certain pharmacophore interaction pattern to a protein. By using an innovative indexing technology, sublinear runtimes in the number of ligands can be achieved. Redocking experiments show that TrixX BMI correctly predicts the pose of the bioactive conformation within an rmsd of less than 2.0 A of the cocrystallized ligand in 80% of 85 protein-ligand complexes of the Astex Diverse Set. In addition to that several comparative enrichment experiments show that TrixX BMI is on a competitive basis to established virtual screening technology, while the observed runtimes are clearly below one second per compound.

• 3-D clustering: a tool for high throughput docking
Priestle, John P.
Journal of Molecular Modeling, 2009, 15(5), 551-560
PMID: 19085027     doi: 10.1007/s00894-008-0360-6

This report describes a computer program for clustering docking poses based on their 3-dimensional (3D) coordinates as well as on their chemical structures. This is chiefly intended for reducing a set of hits coming from high throughput docking, since the capacity to prepare and biologically test such molecules is generally far more limited than the capacity to generate such hits. The advantage of clustering molecules based on 3D, rather than 2D, criteria is that small variations on a scaffold may bring about different binding modes for molecules that would not be predicted by 2D similarity alone. The program does a pose-by-pose/atom-by-atom comparison of a set of docking hits (poses), scoring both spatial and chemical similarity. Using these pair-wise similarities, the whole set is clustered based on a user-supplied similarity threshold. An output coordinate file is created that mirrors the input coordinate file, but contains two new properties: a cluster number and similarity to the cluster center. Poses in this output file can easily be sorted by cluster and displayed together for visual inspection with any standard molecular viewing program, and decisions made about which molecule should be selected for biological testing as the best representative of this group of similar molecules with similar binding modes.

## 2008

• Is it possible to increase hit rates in structure-based virtual screening by pharmacophore filtering? An investigation of the advantages and pitfalls of post-filtering.
Muthas, Daniel and Sabnis, Yogesh A and Lundborg, Magnus and Karlén, Anders
Journal of molecular graphics & modelling, 2008, 26(8), 1237-1251
PMID: 18203638     doi: 10.1016/j.jmgm.2007.11.005

We have investigated the influence of post-filtering virtual screening results, with pharmacophoric features generated from an X-ray structure, on enrichment rates. This was performed using three docking softwares, zdock+, Surflex and FRED, as virtual screening tools and pharmacophores generated in UNITY from co-crystallized complexes. Sets of known actives along with 9997 pharmaceutically relevant decoy compounds were docked against six chemically diverse protein targets namely CDK2, COX2, ERalpha, fXa, MMP3, and NA. To try to overcome the inherent limitations of the well-known docking problem, we generated multiple poses for each compound. The compounds were first ranked according to their scores alone and enrichment rates were calculated using only the top scoring pose of each compound. Subsequently, all poses for each compound were passed through the different pharmacophores generated from co-crystallized complexes and the enrichment factors were re-calculated based on the top-scoring passing pose of each compound. Post-filtering with a pharmacophore generated from only one X-ray complex was shown to increase enrichment rates in all investigated targets compared to docking alone. This indicates that this is a general method, which works for diverse targets and different docking softwares.

• Consensus scoring with feature selection for structure-based virtual screening
Teramoto, Reiji and Fukunishi, Hiroaki
Journal of chemical information and modeling, 2008, 48(2), 288-295
doi: 10.1021/ci700239t

The evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, and scoring functions play significant roles in it. While consensus scoring (CS) generally improves enrichment by compensating for the deficiencies of each scoring function, the strategy of how individual scoring functions are selected remains a challenging task when few known active compounds are available. To address this problem, we propose feature selection-based consensus scoring (FSCS), which performs supervised feature selection with docked native ligand conformations to select complementary scoring functions. We evaluated the enrichments of five scoring functions (F-Score, D-Score, PMF, G-Score, and ChemScore), FSCS, and RCS (rank-by-rank consensus scoring) for four different target proteins: acetylcholine esterase (AChE), thrombin (thrombin), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPAR gamma). The results indicated that FSCS was able to select the complementary scoring functions and enhance ligand enrichments and that it outperformed RCS and the individual scoring functions for all target proteins. They also indicated that the performances of the single scoring functions were strongly dependent on the target protein. An especially favorable result with implications for practical drug screening is that FSCS performs well even if only one 3D structure of the protein-ligand complex is known. Moreover, we found that one can infer which scoring functions significantly enrich active compounds by using feature selection before actual docking and that the selected scoring functions are complementary.

• DOVIS: an implementation for high-throughput virtual screening using AutoDock.
Zhang, Shuxing and Kumar, Kamal and Jiang, Xiaohui and Wallqvist, Anders and Reifman, Jaques
Bmc Bioinformatics, 2008, 9, 126
PMID: 18304355     doi: 10.1186/1471-2105-9-126

BACKGROUND:Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial.

• DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0.
Jiang, Xiaohui and Kumar, Kamal and Hu, Xin and Wallqvist, Anders and Reifman, Jaques
Chemistry Central journal, 2008, 2, 18
PMID: 18778471     doi: 10.1186/1752-153X-2-18

BACKGROUND:Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS) to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability.

• Ligand-target interaction-based weighting of substructures for virtual screening.
Crisman, Thomas J and Sisay, Mihiret T. and Bajorath, Jürgen
Journal of chemical information and modeling, 2008, 48(10), 1955-1964
PMID: 18821751     doi: 10.1021/ci800229q

A methodology is introduced to assign energy-based scores to two-dimensional (2D) structural features based on three-dimensional (3D) ligand-target interaction information and utilize interaction-annotated features in virtual screening. Database molecules containing such fragments are assigned cumulative scores that serve as a measure of similarity to active reference compounds. The Interaction Annotated Structural Features (IASF) method is applied to mine five high-throughput screening (HTS) data sets and often identifies more hits than conventional fragment-based similarity searching or ligand-protein docking.

• PDTD: a web-accessible protein database for drug target identification
Gao, Zhenting and Li, Honglin and Zhang, Hailei and Liu, Xiaofeng and Kang, Ling and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Wang, Xicheng and Jiang, Hualiang
Bmc Bioinformatics, 2008, 9, -
PMID: 18282303     doi: 10.1186/1471-2105-9-104

Background: Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking) http://www.dddc.ac.cn/tarfisdock, which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation.Description: PDTD is a web-accessible protein database for in silico target identification. It currently contains > 1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of > 830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores.Conclusion: PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at http://www.dddc.ac.cn/pdtd/.

• Lead finder: an approach to improve accuracy of protein-ligand docking, binding energy estimation, and virtual screening.
Stroganov, Oleg V and Novikov, Fedor N and Stroylov, Viktor S and Kulkov, Val and Chilov, Ghermes G
Journal of chemical information and modeling, 2008, 48(12), 2371-2385
PMID: 19007114     doi: 10.1021/ci800166p

An innovative molecular docking algorithm and three specialized high accuracy scoring functions are introduced in the Lead Finder docking software. Lead Finder's algorithm for ligand docking combines the classical genetic algorithm with various local optimization procedures and resourceful exploitation of the knowledge generated during docking process. Lead Finder's scoring functions are based on a molecular mechanics functional which explicitly accounts for different types of energy contributions scaled with empiric coefficients to produce three scoring functions tailored for (a) accurate binding energy predictions; (b) correct energy-ranking of docked ligand poses; and (c) correct rank-ordering of active and inactive compounds in virtual screening experiments. The predicted values of the free energy of protein-ligand binding were benchmarked against a set of experimentally measured binding energies for 330 diverse protein-ligand complexes yielding rmsd of 1.50 kcal/mol. The accuracy of ligand docking was assessed on a set of 407 structures, which included almost all published test sets of the following programs: FlexX, Glide SP, Glide XP, Gold, LigandFit, MolDock, and Surflex. rmsd of 2 A or less was observed for 80-96% of the structures in the test sets (80.0% on the Glide XP and FlexX test sets, 96.0% on the Surflex and MolDock test sets). The ability of Lead Finder to distinguish between active and inactive compounds during virtual screening experiments was benchmarked against 34 therapeutically relevant protein targets. Impressive enrichment factors were obtained for almost all of the targets with the average area under receiver operator curve being equal to 0.92.

## 2007

• WinDock: structure-based drug discovery on Windows-based PCs.
Hu, Zengjian and Southerland, William
Journal of computational chemistry, 2007, 28(14), 2347-2351
PMID: 17476686     doi: 10.1002/jcc.20756

In recent years, virtual database screening using high-throughput docking (HTD) has emerged as a very important tool and a well-established method for finding new lead compounds in the drug discovery process. With the advent of powerful personal computers (PCs), it is now plausible to perform HTD investigations on these inexpensive PCs. To make HTD more accessible to a broad community, we present here WinDock, an integrated application designed to help researchers perform structure-based drug discovery tasks under a uniform, user friendly graphical interface for Windows-based PCs. WinDock combines existing small molecule searchable three-dimensional (3D) libraries, homology modeling tools, and ligand-protein docking programs in a semi-automatic, interactive manner, which guides the user through the use of each integrated software component. WinDock is coded in C++.

• Structure-based virtual ligand screening with LigandFit: Pose prediction and enrichment of compound collections
Montes, Matthieu and Miteva, Maria A. and Villoutreix, Bruno O.
Proteins, 2007, 68(3), 712-725
PMID: 17510958     doi: 10.1002/prot.21405

Virtual ligand screening methods based on the structure of the receptor are extensively used to facilitate the discovery of lead compounds. In the present study, we investigated the LigandFit package on four different proteins (coagulation factor VIIa, estrogen receptor, thymidine kinase, and neuraminidase), a relatively large compound collection of 65,560 unique "drug-like" molecules and four focused libraries (1950 molecules each). We performed virtual screening experiments with the large database and evaluated six scoring functions available in the package (DockScore, LigScorel, Lig-Score2, PLP1, PLP2, and PMF). The results showed that LigandFit is an efficient program, especially when used with LigScorel. Similar computations were carried out using focused libraries. In this situation the LigScorel scoring function outperformed the other ones on three out of the four proteins tested. Even for the difficult neuraminidase case, the LigandFit/LigScore1 combination was still reasonably successful. Assessment Of docking accuracy was also performed and again, we found that LigandFit (with DockScore and the CFF parameters) was performing well. On the basis of these results and observed increased enrichments after LigandFit/Ligscorel screening on focused libraries, we suggest that using this program as a final step of a hierarchical protocol can be very beneficial to assist lead finding.

• Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem.
Truchon, Jean-Francois and Bayly, Christopher I
Journal of chemical information and modeling, 2007, 47(2), 488-508
PMID: 17288412     doi: 10.1021/ci600426e

Many metrics are currently used to evaluate the performance of ranking methods in virtual screening (VS), for instance, the area under the receiver operating characteristic curve (ROC), the area under the accumulation curve (AUAC), the average rank of actives, the enrichment factor (EF), and the robust initial enhancement (RIE) proposed by Sheridan et al. In this work, we show that the ROC, the AUAC, and the average rank metrics have the same inappropriate behaviors that make them poor metrics for comparing VS methods whose purpose is to rank actives early in an ordered list (the "early recognition problem"). In doing so, we derive mathematical formulas that relate those metrics together. Moreover, we show that the EF metric is not sensitive to ranking performance before and after the cutoff. Instead, we formally generalize the ROC metric to the early recognition problem which leads us to propose a novel metric called the Boltzmann-enhanced discrimination of receiver operating characteristic that turns out to contain the discrimination power of the RIE metric but incorporates the statistical significance from ROC and its well-behaved boundaries. Finally, two major sources of errors, namely, the statistical error and the "saturation effects", are examined. This leads to practical recommendations for the number of actives, the number of inactives, and the "early recognition" importance parameter that one should use when comparing ranking methods. Although this work is applied specifically to VS, it is general and can be used to analyze any method that needs to segregate actives toward the front of a rank-ordered list.

## 2006

• Screening drug-like compounds by docking to homology models: a systematic study.
Kairys, Visvaldas and Fernandes, Miguel X and Gilson, Michael K
Journal of chemical information and modeling, 2006, 46(1), 365-379
PMID: 16426071     doi: 10.1021/ci050238c

In the absence of an experimentally solved structure, a homology model of a protein target can be used instead for virtual screening of drug candidates by docking and scoring. This approach poses a number of questions regarding the choice of the template to use in constructing the model, the accuracy of the screening results, and the importance of allowing for protein flexibility. The present study addresses such questions with compound screening calculations for multiple homology models of five drug targets. A central result is that docking to homology models frequently yields enrichments of known ligands as good as that obtained by docking to a crystal structure of the actual target protein. Interestingly, however, standard measures of the similarity of the template used to build the homology model to the targeted protein show little correlation with the effectiveness of the screening calculations, and docking to the template itself often is as successful as docking to the corresponding homology model. Treating key side chains as mobile produces a modest improvement in the results. The reasons for these sometimes unexpected results, and their implications for future methodologic development, are discussed.

• TarFisDock: a web server for identifying drug targets with docking approach
Li, Honglin and Gao, Zhenting and Kang, Ling and Zhang, Hailei and Yang, Kun and Yu, Kunqian and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Shen, Jianhua and Wang, Xicheng and Jiang, Hualiang
Nucleic acids research, 2006, 34(Web Server issue), W219-W224
PMID: 16844997     doi: 10.1093/nar/gkl114

TarFisDock is a web-based tool for automating the procedure of searching for small molecule-protein interactions over a large repertoire of protein structures. It offers PDTD (potential drug target database), a target database containing 698 protein structures covering 15 therapeutic areas and a reverse ligand protein docking program. In contrast to conventional ligand-protein docking, reverse ligand-protein docking aims to seek potential protein targets by screening an appropriate protein database. The input file of this web server is the small molecule to be tested, in standard mol2 format; TarFisDock then searches for possible binding proteins for the given small molecule by use of a docking approach. The ligand-protein interaction energy terms of the program DOCK are adopted for ranking the proteins. To test the reliability of the TarFisDock server, we searched the PDTD for putative binding proteins for vitamin E and 4H-tamoxifen. The top 2 and 10% candidates of vitamin E binding proteins identified by TarFisDock respectively cover 30 and 50% of reported targets verified or implicated by experiments; and 30 and 50% of experimentally confirmed targets for 4H-tamoxifen appear amongst the top 2 and 5% of the TarFisDock predicted candidates, respectively. Therefore, TarFisDock may be a useful tool for target identification, mechanism study of old drugs and probes discovered from natural products. TarFisDock and PDTD are available at http://www.dddc.ac.cn/tarfisdock/.

• Multiple target screening method for robust and accurate in silico ligand screening.
Fukunishi, Yoshifumi and Mikami, Yoshiaki and Kubota, Satoru and Nakamura, Haruki
Journal of molecular graphics & modelling, 2006, 25(1), 61-70
PMID: 16376595     doi: 10.1016/j.jmgm.2005.11.006

We developed a new in silico multiple target screening (MTS) method, based on a multi-receptor versus multi-ligand docking affinity matrixes, and examined its robustness against changes in the scoring system. According to this method, compounds in a database are docked to multiple proteins. The compounds among these proteins that are likely bind to the target protein are selected as the members of the candidate-hit compound group. Then, the compounds in the group are sorted into descending order using the docking score: the first (n-th) compound is expected to be the most (n-th) probable hit compound. This method was applied to the analysis of a set of 142 receptors and 142 compounds using a receptor-ligand docking program, Sievgene [Y. Fukunishi, Y. Mikami, H. Nakamura, Similarities among receptor pockets and among compounds: analysis and application to in silico ligand screening, J. Mol. Graphics Modelling, 24 (2005) 34-45], and the results demonstrated that this method achieves a high hit ratio compared to uniform sampling. We prepared two new scores: the DeltaG score, designed to reproduce the protein-ligand binding free energy, and the hit-optimized score, designed to maximize the hit ratio of in silico screening. Using the Sievgene docking score, DeltaG score and hit-optimized score, the MTS method is more robust than the multiple active-site correction scoring method [G.P.A. Vigers, J.P. Rizzi, Multiple active site corrections for docking and virtual screening, J. Med. Chem., 47 (2004) 80-89].

• Critical assessment of the automated AutoDock as a new docking tool for virtual screening.
Park, Hwangseo and Lee, Jinuk and Lee, Sangyoub
Proteins, 2006, 65(3), 549-554
PMID: 16988956     doi: 10.1002/prot.21183

A major problem in virtual screening concerns the accuracy of the binding free energy between a target protein and a putative ligand. Here we report an example supporting the outperformance of the AutoDock scoring function in virtual screening in comparison to the other popular docking programs. The original AutoDock program is in itself inefficient to be used in virtual screening because the grids of interaction energy have to be calculated for each putative ligand in chemical database. However, the automation of the AutoDock program with the potential grids defined in common for all putative ligands leads to more than twofold increase in the speed of virtual database screening. The utility of the automated AutoDock in virtual screening is further demonstrated by identifying the actual inhibitors of various target enzymes in chemical databases with accuracy higher than the other docking tools including DOCK and FlexX. These results exemplify the usefulness of the automated AutoDock as a new promising tool in structure-based virtual screening.

• sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank.
Kellenberger, Esther and Muller, Pascal and Schalon, Claire and Bret, Guillaume and Foata, Nicolas and Rognan, Didier
Journal of chemical information and modeling, 2006, 46(2), 717-727
PMID: 16563002     doi: 10.1021/ci050372x

The sc-PDB is a collection of 6 415 three-dimensional structures of binding sites found in the Protein Data Bank (PDB). Binding sites were extracted from all high-resolution crystal structures in which a complex between a protein cavity and a small-molecular-weight ligand could be identified. Importantly, ligands are considered from a pharmacological and not a structural point of view. Therefore, solvents, detergents, and most metal ions are not stored in the sc-PDB. Ligands are classified into four main categories: nucleotides (< 4-mer), peptides (< 9-mer), cofactors, and organic compounds. The corresponding binding site is formed by all protein residues (including amino acids, cofactors, and important metal ions) with at least one atom within 6.5 angstroms of any ligand atom. The database was carefully annotated by browsing several protein databases (PDB, UniProt, and GO) and storing, for every sc-PDB entry, the following features: protein name, function, source, domain and mutations, ligand name, and structure. The repository of ligands has also been archived by diversity analysis of molecular scaffolds, and several chemoinformatics descriptors were computed to better understand the chemical space covered by stored ligands. The sc-PDB may be used for several purposes: (i) screening a collection of binding sites for predicting the most likely target(s) of any ligand, (ii) analyzing the molecular similarity between different cavities, and (iii) deriving rules that describe the relationship between ligand pharmacophoric points and active-site properties. The database is periodically updated and accessible on the web at http://bioinfo-pharma.u-strasbg.fr/scPDB/.

## 2005

• Validation and use of the MM-PBSA approach for drug discovery.
Kuhn, Bernd and Gerber, Paul and Schulz-Gasch, Tanja and Stahl, Martin
Journal of medicinal chemistry, 2005, 48(12), 4040-4048
PMID: 15943477     doi: 10.1021/jm049081q

The MM-PBSA approach has become a popular method for calculating binding affinities of biomolecular complexes. Published application examples focus on small test sets and few proteins and, hence, are of limited relevance in assessing the general validity of this method. To further characterize MM-PBSA, we report on a more extensive study involving a large number of ligands and eight different proteins. Our results show that applying the MM-PBSA energy function to a single, relaxed complex structure is an adequate and sometimes more accurate approach than the standard free energy averaging over molecular dynamics snapshots. The use of MM-PBSA on a single structure is shown to be valuable (a) as a postdocking filter in further enriching virtual screening results, (b) as a helpful tool to prioritize de novo design solutions, and (c) for distinguishing between good and weak binders (DeltapIC(50) > or

## 2004

• Recovering the true targets of specific ligands by virtual screening of the protein data bank.
Paul, Nicodéme and Kellenberger, Esther and Bret, Guillaume and Muller, Pascal and Rognan, Didier
Proteins, 2004, 54(4), 671-680
PMID: 14997563     doi: 10.1002/prot.10625

The Protein Data Bank (PDB) has been processed to extract a screening protein library (sc-PDB) of 2148 entries. A knowledge-based detection algorithm has been applied to 18,000 PDB files to find regular expressions corresponding to either protein, ions, co-factors, solvent, or ligand atoms. The sc-PDB database comprises high-resolution X-ray structures of proteins for which (i) a well-defined active site exists, (ii) the bound-ligand is a small molecular weight molecule. The database has been screened by an inverse docking tool derived from the GOLD program to recover the known target of four unrelated ligands. Both the database and the inverse screening procedures are accurate enough to rank the true target of the four investigated ligands among the top 1% scorers, with 70-100 fold enrichment with respect to random screening. Applying the proposed screening procedure to a small-sized generic ligand was much less accurate suggesting that inverse screening shall be reserved to rather selective compounds.

• OptiDock: virtual HTS of combinatorial libraries by efficient sampling of binding modes in product space.
Sprous, Dennis G and Lowis, David R and Leonard, Joseph M and Heritage, Trevor and Burkett, Steven N and Baker, David S and Clark, Robert D
Journal of combinatorial chemistry, 2004, 6(4), 530-539
PMID: 15244414     doi: 10.1021/cc034068x

Products from combinatorial libraries generally share a common core structure that can be exploited to improve the efficiency of virtual high-throughput screening (vHTS). In general, it is more efficient to find a method that scales with the total number of reagents (Sigma growth) rather with the number of products (Pi growth). The OptiDock methodology described herein entails selecting a diverse but representative subset of compounds that span the structural space encompassed by the full library. These compounds are docked individually using the FlexX program (Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. J. Mol. Biol. 1995, 251, 470-489) to define distinct docking modes in terms of reference placements for combinatorial core atoms. Thereafter, substituents in R-cores (consisting of the core structure substituted at a single variation site) are docked, keeping the core atoms fixed at the coordinates dictated by each reference placement. Interaction energies are calculated for each docked R-core with respect to the target protein, and energies for whole compounds are calculated by finding the reference core placement for which the sum of corresponding R-core energies is most negative. The use of diverse whole compounds to define binding modes is a key advantage of the protocol over other combinatorial docking programs. As a result, OptiDock returns better-scoring conformers than does serially applied FlexX. OptiDock is also better able to find a viable docked pose for each library member than are other combinatorial approaches.

• HierVLS hierarchical docking protocol for virtual ligand screening of large-molecule databases
Floriano, WB and Vaidehi, N and Zamanakos, G and Goddard, WA
Journal of medicinal chemistry, 2004, 47(1), 56-71
PMID: 14695820     doi: 10.1021/jm030271v

To provide practical means for rapidly scanning the extensive experimental combinatorial chemistry libraries now available for high-throughput screening (HTS), it is essential to establish computational virtual ligand screening (VLS) techniques to rapidly identify out of a large library all active compounds against a particular protein target. Toward this goal we developed HierVLS, a fast hierarchical docking approach that starts with a coarse grain conformational search over a large number of configurations filtered with a fast but crude energy function, followed by a succession of finer grain levels, using successively more accurate but more expensive descriptions of the ligand-protein-solvent interactions to filter successively fewer cases. The final step of this procedure optimizes one configuration of the ligand in the protein site using our most accurate energy expression and description of the solvent, which would be impractical for all conformations and sites sampled in the coarse level. HierVLS is based on the HierDock approach, but rather than allowing an hour or more to determine the best binding site and energy for each ligands (as in HierDock), we have adapted our procedure so that it can lead to reliable results while using only 4 min (866 MHz Pentium III processor) per ligand. To validate the accuracy for HierVLS to predict the experimentally observed binding conformation, we considered 37 cocrystal structures comprising 11 target proteins. We find that HierVLS identifies the correct binding mode for all 37 cocrystals. In addition, the calculated binding energies correlate well with available experimental binding constants. To validate how well HierVLS can identify the correct ligand in an extensive library of decoys, we considered a library of over 10 000 molecules. HierVLS identifies 26 out of the 37 cases in the top 2% ranked by binding affinity among the 10 037 molecules. The failures result from either metal-containing sites on the protein or water-mediated ligand-protein interactions, which we anticipate can be solved within the constraints of practical VLS. We then applied HierVLS to screen a 55000-compound virtual library against the target protein-tyrosine phosphatase 1B (ptp1b). The top 250 compounds by binding affinity included all six ptp1b cocrystal ligands added to the library plus three other experimentally confirmed binders. The best (top 1) binder is an experimentally confirmed positive. We conclude that HierVLS is useful for selecting leads for a particular target out of large combinatorial databases.

• Multiple active site corrections for docking and virtual screening.
Vigers, Guy P A and Rizzi, James P
Journal of medicinal chemistry, 2004, 47(1), 80-89
PMID: 14695822     doi: 10.1021/jm030161o

Several docking programs are now available that can reproduce the bound conformation of a ligand in an active site, for a wide variety of experimentally determined complexes. However, these programs generally perform less well at ranking multiple possible ligands in one site. Since accurate identification of potential ligands is a prerequisite for many aspects of structure-based drug design, this is a serious limitation. We have tested the ability of two docking programs, FlexX and Gold, to match ligands and active sites for multiple complexes. We show that none of the docking scores from either program are able to match consistently ligands and active sites in our tests. We propose a simple statistical correction, the multiple active site correction (MASC), which greatly ameliorates this problem. We have also tested the correction method against an extended set of 63 cocrystals and in a virtual screening experiment. In all cases, MASC significantly improves the results of the docking experiments.

• FlexX-Scan: Fast, structure-based virtual screening
Schellhammer, I and Rarey, M
Proteins, 2004, 57(3), 504-517
PMID: 15382244     doi: 10.1002/prot.20217

We present a new software module, FlexX-Scan, for high-throughput, structure-based virtual screening. FlexX-Scan was developed with the aim to further speed up the virtual screening process. Based on the incremental construction docking tool FlexX (Rarey et al., J Mol Biol 1996;261: 470-489), a compact descriptor for representing favorable protein interaction spots within the protein binding site has been developed. The descriptor is calculated using special-purpose clustering techniques applied to the usual interaction points created by FlexX. The algorithm automatically detects a small set of interaction spots in the binding site for positioning ligand functional groups. The parametrizations of the base placement and incremental construction algorithms have been adapted to the new interaction model. We tested the software tool on a diverse set of 200 protein-ligand complexes from the protein database (PDB) (Kramer et al., Proteins 1999;37:228-241). On average, the algorithm proposes about 90 interaction spots per binding site compared to about 1000 interaction dots in FlexX. We observe that the docking solutions of FlexX-Scan have a root-mean-square deviation from the crystal structure similar to the deviation of docking solutions of standard FlexX. For further validation we also performed virtual screening experiments for cyclin-dependent kinase 2, thrombin, angiotensin-converting enzyme, and dihydrofolat reductase. In these experiments, we screened a set of 34,000 random compounds and a number of known actives for each target. With FlexX-Scan, we achieved comparable enrichments to standard FlexX, with an averaged computing time of 5-10 s per compound, depending on parametrization. (C) 2004 Wiley-Liss, Inc.

• Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening
Halgren, TA and Murphy, RB and Friesner, RA and Beard, HS and Frye, LL and Pollard, WT and Banks, JL
Journal of medicinal chemistry, 2004, 47(7), 1750-1759
PMID: 15027866     doi: 10.1021/jm030644s

Glide's ability to identify active compounds in a database screen is characterized by applying Glide to a diverse set of nine protein receptors. In many cases, two, or even three, protein sites are employed to probe the sensitivity of the results to the site geometry. To make the database screens as realistic as possible, the screens use sets of "druglike" decoy ligands that have been selected to be representative of what we believe is likely to be found in the compound collection of a pharmaceutical or biotechnology company. Results are presented for releases 1.8, 2.0, and 2.5 of Glide. The comparisons show that average measures for both "early" and "global" enrichment for Glide 2.5 are 3 times higher than for Glide 1.8 and more than 2 times higher than for Glide 2.0 because of better results for the least well-handled screens. This improvement in enrichment stems largely from the better balance of the more widely parametrized GlideScore 2.5 function and the inclusion of terms that penalize ligand-protein interactions that violate established principles of physical chemistry, particularly as it concerns the exposure to solvent of charged protein and ligand groups. Comparisons to results for the thymidine kinase and estrogen receptors published by Rognan and co-workers (J. Med. Chem. 2000, 43, 4759-4767) show that Glide 2.5 performs better than GOLD 1.1, FlexX 1.8, or DOCK 4.01.

• Virtual screening using protein-ligand docking: Avoiding artificial enrichment
Verdonk, ML and Berdini, V and Hartshorn, MJ and Mooij, WTM and Murray, CW and Taylor, RD and Watson, P
Journal of Chemical Information and Computer Sciences, 2004, 44(3), 793-806
PMID: 15154744     doi: 10.1021/ci034289q

This study addresses a number of topical issues around the use of protein-ligand docking in virtual screening. We show that, for the validation of such methods, it is key to use focused libraries (containing compounds with one-dimensional properties, similar to the actives), rather than "random" or "drug-like" libraries to test the actives against. We also show that, to obtain good enrichments, the docking program needs to produce reliable binding modes. We demonstrate how pharmacophores can be used to guide the dockings and improve enrichments, and we compare the performance of three consensus-ranking protocols against ranking based on individual scoring functions. Finally, we show that protein-ligand docking can be an effective aid in the screening for weak, fragment-like binders, which has rapidly become a popular strategy for hit identification. All results presented are based on carefully constructed virtual screening experiments against four targets, using the protein-ligand docking program GOLD.

## 2003

• Shape signatures: a new approach to computer-aided ligand- and receptor-based drug design.
Zauhar, Randy J and Moyna, Guillermo and Tian, LiFeng and Li, ZhiJian and Welsh, William J
Journal of medicinal chemistry, 2003, 46(26), 5674-5690
PMID: 14667221     doi: 10.1021/jm030242k

A unifying principle of rational drug design is the use of either shape similarity or complementarity to identify compounds expected to be active against a given target. Shape similarity is the underlying foundation of ligand-based methods, which seek compounds with structure similar to known actives, while shape complementarity is the basis of most receptor-based design, where the goal is to identify compounds complementary in shape to a given receptor. These approaches can be extended to include molecular descriptors in addition to shape, such as lipophilicity or electrostatic potential. Here we introduce a new technique, which we call shape signatures, for describing the shape of ligand molecules and of receptor sites. The method uses a technique akin to ray-tracing to explore the volume enclosed by a ligand molecule, or the volume exterior to the active site of a protein. Probability distributions are derived from the ray-trace, and can be based solely on the geometry of the reflecting ray, or may include joint dependence on properties, such as the molecular electrostatic potential, computed over the surface. Our shape signatures are just these probability distributions, stored as histograms. They converge rapidly with the length of the ray-trace, are independent of molecular orientation, and can be compared quickly using simple metrics. Shape signatures can be used to test for both shape similarity between compounds and for shape complementarity between compounds and receptors and thus can be applied to problems in both ligand- and receptor-based molecular design. We present results for comparisons between small molecules of biological interest and the NCI Database using shape signatures under two different metrics. Our results show that the method can reliably extract compounds of shape (and polarity) similar to the query molecules. We also present initial results for a receptor-based strategy using shape signatures, with application to the design of new inhibitors predicted to be active against HIV protease.

• LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites
Venkatachalam, CM and Jiang, X and Oldfield, T and Waldman, M
Journal of molecular graphics & modelling, 2003, 21(4), 289-307
PMID: 12479928

We present a new shape-based method, LigandFit, for accurately docking ligands into protein active sites. The method employs a cavity detection algorithm for detecting invaginations in the protein as candidate active site regions. A shape comparison filter is combined with a Monte Carlo conformational search for generating ligand poses consistent with the active site shape. Candidate poses are minimized in the context of the active site using a grid-based method for evaluating protein-ligand interaction energies. Errors arising from grid interpolation are dramatically reduced using a new non-linear interpolation scheme. Results are presented for 19 diverse protein-ligand complexes. The method appears quite promising, reproducing the X-ray structure ligand pose within an RMS of 2Angstrom in 14 out of the 19 complexes. A high-throughput screening study applied to the thymidine kinase receptor is also presented in which LigandFit, when combined with LigScore, an internally developed scoring function [1], yields very good hit rates for a ligand pool seeded with known actives. (C) 2002 Published by Elsevier Science Inc.

• Automated generation of MCSS-derived pharmacophoric DOCK site points for searching multiconformation databases.
Joseph-McCarthy, Diane and Alvarez, Juan C
Proteins, 2003, 51(2), 189-202
PMID: 12660988     doi: 10.1002/prot.10296

All docking methods employ some sort of heuristic to orient the ligand molecules into the binding site of the target structure. An automated method, MCSS2SPTS, for generating chemically labeled site points for docking is presented. MCSS2SPTS employs the program Multiple Copy Simultaneous Search (MCSS) to determine target-based theoretical pharmacophores. More specifically, chemically labeled site points are automatically extracted from selected low-energy functional-group minima and clustered together. These pharmacophoric site points can then be directly matched to the pharmacophoric features of database molecules with the use of either DOCK or PhDOCK to place the small molecules into the binding site. Several examples of the ability of MCSS2SPTS to reproduce the three-dimensional pharmacophoric features of ligands from known ligand-protein complex structures are discussed. In addition, a site-point set calculated for one human immunodeficiency virus 1 (HIV1) protease structure is used with PhDOCK to dock a set of HIV1 protease ligands; the docked poses are compared to the corresponding complex structures of the ligands. Finally, the use of an MCSS2SPTS-derived site-point set for acyl carrier protein synthase is compared to the use of atomic positions from a bound ligand as site points for a large-scale DOCK search. In general, MCSS2SPTS-generated site points focus the search on the more relevant areas and thereby allow for more effective sampling of the target site.

## 2002

• A structure-based design approach for the identification of novel inhibitors: application to an alanine racemase.
Mustata, Gabriela Iurcu and Briggs, James M
Journal of computer-aided molecular design, 2002, 16(12), 935-953
PMID: 12825624

We report a new structure-based strategy for the identification of novel inhibitors. This approach has been applied to Bacillus stearothermophilus alanine racemase (AlaR), an enzyme implicated in the biosynthesis of the bacterial cell wall. The enzyme catalyzes the racemization of L- and D-alanine using pyridoxal 5'-phosphate (PLP) as a cofactor. The restriction of AlaR to bacteria and some fungi and the absolute requirement for D-alanine in peptidoglycan biosynthesis make alanine racemase a suitable target for drug design. Unfortunately, known inhibitors of alanine racemase are not specific and inhibit the activity of other PLP-dependent enzymes, leading to neurological and other side effects. This article describes the development of a receptor-based pharmacophore model for AllaR, taking into account receptor flexibility (i.e. a 'dynamic' pharmacophore model). In order to accomplish this, molecular dynamics (MD) simulations were performed on the full AlaR dimer from Bacillus stearothermophilus (PDB entry, 1 sft) with a D-alanine molecule in one active site and the non-covalent inhibitor, propionate, in the second active site of this homodimer. The basic strategy followed in this study was to utilize conformations of the protein obtained during MD simulations to generate a dynamic pharmacophore model using the property mapping capability of the LigBuilder program. Compounds from the Available Chemicals Directory that fit the pharmacophore model were identified and have been submitted for experimental testing. The approach described here can be used as a valuable tool for the design of novel inhibitors of other biomolecular targets.

## 2001

• Detailed analysis of scoring functions for virtual screening.
Stahl, M and Rarey, M
Journal of medicinal chemistry, 2001, 44(7), 1035-1042
PMID: 11297450

We present a comprehensive study of the performance of fast scoring functions for library docking using the program FlexX as the docking engine. Four scoring functions, among them two recently developed knowledge-based potentials, are evaluated on seven target proteins whose binding sites represent a wide range of size, form, and polarity. The results of these calculations give valuable insight into strengths and weaknesses of current scoring functions. Furthermore, it is shown that a well-chosen combination of two of the tested scoring functions leads to a new, robust scoring scheme with superior performance in virtual screening.