Bibliography of computer-aided Drug Design

Updated on 5/1/2014. Currently 2130 references

Docking / All

2014 / 2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2007 / 2006 / 2005 / 2004 / 2003 / 2002 / 2001 / 2000 / 1999 / 1998 / 1997 / 1996 / 1995 / 1994 / 1992 / 1991 / 1982 /


  • BP-Dock: A Flexible Docking Scheme for Exploring Protein-Ligand Interactions Based on Unbound Structures.
    Bolia, Ashini and Gerek, Z Nevin and Ozkan, S Banu
    Journal of chemical information and modeling, 2014, 54(3), 913-925
    PMID: 24380381     doi: 10.1021/ci4004927
    Molecular docking serves as an important tool in modeling protein-ligand interactions. However, it is still challenging to incorporate overall receptor flexibility, especially backbone flexibility, in docking due to the large conformational space that needs to be sampled. To overcome this problem, we developed a novel flexible docking approach, BP-Dock (Backbone Perturbation-Dock) that can integrate both backbone and side chain conformational changes induced by ligand binding through a multi-scale approach. In the BP-Dock method, we mimic the nature of binding-induced events as a first-order approximation by perturbing the residues along the protein chain with a small Brownian kick one at a time. The response fluctuation profile of the chain upon these perturbations is computed using the perturbation response scanning method. These response fluctuation profiles are then used to generate binding-induced multiple receptor conformations for ensemble docking. To evaluate the performance of BP-Dock, we applied our approach on a large and diverse data set using unbound structures as receptors. We also compared the BP-Dock results with bound and unbound docking, where overall receptor flexibility was not taken into account. Our results highlight the importance of modeling backbone flexibility in docking for recapitulating the experimental binding affinities, especially when an unbound structure is used. With BP-Dock, we can generate a wide range of binding site conformations realized in nature even in the absence of a ligand that can help us to improve the accuracy of unbound docking. We expect that our fast and efficient flexible docking approach may further aid in our understanding of protein-ligand interactions as well as virtual screening of novel targets for rational drug design.

  • SAMPL4 & DOCK3.7: lessons for automated docking procedures.
    Coleman, Ryan G and Sterling, Teague and Weiss, Dahlia R
    Journal of computer-aided molecular design, 2014, 28(3), 201-209
    PMID: 24515818     doi: 10.1007/s10822-014-9722-6
    The SAMPL4 challenges were used to test current automated methods for solvation energy, virtual screening, pose and affinity prediction of the molecular docking pipeline DOCK 3.7. Additionally, first-order models of binding affinity were proposed as milestones for any method predicting binding affinity. Several important discoveries about the molecular docking software were made during the challenge: (1) Solvation energies of ligands were five-fold worse than any other method used in SAMPL4, including methods that were similarly fast, (2) HIV Integrase is a challenging target, but automated docking on the correct allosteric site performed well in terms of virtual screening and pose prediction (compared to other methods) but affinity prediction, as expected, was very poor, (3) Molecular docking grid sizes can be very important, serious errors were discovered with default settings that have been adjusted for all future work. Overall, lessons from SAMPL4 suggest many changes to molecular docking tools, not just DOCK 3.7, that could improve the state of the art. Future difficulties and projects will be discussed.

  • Incorporating replacement free energy of binding-site waters in molecular docking
    Sun, Hanzi and Zhao, Lifeng and Peng, Shiming and Huang, Niu
    Proteins, 2014, n/a-n/a
    PMID: 24549784     doi: 10.1002/prot.24530
    Binding-site water molecules play a crucial role in protein-ligand recognition, either being displaced upon ligand binding or forming water bridges to stabilize the complex. However, rigorously treating explicit binding-site waters is challenging in molecular docking, which requires to fully sample ensembles of waters and to consider the free energy cost of replacing waters. Here, we describe a method to incorporate structural and energetic properties of binding-site waters into molecular docking. We first developed a solvent property analysis (SPA) program to compute the replacement free energies of binding-site water molecules by post-processing molecular dynamics trajectories obtained from ligand-free protein structure simulation in explicit water. Next, we implemented a distance-dependent scoring term into DOCK scoring function to take account of the water replacement free energy cost upon ligand binding. We assessed this approach in protein targets containing important binding-site waters, and we demonstrated that our approach is reliable in reproducing the crystal binding geometries of protein-ligand-water complexes, as well as moderately improving the ligand docking enrichment performance. In addition, SPA program (free available to academic users upon request) may be applied in identifying hot-spot binding-site residues and structure-based lead optimization.Proteins 2014.

  • Exhaustive docking and solvated interaction energy scoring: lessons learned from the SAMPL4 challenge.
    Hogues, Hervé and Sulea, Traian and Purisima, Enrico O
    Journal of computer-aided molecular design, 2014
    PMID: 24474162     doi: 10.1007/s10822-014-9715-5
    We continued prospective assessments of the Wilma-solvated interaction energy (SIE) platform for pose prediction, binding affinity prediction, and virtual screening on the challenging SAMPL4 data sets including the HIV-integrase inhibitor and two host-guest systems. New features of the docking algorithm and scoring function are tested here prospectively for the first time. Wilma-SIE provides good correlations with actual binding affinities over a wide range of binding affinities that includes strong binders as in the case of SAMPL4 host-guest systems. Absolute binding affinities are also reproduced with appropriate training of the scoring function on available data sets or from comparative estimation of the change in target's vibrational entropy. Even when binding modes are known, SIE predictions lack correlation with experimental affinities within dynamic ranges below 2 kcal/mol as in the case of HIV-integrase ligands, but they correctly signaled the narrowness of the dynamic range. Using a common protein structure for all ligands can reduce the noise, while incorporating a more sophisticated solvation treatment improves absolute predictions. The HIV-integrase virtual screening data set consists of promiscuous weak binders with relatively high flexibility and thus it falls outside of the applicability domain of the Wilma-SIE docking platform. Despite these difficulties, unbiased docking around three known binding sites of the enzyme resulted in over a third of ligands being docked within 2\AA} from their actual poses and over half of the ligands docked in the correct site, leading to better-than-random virtual screening results.

  • Assessing protein-ligand docking for the binding of organometallic compounds to proteins.
    Ortega-Carrasco, Elisabeth and Lledós, Agusti and Maréchal, Jean-Didier
    Journal of computational chemistry, 2014, 35(3), 192-198
    PMID: 24375319     doi: 10.1002/jcc.23472
    Organometallic compounds are increasingly used as molecular scaffolds in drug development projects; their structural and electronic properties offering novel opportunities in protein-ligand complementarities. Interestingly, while protein-ligand dockings have long become a spearhead in computer assisted drug design, no benchmarking nor optimization have been done for their use with organometallic compounds. Pursuing our efforts to model metal mediated recognition processes, we herein present a systematic study of the capabilities of the program GOLD to predict the interactions of protein with organometallic compounds. The study focuses on inert systems for which no alteration of the first coordination sphere of the metal occurs upon binding. Several scaffolds are used as test systems with different docking schemes and scoring functions. We conclude that ChemScore is the most robust scoring function with ASP and ChemPLP providing with good results too and GoldScore slightly underperforming. This study shows that current state-of-the-art protein-ligand docking techniques are reliable for the docking of inert organometallic compounds binding to protein.

  • Importance of ligand conformational energies in carbohydrate docking: Sorting the wheat from the chaff.
    Nivedha, Anita K and Makeneni, Spandana and Foley, Bethany Lachele and Tessier, Matthew B and Woods, Robert J
    Journal of computational chemistry, 2014, 35(7), 526-539
    PMID: 24375430     doi: 10.1002/jcc.23517
    Docking algorithms that aim to be applicable to a broad range of ligands suffer reduced accuracy because they are unable to incorporate ligand-specific conformational energies. Here, we develop a set of Carbohydrate Intrinsic (CHI) energy functions that quantify the conformational properties of oligosaccharides, based on the values of their glycosidic torsion angles. The relative energies predicted by the CHI energy functions mirror the conformational distributions of glycosidic linkages determined from a survey of oligosaccharide-protein complexes in the protein data bank. Addition of CHI energies to the standard docking scores in Autodock 3, 4.2, and Vina consistently improves pose ranking of oligosaccharides docked to a set of anticarbohydrate antibodies. The CHI energy functions are also independent of docking algorithm, and with minor modifications, may be incorporated into both theoretical modeling methods, and experimental NMR or X-ray structure refinement programs.

  • istar: a web platform for large-scale protein-ligand docking.
    Li, Hongjian and Leung, Kwong-Sak and Ballester, Pedro J and Wong, Man-Hon
    PloS one, 2014, 9(1), e85678
    PMID: 24475049     doi: 10.1371/journal.pone.0085678
    Protein-ligand docking is a key computational method in the design of starting points for the drug discovery process. We are motivated by the desire to automate large-scale docking using our popular docking engine idock and thus have developed a publicly-accessible web platform called istar. Without tedious software installation, users can submit jobs using our website. Our istar website supports 1) filtering ligands by desired molecular properties and previewing the number of ligands to dock, 2) monitoring job progress in real time, and 3) visualizing ligand conformations and outputting free energy and ligand efficiency predicted by idock, binding affinity predicted by RF-Score, putative hydrogen bonds, and supplier information for easy purchase, three useful features commonly lacked on other online docking platforms like DOCK Blaster or iScreen. We have collected 17,224,424 ligands from the All Clean subset of the ZINC database, and revamped our docking engine idock to version 2.0, further improving docking speed and accuracy, and integrating RF-Score as an alternative rescoring function. To compare idock 2.0 with the state-of-the-art AutoDock Vina 1.1.2, we have carried out a rescoring benchmark and a redocking benchmark on the 2,897 and 343 protein-ligand complexes of PDBbind v2012 refined set and CSAR NRC HiQ Set 24Sept2010 respectively, and an execution time benchmark on 12 diverse proteins and 3,000 ligands of different molecular weight. Results show that, under various scenarios, idock achieves comparable success rates while outperforming AutoDock Vina in terms of docking speed by at least 8.69 times and at most 37.51 times. When evaluated on the PDBbind v2012 core set, our istar platform combining with RF-Score manages to reproduce Pearson's correlation coefficient and Spearman's correlation coefficient of as high as 0.855 and 0.859 respectively between the experimental binding affinity and the predicted binding affinity of the docked conformation. istar is freely available at

  • iview: an interactive WebGL visualizer for protein-ligand complex.
    Li, Hongjian and Leung, Kwong-Sak and Nakane, Takanori and Wong, Man-Hon
    Bmc Bioinformatics, 2014, 15, 56
    PMID: 24564583     doi: 10.1186/1471-2105-15-56
    BACKGROUND:Visualization of protein-ligand complex plays an important role in elaborating protein-ligand interactions and aiding novel drug design. Most existing web visualizers either rely on slow software rendering, or lack virtual reality support. The vital feature of macromolecular surface construction is also unavailable.


  • DockoMatic 2.0: High Throughput Inverse Virtual Screening and Homology Modeling
    Bullock, Casey and Cornia, Nic and Jacob, Reed and Remm, Andrew and Peavey, Thomas and Weekes, Ken and Mallory, Chris and Oxford, Julia T and McDougal, Owen M. and Andersen, Timothy L
    Journal of chemical information and modeling, 2013, 53(8), 2161-2170
    PMID: 23808933     doi: 10.1021/ci400047w
    DockoMatic is a free and open source application that unifies a suite of software programs within a user-friendly graphical user interface (GUI) to facilitate molecular docking experiments. Here we describe the release of DockoMatic 2.0; significant software advances include the ability to (1) conduct high throughput inverse virtual screening (IVS); (2) construct 3D homology models; and (3) customize the user interface. Users can now efficiently setup, start, and manage IVS experiments through the DockoMatic GUI by specifying receptor(s), ligand(s), grid parameter file(s), and docking engine (either AutoDock or AutoDock Vina). DockoMatic automatically generates the needed experiment input files and output directories and allows the user to manage and monitor job progress. Upon job completion, a summary of results is generated by Dockomatic to facilitate interpretation by the user. DockoMatic functionality has also been expanded to facilitate the construction of 3D protein homology models using the Timely Integrated Modeler (TIM) wizard. The wizard TIM provides an interface that accesses the basic local alignment search tool (BLAST) and MODELER programs and guides the user through the necessary steps to easily and efficiently create 3D homology models for biomacromolecular structures. The DockoMatic GUI can be customized by the user, and the software design makes it relatively easy to integrate additional docking engines, scoring functions, or third party programs. DockoMatic is a free comprehensive molecular docking software program for all levels of scientists in both research and education.

  • Roles for ordered and bulk solvent in ligand recognition and docking in two related cavities.
    Barelier, Sarah and Boyce, Sarah E and Fish, Inbar and Fischer, Marcus and Goodin, David B and Shoichet, Brian K
    PloS one, 2013, 8(7), e69153
    PMID: 23874896     doi: 10.1371/journal.pone.0069153
    A key challenge in structure-based discovery is accounting for modulation of protein-ligand interactions by ordered and bulk solvent. To investigate this, we compared ligand binding to a buried cavity in Cytochrome c Peroxidase (CcP), where affinity is dominated by a single ionic interaction, versus a cavity variant partly opened to solvent by loop deletion. This opening had unexpected effects on ligand orientation, affinity, and ordered water structure. Some ligands lost over ten-fold in affinity and reoriented in the cavity, while others retained their geometries, formed new interactions with water networks, and improved affinity. To test our ability to discover new ligands against this opened site prospectively, a 534,000 fragment library was docked against the open cavity using two models of ligand solvation. Using an older solvation model that prioritized many neutral molecules, three such uncharged docking hits were tested, none of which was observed to bind; these molecules were not highly ranked by the new, context-dependent solvation score. Using this new method, another 15 highly-ranked molecules were tested for binding. In contrast to the previous result, 14 of these bound detectably, with affinities ranging from 8 µM to 2 mM. In crystal structures, four of these new ligands superposed well with the docking predictions but two did not, reflecting unanticipated interactions with newly ordered waters molecules. Comparing recognition between this open cavity and its buried analog begins to isolate the roles of ordered solvent in a system that lends itself readily to prospective testing and that may be broadly useful to the community.

  • VinaMPI: Facilitating multiple receptor high-throughput virtual docking on high-performance computers
    Ellingson, Sally R and Smith, Jeremy C and Baudry, Jerome
    Journal of computational chemistry, 2013, 34(25), 2212-2221
    PMID: 23813626     doi: 10.1002/jcc.23367
    The program VinaMPI has been developed to enable massively large virtual drug screens on leadership-class computing resources, using a large number of cores to decrease the time-to-completion of the screen. VinaMPI is a massively parallel Message Passing Interface (MPI) program based on the multithreaded virtual docking program AutodockVina, and is used to distribute tasks while multithreading is used to speed-up individual docking tasks. VinaMPI uses a distribution scheme in which tasks are evenly distributed to the workers based on the complexity of each task, as defined by the number of rotatable bonds in each chemical compound investigated. VinaMPI efficiently handles multiple proteins in a ligand screen, allowing for high-throughput inverse docking that presents new opportunities for improving the efficiency of the drug discovery pipeline. VinaMPI successfully ran on 84,672 cores with a continual decrease in job completion time with increasing core count. The ratio of the number of tasks in a screening to the number of workers should be at least around 100 in order to have a good load balance and an optimal job completion time. The code is freely available and downloadable. Instructions for downloading and using the code are provided in the Supporting Information.

  • Assessment and challenges of ligand docking into comparative models of G-protein coupled receptors.
    Nguyen, Elizabeth Dong and Norn, Christoffer and Frimurer, Thomas M and Meiler, Jens
    PloS one, 2013, 8(7), e67302
    PMID: 23844000     doi: 10.1371/journal.pone.0067302
    The rapidly increasing number of high-resolution X-ray structures of G-protein coupled receptors (GPCRs) creates a unique opportunity to employ comparative modeling and docking to provide valuable insight into the function and ligand binding determinants of novel receptors, to assist in virtual screening and to design and optimize drug candidates. However, low sequence identity between receptors, conformational flexibility, and chemical diversity of ligands present an enormous challenge to molecular modeling approaches. It is our hypothesis that rapid Monte-Carlo sampling of protein backbone and side-chain conformational space with Rosetta can be leveraged to meet this challenge. This study performs unbiased comparative modeling and docking methodologies using 14 distinct high-resolution GPCRs and proposes knowledge-based filtering methods for improvement of sampling performance and identification of correct ligand-receptor interactions. On average, top ranked receptor models built on template structures over 50% sequence identity are within 2.9\AA} of the experimental structure, with an average root mean square deviation (RMSD) of 2.2\AA} for the transmembrane region and 5\AA} for the second extracellular loop. Furthermore, these models are consistently correlated with low Rosetta energy score. To predict their binding modes, ligand conformers of the 14 ligands co-crystalized with the GPCRs were docked against the top ranked comparative models. In contrast to the comparative models themselves, however, it remains difficult to unambiguously identify correct binding modes by score alone. On average, sampling performance was improved by 10(3) fold over random using knowledge-based and energy-based filters. In assessing the applicability of experimental constraints, we found that sampling performance is increased by one order of magnitude for every 10 residues known to contact the ligand. Additionally, in the case of DOR, knowledge of a single specific ligand-protein contact improved sampling efficiency 7 fold. These findings offer specific guidelines which may lead to increased success in determining receptor-ligand complexes.

  • Towards ligand docking including explicit interface water molecules.
    Lemmon, Gordon and Meiler, Jens
    PloS one, 2013, 8(6), e67536
    PMID: 23840735     doi: 10.1371/journal.pone.0067536
    Small molecule docking predicts the interaction of a small molecule ligand with a protein at atomic-detail accuracy including position and conformation the ligand but also conformational changes of the protein upon ligand binding. While successful in the majority of cases, docking algorithms including RosettaLigand fail in some cases to predict the correct protein/ligand complex structure. In this study we show that simultaneous docking of explicit interface water molecules greatly improves Rosetta's ability to distinguish correct from incorrect ligand poses. This result holds true for both protein-centric water docking wherein waters are located relative to the protein binding site and ligand-centric water docking wherein waters move with the ligand during docking. Protein-centric docking is used to model 99 HIV-1 protease/protease inhibitor structures. We find protease inhibitor placement improving at a ratio of 9∶1 when one critical interface water molecule is included in the docking simulation. Ligand-centric docking is applied to 341 structures from the CSAR benchmark of diverse protein/ligand complexes [1]. Across this diverse dataset we see up to 56% recovery of failed docking studies, when waters are included in the docking simulation.

  • Small-molecule ligand docking into comparative models with Rosetta.
    Combs, Steven A and Deluca, Samuel L and Deluca, Stephanie H and Lemmon, Gordon H and Nannemann, David P and Nguyen, Elizabeth D and Willis, Jordan R and Sheehan, Jonathan H and Meiler, Jens
    Nature protocols, 2013, 8(7), 1277-1298
    PMID: 23744289     doi: 10.1038/nprot.2013.074
    Structure-based drug design is frequently used to accelerate the development of small-molecule therapeutics. Although substantial progress has been made in X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, the availability of high-resolution structures is limited owing to the frequent inability to crystallize or obtain sufficient NMR restraints for large or flexible proteins. Computational methods can be used to both predict unknown protein structures and model ligand interactions when experimental data are unavailable. This paper describes a comprehensive and detailed protocol using the Rosetta modeling suite to dock small-molecule ligands into comparative models. In the protocol presented here, we review the comparative modeling process, including sequence alignment, threading and loop building. Next, we cover docking a small-molecule ligand into the protein comparative model. In addition, we discuss criteria that can improve ligand docking into comparative models. Finally, and importantly, we present a strategy for assessing model quality. The entire protocol is presented on a single example selected solely for didactic purposes. The results are therefore not representative and do not replace benchmarks published elsewhere. We also provide an additional tutorial so that the user can gain hands-on experience in using Rosetta. The protocol should take 5-7 h, with additional time allocated for computer generation of models.

  • The MM2QM tool for combining docking, molecular dynamics, molecular mechanics, and quantum mechanics†
    Nowosielski, Marcin and Hoffmann, Marcin and Kuron, Aneta and Korycka-Machala, Malgorzata and Dziadek, Jaroslaw
    Journal of computational chemistry, 2013, 34(9), 750-756
    PMID: 23233437     doi: 10.1002/jcc.23192
    The use of the MM2QM tool in a combined docking + molecular dynamics (MD) + molecular mechanics (MM) + quantum mechanical (QM) binding affinity prediction study is presented, and the tool itself is discussed. The system of interest is Mycobacterium tuberculosis (MTB) pantothenate synthetase in complexes with three highly similar sulfonamide inhibitors, for which crystal structures are available. Starting from the structure of MTB pantothenate synthetase in the "open" conformation and following the combined docking + MD + MM + QM procedure, we were able to capture the closing of the enzyme binding pocket and to reproduce the position of the ligands with an average root mean square deviation of 1.6\AA}. Protein-ligand interaction energies were reproduced with an average error lower than 10%. The discussion on the MD part and a protein flexibility importance is carried out. The presented approach may be useful especially for finding analog inhibitors or improving drug candidates.

  • Docking Challenge: Protein Sampling and Molecular Docking Performance
    Elokely, Khaled M and Doerksen, Robert J
    Journal of chemical information and modeling, 2013
    PMID: 23530568    
    Computational tools are essential in the drug design process, especially in order to take advantage of the increasing numbers of solved X-ray and NMR protein-ligand structures. Nowadays, molecular docking methods are routinely used for prediction of protein-ligand interactions and to aid in selecting potent molecules as a part of virtual screening of large databases. The improvements and advances in computational capacity in the last decade have allowed for further developments in molecular docking algorithms to address more complicated aspects such as protein flexibility. The effects of incorporation of active site water molecules and implicit or explicit solvation of the binding site are other relevant issues to be addressed in the docking procedures. Using the right docking algorithm at the right stage of virtual screening is most important. We report a staged study to address the effects of various aspects of protein flexibility and inclusion of active site water molecules on docking effectiveness to retrieve (and to be able to predict) correct ligand poses and to rank docked ligands in relation to their biological activity, for CHK1, ERK2, LpxC and UPA. We generated multiple conformers for the ligand, and compared different docking algorithms that use a variety of approaches to protein flexibility, including rigid receptor, soft receptor, flexible side chains, induced-fit, and multiple structure algorithms. Docking accuracy varied from 1 to 84%, demonstrating that the choice of method is important.

  • Water PMF for predicting the properties of water molecules in protein binding site
    Zheng, Mingyue and Li, Yanlian and Xiong, Bing and Jiang, Hualiang and Shen, Jingkang
    Journal of computational chemistry, 2013, 34(7), 583-592
    PMID: 23114863     doi: 10.1002/jcc.23170
    Water is an important component in living systems and deserves better understanding in chemistry and biology. However, due to the difficulty of investigating the water functions in protein structures, it is usually ignored in computational modeling, especially in the field of computer-aided drug design. Here, using the potential of mean forces (PMFs) approach, we constructed a water PMF (wPMF) based on 3946 non-redundant high resolution crystal structures. The extracted wPMF potential was first used to investigate the structure pattern of water and analyze the residue hydrophilicity. Then, the relationship between wPMF score and the B factor value of crystal waters was studied. It was found that wPMF agrees well with some previously reported experimental observations. In addition, the wPMF score was also tested in parallel with 3D-RISM to measure the ability of retrieving experimentally observed waters, and showed comparable performance but with much less computational cost. In the end, we proposed a grid-based clustering scheme together with a distance weighted wPMF score to further extend wPMF to predict the potential hydration sites of protein structure. From the test, this approach can predict the hydration site at the accuracy about 80% when the calculated score lower than -4.0. It also allows the assessment of whether or not a given water molecule should be targeted for displacement in ligand design. Overall, the wPMF presented here provides an optional solution to many water related computational modeling problems, some of which can be highly valuable as part of a rational drug design strategy.

  • CovalentDock Cloud: a web server for automated covalent docking.
    Ouyang, Xuchang and Zhou, Shuo and Ge, Zemei and Li, Runtao and Kwoh, Chee Keong
    Nucleic acids research, 2013, 41(W1), W329-W332
    PMID: 23677616     doi: 10.1093/nar/gkt406
    Covalent binding is an important mechanism for many drugs to gain its function. We developed a computational algorithm to model this chemical event and extended it to a web server, the CovalentDock Cloud, to make it accessible directly online without any local installation and configuration. It provides a simple yet user-friendly web interface to perform covalent docking experiments and analysis online. The web server accepts the structures of both the ligand and the receptor uploaded by the user or retrieved from online databases with valid access id. It identifies the potential covalent binding patterns, carries out the covalent docking experiments and provides visualization of the result for user analysis. This web server is free and open to all users at

  • DOLINA - Docking Based on a Local Induced-Fit Algorithm: Application toward Small-Molecule Binding to Nuclear Receptors.
    Smiesko, Martin
    Journal of chemical information and modeling, 2013, 53(6), 1415-1423
    PMID: 23725336     doi: 10.1021/ci400098y
    Docking algorithms allowing for ligand and - to various extent - also protein flexibility are nowadays replacing techniques based on rigid protocols. The algorithm implemented in the Dolina software relies on pharmacophore matching for generating potential ligand poses and treats associated local induced-fit changes by combinatorial rearrangement of side-chains lining the binding site. In Dolina, ligand flexibility is not treated internally, instead a pool of low-energy conformers identified in a conformational search is screened for extended binding-pose candidates. Grouping rearranged residues in sterically independent families and side-chain conformer clustering are employed to achieve efficient use of the computational resources along with a good accuracy of the generated poses. Dolina was applied toward docking of small-molecule ligands to three different nuclear receptor ligand binding domains for which in total 18 high-resolution crystal structures were used as reference. The selected nuclear receptors feature a deeply buried ligand-binding site where local induced-fit is to be expected, particularly for receptor antagonists. For each receptor, a crystal structure with a cocrystallized small steroid ligand (template) was chosen as a target system, to which several synthetic ligands of different sizes were docked. Poses within an RMSD of 2.0\AA} from the crystal reference pose were generated in 91% of the cases. In 28%, the pose with the lowest RMSD to the reference pose was ranked as the top one, and in 76% it was ranked among the top five poses. Detailed descriptions of the docking algorithm and observed results are included. Dolina is available free of charge for academic institutions.

  • Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 - A Public Library of Challenging Docking Benchmark Sets.
    Bauer, Matthias R and Ibrahim, Tamer M and Vogel, Simon M and Boeckler, Frank M
    Journal of chemical information and modeling, 2013, 53(6), 1447-1462
    PMID: 23705874     doi: 10.1021/ci400115b
    The application of molecular benchmarking sets helps to assess the actual performance of virtual screening (VS) workflows. To improve the efficiency of structure-based VS approaches, the selection and optimization of various parameters can be guided by benchmarking. With the DEKOIS 2.0 library, we aim to further extend and complement the collection of publicly available decoy sets. Based on BindingDB bioactivity data, we provide 81 new and structurally diverse benchmark sets for a wide variety of different target classes. To ensure a meaningful selection of ligands, we address several issues that can be found in bioactivity data. We have improved our previously introduced DEKOIS methodology with enhanced physicochemical matching, now including the consideration of molecular charges, as well as a more sophisticated elimination of latent actives in the decoy set (LADS). We evaluate the docking performance of Glide, GOLD, and AutoDock Vina with our data sets and highlight existing challenges for VS tools. All DEKOIS 2.0 benchmark sets will be made accessible at .

  • Latest developments in molecular docking: 2010-2011 in review.
    Yuriev, Elizabeth and Ramsland, Paul A
    Journal of molecular recognition : JMR, 2013, 26(5), 215-239
    PMID: 23526775     doi: 10.1002/jmr.2266
    The aim of docking is to accurately predict the structure of a ligand within the constraints of a receptor binding site and to correctly estimate the strength of binding. We discuss, in detail, methodological developments that occurred in the docking field in 2010 and 2011, with a particular focus on the more difficult, and sometimes controversial, aspects of this promising computational discipline. The main developments in docking in this period, covered in this review, are receptor flexibility, solvation, fragment docking, postprocessing, docking into homology models, and docking comparisons. Several new, or at least newly invigorated, advances occurred in areas such as nonlinear scoring functions, using machine-learning approaches. This review is strongly focused on docking advances in the context of drug design, specifically in virtual screening and fragment-based drug design. Where appropriate, we refer readers to exemplar case studies. Copyright

  • Are predicted protein structures of any value for binding site prediction and virtual ligand screening?
    Skolnick, Jeffrey and Zhou, Hongyi and Gao, Mu
    Current Opinion in Structural Biology VL -, 2013(0 SP - EP - PY - T2 -)
    PMID: 23415854     doi: 10.1016/
    The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ∼75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.

  • Grid-based molecular footprint comparison method for docking and de novo design: Application to HIVgp41
    Balius, Trent E and Allen, William J and Mukherjee, Sudipto and Rizzo, Robert C
    Journal of computational chemistry, 2013, 34(14), 1226-1240
    PMID: 23436713     doi: 10.1002/jcc.23245
    Scoring functions are a critically important component of computer-aided screening methods for the identification of lead compounds during early stages of drug discovery. Here, we present a new multigrid implementation of the footprint similarity (FPS) scoring function that was recently developed in our laboratory which has proven useful for identification of compounds which bind to a protein on a per-residue basis in a way that resembles a known reference. The grid-based FPS method is much faster than its Cartesian-space counterpart, which makes it computationally tractable for on-the-fly docking, virtual screening, or de novo design. In this work, we establish that: (i) relatively few grids can be used to accurately approximate Cartesian space footprint similarity, (ii) the method yields improved success over the standard DOCK energy function for pose identification across a large test set of experimental co-crystal structures, for crossdocking, and for database enrichment, and (iii) grid-based FPS scoring can be used to tailor construction of new molecules to have specific properties, as demonstrated in a series of test cases targeting the viral protein HIVgp41. The method is available in the program DOCK6.

  • Systematic and efficient side chain optimization for molecular docking using a cheapest-path procedure
    Schumann, Marcel and Armen, Roger S
    Journal of computational chemistry, 2013, 34(14), 1258-1269
    PMID: 23420703     doi: 10.1002/jcc.23251
    Molecular docking of small-molecules is an important procedure for computer-aided drug design. Modeling receptor side chain flexibility is often important or even crucial, as it allows the receptor to adopt new conformations as induced by ligand binding. However, the accurate and efficient incorporation of receptor side chain flexibility has proven to be a challenge due to the huge computational complexity required to adequately address this problem. Here we describe a new docking approach with a very fast, graph-based optimization algorithm for assignment of the near-optimal set of residue rotamers. We extensively validate our approach using the 40 DUD target benchmarks commonly used to assess virtual screening performance and demonstrate a large improvement using the developed side chain optimization over rigid receptor docking (average ROC AUC of 0.693 vs. 0.623). Compared to numerous benchmarks, the overall performance is better than nearly all other commonly used procedures. Furthermore, we provide a detailed analysis of the level of receptor flexibility observed in docking results for different classes of residues and elucidate potential avenues for further improvement.

  • Automated Large-Scale File Preparation, Docking, and Scoring: Evaluation of ITScore and STScore Using the 2012 Community Structure-Activity Resource Benchmark
    Grinter, Sam Z and Yan, Chengfei and Huang, Sheng-You and Jiang, Lin and Zou, Xiaoqin
    Journal of chemical information and modeling, 2013, 53(8), 1905-1914
    PMID: 23656179     doi: 10.1021/ci400045v
    In this study, we use the recently released 2012 Community Structure-Activity Resource (CSAR) data set to evaluate two knowledge-based scoring functions, ITScore and STScore, and a simple force-field-based potential (VDWScore). The CSAR data set contains 757 compounds, most with known affinities, and 57 crystal structures. With the help of the script files for docking preparation, we use the full CSAR data set to evaluate the performances of the scoring functions on binding affinity prediction and active/inactive compound discrimination. The CSAR subset that includes crystal structures is used as well, to evaluate the performances of the scoring functions on binding mode and affinity predictions. Within this structure subset, we investigate the importance of accurate ligand and protein conformational sampling and find that the binding affinity predictions are less sensitive to non-native ligand and protein conformations than the binding mode predictions. We also find the full CSAR data set to be more challenging in making binding mode predictions than the subset with structures. The script files used for preparing the CSAR data set for docking, including scripts for canonicalization of the ligand atoms, are offered freely to the academic community.

  • CSAR Data Set Release 2012: Ligands, Affinities, Complexes, and Docking Decoys
    Dunbar, James B and Smith, Richard D and Damm-Ganamet, Kelly L and Ahmed, Aqeel and Esposito, Emilio Xavier and Delproposto, James and Chinnaswamy, Krishnapriya and Kang, You-Na and Kubish, Ginger and Gestwicki, Jason E and Stuckey, Jeanne A and Carlson, Heather A
    Journal of chemical information and modeling, 2013, 53(8), 1842-1852
    PMID: 23617227     doi: 10.1021/ci4000486
    A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose ( ). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3-4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pKa. This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.

  • Investigation on the Effect of Key Water Molecules on Docking Performance in CSARdock Exercise
    Kumar, Ashutosh and Zhang, Kam Y J
    Journal of chemical information and modeling, 2013, 53(8), 1880-1892
    PMID: 23617355     doi: 10.1021/ci400052w
    Water molecules are routinely included in molecular docking methods and protocols because of their important role in mediating ligand protein interactions. However, it is still unclear that the inclusion of explicit water molecules improves docking accuracy. To explore the effect of including key water molecules on docking accuracy and performance, we participated in the CSARdock 2011 benchmark exercise. This exercise provides a valuable opportunity for researchers to test their docking programs, methods, and protocols in a blind testing environment. The benchmark exercise and its analysis presented in this paper showed that the performance of current docking programs can be improved by incorporating carefully selected water molecules. Our study showed that water mapping calculations can be used to select key water molecules from experimentally identified water positions for molecular dockings. We have observed that inclusion of all binding site water molecules led to reduced performance and erroneous results. Moreover, an overall improvement in binding pose prediction was achieved when computationally selected water molecules are included during docking simulations. The improvement in the docking performance by including water molecules also depends on protein system, chemical class of ligand, docking method, and scoring function.

  • Docking-Based Virtual Screening of Covalently Binding Ligands: An Orthogonal Lead Discovery Approach
    Schröder, Jörg and Klinger, Anette and Oellien, Frank and Marhofer, Richard J and Duszenko, Michael and Selzer, Paul M
    Journal of medicinal chemistry, 2013, 56(4), 1478-1490
    PMID: 23350811    
    In pharmaceutical industry, lead discovery strategies and screening collections have been predominantly tailored to discover compounds that modulate target proteins through noncovalent interactions. Conversely, covalent linkage formation is an important mechanism for a quantity of successful drugs in the market, which are discovered in most cases by hindsight instead of systematical design. In this article, the implementation of a docking-based virtual screening workflow for the retrieval of covalent binders is presented considering human cathepsin K as a test case. By use of the docking conditions that led to the best enrichment of known actives, 44 candidate compounds with unknown activity on cathepsin K were finally selected for experimental evaluation. The most potent inhibitor, 4-(N-phenylanilino)-6-pyrrolidin-1-yl-1,3,5-triazine-2-carbonitrile (CP243522), showed a K(i) of 21 nM and was confirmed to have a covalent reversible mechanism of inhibition. The presented approach will have great potential in cases where covalent inhibition is the desired drug discovery strategy.

  • AutoMap: A tool for analyzing protein-ligand recognition using multiple ligand binding modes.
    Agostino, Mark and Mancera, Ricardo L and Ramsland, Paul A and Yuriev, Elizabeth
    Journal of molecular graphics & modelling, 2013, 40C, 80-90
    PMID: 23376613     doi: 10.1016/j.jmgm.2013.01.001
    Prediction of the protein residues most likely to be involved in ligand recognition is of substantial value in structure-based drug design. Considering multiple ligand binding modes is of potential relevance to studying ligand recognition, but is generally ignored by currently available techniques. We have previously presented the site mapping technique, which considers multiple ligand binding modes in its analysis of protein-ligand recognition. AutoMap is a partially automated implementation of our previously developed site mapping procedure. It consists of a series of Perl scripts that utilize the output of molecular docking to generate "site maps" of a protein binding site. AutoMap determines the hydrogen bonding and van der Waals interactions taking place between a target protein and each pose of a ligand ensemble. It tallies these interactions according to the protein residues with which they occur, then normalizes the tallies and maps these to the surface of the protein. The residues involved in interactions are selected according to specific cutoffs. The procedure has been demonstrated to perform well in studying carbohydrate-protein and peptide-antibody recognition. An automated procedure to optimize cutoff selection is demonstrated to rapidly identify the appropriate cutoffs for these previously studied systems. The prediction of key ligand binding residues is compared between AutoMap using automatically optimized cutoffs, AutoMap using a previously selected cutoff, the top ranked pose from docking and the predictions supplied by FTMap. AutoMap using automatically optimized cutoffs is demonstrated to provide improved predictions, compared to other methods, in a set of immunologically relevant test cases. The automated implementation of the site mapping technique provides the opportunity for rapid optimization and deployment of the technique for investigating a broad range of protein-ligand systems.

  • Multiple structures for virtual ligand screening: defining binding site properties-based criteria to optimize the selection of the query.
    Ben Nasr, Nesrine and Guillemain, Hélène and Lagarde, Nathalie and Zagury, Jean-François and Montes, Matthieu
    Journal of chemical information and modeling, 2013, 53(2), 293-311
    PMID: 23312043    
    Virtual ligand screening is an integral part of the modern drug discovery process. Traditional ligand-based, virtual screening approaches are fast but require a set of structurally diverse ligands known to bind to the target. Traditional structure-based approaches require high-resolution target protein structures and are computationally demanding. In contrast, the recently developed threading/structure-based FINDSITE-based approaches have the advantage that they are as fast as traditional ligand-based approaches and yet overcome the limitations of traditional ligand- or structure-based approaches. These new methods can use predicted low-resolution structures and infer the likelihood of a ligand binding to a target by utilizing ligand information excised from the target's remote or close homologous proteins and/or libraries of ligand binding databases. Here, we develop an improved version of FINDSITE, FINDSITEfilt, that filters out false positive ligands in threading identified templates by a better binding site detection procedure that includes information about the binding site amino acid similarity. We then combine FINDSITEfilt with FINDSITEX that uses publicly available binding databases ChEMBL and DrugBank for virtual ligand screening. The combined approach, FINDSITEcomb, is compared to two traditional docking methods, AUTODOCK Vina and DOCK 6, on the DUD benchmark set. It is shown to be significantly better in terms of enrichment factor, dependence on target structure quality, and speed. FINDSITEcomb is then tested for virtual ligand screening on a large set of 3576 generic targets from the DrugBank database as well as a set of 168 Human GPCRs. Excluding close homologues, FINDSITEcomb gives an average enrichment factor of 52.1 for generic targets and 22.3 for GPCRs within the top 1% of the screened compound library. Around 65% of the targets have better than random enrichment factors. The performance is insensitive to target structure quality, as long as it has a TM-score ≥ 0.4 to native. Thus, FINDSITEcomb makes the screening of millions of compounds across entire proteomes feasible. The FINDSITEcomb web service is freely available for academic users at

  • CovalentDock: Automated covalent docking with parameterized covalent linkage energy estimation and molecular geometry constraints
    Ouyang, Xuchang and Zhou, Shuo and Su, Chinh Tran To and Ge, Zemei and Li, Runtao and Kwoh, Chee Keong
    Journal of computational chemistry, 2013, 34(4), 326-336
    PMID: 23034731     doi: 10.1002/jcc.23136
    Covalent linkage formation is a very important mechanism for many covalent drugs to work. However, partly due to the limitations of proper computational tools for covalent docking, most covalent drugs are not discovered systematically. In this article, we present a new covalent docking package, the CovalentDock, built on the top of the source code of Autodock. We developed an empirical model of free energy change estimation for covalent linkage formation, which is compatible with existing scoring functions used in docking, while handling the molecular geometry constrains of the covalent linkage with special atom types and directional grid maps. Integrated preparation scripts are also written for the automation of the whole covalent docking workflow. The result tested on existing crystal structures with covalent linkage shows that CovalentDock can reproduce the native covalent complexes with significant improved accuracy when compared with the default covalent docking method in Autodock. Experiments also suggest that CovalentDock is capable of covalent virtual screening with satisfactory enrichment performance. In addition, the investigation on the results also shows that the chirality and target selectivity along with the molecular geometry constrains are well preserved by CovalentDock, showing great capability of this method in the application for covalent drug discovery.

  • Consensus Docking: Improving the Reliability of Docking in a Virtual Screening Context
    Houston, Douglas R and Walkinshaw, Malcolm D
    Journal of chemical information and modeling, 2013, 53(2), 384-390
    PMID: 23351099    
    Structure-based virtual screening relies on scoring the predicted binding modes of compounds docked into the target. Because the accuracy of this scoring relies on the accuracy of the docking, methods that increase docking accuracy are valuable. Here, we present a relatively straightforward method for improving the probability of identifying accurately docked poses. The method is similar in concept to consensus scoring schemes, which have been shown to increase ranking power and thus hit rates, but combines information about predicted binding modes rather than predicted binding affinities. The pose prediction success rate of each docking program alone was found in this trial to be 55% for Autodock, 58% for DOCK, and 64% for Vina. By using more than one docking program to predict the binding pose, correct poses were identified in 82% or more of cases, a significant improvement. In a virtual screen, these more reliably posed compounds can be preferentially advanced to subsequent scoring stages to improve hit rates. Consensus docking can be easily introduced into established structure-based virtual screening methodologies.

  • Use of Experimental Design To Optimize Docking Performance: The Case of LiGenDock, the Docking Module of Ligen, a New De Novo Design Program
    Beato, Claudia and Beccari, Andrea R and Cavazzoni, Carlo and Lorenzi, Simone and Costantino, Gabriele
    Journal of chemical information and modeling, 2013, 53(6), 1503-1517
    PMID: 23590204     doi: 10.1021/ci400079k
    On route toward a novel de novo design program, called LiGen, we developed a docking program, LiGenDock, based on pharmacophore models of binding sites, including a non-enumerative docking algorithm. In this paper, we present the functionalities of LiGenDock and its accompanying module LiGenPocket, aimed at the binding site analysis and structure-based pharmacophore definition. We also report the optimization procedure we have carried out to improve the cognate docking and virtual screening performance of LiGenDock. In particular, we applied the design of experiments (DoE) methodology to screen the set of user-adjustable parameters to identify those having the largest influence on the accuracy of the results (which ensure the best performance in pose prediction and in virtual screening approaches) and then to choose their optimal values. The results are also compared with those obtained by two popular docking programs, namely, Glide and AutoDock for pose prediction, and Glide and DOCK6 for Virtual Screening.

  • Identifying ligand binding sites and poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics.
    Wang, Kai and Chodera, John D and Yang, Yanzhi and Shirts, Michael R
    Journal of computer-aided molecular design, 2013, 27(12), 989-1007
    PMID: 24297454     doi: 10.1007/s10822-013-9689-8
    We present a method to identify small molecule ligand binding sites and poses within a given protein crystal structure using GPU-accelerated Hamiltonian replica exchange molecular dynamics simulations. The Hamiltonians used vary from the physical end state of protein interacting with the ligand to an unphysical end state where the ligand does not interact with the protein. As replicas explore the space of Hamiltonians interpolating between these states, the ligand can rapidly escape local minima and explore potential binding sites. Geometric restraints keep the ligands from leaving the vicinity of the protein and an alchemical pathway designed to increase phase space overlap between intermediates ensures good mixing. Because of the rigorous statistical mechanical nature of the Hamiltonian exchange framework, we can also extract binding free energy estimates for all putative binding sites. We present results of this methodology applied to the T4 lysozyme L99A model system for three known ligands and one non-binder as a control, using an implicit solvent. We find that our methodology identifies known crystallographic binding sites consistently and accurately for the small number of ligands considered here and gives free energies consistent with experiment. We are also able to analyze the contribution of individual binding sites to the overall binding affinity. Our methodology points to near term potential applications in early-stage structure-guided drug discovery.

  • Peptide docking and structure-based characterization of peptide binding: from knowledge to know-how.
    London, Nir and Raveh, Barak and Schueler-Furman, Ora
    Current opinion in structural biology, 2013, 23(6), 894-902
    PMID: 24138780     doi: 10.1016/
    Peptide-mediated interactions are gaining increased attention due to their predominant roles in the many regulatory processes that involve dynamic interactions between proteins. The structures of such interactions provide an excellent starting point for their characterization and manipulation, and can provide leads for targeted inhibitor design. The relatively few experimentally determined structures of peptide-protein complexes can be complemented with an outburst of modeling approaches that have been introduced in recent years, with increasing accuracy and applicability to ever more systems. We review different methods to address the considerable challenges in modeling the binding of a short yet highly flexible peptide to its partner. These methods apply an array of sampling strategies and draw from a recent amassing of knowledge about the biophysical nature of peptide-protein interactions. We elaborate on applications of these structure-based approaches and in particular on the characterization of peptide binding specificity to different peptide-binding domains and enzymes. Such applications can identify new biological targets and thus complement our current view of protein-protein interactions in living organisms. Accurate peptide-protein docking is of particular importance in the light of increased appreciation of the crucial functional roles of disordered regions and the many linear binding motifs embedded within.

  • Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines
    Zhang, Xiaohua and Wong, Sergio E and Lightstone, Felice C
    Journal of computational chemistry, 2013, 34(11), 915-927
    PMID: 23345155     doi: 10.1002/jcc.23214
    A mixed parallel scheme that combines message passing interface (MPI) and multithreading was implemented in the AutoDock Vina molecular docking program. The resulting program, named VinaLC, was tested on the petascale high performance computing (HPC) machines at Lawrence Livermore National Laboratory. To exploit the typical cluster-type supercomputers, thousands of docking calculations were dispatched by the master process to run simultaneously on thousands of slave processes, where each docking calculation takes one slave process on one node, and within the node each docking calculation runs via multithreading on multiple CPU cores and shared memory. Input and output of the program and the data handling within the program were carefully designed to deal with large databases and ultimately achieve HPC on a large number of CPU cores. Parallel performance analysis of the VinaLC program shows that the code scales up to more than 15K CPUs with a very low overhead cost of 3.94%. One million flexible compound docking calculations took only 1.4 h to finish on about 15K CPUs. The docking accuracy of VinaLC has been validated against the DUD data set by the re-docking of X-ray ligands and an enrichment study, 64.4% of the top scoring poses have RMSD values under 2.0\AA}. The program has been demonstrated to have good enrichment performance on 70% of the targets in the DUD data set. An analysis of the enrichment factors calculated at various percentages of the screening database indicates VinaLC has very good early recovery of actives.

  • Characterizing Binding of Small Molecules. II. Evaluating the Potency of Small Molecules to Combat Resistance Based on Docking Structures
    Ding, Bo and Li, Nan and Wang, Wei
    Journal of chemical information and modeling, 2013
    PMID: 23570305    
    Drug resistance severely erodes the efficacy of therapeutic treatments for many diseases. Assessing the potency of a drug lead to combat resistance is no doubt critical for designing new drugs or new therapeutic combinations. Virtual screening is often the first step in drug discovery and a challenging problem is to accurately predict the resistant profile of an inhibitor based on the docking structures. Using a well studied system HIV-1 protease, we have illustrated the success of a computational method called MIEC-SVM on tackling this problem. We computed molecular interaction energy components (MIECs) between the ligand and the protease residues to characterize the docking poses, which were input to support vector machine (SVM) to distinguish resistant from nonresistant mutants. More importantly, the method is able to predict resistant profiles for new drugs based on the docking structures as indicated by its satisfactory performance in leave-one-drug-out and leave-drug/mutants-out tests. Therefore, the MIEC-SVM method can also facilitate designing effective therapeutic combinations by combining drugs with complementary resistant profiles.

  • Accounting for Conformational Variability in Protein-Ligand Docking with NMR-Guided Rescoring.
    Skj{\ae}rven, Lars and Codutti, Luca and Angelini, Andrea and Grimaldi, Manuela and Latek, Dorota and Monecke, Peter and Dreyer, Matthias K and Carlomagno, Teresa
    Journal of the American Chemical Society, 2013, 135(15), 5819-5827
    PMID: 23565800     doi: 10.1021/ja4007468
    A key component to success in structure-based drug design is reliable information on protein-ligand interactions. Recent development in NMR techniques has accelerated this process by overcoming some of the limitations of X-ray crystallography and computational protein-ligand docking. In this work we present a new scoring protocol based on NMR-derived interligand INPHARMA NOEs to guide the selection of computationally generated docking modes. We demonstrate the performance in a range of scenarios, encompassing traditionally difficult cases such as docking to homology models and ligand dependent domain rearrangements. Ambiguities associated with sparse experimental information are lifted by searching a consensus solution based on simultaneously fitting multiple ligand pairs. This study provides a previously unexplored integration between molecular modeling and experimental data, in which interligand NOEs represent the key element in the rescoring algorithm. The presented protocol should be widely applicable for protein-ligand docking also in a different context from drug design and highlights the important role of NMR-based approaches to describe intermolecular ligand-receptor interactions.

  • Automated docking with protein flexibility in the design of femtomolar "click chemistry" inhibitors of acetylcholinesterase.
    Morris, Garrett M and Green, Luke G and Radić, Zoran and Taylor, Palmer and Sharpless, K Barry and Olson, Arthur J and Grynszpan, Flavio
    Journal of chemical information and modeling, 2013, 53(4), 898-906
    PMID: 23451944     doi: 10.1021/ci300545a
    The use of computer-aided structure-based drug design prior to synthesis has proven to be generally valuable in suggesting improved binding analogues of existing ligands. (1) Here we describe the application of the program AutoDock (2) to the design of a focused library that was used in the "click chemistry in-situ" generation of the most potent noncovalent inhibitor of the native enzyme acetylcholinesterase (AChE) yet developed (Kd

  • Optimization of molecular docking scores with support vector rank regression
    Wang, Wei and He, Wanlin and Zhou, Xi and Chen, Xin
    Proteins, 2013, n/a-n/a
    PMID: 23504920     doi: 10.1002/prot.24282
    This work introduces the support vector rank regression (SVRR) algorithm for the optimization of molecular docking scores. Seven original docking scores reported by two docking software were integrated by the SVRR algorithm. The resulting SVRR scores showed an average of 12.1% improvement (59.5% to 66.7%) in binding conformation prediction tests to rank the correctly computed conformation in the first place, along with 16.7% RMSD improvement (2.5414\AA} vs. 2.1162\AA}) for the top ranked conformations. In compound library screening tests, an average of 46.3% improvement (18.2% to 26.6%) was also observed to rank the correct ligand in the first place. Furthermore, it was shown that SVRR scores trained with different example datasets, using different training strategies, all exhibited exceedingly consistent accuracies, suggesting that the SVRR algorithm is highly robust and generalizable. In contrast, using the same training datasets, traditional support vector classification and regression algorithms failed to comparably improve the accuracy of library screening and conformation prediction. These results suggested that, with additional features to indicate the comparative fitness between computed binding conformations, the SVRR algorithm holds the potential to create a new category of more accurate integrative docking scores. Proteins 2013.

  • An Automated Docking Protocol for hERG Channel Blockers
    Di Martino, Giovanni Paolo and Masetti, Matteo and Ceccarini, Luisa and Cavalli, Andrea and Recanatini, Maurizio
    Journal of chemical information and modeling, 2013, 53(1), 159-175
    PMID: 23259741    
    A docking protocol aimed at obtaining a consistent qualitative and quantitative picture of binding for a series of hERG channel blockers is presented. To overcome the limitations experienced by standard procedures when docking blockers at hERG binding site, we designed a strategy that explicitly takes into account the conformations of the channel, their possible intrinsic symmetry, and the role played by the configurational entropy of ligands. The protocol was developed on a series of congeneric sertindole derivatives, allowing us to satisfactorily explain the structure-activity relationships for this set of blockers. In addition, we show that the performance of structure-based models relying on multiple-receptor conformations statistically increases when the protein conformations are chosen in such a way as to capture relevant structural features at the binding site. The protocol was then successfully applied to a series of structurally unrelated blockers.

  • S4MPLE - Sampler For Multiple Protein-Ligand Entities: Simultaneous Docking of Several Entities
    Hoffer, Laurent and Horvath, Dragos
    Journal of chemical information and modeling, 2013, 53(1), 88-102
    PMID: 23215156    
    S4MPLE is a conformational sampling tool, based on a hybrid genetic algorithm, simulating one (conformer enumeration) or more molecules (docking). Energy calculations are based on the AMBER force field [ Cornell et al. J. Am. Chem. Soc. 1995 , 117 , 5179 . ] for biological macromolecules and its generalized version GAFF [ Wang et al. J. Comput. Chem. 2004 , 25 , 1157 . ] for ligands. This paper describes more advanced, specific applications of S4MPLE to problems more complex than classical redocking of drug-like compounds [ Hoffer et al. J. Mol. Graphics Modell. 2012 , submitted for publication. ]. Here, simultaneous docking of multiple entities is addressed in two different important contexts. First, simultaneous docking of two fragment-like ligands was attempted, as such ternary complexes are the basis of fragment-based drug design by linkage of the independent binders. As a preliminary, the capacity of S4MPLE to dock fragment-like compounds has been assessed, since this class of small probes used in fragment-based drug design covers a different chemical space than drug-like molecules. Herein reported success rates from fragments redocking are as good as classical benchmarking results on drug-like compounds (Astex Diverse Set [ Hartshorn et al. J. Med. Chem. 2007 , 50 , 726 . ]). Then, S4MPLE is successfully challenged to predict locations of fragments involved in ternary complexes by means of multientity docking. Second, the key problem of predicting water-mediated interaction is addressed by considering explicit water molecules as additional entities to be docked in the presence of the "main" ligand. Blind prediction of solvent molecule positions, reproducing relevant ligand-water-site mediated interactions, is achieved in 76% cases over saved poses. S4MPLE was also successful to predict crystallographic water displacement by a therefore tailored functional group in the optimized ligand. However, water localization is an extremely delicate issue in terms of weighing of electrostatic and desolvation terms and also introduces a significant increase of required sampling efforts. Yet, the herein reported results - not making use of massively parallel deployment of the software - are very encouraging.

  • Incorporating Backbone Flexibility in MedusaDock Improves Ligand-Binding Pose Prediction in the CSAR2011 Docking Benchmark
    Ding, Feng and Dokholyan, Nikolay V
    Journal of chemical information and modeling, 2013, 0(0), null
    PMID: 23237273    
    Solution of the structures of ligand-receptor complexes via computational docking is an integral step in many structural modeling efforts as well as in rational drug discovery. A major challenge in ligand-receptor docking is the modeling of both receptor and ligand flexibilities in order to capture receptor conformational changes induced by ligand binding. In the molecular docking suite MedusaDock, both ligand and receptor side chain flexibilities are modeled simultaneously with sets of discrete rotamers, where the ligand rotamer library is generated "on the fly" in a stochastic manner. Here, we introduce backbone flexibility into MedusaDock by implementing ensemble docking in a sequential manner for a set of distinct receptor backbone conformations. We generate corresponding backbone ensembles to capture backbone changes upon binding to different ligands, as observed experimentally. We develop a simple clustering and ranking approach to select the top poses as blind predictions. We applied our method in the CSAR2011 benchmark exercise. In 28 out of 35 cases (80%) where the ligand-receptor complex structures were released, we were able to predict near-native poses (<2.5\AA} RMSD), the highest success rate reported for CSAR2011. This result highlights the importance of modeling receptor backbone flexibility to the accurate docking of ligands to flexible targets. We expect a broad application of our fully flexible docking approach in biological studies as well as in rational drug design.


  • Application of Drug-perturbed Essential Dynamics/Molecular Dynamics (ED/MD) to Virtual Screening and Rational Drug Design.
    Chaudhuri, Rima and Carrillo, Oliver and Laughton, Charles Anthony and Orozco, Modesto
    Journal of chemical theory and computation, 2012, 8(7), 2204-2214
    doi: 10.1021/ct300223c
    We present here the first application of a new algorithm, essential dynamics/molecular dynamics (ED/MD), to the field of small molecule docking. The method uses a previously existing molecular dynamics (MD) ensemble of a protein or protein-drug complex to generate, with a very small computational cost, perturbed ensembles which represent ligand-induced binding site flexibility in a more accurate way than the original trajectory. The use of these perturbed ensembles in a standard docking program leads to superior performance than the same docking procedure using the crystal structure or ensembles obtained from conventional MD simulations as templates. The simplicity and accuracy of the method opens up the possibility of introducing protein flexibility in high-throughput docking experiments.

  • Accessible high-throughput virtual screening molecular docking software for students and educators.
    Jacob, Reed B. and Andersen, Tim and McDougal, Owen M.
    PLoS computational biology, 2012, 8(5), e1002499
    PMID: 22693435     doi: 10.1371/journal.pcbi.1002499
    We survey low cost high- throughput virtual screening (HTVS) computer programs for instructors who wish to demonstrate molecular docking in their courses. Since HTVS programs are a useful adjunct to the time consuming and expensive wet bench experiments necessary to discover new drug therapies, the topic of molecular docking is core to the instruction of biochemistry and molecular biology. The availability of HTVS programs coupled with decreasing costs and advances in computer hardware have made computational approaches to drug discovery possible at institutional and non-profit budgets. This paper focuses on HTVS programs with graphical user interfaces (GUIs) that use either DOCK or AutoDock for the prediction of DockoMatic, PyRx, DockingServer, and MOLA since their utility has been proven by the research community, they are free or affordable, and the programs operate on a range of computer platforms.

  • FRED and HYBRID docking performance on standardized datasets.
    McGann, Mark
    Journal of computer-aided molecular design, 2012, 26(8), 897-906
    PMID: 22669221     doi: 10.1007/s10822-012-9584-8
    The docking performance of the FRED and HYBRID programs are evaluated on two standardized datasets from the Docking and Scoring Symposium of the ACS Spring 2011 national meeting. The evaluation includes cognate docking and virtual screening performance. FRED docks 70 % of the structures to within 2\AA} in the cognate docking test. In the virtual screening test, FRED is found to have a mean AUC of 0.75. The HYBRID program uses a modified version of FRED's algorithm that uses both ligand- and structure-based information to dock molecules, which increases its mean AUC to 0.78. HYBRID can also implicitly account for protein flexibility by making use of multiple crystal structures. Using multiple crystal structures improves HYBRID's performance (mean AUC 0.80) with a negligible increase in docking time (~15 %).

  • A Force Field with Discrete Displaceable Waters and Desolvation Entropy for Hydrated Ligand Docking.
    Forli, Stefano and Olson, Arthur J
    Journal of medicinal chemistry, 2012, 55(2), 623-638
    PMID: 22148468     doi: 10.1021/jm2005145
    In modeling ligand-protein interactions, the representation and role of water are of great importance. We introduce a force field and hydration docking method that enables the automated prediction of waters mediating the binding of ligands with target proteins. The method presumes no prior knowledge of the apo or holo protein hydration state and is potentially useful in the process of structure-based drug discovery. The hydration force field accounts for the entropic and enthalpic contributions of discrete waters to ligand binding, improving energy estimation accuracy and docking performance. The force field has been calibrated and validated on a total of 417 complexes (197 training set; 220 test set), then tested in cross-docking experiments, for a total of 1649 ligand-protein complexes evaluated. The method is computationally efficient and was used to model up to 35 waters during docking. The method was implemented and tested using unaltered AutoDock4 with new force field tables.

  • Utilizing Experimental Data for Reducing Ensemble Size in Flexible-Protein Docking.
    Xu, Mengang and Lill, Markus A
    Journal of chemical information and modeling, 2012, 52, 187-198
    PMID: 22146074     doi: 10.1021/ci200428t
    Efficient and sufficient incorporation of protein flexibility into docking is still a challenging task. Docking to an ensemble of protein structures has proven its utility for docking, but using a large ensemble of structures can reduce the efficiency of docking and can increase the number of false positives in virtual screening. In this paper, we describe the application of our new methodology, Limoc, to generate an ensemble of holo-like protein structures in combination with the relaxed complex scheme (RCS), to virtual screening. We describe different schemes to reduce the ensemble of protein structures to increase efficiency and enrichment quality. Utilizing experimental knowledge about actives for a target protein allows the reduction of ensemble members to a minimum of three protein structures, increasing enrichment quality and efficiency simultaneously.

  • Using RosettaLigand for Small Molecule Docking into Comparative Models.
    Kaufmann, Kristian W and Meiler, Jens
    PloS one, 2012, 7(12), e50769
    PMID: 23239984     doi: 10.1371/journal.pone.0050769
    Computational small molecule docking into comparative models of proteins is widely used to query protein function and in the development of small molecule therapeutics. We benchmark RosettaLigand docking into comparative models for nine proteins built during CASP8 that contain ligands. We supplement the study with 21 additional protein/ligand complexes to cover a wider space of chemotypes. During a full docking run in 21 of the 30 cases, RosettaLigand successfully found a native-like binding mode among the top ten scoring binding modes. From the benchmark cases we find that careful template selection based on ligand occupancy provides the best chance of success while overall sequence identity between template and target do not appear to improve results. We also find that binding energy normalized by atom number is often less than -0.4 in native-like binding modes.

  • Rosetta Ligand docking with flexible XML protocols.
    Lemmon, Gordon and Meiler, Jens
    Methods in molecular biology (Clifton, N.J.), 2012, 819, 143-155
    PMID: 22183535     doi: 10.1007/978-1-61779-465-0_10
    RosettaLigand is premiere software for predicting how a protein and a small molecule interact. Benchmark studies demonstrate that 70% of the top scoring RosettaLigand predicted interfaces are within 2{\AA} RMSD from the crystal structure [1]. The latest release of Rosetta ligand software includes many new features, such as (1) docking of multiple ligands simultaneously, (2) representing ligands as fragments for greater flexibility, (3) redesign of the interface during docking, and (4) an XML script based interface that gives the user full control of the ligand docking protocol.

  • GalaxyDock: Protein-ligand docking with flexible protein side-chains.
    Shin, Woong-Hee and Seok, Chaok
    Journal of chemical information and modeling, 2012, 52(12), 3225-3232
    PMID: 23198780     doi: 10.1021/ci300342z
    An important issue in developing protein-ligand docking methods is how to incorporate receptor flexibility. Consideration of receptor flexibility using an ensemble of pre-compiled receptor conformations or by employing an effectively enlarged binding pocket has been reported to be useful. However, direct consideration of receptor flexibility during energy optimization of the docked conformation has been less popular because of the large increase in computational complexity. In this paper, we present a new docking program called GalaxyDock that accounts for the flexibility of pre-selected receptor side-chains by global optimization of an AutoDock-based energy function trained for flexible side-chain docking. This method was tested on 3 sets of protein-ligand complexes (HIV-PR, LXR$\beta$, cAPK) and a diverse set of 16 proteins that involve side-chain conformational changes upon ligand binding. The cross-docking tests show that the performance of GalaxyDock is higher or comparable to previous flexible docking methods tested on the same sets, increasing the binding conformation prediction accuracy by 10%-60% compared to rigid-receptor docking. This encouraging result suggests that this powerful global energy optimization method may be further extended to incorporate larger magnitudes of receptor flexibility in the future. The program is available at

  • Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.
    Barros, Rodrigo C and Winck, Ana T and Machado, Karina S and Basgalupp, Márcio P and de Carvalho, André Cplf and Ruiz, Duncan D and Norberto de Souza, Osmar
    Bmc Bioinformatics, 2012, 13(1), 310
    PMID: 23171000     doi: 10.1186/1471-2105-13-310
    ABSTRACT: Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

  • Current Assessment of Docking into GPCR Crystal Structures and Homology Models: Successes, Challenges, and Guidelines.
    Beuming, Thijs and Sherman, Woody
    Journal of chemical information and modeling, 2012, 52(12), 3263-3277
    PMID: 23121495     doi: 10.1021/ci300411b
    The growing availability of novel structures for several G protein-coupled receptors (GPCRs) has provided new opportunities for structure-based drug design of ligands against this important class of targets. Here, we report a systematic analysis of the accuracy of docking small molecules into GPCR structures and homology models using both rigid receptor (Glide SP and Glide XP) and flexible receptor (Induced Fit Docking; IFD) methods. The ability to dock ligands into different structures of the same target (cross-docking) is evaluated for both agonist and inverse agonist structures of the A2A receptor and the $\beta$1- and $\beta$2-adrenergic receptors. In addition, we have produced homology models for the $\beta$1-adrenergic, $\beta$2-adrenergic, D3 dopamine, H1 histamine, M2 muscarine, M3 muscarine, A2A adenosine, S1P1, $\kappa$-opioid, and C-X-C chemokine 4 receptors using multiple templates and investigated the ability of docking to predict the binding mode of ligands in these models. Clear correlations are observed between the docking accuracy and the similarity of the sequence of interest to the template, suggesting regimes in which docking can correctly identify ligand binding modes.

  • A Network Approach for Computational Drug Repositioning
    Li, Jiao and Lu, Zhiyong
    Journal of molecular biology, 2012, 161(2), 83-83
    PMID: 7154081     doi: 10.1109/HISB.2012.26
    Computational drug repositioning offers promise for discovering new uses of existing drugs, as drug related molecular, chemical, and clinical information has increased over the past decade and become broadly accessible. In this study, we present a new computational approach for identifying potential new indications of an existing drug through its relation to similar drugs in disease-drug-target network. When measuring drug pairwise similarly, we used a bipartite-graph based method which combined similarity of drug compound structures, similarity of target protein profiles, and interaction between target proteins. In evaluation, our method compared favorably to the state of the art, achieving AUC of 0.888. The results indicated that our method is able to identify drug repositioning opportunities by exploring complex relationships in disease-drug-target network.

  • How Good Are State-of-the-Art Docking Tools in Predicting Ligand Binding Modes in Protein-Protein Interfaces?
    Krüger, Dennis M and Jessen, Gisela and Gohlke, Holger
    Journal of chemical information and modeling, 2012, 52(11), 2807-2811
    PMID: 23072688     doi: 10.1021/ci3003599
    Protein-protein interfaces (PPIs) are an important class of drug targets. We report on the first large-scale validation study on docking into PPIs. DrugScore-adapted AutoDock3 and Glide showed good success rates with a moderate drop-off compared to docking to "classical targets". An analysis of the binding energetics in a PPI allows identifying those interfaces that are amenable for docking. The results are important for deciding if structure-based design approaches can be applied to a particular PPI.

  • Structural insights into the molecular basis of the ligand promiscuity.
    Sturm, Noé and Desaphy, Jérémy and Quinn, Ronald J and Rognan, Didier and Kellenberger, Esther
    Journal of chemical information and modeling, 2012, 52(9), 2410-2421
    PMID: 22920885     doi: 10.1021/ci300196g
    Selectivity is a key factor in drug development. In this paper, we questioned the Protein Data Bank to better understand the reasons for the promiscuity of bioactive compounds. We assembled a data set of >1000 pairs of three-dimensional structures of complexes between a "drug-like" ligand (as its physicochemical properties overlap that of approved drugs) and two distinct "druggable" protein targets (as their binding sites are likely to accommodate "drug-like" ligands). Studying the similarity between the ligand-binding sites in the different targets revealed that the lack of selectivity of a ligand can be due (i) to the fact that Nature has created the same binding pocket in different proteins, which do not necessarily have otherwise sequence or fold similarity, or (ii) to specific characteristics of the ligand itself. In particular, we demonstrated that many ligands can adapt to different protein environments by changing their conformation, by using different chemical moieties to anchor to different targets, or by adopting unusual extreme binding modes (e.g., only apolar contact between the ligand and the protein, even though polar groups are present on the ligand or at the protein surface). Lastly, we provided new elements in support to the recent studies which suggest that the promiscuity of a ligand might be inferred from its molecular complexity.

  • Multiple ligand docking by Glide: implications for virtual second-site screening.
    Vass, Márton and Tarcsay, Akos and Keseru, György M
    Journal of computer-aided molecular design, 2012, 26(7), 821-834
    PMID: 22639078     doi: 10.1007/s10822-012-9578-6
    Performance of Glide was evaluated in a sequential multiple ligand docking paradigm predicting the binding modes of 129 protein-ligand complexes crystallized with clusters of 2-6 cooperative ligands. Three sampling protocols (single precision-SP, extra precision-XP, and SP without scaling ligand atom radii-SP hard) combined with three different scoring functions (GlideScore, Emodel and Glide Energy) were tested. The effects of ligand number, docking order and druglikeness of ligands and closeness of the binding site were investigated. On average 36 % of all structures were reproduced with RMSDs lower than 2\AA}. Correctly docked structures reached 50 % when docking druglike ligands into closed binding sites by the SP hard protocol. Cooperative binding to metabolic and transport proteins can dramatically alter pharmacokinetic parameters of drugs. Analyzing the cytochrome P450 subset the SP hard protocol with Emodel ranking reproduced two-thirds of the structures well. Multiple ligand binding is also exploited by the fragment linking approach in lead discovery settings. The HSP90 subset from real life fragment optimization programs revealed that Glide is able to reproduce the positions of multiple bound fragments if conserved water molecules are considered. These case studies assess the utility of Glide in sequential multiple docking applications.

  • Consensus Induced Fit Docking (cIFD): methodology, validation, and application to the discovery of novel Crm1 inhibitors.
    Kalid, Ori and Toledo Warshaviak, Dora and Shechter, Sharon and Sherman, Woody and Shacham, Sharon
    Journal of computer-aided molecular design, 2012, 26(11), 1217-1228
    PMID: 23053738     doi: 10.1007/s10822-012-9611-9
    We present the Consensus Induced Fit Docking (cIFD) approach for adapting a protein binding site to accommodate multiple diverse ligands for virtual screening. This novel approach results in a single binding site structure that can bind diverse chemotypes and is thus highly useful for efficient structure-based virtual screening. We first describe the cIFD method and its validation on three targets that were previously shown to be challenging for docking programs (COX-2, estrogen receptor, and HIV reverse transcriptase). We then demonstrate the application of cIFD to the challenging discovery of irreversible Crm1 inhibitors. We report the identification of 33 novel Crm1 inhibitors, which resulted from the testing of 402 purchased compounds selected from a screening set containing 261,680 compounds. This corresponds to a hit rate of 8.2 %. The novel Crm1 inhibitors reveal diverse chemical structures, validating the utility of the cIFD method in a real-world drug discovery project. This approach offers a pragmatic way to implicitly account for protein flexibility without the additional computational costs of ensemble docking or including full protein flexibility during virtual screening.

  • Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening?
    Grigoryan, Arsen V and Wang, Hong and Cardozo, Timothy J
    PloS one, 2012, 7(10), e46532
    doi: 10.1371/journal.pone.0046532
    The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of ...

  • Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening?
    Grigoryan, Arsen V and Wang, Hong and Cardozo, Timothy J
    PloS one, 2012, 7(10), e46532
    doi: 10.1371/journal.pone.0046532
    The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of ...

  • On the Value of Homology Models for Virtual Screening: Discovering hCXCR3 Antagonists by Pharmacophore-Based and Structure-Based Approaches.
    Huang, Dane and Gu, Qiong and Ge, Hu and Ye, Jiming and Salam, Noeris K. and Hagler, Arnie and Chen, Hongzhuan and Xu, Jun
    Journal of chemical information and modeling, 2012, 52(5), 1356-1366
    PMID: 22545675     doi: 10.1021/ci300067q
    Human chemokine receptor CXCR3 (hCXCR3) antagonists have potential therapeutic applications as antivirus, antitumor, and anti-inflammatory agents. A novel virtual screening protocol, which combines pharmacophore-based and structure-based approaches, was proposed. A three-dimensional QSAR pharmacophore model and a structure-based docking model were built to virtually screen for hCXCR3 antagonists. The hCXCR3 antagonist binding site was constructed by homology modeling and molecular dynamics (MD) simulation. By combining the structure-based and ligand-based screenings results, 95% of the compounds satisfied either pharmacophore or docking score criteria and would be chosen as hits if the union of the two searches was taken. The false negative rates were 15% for the pharmacophore model, 14% for the homology model, and 5% for the combined model. Therefore, the consistency of the pharmacophore model and the structural binding model is 219/273

  • Evaluation of DOCK 6 as a pose generation and database enrichment tool.
    Brozell, Scott R and Mukherjee, Sudipto and Balius, Trent E and Roe, Daniel R and Case, David A and Rizzo, Robert C
    Journal of computer-aided molecular design, 2012, 26(6), 749-773
    PMID: 22569593     doi: 10.1007/s10822-012-9565-y
    In conjunction with the recent American Chemical Society symposium titled "Docking and Scoring: A Review of Docking Programs" the performance of the DOCK6 program was evaluated through (1) pose reproduction and (2) database enrichment calculations on a common set of organizer-specified systems and datasets (ASTEX, DUD, WOMBAT). Representative baseline grid score results averaged over five docking runs yield a relatively high pose identification success rate of 72.5 % (symmetry corrected rmsd) and sampling rate of 91.9 % for the multi site ASTEX set (N

  • Variability in docking success rates due to dataset preparation.
    Corbeil, Christopher R and Williams, Christopher I and Labute, Paul
    Journal of computer-aided molecular design, 2012, 26(6), 775-786
    PMID: 22566074     doi: 10.1007/s10822-012-9570-1
    The results of cognate docking with the prepared Astex dataset provided by the organizers of the "Docking and Scoring: A Review of Docking Programs" session at the 241st ACS national meeting are presented. The MOE software with the newly developed GBVI/WSA dG scoring function is used throughout the study. For 80 % of the Astex targets, the MOE docker produces a top-scoring pose within 2\AA} of the X-ray structure. For 91 % of the targets a pose within 2\AA} of the X-ray structure is produced in the top 30 poses. Docking failures, defined as cases where the top scoring pose is greater than 2\AA} from the experimental structure, are shown to be largely due to the absence of bound waters in the source dataset, highlighting the need to include these and other crucial information in future standardized sets. Docking success is shown to depend heavily on data preparation. A "dataset preparation" error of 0.5 kcal/mol is shown to cause fluctuations of over 20 % in docking success rates.

  • Lead Finder docking and virtual screening evaluation with Astex and DUD test sets.
    Novikov, Fedor N and Stroylov, Viktor S and Zeifman, Alexey A and Stroganov, Oleg V and Kulkov, Val and Chilov, Ghermes G
    Journal of computer-aided molecular design, 2012, 26(6), 725-735
    PMID: 22569592     doi: 10.1007/s10822-012-9549-y
    Lead Finder is a molecular docking software. Sampling uses an original implementation of the genetic algorithm that involves a number of additional optimization procedures. Lead Finder's scoring functions employ a set of semi-empiric molecular mechanics functionals that have been parameterized independently for docking, binding energy predictions and rank-ordering for virtual screening. Sampling and scoring both utilize a staged approach, moving from fast but less accurate algorithm versions to computationally more intensive but more accurate versions. Lead Finder includes tools for the preparation of full atom protein and ligand models. In this exercise, Lead Finder achieved 72.9% docking success rate on the Astex test set when the original author-prepared full atom models were used, and 74.1% success rate when the structures were prepared by Lead Finder. The major cause of docking failures were scoring errors resulting from the use of imperfect solvation models. In many cases, docking errors could be corrected by the proper protonation and the use of correct cyclic conformations of ligands. In virtual screening experiments on the DUD test set the early enrichment factor of several tens was achieved on average. However, the area under the ROC curve ("AUC ROC") ranged from 0.70 to 0.74 depending on the screening protocol used, and the separation from the null model was not perfect-0.12-0.15 units of AUC ROC. We assume that effective virtual screening in the whole range of enrichment curve and not just at the early enrichment stages requires more accurate solvation modeling and accounting for the protein backbone flexibility.

  • Docking and scoring with ICM: the benchmarking results and strategies for improvement.
    Neves, Marco A C and Totrov, Maxim and Abagyan, Ruben
    Journal of computer-aided molecular design, 2012, 26(6), 675-686
    PMID: 22569591     doi: 10.1007/s10822-012-9547-0
    Flexible docking and scoring using the internal coordinate mechanics software (ICM) was benchmarked for ligand binding mode prediction against the 85 co-crystal structures in the modified Astex data set. The ICM virtual ligand screening was tested against the 40 DUD target benchmarks and 11-target WOMBAT sets. The self-docking accuracy was evaluated for the top 1 and top 3 scoring poses at each ligand binding site with near native conformations below 2\AA} RMSD found in 91 and 95% of the predictions, respectively. The virtual ligand screening using single rigid pocket conformations provided the median area under the ROC curves equal to 69.4 with 22.0% true positives recovered at 2% false positive rate. Significant improvements up to ROC AUC

  • Surflex-Dock: Docking benchmarks and real-world application.
    Spitzer, Russell and Jain, Ajay N
    Journal of computer-aided molecular design, 2012, 26(6), 687-699
    PMID: 22569590     doi: 10.1007/s10822-011-9533-y
    Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.

  • In Silico Mutagenesis and Docking Study of Ralstonia solanacearum RSL Lectin: Performance of Docking Software To Predict Saccharide Binding.
    Mishra, Sushil Kumar and Adam, Jan and Wimmerová, Michaela and Koca, Jaroslav
    Journal of chemical information and modeling, 2012, 52(5), 1250-1261
    PMID: 22506916     doi: 10.1021/ci200529n
    In this study, in silico mutagenesis and docking in Ralstonia solanacearum lectin (RSL) were carried out, and the ability of several docking software programs to calculate binding affinity was evaluated. In silico mutation of six amino acid residues (Agr17, Glu28, Gly39, Ala40, Trp76, and Trp81) was done, and a total of 114 in silico mutants of RSL were docked with Me-$\alpha$-l-fucoside. Our results show that polar residues Arg17 and Glu28, as well as nonpolar amino acids Trp76 and Trp81, are crucial for binding. Gly39 may also influence ligand binding because any mutations at this position lead to a change in the binding pocket shape. The Ala40 residue was found to be the most interesting residue for mutagenesis and can affect the selectivity and/or affinity. In general, the docking software used performs better for high affinity binders and fails to place the binding affinities in the correct order.

  • Docking performance of the glide program as evaluated on the Astex and DUD datasets: a complete set of glide SP results and selected results for a new scoring function integrating WaterMap and glide.
    Repasky, Matthew P and Murphy, Robert B and Banks, Jay L and Greenwood, Jeremy R and Tubert-Brohman, Ivan and Bhat, Sathesh and Friesner, Richard A
    Journal of computer-aided molecular design, 2012, 26(6), 787-799
    PMID: 22576241     doi: 10.1007/s10822-012-9575-9
    Glide SP mode enrichment results for two preparations of the DUD dataset and native ligand docking RMSDs for two preparations of the Astex dataset are presented. Following a best-practices preparation scheme, an average RMSD of 1.140\AA} for native ligand docking with Glide SP is computed. Following the same best-practices preparation scheme for the DUD dataset an average area under the ROC curve (AUC) of 0.80 and average early enrichment via the ROC (0.1 %) metric of 0.12 were observed. 74 and 56 % of the 39 best-practices prepared targets showed AUC over 0.7 and 0.8, respectively. Average AUC was greater than 0.7 for all best-practices protein families demonstrating consistent enrichment performance across a broad range of proteins and ligand chemotypes. In both Astex and DUD datasets, docking performance is significantly improved employing a best-practices preparation scheme over using minimally-prepared structures from the PDB. Enrichment results for WScore, a new scoring function and sampling methodology integrating WaterMap and Glide, are presented for four DUD targets, hivrt, hsp90, cdk2, and fxa. WScore performance in early enrichment is consistently strong and all systems examined show AUC > 0.9 and superior early enrichment to DUD best-practices Glide SP results.

  • e-Drug3D: 3D structure collections dedicated to drug repurposing and fragment-based drug design.
    Pihan, Emilie and Colliandre, Lionel and Guichou, Jean-François and Douguet, Dominique
    Bioinformatics (Oxford, England), 2012, 28(11), 1540-1541
    PMID: 22539672     doi: 10.1093/bioinformatics/bts186
    MOTIVATION: In the drug discovery field, new uses for old drugs, selective optimization of side activities and Fragment-Based Drug Design (FBDD) have proved to be successful alternatives to high throughput screening (HTS). e-Drug3D is a database of 3D chemical structures of drugs that provides several collections of ready-to-screen SD Files of drugs and commercial drug fragments. They are natural inputs in studies dedicated to drug repurposing and FBDD. AVAILABILITY: e-Drug3D collections are freely available at either for download or for direct in silico web-based screenings. CONTACT: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • PRL-dock: Protein-ligand docking based on hydrogen bond matching and probabilistic relaxation labeling.
    Wu, Meng-Yun and Dai, Dao-Qing and Yan, Hong
    Proteins, 2012, 80(9), 2137-2153
    PMID: 22544808     doi: 10.1002/prot.24104
    Protein-ligand docking is widely applied to structure-based virtual screening for drug discovery. This paper presents a novel docking technique, PRL-Dock, based on hydrogen bond matching and probabilistic relaxation labeling. It deals with multiple hydrogen bonds and can match many acceptors and donors simultaneously. In the matching process, the initial probability of matching an acceptor with a donor is estimated by an efficient scoring function and the compatibility coefficients are assigned according to the coexisting condition of two hydrogen bonds. After hydrogen bond matching, the geometric complementarity of the interacting donor and acceptor sites is taken into account for displacement of the ligand. It is reduced to an optimization problem to calculate the optimal translation and rotation matrixes that minimize the root mean square deviation between two sets of points, which can be solved using the Kabsch algorithm. In addition to the van der Waals interaction, the contribution of intermolecular hydrogen bonds in a complex is included in the scoring function to evaluate the docking quality. A modified Lennard-Jones 12-6 dispersion-repulsion term is used to estimate the van der Waals interaction to make the scoring function fairly 'soft' so that ligands are not heavily penalized for small errors in the binding geometry. The calculation of this scoring function is very convenient. The evaluation is carried out on rigid complexes and 93 flexible ones where there is at least one intermolecular hydrogen bond. The experiment results of docking accuracy and prediction of binding affinity demonstrate that the proposed method is highly effective. Proteins 2012.

  • Ligand Aligning Method for Molecular Docking: Alignment of Property-Weighted Vectors.
    Joung, Jong Young and Nam, Ky-Youb and Cho, Kwang-Hwi and No, Kyoung Tai
    Journal of chemical information and modeling, 2012, 52(4), 984-995
    PMID: 22471323     doi: 10.1021/ci200501p
    To reduce searching effort in conformational space of ligand docking positions, we propose an algorithm that generates initial binding positions of the ligand in a target protein, based on the property-weighted vector (P-weiV), the three-dimensional orthogonal vector determined by the molecular property of hydration-free energy density. The alignment of individual P-weiVs calculated separately for the ligand and the protein gives the initial orientation of a given ligand conformation relative to an active site; these initial orientations are then ranked by simple energy functions, including solvation. Because we are using three-dimensional orthogonal vectors to be aligned, only four orientations of ligand positions are possible for each ligand conformation, which reduces the search space dramatically. We found that the performance of P-weiV compared favorably to the use of principle moment of inertia (PMI) as implemented in LigandFit when we tested the abilities of the two approaches to correctly predict 205 protein-ligand complex data sets from the PDBBind database. P-weiV correctly predicted the alignment of ligands (within rmsd of 2.5\AA}) with 57.6% reliability (118/205) for the top 10 ranked conformations and with 74.1% reliability (152/205) for the top 50 ranked conformations of Catalyst-generated conformers, as compared to 22.9% (47/205) and 31.2% (64/205), respectively, in the case of PMI with the same conformer set.

  • 3D-RISM-Dock: a new fragment-based drug design protocol
    Nikolić, Dragan and Blinov, Nikolay and Wishart, David S. and Kovalenko, Andriy
    Journal of chemical theory and computation, 2012, 8(9), 3356-3372
    doi: 10.1021/ct300257v
    We explore a new approach in the rational design of specificity in molecular recognition of small molecules based on statistical-mechanical integral equation theory of molecular liquids in the form of the three-dimensional reference interaction site model with the Kovalenko-Hirata closure (3D-RISM-KH). The numerically stable iterative solution of conventional 3D-RISM equations includes the fragmental decomposition of flexible ligands, which are treated as distinct species in solvent mixtures of arbitrary complexity. The computed density functions for solution (including ligand) molecules are obtained as a set of discrete spatial grids that uniquely describe the continuous solvent-site distribution around the protein solute. Potentials of mean force derived from these distributions define the scoring function interfaced with the AutoDock program for an automated ranking of docked conformations. As a case study in terms of solvent composition, we analyze cooperative interactions encountered in the binding o...

  • Fast force field-based optimization of protein-ligand complexes with graphics processor.
    Heinzerling, Lennart and Klein, Robert and Rarey, Matthias
    Journal of computational chemistry, 2012, 33(32), 2554-2565
    PMID: 22911510     doi: 10.1002/jcc.23094
    Usually based on molecular mechanics force fields, the post-optimization of ligand poses is typically the most time-consuming step in protein-ligand docking procedures. In return, it bears the potential to overcome the limitations of discretized conformation models. Because of the parallel nature of the problem, recent graphics processing units (GPUs) can be applied to address this dilemma. We present a novel algorithmic approach for parallelizing and thus massively speeding up protein-ligand complex optimizations with GPUs. The method, customized to pose-optimization, performs at least 100 times faster than widely used CPU-based optimization tools. An improvement in Root-Mean-Square Distance (RMSD) compared to the original docking pose of up to 42% can be achieved.

  • Computational Approach for Fast Screening of Small Molecular Candidates To Inhibit Crystallization in Amorphous Drugs.
    Pajula, Katja and Lehto, Vesa-Pekka and Ketolainen, Jarkko and Korhonen, Ossi
    Molecular Pharmaceutics, 2012, 9(10), 2844-2855
    PMID: 22867030     doi: 10.1021/mp300135h
    The applicability of the computational docking approach was investigated to create a novel method for quick additive screening to inhibit the crystallization taking place in amorphous drugs. Surface energy and attachment energy were utilized to recognize the morphologically most important crystal faces. The surfaces (100), (001), and (010) were identified as target faces, and the estimated free energies of binding of additives on these surfaces were computationally determined. The molecule of the crystallizing compound was included in the group of the modeled additives as the reference and for the validation of the approach. Additives having a lower estimated free energy of binding than the reference molecule itself were considered as potential crystallization inhibitors. Salicylamide, salicylic acid, and sulfanilamide with computationally prescreened additives were melt-quenched, and the nucleation and crystal growth rates were subsequently monitored by polarized light microscopy. As a result, computationally screened additives decelerated the nucleation and crystal growth rates of the studied drugs while the pure drugs crystallized too fast to be measured. The use of a computational approach enabled fast and cost-effective additive selection to retard nucleation and crystal growth, thus facilitating the production of amorphous binary small molecular compounds with stabilized disordered structures.

  • Molecular Docking Using the Molecular Lipophilicity Potential as Hydrophobic Descriptor: Impact on GOLD Docking Performance.
    Nurisso, Alessandra and Bravo, Juan and Carrupt, Pierre-Alain and Daina, Antoine
    Journal of chemical information and modeling, 2012, 52(5), 319-1327
    PMID: 22462609     doi: 10.1021/ci200515g
    GOLD is a molecular docking software widely used in drug design. In the initial steps of docking, it creates a list of hydrophobic fitting points inside protein cavities that steer the positioning of ligand hydrophobic moieties. These points are generated based on the Lennard-Jones potential between a carbon probe and each atom of the residues delimitating the binding site. To thoroughly describe hydrophobic regions in protein pockets and properly guide ligand hydrophobic moieties toward favorable areas, an in-house tool, the MLP filter, was developed and herein applied. This strategy only retains GOLD hydrophobic fitting points that match the rigorous definition of hydrophobicity given by the molecular lipophilicity potential (MLP), a molecular interaction field that relies on an atomic fragmental system based on 1-octanol/water experimental partition coefficients (log P(oct)). MLP computations in the binding sites of crystallographic protein structures revealed that a significant number of points considered hydrophobic by GOLD were actually polar according to the MLP definition of hydrophobicity. To examine the impact of this new tool, ligand-protein complexes from the Astex Diverse Set and the PDB bind core database were redocked with and without the use of the MLP filter. Reliable docking results were obtained by using the MLP filter that increased the quality of docking in nonpolar cavities and outperformed the standard GOLD docking approach.

  • Potential and Limitations of Ensemble Docking.
    Korb, Oliver and Olsson, Tjelvar S G and Bowden, Simon J and Hall, Richard J and Verdonk, Marcel L and Liebeschuetz, John W and Cole, Jason C
    Journal of chemical information and modeling, 2012, 52(5), 1262-1274
    PMID: 22482774     doi: 10.1021/ci2005934
    A major problem in structure-based virtual screening applications is the appropriate selection of a single or even multiple protein structures to be used in the virtual screening process. A priori it is unknown which protein structure(s) will perform best in a virtual screening experiment. We investigated the performance of ensemble docking, as a function of ensemble size, for eight targets of pharmaceutical interest. Starting from single protein structure docking results, for each ensemble size up to 500 000 combinations of protein structures were generated, and, for each ensemble, pose prediction and virtual screening results were derived. Comparison of single to multiple protein structure results suggests improvements when looking at the performance of the worst and the average over all single protein structures to the performance of the worst and average over all protein ensembles of size two or greater, respectively. We identified several key factors affecting ensemble docking performance, including the sampling accuracy of the docking algorithm, the choice of the scoring function, and the similarity of database ligands to the cocrystallized ligands of ligand-bound protein structures in an ensemble. Due to these factors, the prospective selection of optimum ensembles is a challenging task, shown by a reassessment of published ensemble selection protocols.

  • Rigid Body Energy Minimization on Manifolds for Molecular Docking
    Mirzaei, Hanieh and Beglov, Dmitri and Paschalidis, Ioannis Ch and Vajda, Sandor and Vakili, Pirooz and Kozakov, Dima
    Journal of chemical theory and computation, 2012, 8(11), 4374-4380
    doi: 10.1021/ct300272j
    Virtually all docking methods include some local continuous minimization of an energy/scoring function in order to remove steric clashes and obtain more reliable energy values. In this paper, we describe an efficient rigid-body optimization algorithm that, compared to the most widely used algorithms, converges approximately an order of magnitude faster to conformations with equal or slightly lower energy. The space of rigid body transformations is a nonlinear manifold, namely, a space which locally resembles a Euclidean space. We use a canonical parametrization of the manifold, called the exponential parametrization, to map the Euclidean tangent space of the manifold onto the manifold itself. Thus, we locally transform the rigid body optimization to an optimization over a Euclidean space where basic optimization algorithms are applicable. Compared to commonly used methods, this formulation substantially reduces the dimension of the search space. As a result, it requires far fewer costly function and gradi...

  • Protein-Ligand Binding Free Energies from Exhaustive Docking.
    Purisima, Enrico O and Hogues, Hervé
    The journal of physical chemistry. B, 2012, 116(23), 6872-6879
    PMID: 22432509     doi: 10.1021/jp212646s
    We explore the use of exhaustive docking as an alternative to Monte Carlo and molecular dynamics sampling for the direct integration of the partition function for protein-ligand binding. We enumerate feasible poses for the ligand and calculate the Boltzmann factor contribution of each pose to the partition function. From the partition function, the free energy, enthalpy, and entropy can be derived. All our calculations are done with a continuum solvation model that includes solving the Poisson equation. In contrast to Monte Carlo and molecular dynamics simulations, exhaustive docking avoids (within the limitations of a discrete sampling) the question of "Have we run long enough?" due to its deterministic complete enumeration of states. We tested the method on the T4 lysozyme L99A mutant, which has a nonpolar cavity that can accommodate a number of small molecules. We tested two electrostatic models. Model 1 used a solute dielectric of 2.25 for the complex apoprotein and free ligand and 78.5 for the solvent. Model 2 used a solute dielectric of 2.25 for the complex and apoprotein but 1.0 for the free ligand. For our test set of eight molecules, we obtain a reasonable correlation with a Pearson r(2)

  • Numerical Errors and Chaotic Behavior in Docking Simulations.
    Feher, Miklos and Williams, Christopher I
    Journal of chemical information and modeling, 2012, 52(3), 724-738
    PMID: 22379951     doi: 10.1021/ci200598m
    This work examines the sensitivity of docking programs to tiny changes in ligand input files. The results show that nearly identical ligand input structures can produce dramatically different top-scoring docked poses. Even changing the atom order in a ligand input file can produce significantly different poses and scores. In well-behaved cases the docking variations are small and follow a normal distribution around a central pose and score, but in many cases the variations are large and reflect wildly different top scores and binding modes. The docking variations are characterized by statistical methods, and the sensitivity of high-throughput and more precise docking methods are compared. The results demonstrate that part of docking variation is due to numerical sensitivity and potentially chaotic effects in current docking algorithms and not solely due to incomplete ligand conformation and pose searching. These results have major implications for the way docking is currently used for pose prediction, ranking, and virtual screening.

  • Modeling peptide-protein interactions.
    London, Nir and Raveh, Barak and Schueler-Furman, Ora
    Methods in molecular biology (Clifton, N.J.), 2012, 857, 375-398
    PMID: 22323231     doi: 10.1007/978-1-61779-588-6_17
    Peptide-protein interactions are prevalent in the living cell and form a key component of the overall protein-protein interaction network. These interactions are drawing increasing interest due to their part in signaling and regulation, and are thus attractive targets for computational structural modeling. Here we report an overview of current techniques for the high resolution modeling of peptide-protein complexes. We dissect this complicated challenge into several smaller subproblems, namely: modeling the receptor protein, predicting the peptide binding site, sampling an initial peptide backbone conformation and the final refinement of the peptide within the receptor binding site. For each of these conceptual stages, we present available tools, approaches, and their reported performance. We summarize with an illustrative example of this process, highlighting the success and current challenges still facing the automated blind modeling of peptide-protein interactions. We believe that the upcoming years will see considerable progress in our ability to create accurate models of peptide-protein interactions, with applications in binding-specificity prediction, rational design of peptide-mediated interactions and the usage of peptides as therapeutic agents.

  • Pose prediction and virtual screening performance of GOLD scoring functions in a standardized test.
    Liebeschuetz, John W and Cole, Jason C and Korb, Oliver
    Journal of computer-aided molecular design, 2012, 26(6), 737-748
    PMID: 22371207     doi: 10.1007/s10822-012-9551-4
    The performance of all four GOLD scoring functions has been evaluated for pose prediction and virtual screening under the standardized conditions of the comparative docking and scoring experiment reported in this Edition. Excellent pose prediction and good virtual screening performance was demonstrated using unmodified protein models and default parameter settings. The best performing scoring function for both pose prediction and virtual screening was demonstrated to be the recently introduced scoring function ChemPLP. We conclude that existing docking programs already perform close to optimally in the cognate pose prediction experiments currently carried out and that more stringent pose prediction tests should be used in the future. These should employ cross-docking sets. Evaluation of virtual screening performance remains problematic and much remains to be done to improve the usefulness of publically available active and decoy sets for virtual screening. Finally we suggest that, for certain target/scoring function combinations, good enrichment may sometimes be a consequence of 2D property recognition rather than a modelling of the correct 3D interactions.

  • Rapid and Accurate Prediction and Scoring of Water Molecules in Protein Binding Sites
    Ross, Gregory A and Morris, Garrett M and Biggin, Philip C
    PloS one, 2012, 7(3), e32036
    doi: 10.1371/journal.pone.0032036.t006
    Water plays a critical role in ligand-protein interactions. However, it is still challenging to predict accurately not only where water molecules prefer to bind, but also which of those water molecules might be displaceable. The latter is often seen as a route to optimizing affinity of potential drug candidates. Using a protocol we call WaterDock, we show that the freely available AutoDock Vina tool can be used to predict accurately the binding sites of water molecules. WaterDock was validated using data from X-ray crystallography, neutron diffraction and molecular dynamics simulations and correctly predicted 97% of the water molecules in the test set. In addition, we combined data-mining, heuristic and machine learning techniques to develop probabilistic water molecule classifiers. When applied to WaterDock predictions in the Astex Diverse Set of protein ligand complexes, we could identify whether a water molecule was conserved or displaced to an accuracy of 75%. A second model predicted whether water molecules were displaced by polar groups or by non-polar groups to an accuracy of 80%. These results should prove useful for anyone wishing to undertake rational design of new compounds where the displacement of water molecules is being considered as a route to improved affinity.

  • Flexible protein-ligand docking using the Fleksy protocol.
    Wagener, Markus and Vlieg, Jacob de and Nabuurs, Sander B
    Journal of computational chemistry, 2012, 33(12), 1215-1217
    PMID: 22371008     doi: 10.1002/jcc.22948
    Considering protein plasticity is important in accurately predicting the three-dimensional geometry of protein-ligand complexes. Here, we present the first public release of our flexible docking tool Fleksy, which is able to consider both ligand and protein flexibility in the docking process. We describe the workflow and different features of the software and present its performance on two cross-docking benchmark datasets.

  • On the Applicability of Elastic Network Normal Modes in Small-Molecule Docking.
    Dietzen, Matthias Michael and Hildebrandt, Andreas and Zotenko, Elena and Lengauer, Thomas
    Journal of chemical information and modeling, 2012, 52(3), 844-856
    PMID: 22320151     doi: 10.1021/ci2004847
    Incorporating backbone flexibility into protein-ligand docking is still a challenging problem. In protein-protein docking, normal mode analysis (NMA) has become increasingly popular as it can be used to describe the collective motions of a biological system, but the question whether NMA can also be useful in predicting the conformational changes observed upon small-molecule binding has only been addressed in a few case studies. Here, we describe a large-scale study on the applicability of NMA for protein-ligand docking using 433 apo/holo pairs of the Astex data sets. Based on sets of the first normal modes from the apo structure, we first generated for each paired holo structure a set of conformations that optimally reproduce its C$\alpha$ trace w.r.t. the underlying normal mode subspace. Using AutoDock, GOLD, and FlexX we then docked the original ligands into these conformations to assess how the docking performance depends on the number of modes used to reproduce the holo structure. The results of our study indicate that, even for such a best-case scenario, the use of normal mode analysis in small-molecule docking is restricted, and that a general rule on how many modes to use does not seem to exist or at least is not easy to find.

  • Protein-Ligand-Based Pharmacophores: Generation and Utility Assessment in Computational Ligand Profiling.
    Meslamani, Jamel and Li, Jiabo and Sutter, Jon and Stevens, Adrian and Bertrand, Hugues-Olivier and Rognan, Didier
    Journal of chemical information and modeling, 2012, 52, 943-955
    PMID: 22480372     doi: 10.1021/ci300083r
    Ligand profiling is an emerging computational method for predicting the most likely targets of a bioactive compound and therefore anticipating adverse reactions, side effects and drug repurposing. A few encouraging successes have already been reported using ligand 2-D similarity searches and protein-ligand docking. The current study describes the use of receptor-ligand-derived pharmacophore searches as a tool to link ligands to putative targets. A database of 68,056 pharmacophores was first derived from 8,166 high-resolution protein-ligand complexes. In order to limit the number of queries, a maximum of 10 pharmacophores was generated for each complex according to their predicted selectivity. Pharmacophore search was compared to ligand-centric (2-D and 3-D similarity searches) and docking methods in profiling a set of 157 diverse ligands against a panel of 2,556 unique targets of known X-ray structure. As expected, ligand-based methods outperformed, in most of the cases, structure-based approaches in ranking the true targets among the top 1% scoring entries. However, we could identify ligands for which only a single method was successful. Receptor-ligand-based pharmacophore search is notably a fast and reliable alternative to docking when few ligand information is available for some targets. Overall, the present study suggests that a workflow using the best profiling method according to the protein-ligand context is the best strategy to follow. We notably present concrete guidelines for selecting the optimal computational method according to simple ligand and binding site properties.

  • Virtual fragment screening: exploration of MM-PBSA re-scoring.
    Kawatkar, Sameer and Moustakas, Demetri and Miller, Matthew and Joseph-McCarthy, Diane
    Journal of computer-aided molecular design, 2012, 26(8), 921-934
    PMID: 22869295     doi: 10.1007/s10822-012-9590-x
    An NMR fragment screening dataset with known binders and decoys was used to evaluate the ability of docking and re-scoring methods to identify fragment binders. Re-scoring docked poses using the Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) implicit solvent model identifies additional active fragments relative to either docking or random fragment screening alone. Early enrichment, which is clearly most important in practice for selecting relatively small sets of compounds for experimental testing, is improved by MM-PBSA re-scoring. In addition, the value in MM-PBSA re-scoring of docked poses for virtual screening may be in lessening the effect of the variation in the protein complex structure used.

  • Modeling loop backbone flexibility in receptor-ligand docking simulations.
    Flick, Johannes and Tristram, Frank and Wenzel, Wolfgang
    Journal of computational chemistry, 2012, 33(31), 2504-2515
    PMID: 22886372     doi: 10.1002/jcc.23087
    The relevance of receptor conformational change during ligand binding is well documented for many pharmaceutically relevant receptors, but is still not fully accounted for in in silico docking methods. While there has been significant progress in treatment of receptor side chain flexibility sampling of backbone flexibility remains challenging because the conformational space expands dramatically and the scoring function must balance protein-protein and protein-ligand contributions. Here, we investigate an efficient multistage backbone reconstruction algorithm for large loop regions in the receptor and demonstrate that treatment of backbone receptor flexibility significantly improves binding mode prediction starting from apo structures and in cross docking simulations. For three different kinase receptors in which large flexible loops reconstruct upon ligand binding, we demonstrate that treatment of backbone flexibility results in accurate models of the complexes in simulations starting from the apo structure. At the example of the DFG-motif in the p38 kinase, we also show how loop reconstruction can be used to model allosteric binding. Our approach thus paves the way to treat the complex process of receptor reconstruction upon ligand binding in docking simulations and may help to design new ligands with high specificity by exploitation of allosteric mechanisms.

  • FIPSDock: A new molecular docking technique driven by fully informed swarm optimization algorithm.
    Liu, Yu and Zhao, Lei and Li, Wentao and Zhao, Dongyu and Song, Miao and Yang, Yongliang
    Journal of computational chemistry, 2012, 34(1), 67-75
    PMID: 22961860     doi: 10.1002/jcc.23108
    The accurate prediction of protein-ligand binding is of great importance for rational drug design. We present herein a novel docking algorithm called as FIPSDock, which implements a variant of the Fully Informed Particle Swarm (FIPS) optimization method and adopts the newly developed energy function of AutoDock 4.20 suite for solving flexible protein-ligand docking problems. The search ability and docking accuracy of FIPSDock were first evaluated by multiple cognate docking experiments. In a benchmarking test for 77 protein/ligand complex structures derived from GOLD benchmark set, FIPSDock has obtained a successful predicting rate of 93.5% and outperformed a few docking programs including particle swarm optimization (PSO)@AutoDock, SODOCK, AutoDock, DOCK, Glide, GOLD, FlexX, Surflex, and MolDock. More importantly, FIPSDock was evaluated against PSO@AutoDock, SODOCK, and AutoDock 4.20 suite by cross-docking experiments of 74 protein-ligand complexes among eight protein targets (CDK2, ESR1, F2, MAPK14, MMP8, MMP13, PDE4B, and PDE5A) derived from Sutherland-crossdock-set. Remarkably, FIPSDock is superior to PSO@AutoDock, SODOCK, and AutoDock in seven out of eight cross-docking experiments. The results reveal that FIPS algorithm might be more suitable than the conventional genetic algorithm-based algorithms in dealing with highly flexible docking problems.

  • Virtual Target Screening: Validation Using Kinase Inhibitors.
    Santiago, Daniel N and Pevzner, Yuri and Durand, Ashley A and Tran, Minhphuong and Scheerer, Rachel R and Daniel, Kenyon and Sung, Shen-Shu and Lee Woodcock, H and Guida, Wayne C and Brooks, Wesley H
    Journal of chemical information and modeling, 2012, 52(8), 2192-2203
    PMID: 22747098     doi: 10.1021/ci300073m
    Computational methods involving virtual screening could potentially be employed to discover new biomolecular targets for an individual molecule of interest (MOI). However, existing scoring functions may not accurately differentiate proteins to which the MOI binds from a larger set of macromolecules in a protein structural database. An MOI will most likely have varying degrees of predicted binding affinities to many protein targets. However, correctly interpreting a docking score as a hit for the MOI docked to any individual protein can be problematic. In our method, which we term "Virtual Target Screening (VTS)", a set of small drug-like molecules are docked against each structure in the protein library to produce benchmark statistics. This calibration provides a reference for each protein so that hits can be identified for an MOI. VTS can then be used as tool for: drug repositioning (repurposing), specificity and toxicity testing, identifying potential metabolites, probing protein structures for allosteric sites, and testing focused libraries (collection of MOIs with similar chemotypes) for selectivity. To validate our VTS method, twenty kinase inhibitors were docked to a collection of calibrated protein structures. Here, we report our results where VTS predicted protein kinases as hits in preference to other proteins in our database. Concurrently, a graphical interface for VTS was developed.

  • CRDOCK: An Ultrafast Multipurpose Protein-Ligand Docking Tool.
    Cabrera, Alvaro Cortés and Klett, Javier and G Dos Santos, Helena and Perona, Almudena and Gil-Redondo, Rubén and Francis, Sandrea M and Priego, Eva M and Gago, Federico and Morreale, Antonio
    Journal of chemical information and modeling, 2012, 52(8), 2300-2309
    PMID: 22764680     doi: 10.1021/ci300194a
    An ultrafast docking and virtual screening program, CRDOCK, is presented that contains (1) a search engine that can use a variety of sampling methods and an initial energy evaluation function, (2) several energy minimization algorithms for fine tuning the binding poses, and (3) different scoring functions. This modularity ensures the easy configuration of custom-made protocols that can be optimized depending on the problem in hand. CRDOCK employs a precomputed library of ligand conformations that are initially generated from one-dimensional SMILES strings. Testing CRDOCK on two widely used benchmarks, the ASTEX diverse set and the Directory of Useful Decoys, yielded a success rate of ∼75% in pose prediction and an average AUC of 0.66. A typical ligand can be docked, on average, in just ∼13 s. Extension to a representative group of pharmacologically relevant G protein-coupled receptors that have been recently cocrystallized with some selective ligands allowed us to demonstrate the utility of this tool and also highlight some current limitations. CRDOCK is now included within VSDMIP, our integrated platform for drug discovery.

  • Automatic modeling of mammalian olfactory receptors and docking of odorants.
    Launay, Guillaume and Téletchéa, Stéphane and Wade, Fallou and Pajot-Augy, Edith and Gibrat, Jean-François and Sanz, Guenhaël
    Protein engineering, design & selection : PEDS, 2012, 25(8), 377-386
    PMID: 22691703     doi: 10.1093/protein/gzs037
    We present a procedure that (i) automates the homology modeling of mammalian olfactory receptors (ORs) based on the six three-dimensional (3D) structures of G protein-coupled receptors (GPCRs) available so far and (ii) performs the docking of odorants on these models, using the concept of colony energy to score the complexes. ORs exhibit low-sequence similarities with other GPCR and current alignment methods often fail to provide a reliable alignment. Here, we use a fold recognition technique to obtain a robust initial alignment. We then apply our procedure to a human OR that we have previously functionally characterized. The analysis of the resulting in silico complexes, supported by receptor mutagenesis and functional assays in a heterologous expression system, suggests that antagonists dock in the upper part of the binding pocket whereas agonists dock in the narrow lower part. We propose that the potency of agonists in activating receptors depends on their ability to establish tight interactions with the floor of the binding pocket. We developed a web site that allows the user to upload a GPCR sequence, choose a ligand in a library and obtain the 3D structure of the free receptor and ligand-receptor complex (

  • BSP-SLIM: a blind low-resolution ligand-protein docking approach using predicted protein structures.
    Lee, Hui Sun and Zhang, Yang
    Proteins, 2012, 80(1), 93-110
    PMID: 21971880     doi: 10.1002/prot.23165
    We developed BSP-SLIM, a new method for ligand-protein blind docking using low-resolution protein structures. For a given sequence, protein structures are first predicted by I-TASSER; putative ligand binding sites are transferred from holo-template structures which are analogous to the I-TASSER models; ligand-protein docking conformations are then constructed by shape and chemical match of ligand with the negative image of binding pockets. BSP-SLIM was tested on 71 ligand-protein complexes from the Astex diverse set where the protein structures were predicted by I-TASSER with an average RMSD 2.92\AA} on the binding residues. Using I-TASSER models, the median ligand RMSD of BSP-SLIM docking is 3.99\AA} which is 5.94\AA} lower than that by AutoDock; the median binding-site error by BSP-SLIM is 1.77\AA} which is 6.23\AA} lower than that by AutoDock and 3.43\AA} lower than that by LIGSITE(CSC) . Compared to the models using crystal protein structures, the median ligand RMSD by BSP-SLIM using I-TASSER models increases by 0.87\AA while that by AutoDock increases by 8.41\AA}; the median binding-site error by BSP-SLIM increase by 0.69{\AA} while that by AutoDock and LIGSITE(CSC) increases by 7.31\AA} and 1.41\AA respectively. As case studies, BSP-SLIM was used in virtual screening for six target proteins, which prioritized actives of 25% and 50% in the top 9.2% and 17% of the library on average, respectively. These results demonstrate the usefulness of the template-based coarse-grained algorithms in the low-resolution ligand-protein docking and drug-screening. An on-line BSP-SLIM server is freely available at

  • idTarget: a web server for identifying protein targets of small chemical molecules with robust scoring functions and a divide-and-conquer docking approach.
    Wang, Jui-Chih and Chu, Pei-Ying and Chen, Chung-Ming and Lin, Jung-Hsin
    Nucleic acids research, 2012, 40(Web Server issue), W393-9
    PMID: 22649057     doi: 10.1093/nar/gks496
    Identification of possible protein targets of small chemical molecules is an important step for unravelling their underlying causes of actions at the molecular level. To this end, we construct a web server, idTarget, which can predict possible binding targets of a small chemical molecule via a divide-and-conquer docking approach, in combination with our recently developed scoring functions based on robust regression analysis and quantum chemical charge models. Affinity profiles of the protein targets are used to provide the confidence levels of prediction. The divide-and-conquer docking approach uses adaptively constructed small overlapping grids to constrain the searching space, thereby achieving better docking efficiency. Unlike previous approaches that screen against a specific class of targets or a limited number of targets, idTarget screen against nearly all protein structures deposited in the Protein Data Bank (PDB). We show that idTarget is able to reproduce known off-targets of drugs or drug-like compounds, and the suggested new targets could be prioritized for further investigation. idTarget is freely available as a web-based server at

  • Flexibility and binding affinity in protein-ligand, protein-protein and multi-component protein interactions: limitations of current computational approaches.
    Tuffery, Pierre and Derreumaux, Philippe
    Journal of the Royal Society, Interface / the Royal Society, 2012, 9(66), 20-33
    PMID: 21993006     doi: 10.1098/rsif.2011.0584
    The recognition process between a protein and a partner represents a significant theoretical challenge. In silico structure-based drug design carried out with nothing more than the three-dimensional structure of the protein has led to the introduction of many compounds into clinical trials and numerous drug approvals. Central to guiding the discovery process is to recognize active among non-active compounds. While large-scale computer simulations of compounds taken from a library (virtual screening) or designed de novo are highly desirable in the post-genomic area, many technical problems remain to be adequately addressed. This article presents an overview and discusses the limits of current computational methods for predicting the correct binding pose and accurate binding affinity. It also presents the performances of the most popular algorithms for exploring binary and multi-body protein interactions.


  • Transferable scoring function based on semiempirical quantum mechanical PM6-DH2 method: CDK2 with 15 structurally diverse inhibitors.
    Dobes, Petr and Fanfrlík, Jindrich and Rezác, Jan and Otyepka, Michal and Hobza, Pavel
    Journal of computer-aided molecular design, 2011, 25(3), 223-235
    PMID: 21286784     doi: 10.1007/s10822-011-9413-5
    A semiempirical quantum mechanical PM6-DH2 method accurately covering the dispersion interaction and H-bonding was used to score fifteen structurally diverse CDK2 inhibitors. The geometries of all the complexes were taken from the X-ray structures and were reoptimised by the PM6-DH2 method in continuum water. The total scoring function was constructed as an estimate of the binding free energy, i.e., as a sum of the interaction enthalpy, interaction entropy and the corrections for the inhibitor desolvation and deformation energies. The applied scoring function contains a clear thermodynamical terms and does not involve any adjustable empirical parameter. The best correlations with the experimental inhibition constants (ln K (i)) were found for bare interaction enthalpy (r (2)

  • DockoMatic: automated peptide analog creation for high throughput virtual screening.
    Jacob, Reed B. and Bullock, Casey W and Andersen, Tim and McDougal, Owen M.
    Journal of computational chemistry, 2011, 32(13), 2936-2941
    PMID: 21717479     doi: 10.1002/jcc.21864
    The purpose of this manuscript is threefold: (1) to describe an update to DockoMatic that allows the user to generate cyclic peptide analog structure files based on protein database (pdb) files, (2) to test the accuracy of the peptide analog structure generation utility, and (3) to evaluate the high throughput capacity of DockoMatic. The DockoMatic graphical user interface interfaces with the software program Treepack to create user defined peptide analogs. To validate this approach, DockoMatic produced cyclic peptide analogs were tested for three-dimensional structure consistency and binding affinity against four experimentally determined peptide structure files available in the Research Collaboratory for Structural Bioinformatics database. The peptides used to evaluate this new functionality were alpha-conotoxins ImI, PnIA, and their published analogs. Peptide analogs were generated by DockoMatic and tested for their ability to bind to X-ray crystal structure models of the acetylcholine binding protein originating from Aplysia californica. The results, consisting of more than 300 simulations, demonstrate that DockoMatic predicts the binding energy of peptide structures to within 3.5 kcal mol(-1), and the orientation of bound ligand compares to within 1.8\AA} root mean square deviation for ligand structures as compared to experimental data. Evaluation of high throughput virtual screening capacity demonstrated that Dockomatic can collect, evaluate, and summarize the output of 10,000 AutoDock jobs in less than 2 hours of computational time, while 100,000 jobs requires approximately 15 hours and 1,000,000 jobs is estimated to take up to a week.

  • Predicting the accuracy of protein-ligand docking on homology models.
    Bordogna, Annalisa and Pandini, Alessandro and Bonati, Laura
    Journal of computational chemistry, 2011, 32(1), 81-98
    PMID: 20607693     doi: 10.1002/jcc.21601
    Ligand-protein docking is increasingly used in Drug Discovery. The initial limitations imposed by a reduced availability of target protein structures have been overcome by the use of theoretical models, especially those derived by homology modeling techniques. While this greatly extended the use of docking simulations, it also introduced the need for general and robust criteria to estimate the reliability of docking results given the model quality. To this end, a large-scale experiment was performed on a diverse set including experimental structures and homology models for a group of representative ligand-protein complexes. A wide spectrum of model quality was sampled using templates at different evolutionary distances and different strategies for target-template alignment and modeling. The obtained models were scored by a selection of the most used model quality indices. The binding geometries were generated using AutoDock, one of the most common docking programs. An important result of this study is that indeed quantitative and robust correlations exist between the accuracy of docking results and the model quality, especially in the binding site. Moreover, state-of-the-art indices for model quality assessment are already an effective tool for an a priori prediction of the accuracy of docking experiments in the context of groups of proteins with conserved structural characteristics.

  • Quantum mechanics/molecular mechanics strategies for docking pose refinement: distinguishing between binders and decoys in cytochrome C peroxidase.
    Burger, Steven K and Thompson, David C and Ayers, Paul W
    Journal of chemical information and modeling, 2011, 51(1), 93-101
    PMID: 21133348     doi: 10.1021/ci100329z
    We investigate the effect of systematically applying molecular dynamics (MD) and quantum mechanics/molecular mechanics (QM/MM) to docked poses in an attempt to improve the correspondence between theoretical prediction and experimental observation. The proposed scheme involves running a short time scale MD simulation on a docked ligand pose (and any known structurally important crystal structure waters in the active site), followed by QM/MM minimization. Both of these steps are relatively fast for moderately sized ligands; longer time scale MD involving the protein is not found to improve the results. The final binding energy is given in terms of the QM/MM total energy, a van der Waals correction, and a term to account for desolvation effects. This methodology is first tested with a trypsin inhibitor, for which we establish the importance of running MD before reoptimizing with QM/MM. The method is then applied to cytochrome c peroxidase using a set of binders and decoys. In this example, the proposed methodology affords much better discrimination between binders and decoys than the traditional docking approach used. For both systems presented, application of this protocol results in a significantly better energetic ranking and a smaller root mean squared deviation from known crystallographic ligand poses. This work highlights the importance of including polarization effects through QM/MM and of sampling with MD to refine a set of initial docked poses.

  • Virtual decoy sets for molecular docking benchmarks.
    Wallach, Izhar and Lilien, Ryan
    Journal of chemical information and modeling, 2011, 51(2), 196-202
    PMID: 21207928     doi: 10.1021/ci100374f
    Virtual docking algorithms are often evaluated on their ability to separate active ligands from decoy molecules. The current state-of-the-art benchmark, the Directory of Useful Decoys (DUD), minimizes bias by including decoys from a library of synthetically feasible molecules that are physically similar yet chemically dissimilar to the active ligands. We show that by ignoring synthetic feasibility, we can compile a benchmark that is comparable to the DUD and less biased with respect to physical similarity.

  • FRED pose prediction and virtual screening accuracy.
    McGann, Mark
    Journal of chemical information and modeling, 2011, 51(3), 578-596
    PMID: 21323318     doi: 10.1021/ci100436p
    Results of a previous docking study are reanalyzed and extended to include results from the docking program FRED and a detailed statistical analysis of both structure reproduction and virtual screening results. FRED is run both in a traditional docking mode and in a hybrid mode that makes use of the structure of a bound ligand in addition to the protein structure to screen molecules. This analysis shows that most docking programs are effective overall but highly inconsistent, tending to do well on one system and poorly on the next. Comparing methods, the difference in mean performance on DUD is found to be statistically significant (95% confidence) 61% of the time when using a global enrichment metric (AUC). Early enrichment metrics are found to have relatively poor statistical power, with 0.5% early enrichment only able to distinguish methods to 95% confidence 14% of the time.

  • Assessing the performance of the molecular mechanics/Poisson Boltzmann surface area and molecular mechanics/generalized Born surface area methods. II. The accuracy of ranking poses generated from docking.
    Hou, Tingjun and Wang, Junmei and Li, Youyong and Wang, Wei
    Journal of computational chemistry, 2011, 32(5), 866-877
    PMID: 20949517     doi: 10.1002/jcc.21666
    In molecular docking, it is challenging to develop a scoring function that is accurate to conduct high-throughput screenings. Most scoring functions implemented in popular docking software packages were developed with many approximations for computational efficiency, which sacrifices the accuracy of prediction. With advanced technology and powerful computational hardware nowadays, it is feasible to use rigorous scoring functions, such as molecular mechanics/Poisson Boltzmann surface area (MM/PBSA) and molecular mechanics/generalized Born surface area (MM/GBSA) in molecular docking studies. Here, we systematically investigated the performance of MM/PBSA and MM/GBSA to identify the correct binding conformations and predict the binding free energies for 98 protein-ligand complexes. Comparison studies showed that MM/GBSA (69.4%) outperformed MM/PBSA (45.5%) and many popular scoring functions to identify the correct binding conformations. Moreover, we found that molecular dynamics simulations are necessary for some systems to identify the correct binding conformations. Based on our results, we proposed the guideline for MM/GBSA to predict the binding conformations. We then tested the performance of MM/GBSA and MM/PBSA to reproduce the binding free energies of the 98 protein-ligand complexes. The best prediction of MM/GBSA model with internal dielectric constant 2.0, produced a Spearman's correlation coefficient of 0.66, which is better than MM/PBSA (0.49) and almost all scoring functions used in molecular docking. In summary, MM/GBSA performs well for both binding pose predictions and binding free-energy estimations and is efficient to re-score the top-hit poses produced by other less-accurate scoring functions.

  • Significant enhancement of docking sensitivity using implicit ligand sampling.
    Xu, Mengang and Lill, Markus A
    Journal of chemical information and modeling, 2011, 51(3), 693-706
    PMID: 21375306     doi: 10.1021/ci100457t
    The efficient and accurate quantification of protein-ligand interactions using computational methods is still a challenging task. Two factors strongly contribute to the failure of docking methods to predict free energies of binding accurately: the insufficient incorporation of protein flexibility coupled to ligand binding and the neglected dynamics of the protein-ligand complex in current scoring schemes. We have developed a new methodology, named the 'ligand-model' concept, to sample protein conformations that are relevant for binding structurally diverse sets of ligands. In the ligand-model concept, molecular-dynamics (MD) simulations are performed with a virtual ligand, represented by a collection of functional groups that binds to the protein and dynamically changes its shape and properties during the simulation. The ligand model essentially represents a large ensemble of different chemical species binding to the same target protein. Representative protein structures were obtained from the MD simulation, and docking was performed into this ensemble of protein conformation. Similar binding poses were clustered, and the averaged score was utilized to rerank the poses. We demonstrate that the ligand-model approach yields significant improvements in predicting native-like binding poses and quantifying binding affinities compared to static docking and ensemble docking simulations into protein structures generated from an apo MD simulation.

  • Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers.
    Collignon, Barbara and Schulz, Roland and Smith, Jeremy C and Baudry, Jerome
    Journal of computational chemistry, 2011, 32(6), 1202-1209
    PMID: 21387347     doi: 10.1002/jcc.21696
    A message passing interface (MPI)-based implementation (Autodock4.lga.MPI) of the grid-based docking program Autodock4 has been developed to allow simultaneous and independent docking of multiple compounds on up to thousands of central processing units (CPUs) using the Lamarkian genetic algorithm. The MPI version reads a single binary file containing precalculated grids that represent the protein-ligand interactions, i.e., van der Waals, electrostatic, and desolvation potentials, and needs only two input parameter files for the entire docking run. In comparison, the serial version of Autodock4 reads ASCII grid files and requires one parameter file per compound. The modifications performed result in significantly reduced input/output activity compared with the serial version. Autodock4.lga.MPI scales up to 8192 CPUs with a maximal overhead of 16.3%, of which two thirds is due to input/output operations and one third originates from MPI operations. The optimal docking strategy, which minimizes docking CPU time without lowering the quality of the database enrichments, comprises the docking of ligands preordered from the most to the least flexible and the assignment of the number of energy evaluations as a function of the number of rotatable bounds. In 24 h, on 8192 high-performance computing CPUs, the present MPI version would allow docking to a rigid protein of about 300K small flexible compounds or 11 million rigid compounds.

  • Substantial improvements in large-scale redocking and screening using the novel HYDE scoring function.
    Schneider, Nadine and Hindle, Sally and Lange, Gudrun and Klein, Robert and Albrecht, Jürgen and Briem, Hans and Beyer, Kristin and Clau{\ss}en, Holger and Gastreich, Marcus and Lemmen, Christian and Rarey, Matthias
    Journal of computer-aided molecular design, 2011, 26(6), 701-723
    PMID: 22203423     doi: 10.1007/s10822-011-9531-0
    The HYDE scoring function consistently describes hydrogen bonding, the hydrophobic effect and desolvation. It relies on HYdration and DEsolvation terms which are calibrated using octanol/water partition coefficients of small molecules. We do not use affinity data for calibration, therefore HYDE is generally applicable to all protein targets. HYDE reflects the Gibbs free energy of binding while only considering the essential interactions of protein-ligand complexes. The greatest benefit of HYDE is that it yields a very intuitive atom-based score, which can be mapped onto the ligand and protein atoms. This allows the direct visualization of the score and consequently facilitates analysis of protein-ligand complexes during the lead optimization process. In this study, we validated our new scoring function by applying it in large-scale docking experiments. We could successfully predict the correct binding mode in 93% of complexes in redocking calculations on the Astex diverse set, while our performance in virtual screening experiments using the DUD dataset showed significant enrichment values with a mean AUC of 0.77 across all protein targets with little or no structural defects. As part of these studies, we also carried out a very detailed analysis of the data that revealed interesting pitfalls, which we highlight here and which should be addressed in future benchmark datasets.

  • Molecular docking with ligand attached water molecules.
    Lie, Mette A and Thomsen, René and Pedersen, Christian N S and Schi{\o}tt, Birgit and Christensen, Mikael H
    Journal of chemical information and modeling, 2011, 51(4), 909-917
    PMID: 21452852     doi: 10.1021/ci100510m
    A novel approach to incorporate water molecules in protein-ligand docking is proposed. In this method, the water molecules display the same flexibility during the docking simulation as the ligand. The method solvates the ligand with the maximum number of water molecules, and these are then retained or displaced depending on energy contributions during the docking simulation. Instead of being a static part of the receptor, each water molecule is a flexible on/off part of the ligand and is treated with the same flexibility as the ligand itself. To favor exclusion of the water molecules, a constant entropy penalty is added for each included water molecule. The method was evaluated using 12 structurally diverse protein-ligand complexes from the PDB, where several water molecules bridge the ligand and the protein. A considerable improvement in successful docking simulations was found when including flexible water molecules solvating hydrogen bonding groups of the ligand. The method has been implemented in the docking program Molegro Virtual Docker (MVD).

  • Correction to "A Machine Learning-Based Method To Improve Docking Scoring Functions and Its Application to Drug Repurposing"
    Kinnings, Sarah L and Liu, Nina and Tonge, Peter J and Jackson, Richard M and Xie, Lei and Bourne, Philip E
    Journal of chemical information and modeling, 2011, 51(5), 1195-1197
    PMID: 21526828     doi: 10.1021/ci2001346

  • MiniMuDS: a new optimizer using knowledge-based potentials improves scoring of docking solutions.
    Spitzmüller, Andreas and Velec, Hans F G and Klebe, Gerhard
    Journal of chemical information and modeling, 2011, 51(6), 1423-1430
    PMID: 21528908     doi: 10.1021/ci200098v
    In small molecule docking, the scoring and ranking of generated conformations is an important, though still not a completely resolved problem. Rescoring schemes often improve the quality of the obtained rankings. It is known that a local optimization is essential before a valid rescore value can be calculated. Here, we present a method that improves rescoring results obtained with the DrugScore function due to a new optimization technique. The method implements a more sophisticated search algorithm compared to the classic local optimization procedures used in this context. We validated the proposed method on a set of 192 protein-ligand complexes. Results show substantial improvements compared to original docking results with success rates increased by up to 10% for top scored solutions below 2\AA} root-mean-square deviation to the native state and up to 18% increase below 1\AA respectively.

  • Efficient incorporation of protein flexibility and dynamics into molecular docking simulations.
    Lill, Markus A
    Biochemistry, 2011, 50(28), 6157-6169
    PMID: 21678954     doi: 10.1021/bi2004558
    Flexibility and dynamics are protein characteristics that are essential for the process of molecular recognition. Conformational changes in the protein that are coupled to ligand binding are described by the biophysical models of induced fit and conformational selection. Different concepts that incorporate protein flexibility into protein-ligand docking within the context of these two models are reviewed. Several computational studies that discuss the validity and possible limitations of such approaches will be presented. Finally, different approaches that incorporate protein dynamics, e.g., configurational entropy, and solvation effects into docking will be highlighted.

  • Construction and test of ligand decoy sets using MDock: community structure-activity resource benchmarks for binding mode prediction.
    Huang, Sheng-You and Zou, Xiaoqin
    Journal of chemical information and modeling, 2011, 51(9), 2107-2114
    PMID: 21755952     doi: 10.1021/ci200080g
    Two sets of ligand binding decoys have been constructed for the community structure-activity resource (CSAR) benchmark by using the MDock and DOCK programs for rigid- and flexible-ligand docking, respectively. The decoys generated for each complex in the benchmark thoroughly cover the binding site and also contain a certain number of near-native binding modes. A few scoring functions have been evaluated using the ligand binding decoy sets for their abilities of predicting near-native binding modes. Among them, ITScore achieved a success rate of 86.7% for the rigid-ligand decoys and 79.7% for the flexible-ligand decoys, under the common definition of a successful prediction as root-mean-square deviation <2.0\AA} from the native structure if the top-scored binding mode was considered. The decoy sets may serve as benchmarks for binding mode prediction of a scoring function, which are available at the CSAR Web site (

  • Evaluation of docking performance in a blinded virtual screening of fragment-like trypsin inhibitors.
    Surpateanu, Georgiana and Iorga, Bogdan I
    Journal of computer-aided molecular design, 2011, 26(5), 595-601
    PMID: 22180049     doi: 10.1007/s10822-011-9526-x
    In this study, we have "blindly" assessed the ability of several combinations of docking software and scoring functions to predict the binding of a fragment-like library of bovine trypsine inhibitors. The most suitable protocols (involving Gold software and GoldScore scoring function, with or without rescoring) were selected for this purpose using a training set of compounds with known biological activities. The selected virtual screening protocols provided good results with the SAMPL3-VS dataset, showing enrichment factors of about 10 for Top 20 compounds. This methodology should be useful in difficult cases of docking, with a special emphasis on the fragment-based virtual screening campaigns.

  • Docking performance of fragments and druglike compounds.
    Verdonk, Marcel L and Giangreco, Ilenia and Hall, Richard J and Korb, Oliver and Mortenson, Paul N and Murray, Christopher W
    Journal of medicinal chemistry, 2011, 54(15), 5422-5431
    PMID: 21692478     doi: 10.1021/jm200558u
    This paper addresses two questions of key interest to researchers working with protein-ligand docking methods: (i) Why is there such a large variation in docking performance between different test sets reported in the literature? (ii) Are fragments more difficult to dock than druglike compounds? To answer these, we construct a test set of in-house X-ray structures of protein-ligand complexes from drug discovery projects, half of which contain fragment ligands, the other half druglike ligands. We find that a key factor affecting docking performance is ligand efficiency (LE). High LE compounds are significantly easier to dock than low LE compounds, which we believe could explain the differences observed between test sets reported in the literature. There is no significant difference in docking performance between fragments and druglike compounds, but the reasons why dockings fail appear to be different.

  • Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors.
    Gatica, Edgar A and Cavasotto, Claudio N
    Journal of chemical information and modeling, 2011, 52(1), 1-6
    PMID: 22168315     doi: 10.1021/ci200412p
    We compiled a G protein-coupled receptor (GPCR) ligand library (GLL) for 147 targets, selecting for each ligand 39 decoy molecules, collected in the GPCR Decoy Database (GDD). Decoys were chosen ensuring a ligand-decoy similarity of six physical properties, while enforcing ligand-decoy chemical dissimilarity. The performance in docking of the GDD was evaluated on 19 GPCRs, showing a marked decrease in enrichment compared to bias-uncorrected decoy sets. Both the GLL and GDD are freely available for the scientific community.

  • Fragment-Based Drug Design and Drug Repositioning Using Multiple Ligand Simultaneous Docking (MLSD): Identifying Celecoxib and Template Compounds as Novel Inhibitors of Signal Transducer and Activator of Transcription 3 (STAT3).
    Li, Huameng and Liu, Aiguo and Zhao, Zhenjiang and Xu, Yufang and Lin, Jiayuh and Jou, David and Li, Chenglong
    Journal of medicinal chemistry, 2011, 54(15), 5592-5596
    PMID: 21678971     doi: 10.1021/jm101330h
    We describe a novel method of drug discovery using MLSD and drug repositioning, with cancer target STAT3 being used as a test case. Multiple drug scaffolds were simultaneously docked into hot spots of STAT3 by MLSD, followed by tethering to generate virtual template compounds. Similarity search of virtual hits on drug database identified celecoxib as a novel inhibitor of STAT3. Furthermore, we designed two novel lead inhibitors based on one of the lead templates and celecoxib.

  • Normalizing Molecular Docking Rankings using Virtually Generated Decoys.
    Wallach, Izhar and Jaitly, Navdeep and Nguyen, Kong and Schapira, Matthieu and Lilien, Ryan
    Journal of chemical information and modeling, 2011, 51(8), 1817-1830
    PMID: 21699246     doi: 10.1021/ci200175h
    Drug discovery research often relies on the use of virtual screening via molecular docking to identify active hits in compound libraries. An area for improvement among many state-of-the-art docking methods is the accuracy of the scoring functions used to differentiate active from nonactive ligands. Many contemporary scoring functions are influenced by the physical properties of the docked molecule. This bias can cause molecules with certain physical properties to incorrectly score better than others. Since variation in physical properties is inevitable in large screening libraries, it is desirable to account for this bias. In this paper, we present a method of normalizing docking scores using virtually generated decoy sets with matched physical properties. First, our method generates a set of property-matched decoys for every molecule in the screening library. Each library molecule and its decoy set are docked using a state-of-the-art method, producing a set of raw docking scores. Next, the raw docking score of each library molecule is normalized against the scores of its decoys. The normalized score represents the probability that the raw docking score was drawn from the background distribution of nonactive property-matched decoys. Assuming that the distribution of scores of active molecules differs from the nonactive score distribution, we expect that the score of an active compound will have a low probability of having been drawn from the nonactive score distribution. In addition to the use of decoys in normalizing docking scores, we suggest that decoy sets may be a useful tool to evaluate, improve, or develop scoring functions. We show that by analyzing docking scores of library molecules with respect to the docking scores of their virtually generated property-matched decoys, one can gain insight into the advantages, limitations, and reliability of scoring functions.

  • DEKOIS: Demanding Evaluation Kits for Objective in Silico Screening - A Versatile Tool for Benchmarking Docking Programs and Scoring Functions.
    Vogel, Simon M and Bauer, Matthias R and Boeckler, Frank M
    Journal of chemical information and modeling, 2011, 51(10), 2650-2665
    PMID: 21774552     doi: 10.1021/ci2001549
    For widely applied in silico screening techniques success depends on the rational selection of an appropriate method. We herein present a fast, versatile, and robust method to construct demanding evaluation kits for objective in silico screening (DEKOIS). This automated process enables creating tailor-made decoy sets for any given sets of bioactives. It facilitates a target-dependent validation of docking algorithms and scoring functions helping to save time and resources. We have developed metrics for assessing and improving decoy set quality and employ them to investigate how decoy embedding affects docking. We demonstrate that screening performance is target-dependent and can be impaired by latent actives in the decoy set (LADS) or enhanced by poor decoy embedding. The presented method allows extending and complementing the collection of publicly available high quality decoy sets toward new target space. All present and future DEKOIS data sets will be made accessible at .

  • Estimating binding affinities by docking/scoring methods using variable protonation states.
    Park, Min-Sun and Gao, Cen and Stern, Harry A
    Proteins, 2011, 79(1), 304-314
    PMID: 21058298     doi: 10.1002/prot.22883
    To investigate the effects of multiple protonation states on protein-ligand recognition, we generated alternative protonation states for selected titratable groups of ligands and receptors. The selection of states was based on the predicted pK(a) of the unbound receptor and ligand and the proximity of titratable groups of the receptor to the binding site. Various ligand tautomer states were also considered. An independent docking calculation was run for each state. Several protocols were examined: using an ensemble of all generated states of ligand and receptor, using only the most probable state of the unbound ligand/receptor, and using only the state giving the most favorable docking score. The accuracies of these approaches were compared, using a set of 176 protein-ligand complexes (15 receptors) for which crystal structures and measured binding affinities are available. The best agreement with experiment was obtained when ligand poses from experimental crystal structures were used. For 9 of 15 receptors, using an ensemble of all generated protonation states of the ligand and receptor gave the best correlation between calculated and measured affinities.

  • Virtual Screening for Lead Discovery
    Tang, Yat T. and Marshall, Garland R.
    , 2011, 1-22
    doi: 10.1007/978-1-61779-012-6_1
    Abstract The identification of small drug-like compounds that selectively inhibit the function of biological targets has historically been a major focus in the pharmaceutical industry, and in recent years, has generated much interest in academia as well. Drug-like compounds ...

  • Virtual Screening for Lead Discovery
    Tang, Yat T. and Marshall, Garland R.
    , 2011, 1-22
    doi: 10.1007/978-1-61779-012-6_1
    Abstract The identification of small drug-like compounds that selectively inhibit the function of biological targets has historically been a major focus in the pharmaceutical industry, and in recent years, has generated much interest in academia as well. Drug-like compounds ...

  • Evaluation of Several Two-Step Scoring Functions Based on Linear Interaction Energy, Effective Ligand Size, and Empirical Pair Potentials for Prediction of Protein-Ligand Binding Geometry and Free Energy.
    Rahaman, Obaidur and Estrada, Trilce P and Doren, Douglas J and Taufer, Michela and Brooks, Charles L and Armen, Roger S
    Journal of chemical information and modeling, 2011, 51(9), 2047-2065
    PMID: 21644546     doi: 10.1021/ci1003009
    The performances of several two-step scoring approaches for molecular docking were assessed for their ability to predict binding geometries and free energies. Two new scoring functions designed for "step 2 discrimination" were proposed and compared to our CHARMM implementation of the linear interaction energy (LIE) approach using the Generalized-Born with Molecular Volume (GBMV) implicit solvation model. A scoring function S1 was proposed by considering only "interacting" ligand atoms as the "effective size" of the ligand and extended to an empirical regression-based pair potential S2. The S1 and S2 scoring schemes were trained and 5-fold cross-validated on a diverse set of 259 protein-ligand complexes from the Ligand Protein Database (LPDB). The regression-based parameters for S1 and S2 also demonstrated reasonable transferability in the CSARdock 2010 benchmark using a new data set (NRC HiQ) of diverse protein-ligand complexes. The ability of the scoring functions to accurately predict ligand geometry was evaluated by calculating the discriminative power (DP) of the scoring functions to identify native poses. The parameters for the LIE scoring function with the optimal discriminative power (DP) for geometry (step 1 discrimination) were found to be very similar to the best-fit parameters for binding free energy over a large number of protein-ligand complexes (step 2 discrimination). Reasonable performance of the scoring functions in enrichment of active compounds in four different protein target classes established that the parameters for S1 and S2 provided reasonable accuracy and transferability. Additional analysis was performed to definitively separate scoring function performance from molecular weight effects. This analysis included the prediction of ligand binding efficiencies for a subset of the CSARdock NRC HiQ data set where the number of ligand heavy atoms ranged from 17 to 35. This range of ligand heavy atoms is where improved accuracy of predicted ligand efficiencies is most relevant to real-world drug design efforts.

  • PLS-DA - Docking Optimized Combined Energetic Terms (PLSDA-DOCET) protocol: a brief evaluation.
    Avram, Sorin and Pacureanu, Liliana Mioara and Seclaman, Edward and Bora, Alina and Kurunczi, Ludovic G
    Journal of chemical information and modeling, 2011, 51(12), 3169-3179
    PMID: 22066983     doi: 10.1021/ci2002268
    Docking studies have become popular approaches in drug design, where the binding energy of the ligand in the active site of the protein is estimated by a scoring function. Many promising techniques were developed to enhance the performance of scoring functions including the fusion of multiple scoring functions outcomes into a so-called consensus scoring function. Hereby, we evaluated the target oriented consensus technique using the energetic terms of several scoring functions. The approach was denoted PLSDA-DOCET. Optimization strategies for consensus energetic terms and scoring functions based on ROC metric were compared to classical rigid docking and to ligand-based similarity search methods comprising 2D fingerprints and ROCS. The ROCS results indicate large performance variations depending on the biological target. The AUC-based strategy of PLSDA-DOCET outperformed the other docking approaches regarding simple retrieval and scaffold-hopping. The superior performance of PLSDA-DOCET protocol relative to single and combined scoring functions was validated on an external test set. We found a relative low mean correlation of the ranks of the chemotypes retrieved by the PLSDA-DOCET protocol and all the other methods employed here.

  • Darwinian Docking.
    Kuntz, Irwin D
    Journal of computer-aided molecular design, 2011, 26(1), 73-75
    PMID: 22143893     doi: 10.1007/s10822-011-9503-4
    The Darwinian model of evolution is an optimization strategy that can be adapted to docking. It differs from the common use of genetic algorithms, primarily in its acceptance of diverse solutions over finding "global" optima. A related problem is selecting compounds using multiple criteria. I discuss these ideas and present the outlines of a protocol for selecting "hits" and "leads" in drug discovery.

  • Statistical potential for modeling and ranking of protein-ligand interactions.
    Fan, Hao and Schneidman-Duhovny, Dina and Irwin, John J and Dong, Guangqiang and Shoichet, Brian K and Sali, Andrej
    Journal of chemical information and modeling, 2011, 51(12), 3078-3092
    PMID: 22014038     doi: 10.1021/ci200377u
    Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2\AA} from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package ( and the LigScore Web server (

  • Improving molecular docking through eHiTS' tunable scoring function.
    Ravitz, Orr and Zsoldos, Zsolt and Simon, Aniko
    Journal of computer-aided molecular design, 2011, 25(11), 1033-1051
    PMID: 22076470     doi: 10.1007/s10822-011-9482-5
    We present three complementary approaches for score-tuning that improve docking performance in pose prediction, virtual screening and binding affinity assessment. The methodology utilizes experimental data to customize the scoring function for the system of interest considering the specific docking scenario. The tuning approach, which has been implemented as an automated utility in eHiTS, is introduced as a solution to one of the conundrums of the molecular docking paradigm, namely, the lack of a universally well performing scoring function. The accuracy of scoring functions has been shown to be generally system-dependent, and particularly lacking for binding energy and bio-activity predictions. In the proposed approach, pose and energy predictions are enhanced by adjusting the relative weights of the eHiTS energy terms to improve score-RMSD or score-affinity correlations. In a virtual screening context ligand-based similarity is used to rescale the docking score such that better enrichment factors are achieved. We discuss the algorithmic details of the methods, and demonstrate the effects of score tuning on a variety of targets, including CDK2, BACE1 and neuraminidase, as well as on the popular benchmarks-the Directory of Useful Decoys and the PDBBind database.

  • BEAR, a novel virtual screening methodology for drug discovery.
    Degliesposti, Gianluca and Portioli, Corinne and Parenti, Marco Daniele and Rastelli, Giulio
    Journal of biomolecular screening, 2011, 16(1), 129-133
    PMID: 21084717     doi: 10.1177/1087057110388276
    BEAR (binding estimation after refinement) is a new virtual screening technology based on the conformational refinement of docking poses through molecular dynamics and prediction of binding free energies using accurate scoring functions. Here, the authors report the results of an extensive benchmark of the BEAR performance in identifying a smaller subset of known inhibitors seeded in a large (1.5 million) database of compounds. BEAR performance proved strikingly better if compared with standard docking screening methods. The validations performed so far showed that BEAR is a reliable tool for drug discovery. It is fast, modular, and automated, and it can be applied to virtual screenings against any biological target with known structure and any database of compounds.

  • Knowledge-Based Scoring Functions in Drug Design: 3. A Two-Dimensional Knowledge-Based Hydrogen-Bonding Potential for the Prediction of Protein-Ligand Interactions.
    Zheng, Mingyue and Xiong, Bing and Luo, Cheng and Li, Shanshan and Liu, Xian and Shen, Qianchen and Li, Jing and Zhu, Weiliang and Luo, Xiaomin and Jiang, Hualiang
    Journal of chemical information and modeling, 2011, 50(11), 2994-3004
    PMID: 21999432     doi: 10.1021/ci2003939
    Hydrogen bonding is a key contributor to the molecular recognition between ligands and their host molecules in biological systems. Here we develop a novel orientation-dependent hydrogen bonding potential based on the geometric characteristics of hydrogen bonds observed in 44,585 protein-ligand complexes. We find a close correspondence between the empirical knowledge and the energy landscape inferred from the distribution of HBs. A scoring function based on the resultant hydrogen-bonding potentials discriminates native protein-ligand structures from incorrectly docked decoys with remarkable predictive power.

  • A machine learning-based method to improve docking scoring functions and its application to drug repurposing.
    Kinnings, Sarah L and Liu, Nina and Tonge, Peter J and Jackson, Richard M and Xie, Lei and Bourne, Philip E
    Journal of chemical information and modeling, 2011, 51(2), 408-419
    PMID: 21291174     doi: 10.1021/ci100369f
    Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC(50) values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.

  • Accelerating molecular docking calculations using graphics processing units.
    Korb, Oliver and Stutzle, Thomas and Exner, Thomas E.
    Journal of chemical information and modeling, 2011, 51(4), 865-876
    PMID: 21434638     doi: 10.1021/ci100459b
    The generation of molecular conformations and the evaluation of interaction potentials are common tasks in molecular modeling applications, particularly in protein-ligand or protein-protein docking programs. In this work, we present a GPU-accelerated approach capable of speeding up these tasks considerably. For the evaluation of interaction potentials in the context of rigid protein-protein docking, the GPU-accelerated approach reached speedup factors of up to over 50 compared to an optimized CPU-based implementation. Treating the ligand and donor groups in the protein binding site as flexible, speedup factors of up to 16 can be observed in the evaluation of protein-ligand interaction potentials. Additionally, we introduce a parallel version of our protein-ligand docking algorithm PLANTS that can take advantage of this GPU-accelerated scoring function evaluation. We compared the GPU-accelerated parallel version to the same algorithm running on the CPU and also to the highly optimized sequential CPU-based version. In terms of dependence of the ligand size and the number of rotatable bonds, speedup factors of up to 10 and 7, respectively, can be observed. Finally, a fitness landscape analysis in the context of rigid protein-protein docking was performed. Using a systematic grid-based search methodology, the GPU-accelerated version outperformed the CPU-based version with speedup factors of up to 60.

  • SERAPhiC: A Benchmark for in Silico Fragment-Based Drug Design.
    Favia, Angelo D and Bottegoni, Giovanni and Nobeli, Irene and Bisignano, Paola and Cavalli, Andrea
    Journal of chemical information and modeling, 2011, 51(11), 2882-2896
    PMID: 21936510     doi: 10.1021/ci2003363
    Our main objective was to compile a data set of high-quality protein-fragment complexes and make it publicly available. Once assembled, the data set was challenged using docking procedures to address the following questions: (i) Can molecular docking correctly reproduce the experimentally solved structures? (ii) How thorough must the sampling be to replicate the experimental data? (iii) Can commonly used scoring functions discriminate between the native pose and other energy minima? The data set, named SERAPhiC (Selected Fragment Protein Complexes), is publicly available in a ready-to-dock format ( ). It offers computational medicinal chemists a reliable test set for both in silico protocol assessment and software development.

  • AADS - An Automated Active Site Identification, Docking, and Scoring Protocol for Protein Targets Based on Physicochemical Descriptors.
    Singh, Tanya and Biswas, D and Jayaram, B.
    Journal of chemical information and modeling, 2011, 51(10), 2515-2527
    PMID: 21877713     doi: 10.1021/ci200193z
    We report here a robust automated active site detection, docking, and scoring (AADS) protocol for proteins with known structures. The active site finder identifies all cavities in a protein and scores them based on the physicochemical properties of functional groups lining the cavities in the protein. The accuracy realized on 620 proteins with sizes ranging from 100 to 600 amino acids with known drug active sites is 100% when the top ten cavity points are considered. These top ten cavity points identified are then submitted for an automated docking of an input ligand/candidate molecule. The docking protocol uses an all atom energy based Monte Carlo method. Eight low energy docked structures corresponding to different locations and orientations of the candidate molecule are stored at each cavity point giving 80 docked structures overall which are then ranked using an effective free energy function and top five structures are selected. The predicted structure and energetics of the complexes agree quite well with experiment when tested on a data set of 170 protein-ligand complexes with known structures and binding affinities. The AADS methodology is implemented on an 80 processor cluster and presented as a freely accessible, easy to use tool at .

  • Efficient inclusion of receptor flexibility in grid-based protein-ligand docking*
    Leis, Simon and Zacharias, Martin
    Journal of computational chemistry, 2011, 32(16), 3433-3439
    PMID: 21919015     doi: 10.1002/jcc.21923
    Accounting for receptor flexibility is an essential component of successful protein-ligand docking but still marks a major computational challenge. For many target molecules of pharmaceutical relevance, global backbone conformational changes are relevant during the ligand binding process. However, popular methods that represent the protein receptor molecule as a potential grid typically assume a rigid receptor structure during ligand-receptor docking. A new approach has been developed that combines inclusion of global receptor flexibility with the efficient potential grid representation of the receptor molecule. This is achieved using interpolation between grid representations of the receptor protein deformed in selected collective degrees of freedom. The method was tested on the docking of three ligands to apo protein kinase A (PKA), an enzyme that undergoes global structural changes upon inhibitor binding. Structural variants of PKA were generated along the softest normal mode of an elastic network representation of apo PKA. Inclusion of receptor deformability during docking resulted in a significantly improved docking performance compared with rigid PKA docking, thus allowing for systematic virtual screening applications at small additional computational cost.

  • BetaDock: shape-priority docking method based on beta-complex.
    Kim, Deok-Soo and Kim, Chong-Min and Won, Chung-In and Kim, Jae-Kwan and Ryu, Joonghyun and Cho, Youngsong and Lee, Changhee and Bhak, Jong
    Journal of biomolecular structure & dynamics, 2011, 29(1), 219-242
    PMID: 21696235    
    This paper presents an approach and a software, BetaDock, to the docking problem by putting the priority on shape complementarity between a receptor and a ligand. The approach is based on the theory of the $\beta$-complex. Given the Voronoi diagram of the receptor whose topology is stored in the quasi-triangulation, the $\beta$-complex corresponding to water molecule is computed. Then, the boundary of the $\beta$-complex defines the $\beta$-shape which has the complete proximity information among all atoms on the receptor boundary. From the $\beta$-shape, we first compute pockets where the ligand may bind. Then, we quickly place the ligand within each pocket by solving the singular value decomposition problem and the assignment problem. Using the conformations of the ligands within the pockets as the initial solutions, we run the genetic algorithm to find the optimal solution for the docking problem. The performance of the proposed algorithm was verified through a benchmark test and showed that BetaDock is superior to a popular docking software AutoDock 4.

  • Virtual screening using molecular simulations.
    Yang, Tianyi and Wu, Johnny C and Yan, Chunli and Wang, Yuanfeng and Luo, Ray and Gonzales, Michael B and Dalby, Kevin N and Ren, Pengyu
    Proteins, 2011, 79(6), 1940-1951
    PMID: 21491494     doi: 10.1002/prot.23018
    Effective virtual screening relies on our ability to make accurate prediction of protein-ligand binding, which remains a great challenge. In this work, utilizing the molecular-mechanics Poisson-Boltzmann (or Generalized Born) surface area approach, we have evaluated the binding affinity of a set of 156 ligands to seven families of proteins, trypsin $\beta$, thrombin $\alpha$, cyclin-dependent kinase (CDK), cAMP-dependent kinase (PKA), urokinase-type plasminogen activator, $\beta$-glucosidase A, and coagulation factor Xa. The effect of protein dielectric constant in the implicit-solvent model on the binding free energy calculation is shown to be important. The statistical correlations between the binding energy calculated from the implicit-solvent approach and experimental free energy are in the range of 0.56-0.79 across all the families. This performance is better than that of typical docking programs especially given that the latter is directly trained using known binding data whereas the molecular mechanics is based on general physical parameters. Estimation of entropic contribution remains the barrier to accurate free energy calculation. We show that the traditional rigid rotor harmonic oscillator approximation is unable to improve the binding free energy prediction. Inclusion of conformational restriction seems to be promising but requires further investigation. On the other hand, our preliminary study suggests that implicit-solvent based alchemical perturbation, which offers explicit sampling of configuration entropy, can be a viable approach to significantly improve the prediction of binding free energy. Overall, the molecular mechanics approach has the potential for medium to high-throughput computational drug discovery.

  • Accounting for induced-fit effects in docking: what is possible and what is not?
    Sotriffer, Christoph A.
    Current topics in medicinal chemistry, 2011, 11(2), 179-191
    PMID: 20939789    
    Proteins can undergo a variety of conformational changes upon ligand binding. Although different mechanisms may play a role, the phenomenon is commonly referred to as induced fit to indicate that the tight structural complementarity of the interaction partners is a consequence of the binding event. Docking methods need to take into account this ability of the ligand and the protein to mutually adapt to each other when forming a complex. Handling the ligand as flexible is already common practice in docking applications. This is not yet the case for the protein. In fact, the accurate prediction of protein conformational changes upon ligand binding is still a major challenge, even more if computational speed is an issue, as for example in virtual screening applications. However, significant progress has been made over the past years and many valuable approaches have become available to address the protein flexibility problem and to provide more reliable docking predictions for complexes governed by significant induced-fit effects. This review provides a brief overview of the current situation, the most recent advances, and the remaining limitations of flexible protein docking, with particular focus on approaches handling protein flexibility simultaneously with ligand placement in the docking process.

  • Challenges and advances in computational docking: 2009 in review.
    Yuriev, Elizabeth and Agostino, Mark and Ramsland, Paul A
    Journal of molecular recognition : JMR, 2011, 24(2), 149-164
    PMID: 21360606     doi: 10.1002/jmr.1077
    Docking is a computational technique that places a small molecule (ligand) in the binding site of its macromolecular target (receptor) and estimates its binding affinity. This review addresses methodological developments that have occurred in the docking field in 2009, with a particular focus on the more difficult, and sometimes controversial, aspects of this promising computational discipline. These developments aim to address the main challenges of docking: receptor representation (such aspects as structural waters, side chain protonation, and, most of all, flexibility (from side chain rotation to domain movement)), ligand representation (protonation, tautomerism and stereoisomerism, and the effect of input conformation), as well as accounting for solvation and entropy of binding. This review is strongly focused on docking advances in the context of drug design, specifically in virtual screening and fragment-based drug design.

  • Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors.
    Raveh, Barak and London, Nir and Zimmerman, Lior and Schueler-Furman, Ora
    PloS one, 2011, 6(4), e18934
    PMID: 21572516     doi: 10.1371/journal.pone.0018934
    Flexible peptides that fold upon binding to another protein molecule mediate a large number of regulatory interactions in the living cell and may provide highly specific recognition modules. We present Rosetta FlexPepDock ab-initio, a protocol for simultaneous docking and de-novo folding of peptides, starting from an approximate specification of the peptide binding site. Using the Rosetta fragments library and a coarse-grained structural representation of the peptide and the receptor, FlexPepDock ab-initio samples efficiently and simultaneously the space of possible peptide backbone conformations and rigid-body orientations over the receptor surface of a given binding site. The subsequent all-atom refinement of the coarse-grained models includes full side-chain modeling of both the receptor and the peptide, resulting in high-resolution models in which key side-chain interactions are recapitulated. The protocol was applied to a benchmark in which peptides were modeled over receptors in either their bound backbone conformations or in their free, unbound form. Near-native peptide conformations were identified in 18/26 of the bound cases and 7/14 of the unbound cases. The protocol performs well on peptides from various classes of secondary structures, including coiled peptides with unusual turns and kinks. The results presented here significantly extend the scope of state-of-the-art methods for high-resolution peptide modeling, which can now be applied to a wide variety of peptide-protein interactions where no prior information about the peptide backbone conformation is available, enabling detailed structure-based studies and manipulation of those interactions.

  • Rosetta FlexPepDock web server-high resolution modeling of peptide-protein interactions
    London, Nir and Raveh, Barak and Cohen, Eyal and Fathi, Guy and Schueler-Furman, Ora
    Nucleic acids research, 2011, 39(Web Server issue), W249-W253
    PMID: 21622962     doi: 10.1093/nar/gkr431
    Peptide-protein interactions are among the most prevalent and important interactions in the cell, but a large fraction of those interactions lack detailed structural characterization. The Rosetta FlexPepDock web server ( provides an interface to a high-resolution peptide docking (refinement) protocol for the modeling of peptide-protein complexes, implemented within the Rosetta framework. Given a protein receptor structure and an approximate, possibly inaccurate model of the peptide within the receptor binding site, the FlexPepDock server refines the peptide to high resolution, allowing full flexibility to the peptide backbone and to all side chains. This protocol was extensively tested and benchmarked on a wide array of non-redundant peptide-protein complexes, and was proven effective when applied to peptide starting conformations within 5.5 angstrom backbone root mean square deviation from the native conformation. FlexPepDock has been applied to several systems that are mediated and regulated by peptide-protein interactions. This easy to use and general web server interface allows non-expert users to accurately model their specific peptide-protein interaction of interest.

  • Docking-based virtual screening for ligands of G protein-coupled receptors: not only crystal structures but also in silico models.
    Vilar, Santiago and Ferino, Giulio and Phatak, Sharangdhar S and Berk, Barkin and Cavasotto, Claudio N and Costanzi, Stefano
    Journal of molecular graphics & modelling, 2011, 29(5), 614-623
    PMID: 21146435     doi: 10.1016/j.jmgm.2010.11.005
    G protein-coupled receptors (GPCRs) regulate a wide range of physiological functions and hold great pharmaceutical interest. Using the $\beta$(2)-adrenergic receptor as a case study, this article explores the applicability of docking-based virtual screening to the discovery of GPCR ligands and defines methods intended to improve the screening performance. Our controlled computational experiments were performed on a compound dataset containing known agonists and blockers of the receptor as well as a large number of decoys. The screening based on the structure of the receptor crystallized in complex with its inverse agonist carazolol yielded excellent results, with a clearly delineated prioritization of ligands over decoys. Blockers generally were preferred over agonists; however, agonists were also well distinguished from decoys. A method was devised to increase the screening yields by generating an ensemble of alternative conformations of the receptor that accounts for its flexibility. Moreover, a method was devised to improve the retrieval of agonists, based on the optimization of the receptor around a known agonist. Finally, the applicability of docking-based virtual screening also to homology models endowed with different levels of accuracy was proved. This last point is of uttermost importance, since crystal structures are available only for a limited number of GPCRs, and extends our conclusions to the entire superfamily. The outcome of this analysis definitely supports the application of computer-aided techniques to the discovery of novel GPCR ligands, especially in light of the fact that, in the near future, experimental structures are expected to be solved and become available for an ever increasing number of GPCRs.

  • Toward prediction of functional protein pockets using blind docking and pocket search algorithms
    Hetenyi, Csaba and van der Spoel, David
    Protein science : a publication of the Protein Society, 2011, 20(5), 880-893
    PMID: 21413095     doi: 10.1002/pro.618
    Location of functional binding pockets of bioactive ligands on protein molecules is essential in structural genomics and drug design projects. If the experimental determination of ligand-protein complex structures is complicated, blind docking (BD) and pocket search (PS) calculations can help in the prediction of atomic resolution binding mode and the location of the pocket of a ligand on the entire protein surface. Whereas the number of successful predictions by these methods is increasing even for the complicated cases of exosites or allosteric binding sites, their reliability has not been fully established. For a critical assessment of reliability, we use a set of ligand-protein complexes, which were found to be problematic in previous studies. The robustness of BD and PS methods is addressed in terms of success of the selection of truly functional pockets from among the many putative ones identified on the surfaces of ligand-bound and ligand-free (holo and apo) protein forms. Issues related to BD such as effect of hydration, existence of multiple pockets, and competition of subsidiary ligands are considered. Practical cases of PS are discussed, categorized and strategies are recommended for handling the different situations. PS can be used in conjunction with BD, as we find that a consensus approach combining the techniques improves predictive power.

  • Fast docking using the CHARMM force field with EADock DSS.
    Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
    Journal of computational chemistry, 2011, 32(10), 2149-2159
    PMID: 21541955     doi: 10.1002/jcc.21797
    The prediction of binding modes (BMs) occurring between a small molecule and a target protein of biological interest has become of great importance for drug development. The overwhelming diversity of needs leaves room for docking approaches addressing specific problems. Nowadays, the universe of docking software ranges from fast and user friendly programs to algorithmically flexible and accurate approaches. EADock2 is an example of the latter. Its multiobjective scoring function was designed around the CHARMM22 force field and the FACTS solvation model. However, the major drawback of such a software design lies in its computational cost. EADock dihedral space sampling (DSS) is built on the most efficient features of EADock2, namely its hybrid sampling engine and multiobjective scoring function. Its performance is equivalent to that of EADock2 for drug-like ligands, while the CPU time required has been reduced by several orders of magnitude. This huge improvement was achieved through a combination of several innovative features including an automatic bias of the sampling toward putative binding sites, and a very efficient tree-based DSS algorithm. When the top-scoring prediction is considered, 57% of BMs of a test set of 251 complexes were reproduced within 2\AA} RMSD to the crystal structure. Up to 70% were reproduced when considering the five top scoring predictions. The success rate is lower in cross-docking assays but remains comparable with that of the latest version of AutoDock that accounts for the protein flexibility.

  • SwissDock, a protein-small molecule docking web service based on EADock DSS.
    Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
    Nucleic acids research, 2011, 39(Web Server issue), W270-7
    PMID: 21624888     doi: 10.1093/nar/gkr366
    Most life science processes involve, at the atomic scale, recognition between two molecules. The prediction of such interactions at the molecular level, by so-called docking software, is a non-trivial task. Docking programs have a wide range of applications ranging from protein engineering to drug design. This article presents SwissDock, a web server dedicated to the docking of small molecules on target proteins. It is based on the EADock DSS engine, combined with setup scripts for curating common problems and for preparing both the target protein and the ligand input files. An efficient Ajax/HTML interface was designed and implemented so that scientists can easily submit dockings and retrieve the predicted complexes. For automated docking tasks, a programmatic SOAP interface has been set up and template programs can be downloaded in Perl, Python and PHP. The web site also provides an access to a database of manually curated complexes, based on the Ligand Protein Database. A wiki and a forum are available to the community to promote interactions between users. The SwissDock web site is available online at We believe it constitutes a step toward generalizing the use of docking tools beyond the traditional molecular modeling community.

  • SwissParam: a fast force field generation tool for small organic molecules.
    Zoete, Vincent and Cuendet, Michel A and Grosdidier, Aurélien and Michielin, Olivier
    Journal of computational chemistry, 2011, 32(11), 2359-2368
    PMID: 21541964     doi: 10.1002/jcc.21816
    The drug discovery process has been deeply transformed recently by the use of computational ligand-based or structure-based methods, helping the lead compounds identification and optimization, and finally the delivery of new drug candidates more quickly and at lower cost. Structure-based computational methods for drug discovery mainly involve ligand-protein docking and rapid binding free energy estimation, both of which require force field parameterization for many drug candidates. Here, we present a fast force field generation tool, called SwissParam, able to generate, for arbitrary small organic molecule, topologies, and parameters based on the Merck molecular force field, but in a functional form that is compatible with the CHARMM force field. Output files can be used with CHARMM or GROMACS. The topologies and parameters generated by SwissParam are used by the docking software EADock2 and EADock DSS to describe the small molecules to be docked, whereas the protein is described by the CHARMM force field, and allow them to reach success rates ranging from 56 to 78%. We have also developed a rapid binding free energy estimation approach, using SwissParam for ligands and CHARMM22/27 for proteins, which requires only a short minimization to reproduce the experimental binding free energy of 214 ligand-protein complexes involving 62 different proteins, with a standard error of 2.0 kcal mol(-1), and a correlation coefficient of 0.74. Together, these results demonstrate the relevance of using SwissParam topologies and parameters to describe small organic molecules in computer-aided drug design applications, together with a CHARMM22/27 description of the target protein. SwissParam is available free of charge for academic users at

  • Computer-aided drug design platform using PyMOL.
    Lill, Markus A and Danielson, Matthew L
    Journal of computer-aided molecular design, 2011, 25(1), 13-19
    PMID: 21053052     doi: 10.1007/s10822-010-9395-8
    The understanding and optimization of protein-ligand interactions are instrumental to medicinal chemists investigating potential drug candidates. Over the past couple of decades, many powerful standalone tools for computer-aided drug discovery have been developed in academia providing insight into protein-ligand interactions. As programs are developed by various research groups, a consistent user-friendly graphical working environment combining computational techniques such as docking, scoring, molecular dynamics simulations, and free energy calculations is needed. Utilizing PyMOL we have developed such a graphical user interface incorporating individual academic packages designed for protein preparation (AMBER package and Reduce), molecular mechanics applications (AMBER package), and docking and scoring (AutoDock Vina and SLIDE). In addition to amassing several computational tools under one interface, the computational platform also provides a user-friendly combination of different programs. For example, utilizing a molecular dynamics (MD) simulation performed with AMBER as input for ensemble docking with AutoDock Vina. The overarching goal of this work was to provide a computational platform that facilitates medicinal chemists, many who are not experts in computational methodologies, to utilize several common computational techniques germane to drug discovery. Furthermore, our software is open source and is aimed to initiate collaborative efforts among computational researchers to combine other open source computational methods under a single, easily understandable graphical user interface.

  • LigDockCSA: Protein-ligand docking using conformational space annealing.
    Shin, Woong-Hee and Heo, Lim and Lee, Juyong and Ko, Junsu and Seok, Chaok and Lee, Jooyoung
    Journal of computational chemistry, 2011, 32(15), 3226-3232
    PMID: 21837636     doi: 10.1002/jcc.21905
    Protein-ligand docking techniques are one of the essential tools for structure-based drug design. Two major components of a successful docking program are an efficient search method and an accurate scoring function. In this work, a new docking method called LigDockCSA is developed by using a powerful global optimization technique, conformational space annealing (CSA), and a scoring function that combines the AutoDock energy and the piecewise linear potential (PLP) torsion energy. It is shown that the CSA search method can find lower energy binding poses than the Lamarckian genetic algorithm of AutoDock. However, lower-energy solutions CSA produced with the AutoDock energy were often less native-like. The loophole in the AutoDock energy was fixed by adding a torsional energy term, and the CSA search on the refined energy function is shown to improve the docking performance. The performance of LigDockCSA was tested on the Astex diverse set which consists of 85 protein-ligand complexes. LigDockCSA finds the best scoring poses within 2\AA} root-mean-square deviation (RMSD) from the native structures for 84.7% of the test cases, compared to 81.7% for AutoDock and 80.5% for GOLD. The results improve further to 89.4% by incorporating the conformational entropy.

  • Can We Trust Docking Results? Evaluation of Seven Commonly Used Programs on PDBbind Database
    Plewczynski, Dariusz and Lazniewski, Michal and Augustyniak, Rafal and Ginalski, Krzysztof
    Journal of computational chemistry, 2011, 32(4), 742-755
    PMID: 20812323     doi: 10.1002/jcc.21643
    Docking is one of the most commonly used techniques in drug design. It is used for both identifying correct poses of a ligand in the binding site of a protein as well as for the estimation of the strength of protein-ligand interaction. Because millions of compounds must be screened, before a suitable target for biological testing can be identified, all calculations should be done in a reasonable time frame. Thus, all programs currently in use exploit empirically based algorithms, avoiding systematic search of the conformational space. Similarly, the scoring is done using simple equations, which makes it possible to speed up the entire process. Therefore, docking results have to be verified by subsequent in vitro studies. The purpose of our work was to evaluate seven popular docking programs (Surf lex, LigandFit, Glide, GOLD, FlexX, eHiTS, and Auto Dock) on the extensive dataset composed of 1300 protein-ligands complexes from PDBbind 2007 database, where experimentally measured binding affinity values were also available. We compared independently the ability of proper posing [according to Root mean square deviation (or Root mean square distance) of predicted conformations versus the corresponding native one] and scoring (by calculating the correlation between docking score and ligand binding strength). To our knowledge, it is the first large-scale docking evaluation that covers both aspects of docking programs, that is, predicting ligand conformation and calculating the strength of its binding. More than 1000 protein-ligand pairs cover a wide range of different protein families and inhibitor classes. Our results clearly showed that the ligand binding conformation could be identified in most cases by using the existing software, yet we still observed the lack of universal scoring function for all types of molecules and protein families. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 32: 742-755, 2011

  • VoteDock: Consensus Docking Method for Prediction of Protein-Ligand Interactions
    Plewczynski, Dariusz and Lazniewski, Michal and Von Grotthuss, Marcin and Rychlewski, Leszek and Ginalski, Krzysztof
    Journal of computational chemistry, 2011, 32(4), 568-581
    PMID: 20812324     doi: 10.1002/jcc.21642
    Molecular recognition plays a fundamental role in all biological processes, and that is why great efforts have been made to understand and predict protein ligand interactions. Finding a molecule that can potentially bind to a target protein is particularly essential in drug discovery and still remains an expensive and time-consuming task. In sale, tools are frequently used to screen molecular libraries to identify new lead compounds, and if protein structure is known, various protein ligand docking programs can be used. The aim of docking procedure is to predict correct poses of ligand in the binding site of the protein as well as to score them according to the strength of interaction in a reasonable time frame. The purpose of our studies was to present the novel consensus approach to predict both protein ligand complex structure and its corresponding binding affinity. Our method used as the input the results from seven docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS, and AutoDock) that are widely used for docking of ligands. We evaluated it on the extensive benchmark dataset of 1300 protein-ligands pairs from refined PDBbind database for which the structural and affinity data was available. We compared independently its ability of proper scoring and posing to the previously proposed methods. In most cases, our method is able to dock properly approximately 20% of pairs more than docking methods on average, and over 10% of pairs more than the best single program. The RMSD value of the predicted complex conformation versus its native one is reduced by a factor of 0.5 angstrom. Finally, we were able to increase the Pearson correlation of the predicted binding affinity in comparison with the experimental value up to 0.5. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 32: 568-581, 2011

  • NNScore 2.0: a neural-network receptor-ligand scoring function.
    Durrant, Jacob D and McCammon, J Andrew
    Journal of chemical information and modeling, 2011, 51(11), 2897-2903
    PMID: 22017367     doi: 10.1021/ci2003889
    NNScore is a neural-network-based scoring function designed to aid the computational identification of small-molecule ligands. While the test cases included in the original NNScore article demonstrated the utility of the program, the application examples were limited. The purpose of the current work is to further confirm that neural-network scoring functions are effective, even when compared to the scoring functions of state-of-the-art docking programs, such as AutoDock, the most commonly cited program, and AutoDock Vina, thought to be two orders of magnitude faster. Aside from providing additional validation of the original NNScore function, we here present a second neural-network scoring function, NNScore 2.0. NNScore 2.0 considers many more binding characteristics when predicting affinity than does the original NNScore. The network output of NNScore 2.0 also differs from that of NNScore 1.0; rather than a binary classification of ligand potency, NNScore 2.0 provides a single estimate of the pK(d). To facilitate use, NNScore 2.0 has been implemented as an open-source python script. A copy can be obtained from .

  • Implementation and evaluation of a docking-rescoring method using molecular footprint comparisons.
    Balius, Trent E and Mukherjee, Sudipto and Rizzo, Robert C
    Journal of computational chemistry, 2011, 32(10), 2273-2289
    PMID: 21541962     doi: 10.1002/jcc.21814
    A docking-rescoring method, based on per-residue van der Waals (VDW), electrostatic (ES), or hydrogen bond (HB) energies has been developed to aid discovery of ligands that have interaction signatures with a target (footprints) similar to that of a reference. Biologically useful references could include known drugs, inhibitors, substrates, transition states, or side-chains that mediate protein-protein interactions. Termed footprint similarity (FPS) score, the method, as implemented in the program DOCK, was validated and characterized using: (1) pose identification, (2) crossdocking, (3) enrichment, and (4) virtual screening. Improvements in pose identification (6-12%) were obtained using footprint-based (FPSVDW+ES) vs. standard DOCK (DCEVDW+ES) scoring as evaluated on three large datasets (680-775 systems) from the SB2010 database. Enhanced pose identification was also observed using FPS (45.4% or 70.9%) compared with DCE (17.8%) methods to rank challenging crossdocking ensembles from carbonic anhydrase. Enrichment tests, for three representative systems, revealed FPSVDW+ES scoring yields significant early fold enrichment in the top 10% of ranked databases. For EGFR, top FPS poses are nicely accommodated in the molecular envelope defined by the reference in comparison with DCE, which yields distinct molecular weight bias toward larger molecules. Results from a representative virtual screen of ca. 1 million compounds additionally illustrate how ligands with footprints similar to a known inhibitor can readily be identified from within large commercially available databases. By providing an alternative way to rank ligand poses in a simple yet directed manner we anticipate that FPS scoring will be a useful tool for docking and structure-based design. (C) 2011 Wiley Periodicals, Inc. J Comput Chem 32: 2273-2289, 2011

  • DSX: a knowledge-based scoring function for the assessment of protein-ligand complexes.
    Neudert, Gerd and Klebe, Gerhard
    Journal of chemical information and modeling, 2011, 51(10), 2731-2745
    PMID: 21863864     doi: 10.1021/ci200274q
    We introduce the new knowledge-based scoring function DSX that consists of distance-dependent pair potentials, novel torsion angle potentials, and newly defined solvent accessible surface-dependent potentials. DSX pair potentials are based on the statistical formalism of DrugScore, extended by a much more specialized set of atom types. The original DrugScore-like reference state is rather unstable with respect to modifications in the used atom types. Therefore, an important method to overcome this problem and to allow for robust results when deriving pair potentials for arbitrary sets of atom types is presented. A validation based on a carefully prepared test set is shown, enabling direct comparison to the majority of other popular scoring functions. Here, DSX features superior performance with respect to docking- and ranking power and runtime requirements. Furthermore, the beneficial combination with torsion angle-dependent and desolvation-dependent potentials is demonstrated. DSX is robust, flexible, and capable of working together with special features of popular docking engines, e.g., flexible protein residues in AutoDock or GOLD. The program is freely available to the scientific community and can be downloaded from our Web site .

  • Exhaustive search and solvated interaction energy (SIE) for virtual screening and affinity prediction.
    Sulea, Traian and Hogues, Hervé and Purisima, Enrico O
    Journal of computer-aided molecular design, 2011, 26(5), 617-633
    PMID: 22198519     doi: 10.1007/s10822-011-9529-7
    We carried out a prospective evaluation of the utility of the SIE (solvation interaction energy) scoring function for virtual screening and binding affinity prediction. Since experimental structures of the complexes were not provided, this was an exercise in virtual docking as well. We used our exhaustive docking program, Wilma, to provide high-quality poses that were rescored using SIE to provide binding affinity predictions. We also tested the combination of SIE with our latest solvation model, first shell of hydration (FiSH), which captures some of the discrete properties of water within a continuum model. We achieved good enrichment in virtual screening of fragments against trypsin, with an area under the curve of about 0.7 for the receiver operating characteristic curve. Moreover, the early enrichment performance was quite good with 50% of true actives recovered with a 15% false positive rate in a prospective calculation and with a 3% false positive rate in a retrospective application of SIE with FiSH. Binding affinity predictions for both trypsin and host-guest complexes were generally within 2 kcal/mol of the experimental values. However, the rank ordering of affinities differing by 2 kcal/mol or less was not well predicted. On the other hand, it was encouraging that the incorporation of a more sophisticated solvation model into SIE resulted in better discrimination of true binders from binders. This suggests that the inclusion of proper Physics in our models is a fruitful strategy for improving the reliability of our binding affinity predictions.

  • sc-PDB: a database for identifying variations and multiplicity of 'druggable' binding sites in proteins
    Meslamani, Jamel and Rognan, Didier and Kellenberger, Esther
    Bioinformatics (Oxford, England), 2011, 27(9), 1324-1326
    doi: 10.1093/bioinformatics/btr120
    Background: The sc-PDB database is an annotated archive of druggable binding sites extracted from the Protein Data Bank. It contains all-atoms coordinates for 8166 protein-ligand complexes, chosen for their geometrical and physico-chemical properties. The sc-PDB provides a functional annotation for proteins, a chemical description for ligands and the detailed intermolecular interactions for complexes. The sc-PDB now includes a hierarchical classification of all the binding sites within a functional class.Method: The sc-PDB entries were first clustered according to the protein name indifferent of the species. For each cluster, we identified dissimilar sites (e. g. catalytic and allosteric sites of an enzyme).Scope and applications: The classification of sc-PDB targets by binding site diversity was intended to facilitate chemogenomics approaches to drug design. In ligand-based approaches, it avoids comparing ligands that do not share the same binding site. In structure-based approaches, it permits to quantitatively evaluate the diversity of the binding site definition (variations in size, sequence and/or structure).

  • Comments on "leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets": significance for the validation of scoring functions.
    Ballester, Pedro J and Mitchell, John B O
    Journal of chemical information and modeling, 2011, 51(8), 1739-1741
    PMID: 21591735     doi: 10.1021/ci200057e

  • Predicting Fragment Binding Poses Using a Combined MCSS MM-GBSA Approach.
    Haider, Muhammad K and Bertrand, Hugues-Olivier and Hubbard, Roderick E
    Journal of chemical information and modeling, 2011, 51(5), 1092-1105
    PMID: 21528911     doi: 10.1021/ci100469n
    Improved methods are required to predict the position and orientation (pose) of binding to the target protein of low molecular weight compounds identified in fragment screening campaigns. This is particularly important to guide initial chemistry to generate structure-activity relationships for the cases where a high resolution structure cannot be obtained. We have assessed the benefit of an implicit solvent method for assessment of fragment binding poses generated by the Multiple Copy Simultaneous Search (MCSS) method in CHARMm. Additionally, the effect of using multiple receptor structures for a flexible receptor is investigated. The original MCSS performance -50% of fragment positions accurately predicted and scored - was increased up to 67% by scoring MCSS energy minima with a Molecular Mechanics Generalized Born approach with molecular volume integration and Surface Area model (MM-GBSA). The same increase in performance (but occasionally for different targets) was observed when using the docking program GOLD followed by MM-GBSA rescoring. The combined results from both methods resulted in a higher success rate emphasizing that a comparison of different docking methods can increase the correct identification of binding poses. For a receptor where multiple structures are available, Hsp90, the average performance on randomly adding receptor structures was also investigated. The results suggest that predictions using these docking methods can be used with some confidence to guide chemical optimization, if the structure of the target either remains relatively fixed on ligand binding, or if a number of crystal structures are available with diverse ligands bound and there is information on the positions of key water molecules in the binding site.


  • Homology modeling and metabolism prediction of human carboxylesterase-2 using docking analyses by GriDock: a parallelized tool based on AutoDock 4.0.
    Vistoli, Giulio and Pedretti, Alessandro and Mazzolari, Angelica and Testa, Bernard
    Journal of computer-aided molecular design, 2010, 24(9), 771-787
    PMID: 20623318     doi: 10.1007/s10822-010-9373-1
    Metabolic problems lead to numerous failures during clinical trials, and much effort is now devoted to developing in silico models predicting metabolic stability and metabolites. Such models are well known for cytochromes P450 and some transferases, whereas less has been done to predict the activity of human hydrolases. The present study was undertaken to develop a computational approach able to predict the hydrolysis of novel esters by human carboxylesterase hCES2. The study involved first a homology modeling of the hCES2 protein based on the model of hCES1 since the two proteins share a high degree of homology (congruent with 73%). A set of 40 known substrates of hCES2 was taken from the literature; the ligands were docked in both their neutral and ionized forms using GriDock, a parallel tool based on the AutoDock4.0 engine which can perform efficient and easy virtual screening analyses of large molecular databases exploiting multi-core architectures. Useful statistical models (e.g., r (2)

  • Rapid flexible docking using a stochastic rotamer library of ligands.
    Ding, Feng and Yin, Shuangye and Dokholyan, Nikolay V
    Journal of chemical information and modeling, 2010, 50(9), 1623-1632
    PMID: 20712341     doi: 10.1021/ci100218t
    Existing flexible docking approaches model the ligand and receptor flexibility either separately or in a loosely coupled manner, which captures the conformational changes inefficiently. Here, we propose a flexible docking approach, MedusaDock, which models both ligand and receptor flexibility simultaneously with sets of discrete rotamers. We developed an algorithm to build the ligand rotamer library "on-the-fly" during docking simulations. MedusaDock benchmarks demonstrate a rapid sampling efficiency and high prediction accuracy in both self- (to the cocrystallized state) and cross-docking (to a state cocrystallized with a different ligand), the latter of which mimics the virtual screening procedure in computational drug discovery. We also perform a virtual screening test of four flexible kinase targets, including cyclin-dependent kinase 2, vascular endothelial growth factor receptor 2, HIV reverse transcriptase, and HIV protease. We find significant improvements of virtual screening enrichments when compared to rigid-receptor methods. The predictive power of MedusaDock in cross-docking and preliminary virtual-screening benchmarks highlights the importance to model both ligand and receptor flexibility simultaneously in computational docking.

  • Rapid context-dependent ligand desolvation in molecular docking.
    Mysinger, Michael M. and Shoichet, Brian K
    Journal of chemical information and modeling, 2010, 50(9), 1561-1573
    PMID: 20735049     doi: 10.1021/ci100214a
    In structure-based screens for new ligands, a molecular docking algorithm must rapidly score many molecules in multiple configurations, accounting for both the ligand's interactions with receptor and its competing interactions with solvent. Here we explore a context-dependent ligand desolvation scoring term for molecular docking. We relate the Generalized-Born effective Born radii for every ligand atom to a fractional desolvation and then use this fraction to scale an atom-by-atom decomposition of the full transfer free energy. The fractional desolvation is precomputed on a scoring grid by numerically integrating over the volume of receptor proximal to a ligand atom, weighted by distance. To test this method's performance, we dock ligands versus property-matched decoys over 40 DUD targets. Context-dependent desolvation better enriches ligands compared to both the raw full transfer free energy penalty and compared to ignoring desolvation altogether, though the improvement is modest. More compellingly, the new method improves docking performance across receptor types. Thus, whereas entirely ignoring desolvation works best for charged sites and overpenalizing with full desolvation works well for neutral sites, the physically more correct context-dependent ligand desolvation is competitive across both types of targets. The method also reliably discriminates ligands from highly charged molecules, where ignoring desolvation performs poorly. Since this context-dependent ligand desolvation may be precalculated, it improves docking reliability with minimal cost to calculation time and may be readily incorporated into any physics-based docking program.

  • A reliable docking/scoring scheme based on the semiempirical quantum mechanical PM6-DH2 method accurately covering dispersion and H-bonding: HIV-1 protease with 22 ligands.
    Fanfrlík, Jindrich and Bronowska, Agnieszka K and Rezác, Jan and Prenosil, Ondrej and Konvalinka, Jan and Hobza, Pavel
    The journal of physical chemistry. B, 2010, 114(39), 12666-12678
    PMID: 20839830     doi: 10.1021/jp1032965
    In this study, we introduce a fast and reliable rescoring scheme for docked complexes based on a semiempirical quantum mechanical PM6-DH2 method. The method utilizes a PM6-based Hamiltonian with corrections for dispersion energy and hydrogen bonds. The total score is constructed as the sum of the PM6-DH2 interaction enthalpy, the empirical force field (AMBER) interaction entropy, and the sum of the deformation (PM6-DH2, SMD) and the desolvation (SMD) energies of the ligand. The main advantage of the procedure is the fact that we do not add any empirical parameter for either an individual component of the total score or an individual protein-ligand complex. This rescoring method is applied to a very challenging system, namely, the HIV-1 protease with a set of ligands. As opposed to the conventional DOCK procedure, the PM6-DH2 rescoring based on all of the terms distinguishes between binders and nonbinders and provides a reliable correlation of the theoretical and experimental binding free energies. Such a dramatic improvement, resulting from the PM6-DH2 rescoring of all the complexes, provides a valuable yet inexpensive tool for rational drug discovery and de novo ligand design.

  • Dockomatic - automated ligand creation and docking.
    Bullock, Casey W and Jacob, Reed B. and McDougal, Owen M. and Hampikian, Greg and Andersen, Tim
    BMC research notes, 2010, 3, 289
    PMID: 21059259     doi: 10.1186/1756-0500-3-289
    BACKGROUND:The application of computational modeling to rationally design drugs and characterize macro biomolecular receptors has proven increasingly useful due to the accessibility of computing clusters and clouds. AutoDock is a well-known and powerful software program used to model ligand to receptor binding interactions. In its current version, AutoDock requires significant amounts of user time to setup and run jobs, and collect results. This paper presents DockoMatic, a user friendly Graphical User Interface (GUI) application that eases and automates the creation and management of AutoDock jobs for high throughput screening of ligand to receptor interactions.

  • pK(a) based protonation states and microspecies for protein-ligand docking.
    ten Brink, Tim and Exner, Thomas E.
    Journal of computer-aided molecular design, 2010, 24(11), 935-942
    PMID: 20882397     doi: 10.1007/s10822-010-9385-x
    In this paper we present our reworked approach to generate ligand protonation states with our structure preparation tool SPORES (Structure PrOtonation and REcognition System). SPORES can be used for the preprocessing of proteins and protein-ligand complexes as e.g. taken from the Protein Data Bank as well as for the setup of 3D ligand databases. It automatically assigns atom and bond types, generates different protonation, tautomeric states as well as different stereoisomers. In the revised version, pKa calculations with the ChemAxon software MARVIN are used either to determine the likeliness of a combinatorial generated protonation state or to determine the titrable atoms used in the combinatorial approach. Additionally, the MARVIN software is used to predict microspecies distributions of ligand molecules. Docking studies were performed with our recently introduced program PLANTS (Protein-Ligand ANT System) on all protomers resulting from the three different selection methods for the well established CCDC/ASTEX clean data set demonstrating the usefulness of especially the latter approach.

  • HarmonyDOCK: The Structural Analysis of Poses in Protein-Ligand Docking
    Plewczynski, Dariusz and Philips, Anna and Von Grotthuss, Marcin and Rychlewski, Leszek and Ginalski, Krzysztof
    Journal of computational biology : a journal of computational molecular cell biology, 2010, 18(00), 1-10
    PMID: 21091053     doi: 10.1089/cmb.2009.0111
    Abstract Molecular docking is a widely used method for lead optimization. However, docking tools often fail to predict how a ligand (the smaller molecule, such as a substrate or drug candidate) binds to a receptor (the accepting part of a protein). We present here the HarmonyDOCK, a novel method for assessing the docking software accuracy, and creating the scoring function which would determine consensus protein-ligand pose among those generated by available docking programs. Conformations for few hundred protein-ligand complexes with known three-dimensional structure were predicted on a benchmark set by set of different docking programs. On the basis of the derived ranking, the point of reference and the lower score limit were determined for subsequent investigations. The focus of the methodology is on the top-ranked poses, with the assumption being that the conformation of the docked molecules is the most accurate. We found out that some docking programs perform considerably better than the others, yet in all cases the proper selection of decoys, namely HarmonyDOCK, is needed for successful docking procedure.

  • Ensemble docking into multiple crystallographically derived protein structures: an evaluation based on the statistical analysis of enrichments.
    Craig, Ian R and Essex, Jonathan W and Spiegel, Katrin
    Journal of chemical information and modeling, 2010, 50(4), 511-524
    PMID: 20222690     doi: 10.1021/ci900407c
    Docking into multiple receptor conformations ("ensemble docking") has been proposed, and employed, in the hope that it may account for receptor flexibility in virtual screening and thus provide higher enrichments than docking into single rigid receptor structures. The statistical analyses presented in this paper provide quantitative evidence that in some cases docking into a crystallographically derived conformational ensemble does indeed yield better enrichment than docking into any of the individual members of the ensemble. However, these "successful" ensembles account for only a minority of those examined and it would not have been possible to prospectively predict their identity using only protein structural information. A more frequently observed outcome is that the ensemble enrichment is higher than the mean of the enrichments provided by its individual members. An additional and promising finding is that, if a set of known active compounds is available, an approach based on induced-fit docking appears to be a reliable way to construct ensembles which provide relatively high enrichments.

  • Q-Dock(LHM): Low-resolution refinement for ligand comparative modeling.
    Brylinski, Michal and Skolnick, Jeffrey
    Journal of computational chemistry, 2010, 31(5), 1093-1105
    PMID: 19827144     doi: 10.1002/jcc.21395
    The success of ligand docking calculations typically depends on the quality of the receptor structure. Given improvements in protein structure prediction approaches, approximate protein models now can be routinely obtained for the majority of gene products in a given proteome. Structure-based virtual screening of large combinatorial libraries of lead candidates against theoretically modeled receptor structures requires fast and reliable docking techniques capable of dealing with structural inaccuracies in protein models. Here, we present Q-Dock(LHM), a method for low-resolution refinement of binding poses provided by FINDSITE(LHM), a ligand homology modeling approach. We compare its performance to that of classical ligand docking approaches in ligand docking against a representative set of experimental (both holo and apo) as well as theoretically modeled receptor structures. Docking benchmarks reveal that unlike all-atom docking, Q-Dock(LHM) exhibits the desired tolerance to the receptor's structure deformation. Our results suggest that the use of an evolution-based approach to ligand homology modeling followed by fast low-resolution refinement is capable of achieving satisfactory performance in ligand-binding pose prediction with promising applicability to proteome-scale applications.

  • Chemical space sampling by different scoring functions and crystal structures.
    Brooijmans, Natasja and Humblet, Christine
    Journal of computer-aided molecular design, 2010, 24(5), 433-447
    PMID: 20401681     doi: 10.1007/s10822-010-9356-2
    Virtual screening has become a popular tool to identify novel leads in the early phases of drug discovery. A variety of docking and scoring methods used in virtual screening have been the subject of active research in an effort to gauge limitations and articulate best practices. However, how to best utilize different scoring functions and various crystal structures, when available, is not yet well understood. In this work we use multiple crystal structures of PI3 K-gamma in both prospective and retrospective virtual screening experiments. Both Glide SP scoring and Prime MM-GBSA rescoring are utilized in the prospective and retrospective virtual screens, and consensus scoring is investigated in the retrospective virtual screening experiments. The results show that each of the different crystal structures that was used, samples a different chemical space, i.e. different chemotypes are prioritized by each structure. In addition, the different (re)scoring functions prioritize different chemotypes as well. Somewhat surprisingly, the Prime MM-GBSA scoring function generally gives lower enrichments than Glide SP. Finally we investigate the impact of different ligand preparation protocols on virtual screening enrichment factors. In summary, different crystal structures and different scoring functions are complementary to each other and allow for a wider variety of chemotypes to be considered for experimental follow-up.

  • Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods.
    Venkatraman, Vishwesh and Pérez-Nueno, Violeta I and Mavridis, Lazaros and Ritchie, David W
    Journal of chemical information and modeling, 2010, 50(12), 2079-2093
    PMID: 21090728     doi: 10.1021/ci100263p
    In recent years, many virtual screening (VS) tools have been developed that employ different molecular representations and have different speed and accuracy characteristics. In this paper, we compare ten popular ligand-based VS tools using the publicly available Directory of Useful Decoys (DUD) data set comprising over 100 000 compounds distributed across 40 protein targets. The DUD was developed initially to evaluate docking algorithms, but our results from an operational correlation analysis show that it is also well suited for comparing ligand-based VS tools. Although it is conventional wisdom that 3D molecular shape is an important determinant of biological activity, our results based on permutational significance tests of several commonly used VS metrics show that the 2D fingerprint-based methods generally give better VS performance than the 3D shape-based approaches for surprisingly many of the DUD targets. To help understand this finding, we have analyzed the nature of the scoring functions used and the composition of the DUD data set itself. We propose that to improve the VS performance of current 3D methods, it will be necessary to devise screening queries that can represent multiple possible conformations and which can exploit knowledge of known actives that span multiple scaffold families.

  • Improving performance of docking-based virtual screening by structural filtration.
    Novikov, Fedor N and Stroylov, Viktor S and Stroganov, Oleg V and Chilov, Ghermes G
    Journal of Molecular Modeling, 2010, 16(7), 1223-1230
    PMID: 20041273     doi: 10.1007/s00894-009-0633-8
    In the current study an innovative method of structural filtration of docked ligand poses is introduced and applied to improve the virtual screening results. The structural filter is defined by a protein-specific set of interactions that are a) structurally conserved in available structures of a particular protein with its bound ligands, and b) that can be viewed as playing the crucial role in protein-ligand binding. The concept was evaluated on a set of 10 diverse proteins, for which the corresponding structural filters were developed and applied to the results of virtual screening obtained with the Lead Finder software. The application of structural filtration resulted in a considerable improvement of the enrichment factor ranging from several folds to hundreds folds depending on the protein target. It appeared that the structural filtration had effectively repaired the deficiencies of the scoring functions that used to overestimate decoy binding, resulting into a considerably lower false positive rate. In addition, the structural filters were also effective in dealing with some deficiencies of the protein structure models that would lead to false negative predictions otherwise. The ability of structural filtration to recover relatively small but specifically bound molecules creates promises for the application of this technology in the fragment-based drug discovery.

  • Prediction of protein-ligand binding affinities using multiple instance learning.
    Teramoto, Reiji and Kashima, Hisashi
    Journal of molecular graphics & modelling, 2010, 29(3), 492-497
    PMID: 20965757     doi: 10.1016/j.jmgm.2010.09.006
    Accurate prediction of protein-ligand binding affinities for lead optimization in drug discovery remains an important and challenging problem on scoring functions for docking simulation. In this paper, we propose a data-driven approach that integrates multiple scoring functions to predict protein-ligand binding affinity directly. We then propose a new method called multiple instance regression based scoring (MIRS) that incorporates unbound ligand conformations using multiple scoring functions. We evaluated the predictive performance of MIRS using 100 protein-ligand complexes and their binding affinities. The experimental results showed that MIRS outperformed the 11 conventional scoring functions including LigScore, PLP, AutoDock, G-Score, D-Score, LUDI, F-Score, ChemScore, X-Score, PMF, and DrugScore. In addition, we confirmed that MIRS performed well on binding pose prediction. Our results reveal that it is indispensable to incorporate unbound ligand conformations in both binding affinity prediction and binding pose prediction. The proposed method will accelerate efficient lead optimization on structure-based drug design and provide a new direction to designing of new scoring score functions.

  • Comparative evaluation of 3D virtual ligand screening methods: impact of the molecular alignment on enrichment.
    Giganti, David and Guillemain, Hélène and Spadoni, Jean-Louis and Nilges, Michael and Zagury, Jean-François and Montes, Matthieu
    Journal of chemical information and modeling, 2010, 50(6), 992-1004
    PMID: 20527883     doi: 10.1021/ci900507g
    In the early stage of drug discovery programs, when the structure of a complex involving a target and a small molecule is available, structure-based virtual ligand screening methods are generally preferred. However, ligand-based strategies like shape-similarity search methods can also be applied. Shape-similarity search methods consist in exploring a pseudo-binding-site derived from the known small molecule used as a reference. Several of these methods use conformational sampling algorithms which are also shared by corresponding docking methods: for example Surflex-dock/Surflex-sim, FlexX/FlexS, ICM, and OMEGA-FRED/OMEGA-ROCS. Using 11 systems issued from the challenging "own" subsets of the Directory of Useful Decoys (DUD-own), we evaluated and compared the performance of the above-cited programs in terms of molecular alignment accuracy, enrichment in active compounds, and enrichment in different chemotypes (scaffold-hopping). Since molecular alignment is a crucial aspect of performance for the different methods, we have assessed its impact on enrichment. We have also illustrated the paradox of retrieving active compounds with good scores even if they are inaccurately positioned. Finally, we have highlighted possible positive aspects of using shape-based approaches in drug-discovery protocols when the structure of the target in complex with a small molecule is known.

  • A fast protein-ligand docking algorithm based on hydrogen bond matching and surface shape complementarity.
    Luo, Wenjia and Pei, Jianfeng and Zhu, Yushan
    Journal of Molecular Modeling, 2010, 16(5), 903-913
    PMID: 19823881     doi: 10.1007/s00894-009-0598-7
    With the rapid development of structural determination of target proteins for human diseases, high throughout virtual screening based drug discovery is gaining popularity gradually. In this paper, a fast docking algorithm (H-DOCK) based on hydrogen bond matching and surface shape complementarity was developed. In H-DOCK, firstly a divide-and-conquer strategy based enumeration approach is applied to rank the intermolecular modes between protein and ligand by maximizing their hydrogen bonds matching, then each docked conformation of the ligand is calculated according to the matched hydrogen bonding geometry, finally a simple but effective scoring function reflecting mainly the van der Waals interaction is used to evaluate the docked conformations of the ligand. H-DOCK is tested for rigid ligand docking and flexible one, the latter is implemented by repeating rigid docking for multiple conformations of a small molecule and ranking all together. For rigid ligands, H-DOCK was tested on a set of 271 complexes where there is at least one intermolecular hydrogen bond, and H-DOCK achieved success rate (RMSD<2.0 A) of 91.1%. For flexible ligands, H-DOCK was tested on another set of 93 complexes, where each case was a conformation ensemble containing native ligand conformation as well as 100 decoy ones generated by AutoDock, and the success rate reached 81.7%. The high success rate of H-DOCK indicates that the hydrogen bonding and steric hindrance can grasp the key interaction between protein and ligand. H-DOCK is quite efficient compared with the conventional docking algorithms, and it takes only about 0.14 seconds for a rigid ligand docking and about 8.25 seconds for a flexible one on average. According to the preliminary docking results, it implies that H-DOCK can be potentially used for large scale virtual screening as a pre-filter for a more accurate but less efficient docking algorithm.

  • Structural ensemble in computational drug screening.
    Fukunishi, Yoshifumi
    Expert opinion on drug metabolism & toxicology, 2010, 6(7), 835-849
    PMID: 20465522     doi: 10.1517/17425255.2010.486399
    Importance of the field: Structure-based in silico drug screening is now widely used in drug development projects. Structure-based in silico drug screening is generally performed using a protein-compound docking program and docking scoring function. Many docking programs have been developed over the last 2 decades, but their prediction accuracy remains insufficient. Areas covered in this review: This review highlights the recent progress of the post-processing of protein-compound complexes after docking. What the reader will gain: These methods utilize ensembles of docking poses of compounds to improve the prediction accuracy for the ligand-docking pose and screening results. While the individual docking poses are not reliable, the free energy surface or the most probable docking pose can be estimated from the ensemble of docking poses. Take home message: The protein-compound docking program provides an arbitral rather than a canonical ensemble of docking poses. When the ensemble of docking poses satisfies the canonical ensemble, we can discuss how these post-docking analysis methods work and fail. Thus, improvements to the docking software will be needed in order to generate well-defined ensembles of docking poses.

  • Advances and challenges in protein-ligand docking.
    Huang, Sheng-You and Zou, Xiaoqin
    International journal of molecular sciences, 2010, 11(8), 3016-3034
    PMID: 21152288     doi: 10.3390/ijms11083016
    Molecular docking is a widely-used computational tool for the study of molecular recognition, which aims to predict the binding mode and binding affinity of a complex formed by two or more constituent molecules with known structures. An important type of molecular docking is protein-ligand docking because of its therapeutic applications in modern structure-based drug design. Here, we review the recent advances of protein flexibility, ligand sampling, and scoring functions-the three important aspects in protein-ligand docking. Challenges and possible future directions are discussed in the Conclusion.

  • Use of the FACTS solvation model for protein-ligand docking calculations. Application to EADock.
    Zoete, Vincent and Grosdidier, Aurélien and Cuendet, Michel and Michielin, Olivier
    Journal of molecular recognition : JMR, 2010, 23(5), 457-461
    PMID: 20101644     doi: 10.1002/jmr.1012
    Protein-ligand docking has made important progress during the last decade and has become a powerful tool for drug development, opening the way to virtual high throughput screening and in silico structure-based ligand design. Despite the flattering picture that has been drawn, recent publications have shown that the docking problem is far from being solved, and that more developments are still needed to achieve high successful prediction rates and accuracy. Introducing an accurate description of the solvation effect upon binding is thought to be essential to achieve this goal. In particular, EADock uses the Generalized Born Molecular Volume 2 (GBMV2) solvent model, which has been shown to reproduce accurately the desolvation energies calculated by solving the Poisson equation. Here, the implementation of the Fast Analytical Continuum Treatment of Solvation (FACTS) as an implicit solvation model in small molecules docking calculations has been assessed using the EADock docking program. Our results strongly support the use of FACTS for docking. The success rates of EADock/FACTS and EADock/GBMV2 are similar, i.e. around 75% for local docking and 65% for blind docking. However, these results come at a much lower computational cost: FACTS is 10 times faster than GBMV2 in calculating the total electrostatic energy, and allows a speed up of EADock by a factor of 4. This study also supports the EADock development strategy relying on the CHARMM package for energy calculations, which enables straightforward implementation and testing of the latest developments in the field of Molecular Modeling.

  • Reducing docking score variations arising from input differences.
    Feher, Miklos and Williams, Christopher I
    Journal of chemical information and modeling, 2010, 50(9), 1549-1560
    PMID: 20698562     doi: 10.1021/ci100204x
    The variability of docking results as a function of variations in ligand input conformations was studied for the GOLD, Glide, FlexX, and Surflex programs. It is concluded that there are two major effects leading to such variability: the adequacy of conformational search during docking and random "chaotic" effects arising from sensitivity to small input perturbations. It is shown that although the former is generally the stronger effect, the latter is also highly significant for almost all docking engines. The strong target-to-target variation of the magnitude of these effects is emphasized. The performance of different packages is compared using these measures. Guidelines are provided for different programs to reduce variability and improve reproducibility, which involve using a small number of input conformations as starting points for docking, followed by the selection of the top scoring docked pose from the results as the best docked solution.

  • SKATE: a docking program that decouples systematic sampling from scoring.
    Feng, Jianwen A and Marshall, Garland R.
    Journal of computational chemistry, 2010, 31(14), 2540-2554
    PMID: 20740553     doi: 10.1002/jcc.21545
    SKATE is a docking prototype that decouples systematic sampling from scoring. This novel approach removes any interdependence between sampling and scoring functions to achieve better sampling and, thus, improves docking accuracy. SKATE systematically samples a ligand's conformational, rotational and translational degrees of freedom, as constrained by a receptor pocket, to find sterically allowed poses. Efficient systematic sampling is achieved by pruning the combinatorial tree using aggregate assembly, discriminant analysis, adaptive sampling, radial sampling, and clustering. Because systematic sampling is decoupled from scoring, the poses generated by SKATE can be ranked by any published, or in-house, scoring function. To test the performance of SKATE, ligands from the Asetex/CDCC set, the Surflex set, and the Vertex set, a total of 266 complexes, were redocked to their respective receptors. The results show that SKATE was able to sample poses within 2 A RMSD of the native structure for 98, 95, and 98% of the cases in the Astex/CDCC, Surflex, and Vertex sets, respectively. Cross-docking accuracy of SKATE was also assessed by docking 10 ligands to thymidine kinase and 73 ligands to cyclin-dependent kinase.

  • Ligand docking and binding site analysis with PyMOL and Autodock/Vina.
    Seeliger, Daniel and de Groot, Bert L
    Journal of computer-aided molecular design, 2010, 24(5), 417-422
    PMID: 20401516     doi: 10.1007/s10822-010-9352-6
    Docking of small molecule compounds into the binding site of a receptor and estimating the binding affinity of the complex is an important part of the structure-based drug design process. For a thorough understanding of the structural principles that determine the strength of a protein/ligand complex both, an accurate and fast docking protocol and the ability to visualize binding geometries and interactions are mandatory. Here we present an interface between the popular molecular graphics system PyMOL and the molecular docking suites Autodock and Vina and demonstrate how the combination of docking and visualization can aid structure-based drug design efforts.

  • AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.
    Trott, Oleg and Olson, Arthur J
    Journal of computational chemistry, 2010, 31(2), 455-461
    PMID: 19499576     doi: 10.1002/jcc.21334
    AutoDock Vina, a new program for molecular docking and virtual screening, is presented. AutoDock Vina achieves an approximately two orders of magnitude speed-up compared with the molecular docking software previously developed in our lab (AutoDock 4), while also significantly improving the accuracy of the binding mode predictions, judging by our tests on the training set used in AutoDock 4 development. Further speed-up is achieved from parallelism, by using multithreading on multicore machines. AutoDock Vina automatically calculates the grid maps and clusters the results in a way transparent to the user.

  • ParaDockS: a framework for molecular docking with population-based metaheuristics.
    Meier, René and Pippel, Martin and Brandt, Frank and Sippl, Wolfgang and Baldauf, Carsten
    Journal of chemical information and modeling, 2010, 50(5), 879-889
    PMID: 20415499     doi: 10.1021/ci900467x
    Molecular docking is a simulation technique that aims to predict the binding pose between a ligand and a receptor. The resulting multidimensional continuous optimization problem is practically unsolvable in an exact way. One possible approach is the combination of an optimization algorithm and an objective function that describes the interaction. The software ParaDockS is designed to hold different optimization algorithms and objective functions. At the current stage, an adapted particle-swarm optimizer (PSO) is implemented. Available objective functions are (i) the empirical objective function p-Score and (ii) an adapted version of the knowledge-based potential PMF04. We tested the docking accuracy in terms of reproducing known crystal structures from the PDBbind core set. For 73% of the test instances the native binding mode was found with an rmsd below 2 A. The virtual screening efficiency was tested with a subset of 13 targets and the respective ligands and decoys from the directory of useful decoys (DUD). ParaDockS with PMF04 shows a superior early enrichment. The here presented approach can be employed for molecular docking experiments and virtual screenings of large compound libraries in academia as well as in industrial research and development. The performance in terms of accuracy and enrichment is close to the results of commercial software solutions.

  • Evaluation of the Performance of Four Molecular Docking Programs on a Diverse Set of Protein-Ligand Complexes
    Li, Xun and Li, Yan and Cheng, Tiejun and Liu, Zhihai and Wang, Renxiao
    Journal of computational chemistry, 2010, 31(11), 2109-2125
    PMID: 20127741     doi: 10.1002/jcc.21498
    Many molecular docking programs are available nowadays, and thus it is of great practical value to evaluate and compare their performance We have conducted an extensive evaluation of four popular commercial molecular docking programs, including Glide, GOLD. LigandFit. and Surflex Our test set consists of 195 protein-ligand complexes with high-resolution crystal structures (resolution <

  • Virtual fragment docking by Glide: a validation study on 190 protein-fragment complexes.
    Sándor, Márk and Kiss, Róbert and Keseru, György M
    Journal of chemical information and modeling, 2010, 50(6), 1165-1172
    PMID: 20459088     doi: 10.1021/ci1000407
    The docking accuracy of Glide was evaluated using 16 different docking protocols on 190 protein-fragment complexes representing 78 targets. Standard precision docking (Glide SP) based protocols showed the best performance. The average root-mean-square deviation (rmsd) between the docked and cocrystallized poses achieved by Glide SP with pre- and postprocessing was 1.17 A, and an acceptable binding mode with rmsd < 2 A could be found in 80% of the cases. Comparison of the docking results produced by different protocols suggests that the sampling efficacy of Glide is adequate for fragment docking. The docking accuracy seems to be limited by the performance of scoring schemes, which is supported by the weak correlation between experimental binding affinities and GlideScores. Cross-docking experiments performed on 8 targets represented by 63 complexes revealed that Glide SP gave similar results to that of the computationally more intensive Glide XP. The average rmsd achieved by Glide SP with pre- and postprocessing was 2.06 A, and an acceptable binding mode with rmsd < 2 A could be found in 63% of the cases. These cross-docking results were improved significantly selecting the optimal X-ray structure for each target (average rmsd

  • Comparison of three preprocessing filters efficiency in virtual screening: identification of new putative LXRbeta regulators as a test case.
    Ghemtio, Léo and Devignes, Marie-Dominique and Smaïl-Tabbone, Malika and Souchet, Michel and Leroux, Vincent and Maigret, Bernard
    Journal of chemical information and modeling, 2010, 50(5), 701-715
    PMID: 20420434     doi: 10.1021/ci900356m
    In silico screening methodologies are widely recognized as efficient approaches in early steps of drug discovery. However, in the virtual high-throughput screening (VHTS) context, where hit compounds are searched among millions of candidates, three-dimensional comparison techniques and knowledge discovery from databases should offer a better efficiency to finding novel drug leads than those of computationally expensive molecular dockings. Therefore, the present study aims at developing a filtering methodology to efficiently eliminate unsuitable compounds in VHTS process. Several filters are evaluated in this paper. The first two are structure-based and rely on either geometrical docking or pharmacophore depiction. The third filter is ligand-based and uses knowledge-based and fingerprint similarity techniques. These filtering methods were tested with the Liver X Receptor (LXR) as a target of therapeutic interest, as LXR is a key regulator in maintaining cholesterol homeostasis. The results show that the three considered filters are complementary so that their combination should generate consistent compound lists of potential hits.

  • VSDocker: a tool for parallel high-throughput virtual screening using AutoDock on Windows-based computer clusters
    Prakhov, Nikita D. and Chernorudskiy, Alexander L. and Gainullin, Murat R.
    Bioinformatics (Oxford, England), 2010, 26(10), 1374-1375
    PMID: 20378556     doi: 10.1093/bioinformatics/btq149
    VSDocker is an original program that allows using AutoDock4 for optimized virtual ligand screening on computer clusters or multiprocessor workstations. This tool is the first implementation of parallel high-performance virtual screening of ligands for MS Windows-based computer systems.

  • Comparison of Structure- and Ligand-Based Virtual Screening Protocols Considering Hit List Complementarity and Enrichment Factors
    Krueger, Dennis M. and Evers, Andreas
    Chemmedchem, 2010, 5(1), 148-158
    PMID: 19908272     doi: 10.1002/cmdc.200900314
    Structure- and ligand-based virtual-screening methods (clocking, 2D- and 3D-similarity searching) were analysed for their effectiveness in virtual screening against four different targets: angiotensin-converting enzyme (ACE), cyclooxygenase 2 (COX-1 2), thrombin and human immunodeficiency virus I (HIV-1) protease. The relative performance of the tools was compared by examining their ability to recognise known active compounds from a set of actives and nonactives. Furthermore, we investigated whether the application of different virtual-screening methods in parallel provides complementary or redundant hit lists. Docking was performed with GOLD, Glide, FlexX and Surflex. The obtained docking poses were rescored by using nine different scoring functions in addition to the scoring functions implemented as objective functions in the docking algorithms. Ligand-based virtual screening was done with ROCS (3D-similarity searching), Feature Trees and Scitegic Functional Fingerprints (2D-similarity searching). The results show that structure- and ligand-based virtual-screening methods provide comparable enrichments in detecting active compounds. Interestingly, the hit lists that are obtained from different virtual-screening methods are generally highly complementary. These results suggest that a parallel application of different structure- and ligand-based virtual-screening methods increases the chance of identifying more (and more diverse) active compounds from a virtual-screening campaign.

  • Virtual screening: an endless staircase?
    Schneider, Gisbert
    Nature reviews. Drug discovery, 2010, 9(4), 273-276
    PMID: 20357802     doi: 10.1038/nrd3139
    Computational chemistry - in particular, virtual screening - can provide valuable contributions in hit- and lead-compound discovery. Numerous software tools have been developed for this purpose. However, despite the applicability of virtual screening technology being well established, it seems that there are relatively few examples of drug discovery projects in which virtual screening has been the key contributor. Has virtual screening reached its peak? If not, what aspects are limiting its potential at present, and how can significant progress be made in the future?

  • NNScore: a neural-network-based scoring function for the characterization of protein-ligand complexes.
    Durrant, Jacob D and McCammon, J Andrew
    Journal of chemical information and modeling, 2010, 50(10), 1865-1871
    PMID: 20845954     doi: 10.1021/ci100244v
    As high-throughput biochemical screens are both expensive and labor intensive, researchers in academia and industry are turning increasingly to virtual-screening methodologies. Virtual screening relies on scoring functions to quickly assess ligand potency. Although useful for in silico ligand identification, these scoring functions generally give many false positives and negatives; indeed, a properly trained human being can often assess ligand potency by visual inspection with greater accuracy. Given the success of the human mind at protein-ligand complex characterization, we present here a scoring function based on a neural network, a computational model that attempts to simulate, albeit inadequately, the microscopic organization of the brain. Computer-aided drug design depends on fast and accurate scoring functions to aid in the identification of small-molecule ligands. The scoring function presented here, used either on its own or in conjunction with other more traditional functions, could prove useful in future drug-discovery efforts.

  • Docking Validation Resources: Protein Family and Ligand Flexibility Experiments
    Mukherjee, Sudipto and Balius, Trent E and Rizzo, Robert C
    Journal of chemical information and modeling, 2010, 50(11), 1986-2000
    PMID: 21033739     doi: 10.1021/ci1001982
    A database consisting of 780 ligand-receptor complexes, termed SB2010, has been derived from the Protein Databank to evaluate the accuracy of docking protocols for regenerating bound ligand conformations. The goal is to provide easily accessible community resources for development of improved procedures to aid virtual screening for ligands with a wide range of flexibilities. Three core experiments using the program DOCK, which employ rigid (ROD), fixed anchor (FAD), and flexible (FLX) protocols, were used to gauge performance by several different metrics: (I) global results, (2) ligand flexibility, (3) protein family, and (4) cross-docking. Global spectrum plots of successes and failures vs rmsd reveal well-defined inflection regions, which suggest the commonly used 2 angstrom criteria is a reasonable choice for defining success. Across all 780 systems, success tracks with the relative difficulty of the calculations: RGD (82.3%) > FAD (78.1%) > FLX (63.8%). In general, failures due to scoring strongly outweigh those due to sampling. Subsets of SB2010 grouped by ligand flexibility (7-or-less, 8-to-15, and 15-plus rotatable bonds) reveal that success degrades linearly for FAD and FLX protocols, in contrast to ROD, which remains constant. Despite the challenges associated with FLX anchor orientation and on-the-fly flexible growth, success rates for the 7-or-less (74.5%) and, in particular, the 8-to-15 (55.2%) subset are encouraging. Poorer results for the very flexible 15-plus set (39.3%) indicate substantial room for improvement. Family-based success appears largely independent of ligand flexibility, suggesting a strong dependence on the binding site environment. For example, zinc-containing proteins are generally problematic, despite moderately flexible ligands. Finally, representative cross-docking examples, for carbonic anhydrase, thermolysin, and neuraminidase families, show the utility of family-based analysis for rapid identification of particularly good or bad docking trends, and the type of failures involved (scoring/sampling), which will likely be of interest to researchers making specific receptor choices for virtual screening. SB2010 is available for download at

  • Blind docking method combining search of low-resolution binding sites with ligand pose refinement by molecular dynamics-based global optimization.
    Vorobjev, Yury N
    Journal of computational chemistry, 2010, 31(5), 1080-1092
    PMID: 19821514     doi: 10.1002/jcc.21394
    This study describes the development of a new blind hierarchical docking method, bhDock, its implementation, and accuracy assessment. The bhDock method uses two-step algorithm. First, a comprehensive set of low-resolution binding sites is determined by analyzing entire protein surface and ranked by a simple score function. Second, ligand position is determined via a molecular dynamics-based method of global optimization starting from a small set of high ranked low-resolution binding sites. The refinement of the ligand binding pose starts from uniformly distributed multiple initial ligand orientations and uses simulated annealing molecular dynamics coupled with guided force-field deformation of protein-ligand interactions to find the global minimum. Assessment of the bhDock method on the set of 37 protein-ligand complexes has shown the success rate of predictions of 78%, which is better than the rate reported for the most cited docking methods, such as AutoDock, DOCK, GOLD, and FlexX, on the same set of complexes.

  • Exploring hierarchical refinement techniques for induced fit docking with protein and ligand flexibility.
    Borrelli, Kenneth W and Cossins, Benjamin and Guallar, Victor
    Journal of computational chemistry, 2010, 31(6), 1224-1235
    PMID: 19885871     doi: 10.1002/jcc.21409
    We present a series of molecular-mechanics-based protein refinement methods, including two novel ones, applied as part of an induced fit docking procedure. The methods used include minimization; protein and ligand sidechain prediction; a hierarchical ligand placement procedure similar to a-priori protein loop predictions; and a minimized Monte Carlo approach using normal mode analysis as a move step. The results clearly indicate the importance of a proper opening of the active site backbone, which might not be accomplished when the ligand degrees of freedom are prioritized. The most accurate method consisted of the minimized Monte Carlo procedure designed to open the active site followed by a hierarchical optimization of the sidechain packing around a mobile flexible ligand. The methods have been used on a series of 88 protein-ligand complexes including both cross-docking and apo-docking members resulting in complex conformations determined to within 2.0 A heavy-atom RMSD in 75% of cases where the protein backbone rearrangement upon binding is less than 1.0 A alpha-carbon RMSD. We also demonstrate that physics-based all-atom potentials can be more accurate than docking-style potentials when complexes are sufficiently refined.

  • A new Lamarckian genetic algorithm for flexible ligand-receptor docking.
    Fuhrmann, Jan and Rurainski, Alexander and Lenhof, Hans-Peter and Neumann, Dirk
    Journal of computational chemistry, 2010, 31(9), 1911-1918
    PMID: 20082382     doi: 10.1002/jcc.21478
    We present a Lamarckian genetic algorithm (LGA) variant for flexible ligand-receptor docking which allows to handle a large number of degrees of freedom. Our hybrid method combines a multi-deme LGA with a recently published gradient-based method for local optimization of molecular complexes. We compared the performance of our new hybrid method to two non gradient-based search heuristics on the Astex diverse set for flexible ligand-receptor docking. Our results show that the novel approach is clearly superior to other LGAs employing a stochastic optimization method. The new algorithm features a shorter run time and gives substantially better results, especially with increasing complexity of the ligands. Thus, it may be used to dock ligands with many rotatable bonds with high efficiency.

  • Improved docking, screening and selectivity prediction for small molecule nuclear receptor modulators using conformational ensembles.
    Park, So-Jung and Kufareva, Irina and Abagyan, Ruben
    Journal of computer-aided molecular design, 2010, 24(5), 459-471
    PMID: 20455005     doi: 10.1007/s10822-010-9362-4
    Nuclear receptors (NRs) are ligand dependent transcriptional factors and play a key role in reproduction, development, and homeostasis of organism. NRs are potential targets for treatment of cancer and other diseases such as inflammatory diseases, and diabetes. In this study, we present a comprehensive library of pocket conformational ensembles of thirteen human nuclear receptors (NRs), and test the ability of these ensembles to recognize their ligands in virtual screening, as well as predict their binding geometry, functional type, and relative binding affinity. 157 known NR modulators and 66 structures were used as a benchmark. Our pocket ensemble library correctly predicted the ligand binding poses in 94% of the cases. The models were also highly selective for the active ligands in virtual screening, with the areas under the ROC curves ranging from 82 to a remarkable 99%. Using the computationally determined receptor-specific binding energy offsets, we showed that the ensembles can be used for predicting selectivity profiles of NR ligands. Our results evaluate and demonstrate the advantages of using receptor ensembles for compound docking, screening, and profiling.

  • Multiple ligand simultaneous docking: orchestrated dancing of ligands in binding sites of protein.
    Li, Huameng and Li, Chenglong
    Journal of computational chemistry, 2010, 31(10), 2014-2022
    PMID: 20166125     doi: 10.1002/jcc.21486
    Present docking methodologies simulate only one single ligand at a time during docking process. In reality, the molecular recognition process always involves multiple molecular species. Typical protein-ligand interactions are, for example, substrate and cofactor in catalytic cycle; metal ion coordination together with ligand(s); and ligand binding with water molecules. To simulate the real molecular binding processes, we propose a novel multiple ligand simultaneous docking (MLSD) strategy, which can deal with all the above processes, vastly improving docking sampling and binding free energy scoring. The work also compares two search strategies: Lamarckian genetic algorithm and particle swarm optimization, which have respective advantages depending on the specific systems. The methodology proves robust through systematic testing against several diverse model systems: E. coli purine nucleoside phosphorylase (PNP) complex with two substrates, SHP2NSH2 complex with two peptides and Bcl-xL complex with ABT-737 fragments. In all cases, the final correct docking poses and relative binding free energies were obtained. In PNP case, the simulations also capture the binding intermediates and reveal the binding dynamics during the recognition processes, which are consistent with the proposed enzymatic mechanism. In the other two cases, conventional single-ligand docking fails due to energetic and dynamic coupling among ligands, whereas MLSD results in the correct binding modes. These three cases also represent potential applications in the areas of exploring enzymatic mechanism, interpreting noisy X-ray crystallographic maps, and aiding fragment-based drug design, respectively.

  • An interaction-motif-based scoring function for protein-ligand docking.
    Xie, Zhong-Ru and Hwang, Ming-Jing
    Bmc Bioinformatics, 2010, 11, 298
    PMID: 20525216     doi: 10.1186/1471-2105-11-298
    BACKGROUND:A good scoring function is essential for molecular docking computations. In conventional scoring functions, energy terms modeling pairwise interactions are cumulatively summed, and the best docking solution is selected. Here, we propose to transform protein-ligand interactions into three-dimensional geometric networks, from which recurring network substructures, or network motifs, are selected and used to provide probability-ranked interaction templates with which to score docking solutions.


  • Elastic potential grids: accurate and efficient representation of intermolecular interactions for fully flexible docking.
    Kazemi, Sina and Krüger, Dennis M and Sirockin, Finton and Gohlke, Holger
    Chemmedchem, 2009, 4(8), 1264-1268
    PMID: 19514026     doi: 10.1002/cmdc.200900146

  • Scoring confidence index: statistical evaluation of ligand binding mode predictions.
    Zavodszky, Maria I and Stumpff-Kane, Andrew W and Lee, David J and Feig, Michael
    Journal of computer-aided molecular design, 2009, 23(5), 289-299
    PMID: 19153808     doi: 10.1007/s10822-008-9258-8
    Protein-ligand docking programs can generate a large number of possible binding orientations for each ligand candidate. The challenge is to identify the orientations closest to the native binding mode using a scoring method. Many different scoring functions have been developed for protein-ligand scoring, but their performance on binding mode prediction is often target-dependent. In this study, a statistical approach was employed to provide a confidence measure of scoring performance in finding close to the correct docked ligand orientations. It exploits the fact that the scores provided by an adequately performing scoring function generally improve as the ligand binding modes get closer to the correct native orientation. For such cases, the correlation coefficient of scores versus distances is expected to be highest when the most native-like orientation is used as a reference. This correlation coefficient, called the correlation-based score (CBScore), was used as an indicator of how far the docked pose was from the native orientation. The correlation between the original scores and CBScores as well as the range of CBScores were found to be good measures of scoring performance. They were combined into a single quantity, called the scoring confidence index. High values of the scoring confidence index were indicative of pronounced and relatively smooth binding energy landscapes with easily discernable global minima, resulting in reliable binding mode predictions. Low values of this index reflected rugged energy landscapes making the prediction of the correct binding mode very difficult and often unreliable. The diagnostic ability of the scoring confidence index was tested on a non-redundant set of 50 protein-ligand complexes scored with three commonly employed scoring functions: AffiScore, DrugScore and X-Score. Binding mode predictions were found to be three times more reliable for complexes with scoring confidence indices in the upper half than for cases with values in the lower half of the resulting range of 0-1.6. This new confidence measure of scoring performance is expected to be a valuable tool for virtual screening applications.

  • Docking Screens: Right for the Right Reasons?
    Kolb, Peter and Irwin, John J
    Current topics in medicinal chemistry, 2009, 9(9), 755-770
    Whereas docking screens have emerged as the most practical way to use protein structure for ligand discovery, an inconsistent track record raises questions about how well docking actually works. In its favor, a growing number of publications report the successful discovery of new ligands, often supported by experimental affinity data and controls for artifacts. Few reports, however, actually test the underlying structural hypotheses that docking makes. To be successful and not just lucky, prospective docking must not only rank a true ligand among the top scoring compounds, it must also correctly orient the ligand so the score it receives is biophysically sound. If the correct binding pose is not predicted, a skeptic might well infer that the discovery was serendipitous. Surveying over 15 years of the docking literature, we were surprised to discover how rarely sufficient evidence is presented to establish whether docking actually worked for the right reasons. The paucity of experimental tests of theoretically predicted poses undermines confidence in a technique that has otherwise become widely accepted. Of course, solving a crystal structure is not always possible, and even when it is, it can be a lot of work, and is not readily accessible to all groups. Even when a structure can be determined, investigators may prefer to gloss over an erroneous structural prediction to better focus on their discovery. Still, the absence of a direct test of theory by experiment is a loss for method developers seeking to understand and improve docking methods. We hope this review will motivate investigators to solve structures and compare them with their predictions whenever possible, to advance the field.

  • FINDSITE: a threading-based approach to ligand homology modeling.
    Brylinski, Michal and Skolnick, Jeffrey
    PLoS computational biology, 2009, 5(6), e1000405
    PMID: 19503616     doi: 10.1371/journal.pcbi.1000405
    Ligand virtual screening is a widely used tool to assist in new pharmaceutical discovery. In practice, virtual screening approaches have a number of limitations, and the development of new methodologies is required. Previously, we showed that remotely related proteins identified by threading often share a common binding site occupied by chemically similar ligands. Here, we demonstrate that across an evolutionarily related, but distant family of proteins, the ligands that bind to the common binding site contain a set of strongly conserved anchor functional groups as well as a variable region that accounts for their binding specificity. Furthermore, the sequence and structure conservation of residues contacting the anchor functional groups is significantly higher than those contacting ligand variable regions. Exploiting these insights, we developed FINDSITE(LHM) that employs structural information extracted from weakly related proteins to perform rapid ligand docking by homology modeling. In large scale benchmarking, using the predicted anchor-binding mode and the crystal structure of the receptor, FINDSITE(LHM) outperforms classical docking approaches with an average ligand RMSD from native of approximately 2.5 A. For weakly homologous receptor protein models, using FINDSITE(LHM), the fraction of recovered binding residues and specific contacts is 0.66 (0.55) and 0.49 (0.38) for highly confident (all) targets, respectively. Finally, in virtual screening for HIV-1 protease inhibitors, using similarity to the ligand anchor region yields significantly improved enrichment factors. Thus, the rather accurate, computationally inexpensive FINDSITE(LHM) algorithm should be a useful approach to assist in the discovery of novel biopharmaceuticals.

  • Four-dimensional docking: a fast and accurate account of discrete receptor flexibility in ligand docking.
    Bottegoni, Giovanni and Kufareva, Irina and Totrov, Maxim and Abagyan, Ruben
    Journal of medicinal chemistry, 2009, 52(2), 397-406
    PMID: 19090659     doi: 10.1021/jm8009958
    Many available methods aimed at incorporating the receptor flexibility in ligand docking are computationally expensive, require a high level of user intervention, and were tested only on benchmarks of limited size and diversity. Here we describe the four-dimensional (4D) docking approach that allows seamless incorporation of receptor conformational ensembles in a single docking simulation and reduces the sampling time while preserving the accuracy of traditional ensemble docking. The approach was tested on a benchmark of 99 therapeutically relevant proteins and 300 diverse ligands (half of them experimental or marketed drugs). The conformational variability of the binding pockets was represented by the available crystallographic data, with the total of 1113 receptor structures. The 4D docking method reproduced the correct ligand binding geometry in 77.3% of the benchmark cases, matching the success rate of the traditional approach but employed on average only one-fourth of the time during the ligand sampling phase.

  • Docking and chemoinformatic screens for new ligands and targets
    Kolb, Peter and Ferreira, Rafaela S and Irwin, John J and Shoichet, Brian K
    Current Opinion in Biotechnology, 2009, 20(4), 429-436
    doi: 10.1016/j.copbio.2009.08.003
    ... rate of 24% [19 * ] (Figure 3). Intriguingly, five of these were inverse agonists, as was the ligand bound in the X-ray structure, carazolol, against which the screen occurred. ... This is borne out in a community-wide, blind assessment (GPCR Dock 2008 [41]) of the prediction of the ...

  • Docking and chemoinformatic screens for new ligands and targets
    Kolb, Peter and Ferreira, Rafaela S and Irwin, John J and Shoichet, Brian K
    Current Opinion in Biotechnology, 2009, 20(4), 429-436
    doi: 10.1016/j.copbio.2009.08.003
    ... rate of 24% [19 * ] (Figure 3). Intriguingly, five of these were inverse agonists, as was the ligand bound in the X-ray structure, carazolol, against which the screen occurred. ... This is borne out in a community-wide, blind assessment (GPCR Dock 2008 [41]) of the prediction of the ...

  • Predicting multiple ligand binding modes using self-consistent pharmacophore hypotheses.
    Wallach, Izhar and Lilien, Ryan
    Journal of chemical information and modeling, 2009, 49(9), 2116-2128
    PMID: 19711952     doi: 10.1021/ci900199e
    The ability to predict ligand binding modes without the aid of wet-lab experiments may accelerate and reduce the cost of drug discovery research. Despite significant recent progress, virtual screening has not yet eliminated the need for wet-lab experiments. For example, after a lead compound has been identified, the precise binding mode is still typically determined by experimental structural biology. This structural knowledge is then employed to guide lead optimization. We present a step toward improving protein-ligand binding mode prediction for a set of ligands known to interact with a common protein. There is thus an important distinction between this work and traditional virtual screening algorithms. Whereas traditional approaches attempt to identify binding ligands from a large database of available compounds, our approach aims to more accurately predict the binding mode for a set of ligands which are already known to bind the target protein. The approach is based on the hypothesis that each active site contains a set of interaction points which binding ligands tend to exploit. In a more traditional context, these interaction points make up a pharmacophoric map. Our algorithm first performs traditional protein-ligand docking for each known binder. The ranked lists of candidate binding modes are then evaluated to identify a set of poses maximally self-consistent with respect to a pharmacophoric map generated from the same poses. We have extensively demonstrated the application of the algorithm to four protein systems (thrombin, cyclin-dependent kinase 2, dihydrofolate reductase, and HIV-1 protease) and attained predictions with an average RMSD < 2.5 A for all tested systems. This represents a typical improvement of 0.5-1.0 A (up to 25%) RMSD over the naive virtual docking predictions. Our algorithm is independent of the docking method and may significantly improve binding mode prediction of virtual docking experiments.

  • Improving virtual screening performance against conformational variations of receptors by shape matching with ligand binding pocket.
    Lee, Hui Sun and Lee, Cheol Soon and Kim, Jeong Sook and Kim, Dong Hou and Choe, Han
    Journal of chemical information and modeling, 2009, 49(11), 2419-2428
    PMID: 19852439     doi: 10.1021/ci9002365
    In this report, we present a novel virtual high-throughput screening methodology to assist in computer-aided drug discovery. Our method, designated as SLIM, involves ligand-free shape and chemical feature matching. The procedure takes advantage of a negative image of a binding pocket in a target receptor. The negative image is a set of virtual atoms representing the inner shape and chemical features of the binding pocket. Using this image, SLIM implements a shape-based similarity search based on molecular volume superposition for the ensemble of conformers of each molecule. The superposed structures, prioritized by shape similarity, are subjected to comparison of chemical feature similarities. To validate the merits of the SLIM method, we compared its performance with those of three distinct widely used tools ROCS, GLIDE, and GOLD. ROCS was selected as a representative of the ligand-centric methods, and docking programs GLIDE and GOLD as representatives of the receptor-centric methods. Our data suggest that SLIM has overall hit ranking ability that is comparable to that of the docking method, retaining the high computational speed of the ligand-centric method. It is notable that the SLIM method offers consistently reliable screening quality against conformational variations of receptors, whereas the docking methods have limited screening performance.

  • Scoring ligand similarity in structure-based virtual screening.
    Zavodszky, Maria I and Rohatgi, Anjali and Van Voorst, Jeffrey R and Yan, Honggao and Kuhn, Leslie A
    Journal of molecular recognition : JMR, 2009, 22(4), 280-292
    PMID: 19235177     doi: 10.1002/jmr.942
    Scoring to identify high-affinity compounds remains a challenge in virtual screening. On one hand, protein-ligand scoring focuses on weighting favorable and unfavorable interactions between the two molecules. Ligand-based scoring, on the other hand, focuses on how well the shape and chemistry of each ligand candidate overlay on a three-dimensional reference ligand. Our hypothesis is that a hybrid approach, using ligand-based scoring to rank dockings selected by protein-ligand scoring, can ensure that high-ranking molecules mimic the shape and chemistry of a known ligand while also complementing the binding site. Results from applying this approach to screen nearly 70 000 National Cancer Institute (NCI) compounds for thrombin inhibitors tend to support the hypothesis. EON ligand-based ranking of docked molecules yielded the majority (4/5) of newly discovered, low to mid-micromolar inhibitors from a panel of 27 assayed compounds, whereas ranking docked compounds by protein-ligand scoring alone resulted in one new inhibitor. Since the results depend on the choice of scoring function, an analysis of properties was performed on the top-scoring docked compounds according to five different protein-ligand scoring functions, plus EON scoring using three different reference compounds. The results indicate that the choice of scoring function, even among scoring functions measuring the same types of interactions, can have an unexpectedly large effect on which compounds are chosen from screening. Furthermore, there was almost no overlap between the top-scoring compounds from protein-ligand versus ligand-based scoring, indicating the two approaches provide complementary information. Matchprint analysis, a new addition to the SLIDE (Screening Ligands by Induced-fit Docking, Efficiently) screening toolset, facilitated comparison of docked molecules' interactions with those of known inhibitors. The majority of interactions conserved among top-scoring compounds for a given scoring function, and from the different scoring functions, proved to be conserved interactions in known inhibitors. This was particularly true in the S1 pocket, which was occupied by all the docked compounds.

  • PLATINUM: a web tool for analysis of hydrophobic/hydrophilic organization of biomolecular complexes.
    Pyrkov, Timothy V and Chugunov, Anton O and Krylov, Nikolay A and Nolde, Dmitry E and Efremov, Roman G
    Bioinformatics (Oxford, England), 2009, 25(9), 1201-1202
    PMID: 19244385     doi: 10.1093/bioinformatics/btp111
    The PLATINUM (Protein-Ligand ATtractions Investigation NUMerically) web service is designed for analysis and visualization of hydrophobic/hydrophilic properties of biomolecules supplied as 3D-structures. Furthermore, PLATINUM provides a number of tools for quantitative characterization of the hydrophobic/hydrophilic match in biomolecular complexes e.g. in docking poses. These complement standard scoring functions. The calculations are based on the concept of empirical Molecular Hydrophobicity Potential (MHP). AVAILABILITY: The PLATINUM web tool as well as detailed documentation and tutorial are available free of charge for academic users at PLATINUM requires Java 5 or higher and Adobe Flash Player 9. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • Assessment of QM/MM scoring functions for molecular docking to HIV-1 protease.
    Fong, Pedro and McNamara, Jonathan P and Hillier, Ian H and Bryce, Richard A
    Journal of chemical information and modeling, 2009, 49(4), 913-924
    PMID: 19309119     doi: 10.1021/ci800432s
    We explore the ability of four quantum mechanical (QM)/molecular mechanical (MM) models to accurately identify the native pose of six HIV-1 protease inhibitors and compare them with the AMBER force field and ChemScore and GoldScore scoring functions. Three QM/MM scoring functions treated the ligand at the HF/6-31G*, AM1d, and PM3 levels; the fourth QM/MM function modeled the ligand and active site at the PM3-D level. For the discrimination of native from non-native poses, solvent-corrected HF/6-31G*:AMBER and AMBER functions exhibited the best overall performance. While the electrostatic component of the MM and QM/MM functions appears important for discriminating the native pose of the ligand, the polarization contribution in the QM/MM functions was relatively insensitive to a ligand's binding mode and, for one ligand, actually hindered discrimination. The inclusion of a desolvation penalty, here using a generalized Born solvent model, improved discrimination for the MM and QM/MM methods. There appeared to be no advantage to binding mode prediction by incorporating active site polarization at the PM3-D level. Finally, we found that choice of the protonation state of the aspartyl dyad in the HIV-1 protease active site influenced the ability of scoring methods to determine the native binding pose.

  • Docking ligands into flexible and solvated macromolecules. 4. Are popular scoring functions accurate for this class of proteins?
    Englebienne, Pablo and Moitessier, Nicolas
    Journal of chemical information and modeling, 2009, 49(6), 1568-1580
    PMID: 19445499     doi: 10.1021/ci8004308
    In our previous report, we investigated the impact of protein flexibility and the presence of water molecules on the pose-prediction accuracy of major docking programs. To complete these investigations, we report herein a study of the impact of these two aspects on the accuracy of scoring functions. To this effect, we developed two sets of protein/ligand complexes made up of ligands cross-docked or cocrystallized with a large variety of proteins, featuring bridging water molecules and demonstrating protein flexibility. Efforts were made to reduce the correlation between the molecular weights of the selected ligands and their binding affinities, a major bias in some previously reported benchmark sets. Using these sets, 18 available scoring functions have been assessed for their accuracy to predict binding affinities and to rank-order compounds by their affinity to cocrystallized proteins. This study confirmed the good and similar accuracy of Xscore, GlideScore, DrugScore(CSD), GoldScore, PLP1, ChemScore, RankScore, and the eHiTS scoring function. Our next investigations demonstrated that most of the assessed scoring functions were much less accurate when the correct protein conformation was not provided. This study also revealed that considering the water molecules for scoring does not greatly affect the accuracy. Finally, this work sheds light on the high correlation between scoring functions and the poor increase in accuracy one can expect from consensus scoring.

  • Docking ligands into flexible and solvated macromolecules. 5. Force-field-based prediction of binding affinities of ligands to proteins.
    Englebienne, Pablo and Moitessier, Nicolas
    Journal of chemical information and modeling, 2009, 49(11), 2564-2571
    PMID: 19928836     doi: 10.1021/ci900251k
    We report herein our efforts in the development of three empirical scoring functions with application in protein-ligand docking. A first scoring function was developed from 209 crystal structures of protein-ligand complexes and a second one from 946 cross-docked complexes. Tuning of the coefficients for the different terms making up these functions was performed by an iterative approach to optimize the correlations between observed activities and calculated scores. A third scoring function was developed from libraries of known actives and decoys docked to six different protein conformational ensembles. In the latter case, the tuning of the coefficients was performed so as to optimize the area under the curve of a receiver operating characteristic (ROC) for the discrimination of actives and inactives. The newly developed scoring functions were next assessed on independent sets of protein-ligand complexes for their ability to predict binding affinities and to discriminate actives from inactives. In the first validation the first function, which was trained on active compounds only, performed as well as other commonly used ones. On a high-throughput virtual screening validation on five protein conformational ensembles, the third scoring function that included data from inactive compounds performed significantly better. This validation showed that the inclusion of data from inactive compounds is critical for performance in virtual high-throughput screening applications.

  • APIF: a new interaction fingerprint based on atom pairs and its application to virtual screening.
    Pérez-Nueno, Violeta I and Rabal, Obdulia and Borrell, José I and Teixidó, Jordi
    Journal of chemical information and modeling, 2009, 49(5), 1245-1260
    PMID: 19364101     doi: 10.1021/ci900043r
    A new interaction fingerprint (IF) called APIF (atom-pairs-based interaction fingerprint) has been developed for postprocessing protein-ligand docking results. Unlike other existing fingerprints which employ absolute locations of individual interactions, APIF considers the relative positions of pairs of interacting atoms. Docking-based virtual screening was performed with GOLD using the crystal structures of trypsin, rhinovirus, HIV protease, carboxypeptidase, and estrogen receptor-alpha as targets. A score derived from the similarity of the bit strings for each docking solution to that of a known reference binding mode was obtained. Comparisons between APIF, GoldScore function, and standard interaction fingerprint (CHIF) scores were performed using enrichment plots. Superior recovery rates were observed in the IF score cases. Comparable results were achieved by using either of the two interaction fingerprints, substantially improving GoldScore function enrichment factors. Binding mode analyses were also carried out in order to study the best method for selecting conformations with a binding mode similar to that of the reference crystallized complex. These showed that the first conformations retrieved by interaction fingerprint scores had a more similar binding mode to the reference complex than those retrieved by the GoldScore function.

  • Carborane clusters in computational drug design: a comparative docking evaluation using AutoDock, FlexX, Glide, and Surflex.
    Tiwari, Rohit and Mahasenan, Kiran and Pavlovicz, Ryan and Li, Chenglong and Tjarks, Werner
    Journal of chemical information and modeling, 2009, 49(6), 1581-1589
    PMID: 19449853     doi: 10.1021/ci900031y
    Compounds containing boron atoms play increasingly important roles in the therapy and diagnosis of various diseases, particularly cancer. However, computational drug design of boron-containing therapeutics and diagnostics is hampered by the fact that many software packages used for this purpose lack parameters for all or part of the various types of boron atoms. In the present paper, we describe simple and efficient strategies to overcome this problem, which are based on the replacement of boron atom types with carbon atom types. The developed methods were validated by docking closo- and nido-carboranyl antifolates into the active site of a human dihydrofolate reductase (hDHFR) using AutoDock, Glide, FlexX, and Surflex and comparing the obtained docking poses with the poses of their counterparts in the original hDHFR-carboranyl antifolate crystal structures. Under optimized conditions, AutoDock and Glide were equally good in docking of the closo-carboranyl antifolates followed by Surflex and FlexX, whereas Autodock, Glide, and Surflex proved to be comparably efficient in the docking of nido-carboranyl antifolates followed by FlexX. Differences in geometries and partial atom charges in the structures of the carboranyl antifolates resulting from different data sources and/or optimization methods did not impact the docking performances of AutoDock or Glide significantly. Binding energies predicted by all four programs were in accordance with experimental data.

  • LigMatch: a multiple structure-based ligand matching method for 3D virtual screening.
    Kinnings, Sarah L and Jackson, Richard M
    Journal of chemical information and modeling, 2009, 49(9), 2056-2066
    PMID: 19685924     doi: 10.1021/ci900204y
    We have developed a new virtual screening (VS) method called LigMatch and evaluated its performance on 13 protein targets using a filtered and clustered version of the directory of useful decoys (DUD). The method uses 3D structural comparison to a crystallographically determined ligand in a bioactive 'template' conformation, using a geometric hashing method, in order to prioritize each database compound. We show that LigMatch outperforms several other widely used VS methods on the 13 DUD targets. We go on to demonstrate that improved VS performance can be gained from using multiple, structurally diverse templates rather than a single template ligand for a particular protein target. In this case, a 2D fingerprint-based method is used to select a ligand template from a set of known bioactive conformations. Furthermore, we show that LigMatch performs well even in the absence of 2D similarity to the template ligands, thereby demonstrating its robustness with respect to purely 2D methods and its potential for scaffold hopping.

  • Validation of molecular docking programs for virtual screening against dihydropteroate synthase.
    Hevener, Kirk E and Zhao, Wei and Ball, David M and Babaoglu, Kerim and Qi, Jianjun and White, Stephen W and Lee, Richard E
    Journal of chemical information and modeling, 2009, 49(2), 444-460
    PMID: 19434845     doi: 10.1021/ci800293n
    Dihydropteroate synthase (DHPS) is the target of the sulfonamide class of antibiotics and has been a validated antibacterial drug target for nearly 70 years. The sulfonamides target the p-aminobenzoic acid (pABA) binding site of DHPS and interfere with folate biosynthesis and ultimately prevent bacterial replication. However, widespread bacterial resistance to these drugs has severely limited their effectiveness. This study explores the second and more highly conserved pterin binding site of DHPS as an alternative approach to developing novel antibiotics that avoid resistance. In this study, five commonly used docking programs, FlexX, Surflex, Glide, GOLD, and DOCK, and nine scoring functions, were evaluated for their ability to rank-order potential lead compounds for an extensive virtual screening study of the pterin binding site of B. anthracis DHPS. Their performance in ligand docking and scoring was judged by their ability to reproduce a known inhibitor conformation and to efficiently detect known active compounds seeded into three separate decoy sets. Two other metrics were used to assess performance; enrichment at 1% and 2% and Receiver Operating Characteristic (ROC) curves. The effectiveness of postdocking relaxation prior to rescoring and consensus scoring were also evaluated. Finally, we have developed a straightforward statistical method of including the inhibition constants of the known active compounds when analyzing enrichment results to more accurately assess scoring performance, which we call the 'sum of the sum of log rank' or SSLR. Of the docking and scoring functions evaluated, Surflex with Surflex-Score and Glide with GlideScore were the best overall performers for use in virtual screening against the DHPS target, with neither combination showing statistically significant superiority over the other in enrichment studies or pose selection. Postdocking ligand relaxation and consensus scoring did not improve overall enrichment.

  • GARD: a Generally Applicable Replacement for RMSD.
    Baber, J Christian and Thompson, David C and Cross, Jason B and Humblet, Christine
    Journal of chemical information and modeling, 2009, 49(8), 1889-1900
    PMID: 19618919     doi: 10.1021/ci9001074
    The root-mean-squared deviation (rmsd) is a widely used measure of distance between two aligned objects - often chemical structures. However, rmsd has a number of known limitations including difficulty of interpretation, no limit on weighting for any portion of the alignment, and a lack of normalization. In this work, a Generally Applicable Replacement for rmsD (GARD) is proposed. In this implementation atomic contributions are weighted by their relative importance to binding, as determined statistically by Andrews et al. (1) , and as such this method is 'chemically aware'. This novel measure is normalized and does not have many of the failings of traditional rmsd. It is, thus, perfectly suited for a wide variety of uses, including the assessment of the quality of poses produced from molecular docking programs and the comparison of conformers. Rmsd and GARD are compared in their ability to assess docking software and multiple examples of the use of GARD to rescue essentially correct poses with a high rmsd are presented.

  • Docking ligands into flexible and solvated macromolecules. 3. Impact of input ligand conformation, protein flexibility, and water molecules on the accuracy of docking programs.
    Corbeil, Christopher R and Moitessier, Nicolas
    Journal of chemical information and modeling, 2009, 49(4), 997-1009
    PMID: 19391631     doi: 10.1021/ci8004176
    Several modifications and additions to Fitted1.5 led to the development of Fitted2.6. Among the novel implementations are a matching algorithm-enhanced genetic algorithm and a ring conformational search algorithm. With these various optimizations, we also hoped to remove the biases and to develop a docking program that would provide results (i.e., poses) as independent as possible to the input ligand and protein conformations and used parameters, although keeping the options to provide additional experimental information. These biases were investigated within Fitted2.6 along with FlexX, GOLD, Glide, and Surflex. The input ligand conformation was found to have a major impact on the program accuracy as drops as large as 10-50% were observed with all the programs but Fitted. This comparative study also demonstrates that the accuracy of Fitted is similar to that of other widely used programs. We have also demonstrated that protein flexibility, displaceable water molecules, and ring conformational search algorithms, three of the main Fitted features, significantly increased its accuracy. Finally, we also proposed potential modifications to the available programs to further improve their accuracy in binding mode prediction.

  • Automated Docking Screens: A Feasibility Study
    Irwin, John J and Shoichet, Brian K and Mysinger, Michael M. and Huang, Niti and Colizzi, Francesco and Wassam, Pascal and Cao, Yiqun
    Journal of medicinal chemistry, 2009, 52(18), 5712-5720
    PMID: 19719084     doi: 10.1021/jm9006966
    Molecular docking is the most practical approach to leverage protein structure for ligand discovery, but the technique retains important liabilities that make it challenging to deploy on a large scale. We have therefore created an expert system, DOCK Blaster, to investigate the feasibility of full automation. The method requires a PDB code, sometimes with a ligand structure, and from that alone can launch a full screen of large libraries. A critical feature is self-assessment, which estimates the anticipated reliability of the automated screening results using pose fidelity and enrichment. Against common benchmarks, DOCK Blaster recapitulates the crystal ligand pose within 2 angstrom rmsd 50-60% of the time; inferior to an expert, but respectrable. Half the time the ligand also ranked among the top 5% of 100 physically matched decoys chosen on the fly. Further tests were undertaken culminating in a study of 7755 eligible PDB structures. In 1398 cases, the redocked ligand ranked in the top 5% of 100 property-matched decoys while also posing within 2 angstrom rmsd, suggesting that unsupervised prospective docking is viable. DOCK Blaster is available at

  • [DockingServer: molecular docking calculations online].
    Hazai, Eszter and Kovács, Sándor and Demkó, László and Bikádi, Zsolt
    Acta pharmaceutica Hungarica, 2009, 79(1), 17-21
    PMID: 19526678    
    Over the last years, the use of bioinformatics tools such as molecular docking has become an essential part of research focused at prediction of the binding of small molecules to their target proteins. DockingServer offers a web-based, easy to use interface that handles all aspects of molecular docking from ligand and pro-tein set-up through results representation integrating a number of software frequently used in computational chemistry. While its user friendly interface enables docking calculation and results evaluation carried out by researchers coming from all fields of biochemistry, DockingServer also provides full control on the setting of specific parameters of ligand and protein set up and docking calculations for more advanced users. The application can be used for docking and analysis of single ligands as well as for high throughput docking of ligand libraries to target proteins. The use of "DockingServer" is illustrated by the formation of acetaminophene (paracetamol)-CYP2E1 complex.

  • Blind docking of 260 protein-ligand complexes with EADock 2.0.
    Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
    Journal of computational chemistry, 2009, 30(13), 2021-2030
    PMID: 19130502     doi: 10.1002/jcc.21202
    Molecular docking softwares are one of the important tools of modern drug development pipelines. The promising achievements of the last 10 years emphasize the need for further improvement, as reflected by several recent publications (Leach et al., J Med Chem 2006, 49, 5851; Warren et al., J Med Chem 2006, 49, 5912). Our initial approach, EADock, showed a good performance in reproducing the experimental binding modes for a set of 37 different ligand-protein complexes (Grosdidier et al., Proteins 2007, 67, 1010). This article presents recent improvements regarding the scoring and sampling aspects over the initial implementation, as well as a new seeding procedure based on the detection of cavities, opening the door to blind docking with EADock. These enhancements were validated on 260 complexes taken from the high quality Ligand Protein Database [LPDB, (Roche et al., J Med Chem 2001, 44, 3592)]. Two issues were identified: first, the quality of the initial structures cannot be assumed and a manual inspection and/or a search in the literature are likely to be required to achieve the best performance. Second the description of interactions involving metal ions still has to be improved. Nonetheless, a remarkable success rate of 65% was achieved for a large scale blind docking assay, when considering only the top ranked binding mode and a success threshold of 2 A RMSD to the crystal structure. When looking at the five-top ranked binding modes, the success rate increases up to 76%. In a standard local docking assay, success rates of 75 and 83% were obtained, considering only the top ranked binding mode, or the five top binding modes, respectively.

  • Docking to heme proteins.
    Röhrig, Ute F and Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
    Journal of computational chemistry, 2009, 30(14), 2305-2315
    PMID: 19288474     doi: 10.1002/jcc.21244
    In silico screening has become a valuable tool in drug design, but some drug targets represent real challenges for docking algorithms. This is especially true for metalloproteins, whose interactions with ligands are difficult to parametrize. Our docking algorithm, EADock, is based on the CHARMM force field, which assures a physically sound scoring function and a good transferability to a wide range of systems, but also exhibits difficulties in case of some metalloproteins. Here, we consider the therapeutically important case of heme proteins featuring an iron core at the active site. Using a standard docking protocol, where the iron-ligand interaction is underestimated, we obtained a success rate of 28% for a test set of 50 heme-containing complexes with iron-ligand contact. By introducing Morse-like metal binding potentials (MMBP), which are fitted to reproduce density functional theory calculations, we are able to increase the success rate to 62%. The remaining failures are mainly due to specific ligand-water interactions in the X-ray structures. Testing of the MMBP on a second data set of non iron binders (14 cases) demonstrates that they do not introduce a spurious bias towards metal binding, which suggests that they may reliably be used also for cross-docking studies.

  • An Evaluation of Explicit Receptor Flexibility in Molecular Docking Using Molecular Dynamics and Torsion Angle Molecular Dynamics.
    Armen, Roger S and Chen, Jianhan and Brooks, Charles L
    Journal of chemical theory and computation, 2009, 5(10), 2909-2923
    PMID: 20160879     doi: 10.1021/ct900262t
    Incorporating receptor flexibility into molecular docking should improve results for flexible proteins. However, the incorporation of explicit all-atom flexibility with molecular dynamics for the entire protein chain may also introduce significant error and "noise" that could decrease docking accuracy and deteriorate the ability of a scoring function to rank native-like poses. We address this apparent paradox by comparing the success of several flexible receptor models in cross-docking and multiple receptor ensemble docking for p38$\alpha$ mitogen-activated protein (MAP) kinase. Explicit all-atom receptor flexibility has been incorporated into a CHARMM-based molecular docking method (CDOCKER) using both molecular dynamics (MD) and torsion angle molecular dynamics (TAMD) for the refinement of predicted protein-ligand binding geometries. These flexible receptor models have been evaluated, and the accuracy and efficiency of TAMD sampling is directly compared to MD sampling. Several flexible receptor models are compared, encompassing flexible side chains, flexible loops, multiple flexible backbone segments, and treatment of the entire chain as flexible. We find that although including side chain and some backbone flexibility is required for improved docking accuracy as expected, docking accuracy also diminishes as additional and unnecessary receptor flexibility is included into the conformational search space. Ensemble docking results demonstrate that including protein flexibility leads to to improved agreement with binding data for 227 active compounds. This comparison also demonstrates that a flexible receptor model enriches high affinity compound identification without significantly increasing the number of false positives from low affinity compounds.

  • RosettaLigand docking with full ligand and receptor flexibility.
    Davis, Ian W and Baker, David
    Journal of molecular biology, 2009, 385(2), 381-392
    PMID: 19041878     doi: 10.1016/j.jmb.2008.11.010
    Computational docking of small-molecule ligands into protein receptors is an important tool for modern drug discovery. Although conformational adjustments are frequently observed between the free and ligand-bound states, the conformational flexibility of the protein is typically ignored in protein-small molecule docking programs. We previously described the program RosettaLigand, which leverages the Rosetta energy function and side-chain repacking algorithm to account for flexibility of all side chains in the binding site. Here we present extensions to RosettaLigand that incorporate full ligand flexibility as well as receptor backbone flexibility. Including receptor backbone flexibility is found to produce more correct docked complexes and to lower the average RMSD of the best-scoring docked poses relative to the rigid-backbone results. On a challenging set of retrospective and prospective cross-docking tests, we find that the top-scoring ligand pose is correctly positioned within 2 A RMSD for 64% (54/85) of cases overall.

  • Blind docking of pharmaceutically relevant compounds using RosettaLigand.
    Davis, Ian W and Raha, Kaushik and Head, Martha S and Baker, David
    Protein science : a publication of the Protein Society, 2009, 18(9), 1998-2002
    PMID: 19554568     doi: 10.1002/pro.192
    It is difficult to properly validate algorithms that dock a small molecule ligand into its protein receptor using data from the public domain: the predictions are not blind because the correct binding mode is already known, and public test cases may not be representative of compounds of interest such as drug leads. Here, we use private data from a real drug discovery program to carry out a blind evaluation of the RosettaLigand docking methodology and find that its performance is on average comparable with that of the best commercially available current small molecule docking programs. The strength of RosettaLigand is the use of the Rosetta sampling methodology to simultaneously optimize protein sidechain, protein backbone and ligand degrees of freedom; the extensive benchmark test described here identifies shortcomings in other aspects of the protocol and suggests clear routes to improving the method.

  • DockFlow - a prototypic PharmaGrid for virtual screening integrating four different docking tools.
    Wolf, Antje and Hofmann-Apitius, Martin and Ghanem, Moustafa and Azam, Nabeel and Kalaitzopoulos, Dimitrios and Yu, Kunqian and Kasam, Vinod
    Studies in health technology and informatics, 2009, 147, 3-12
    PMID: 19593039    
    In this paper we present DockFlow, a prototypic version of a PharmaGrid. DockFlow is supporting pharmaceutical research through enabling virtual screening on the Grid. The system was developed in the course of the BRIDGE project funded by the European Commission. Grids have been used before to run compute- and data-intensive virtual screening experiments, like in the WISDOM project. With DockFlow, however, we addressed a variety of problems yet unsolved, like the diversity of results produced by different docking tools. We also addressed the problem of analysing the data produced in a distributed virtual screening system applying a combinatorial docking approach. In DockFlow we worked on a grid-based problem solving environment for virtual screening with the following major features: execution of four different docking services (FlexX, AutoDock, DOCK and GAsDock) at locations in Europe and China remotely from a common workflow, storage of the results in a common Docking Database providing a shared analysis platform for the collaboration partners and combination of the results. The DockFlow prototype is evaluated on two scientific case studies: malaria and avian flu.

  • Ligand mapping on protein surfaces by the 3D-RISM theory: toward computational fragment-based drug design.
    Imai, Takashi and Oda, Koji and Kovalenko, Andriy and Hirata, Fumio and Kidera, Akinori
    Journal of the American Chemical Society, 2009, 131(34), 12430-12440
    PMID: 19655800     doi: 10.1021/ja905029t
    In line with the recent development of fragment-based drug design, a new computational method for mapping of small ligand molecules on protein surfaces is proposed. The method uses three-dimensional (3D) spatial distribution functions of the atomic sites of the ligand calculated using the molecular theory of solvation, known as the 3D reference interaction site model (3D-RISM) theory, to identify the most probable binding modes of ligand molecules. The 3D-RISM-based method is applied to the binding of several small organic molecules to thermolysin, in order to show its efficiency and accuracy in detecting binding sites. The results demonstrate that our method can reproduce the major binding modes found by X-ray crystallographic studies with sufficient precision. Moreover, the method can successfully identify some binding modes associated with a known inhibitor, which could not be detected by X-ray analysis. The dependence of ligand-binding modes on the ligand concentration, which essentially cannot be treated with other existing computational methods, is also investigated. The results indicate that some binding modes are readily affected by the ligand concentration, whereas others are not significantly altered. In the former case, it is the subtle balance in the binding affinity between the ligand and water that determines the dominant ligand-binding mode.

  • Managing protein flexibility in docking and its applications.
    B-Rao, Chandrika and Subramanian, Jyothi and Sharma, Somesh D
    Drug discovery today, 2009, 14(7-8), 394-400
    PMID: 19185058     doi: 10.1016/j.drudis.2009.01.003
    Docking, virtual screening and structure-based drug design are routinely used in modern drug discovery programs. Although current docking methods deal with flexible ligands, managing receptor flexibility has proved to be challenging. In this brief review, we present the current state-of-the-art for computationally handling receptor flexibility, including a novel statistical computational approach published recently. We conclude, from a comparison of the different approaches, that a combination of methods is likely to provide the most reliable solution to the problem of finding the right protein conformation for a given ligand.

  • 3-D clustering: a tool for high throughput docking
    Priestle, John P.
    Journal of Molecular Modeling, 2009, 15(5), 551-560
    PMID: 19085027     doi: 10.1007/s00894-008-0360-6
    This report describes a computer program for clustering docking poses based on their 3-dimensional (3D) coordinates as well as on their chemical structures. This is chiefly intended for reducing a set of hits coming from high throughput docking, since the capacity to prepare and biologically test such molecules is generally far more limited than the capacity to generate such hits. The advantage of clustering molecules based on 3D, rather than 2D, criteria is that small variations on a scaffold may bring about different binding modes for molecules that would not be predicted by 2D similarity alone. The program does a pose-by-pose/atom-by-atom comparison of a set of docking hits (poses), scoring both spatial and chemical similarity. Using these pair-wise similarities, the whole set is clustered based on a user-supplied similarity threshold. An output coordinate file is created that mirrors the input coordinate file, but contains two new properties: a cluster number and similarity to the cluster center. Poses in this output file can easily be sorted by cluster and displayed together for visual inspection with any standard molecular viewing program, and decisions made about which molecule should be selected for biological testing as the best representative of this group of similar molecules with similar binding modes.

  • An improved adaptive genetic algorithm for protein-ligand docking.
    Kang, Ling and Li, Honglin and Jiang, Hualiang and Wang, Xicheng
    Journal of computer-aided molecular design, 2009, 23(1), 1-12
    PMID: 18777161     doi: 10.1007/s10822-008-9232-5
    A new optimization model of molecular docking is proposed, and a fast flexible docking method based on an improved adaptive genetic algorithm is developed in this paper. The algorithm takes some advanced techniques, such as multi-population genetic strategy, entropy-based searching technique with self-adaptation and the quasi-exact penalty. A new iteration scheme in conjunction with above techniques is employed to speed up the optimization process and to ensure very rapid and steady convergence. The docking accuracy and efficiency of the method are evaluated by docking results from GOLD test data set, which contains 134 protein-ligand complexes. In over 66.2% of the complexes, the docked pose was within 2.0 A root-mean-square deviation (RMSD) of the X-ray structure. Docking time is approximately in proportion to the number of the rotatable bonds of ligands.

  • SeleX-CS: a new consensus scoring algorithm for hit discovery and lead optimization.
    Bar-Haim, Shay and Aharon, Ayelet and Ben-Moshe, Tal and Marantz, Yael and Senderowitz, Hanoch
    Journal of chemical information and modeling, 2009, 49(3), 623-633
    PMID: 19231809     doi: 10.1021/ci800335j
    Identifying active compounds (hits) that bind to biological targets of pharmaceutical relevance is the cornerstone of drug design efforts. Structure based virtual screening, namely, the in silico evaluation of binding energies and geometries between a protein and its putative ligands, has emerged over the past few years as a promising approach in this field. The success of the method relies on the availability of reliable 3-dimensional (3D) structures of the target protein and its candidate ligands (the screening library), a reliable docking method that can fit the different ligands into the protein's binding site, and an accurate scoring function that can rank the resulting binding modes in accord with their binding affinities. This last requirement is arguably the most difficult to meet due to the complexity of the binding process. A potential solution to this so-called scoring problem is the usage of multiple scoring functions in an approach known as consensus scoring. Several consensus scoring methods were suggested in the literature and have generally demonstrated an improved ranking of screening libraries relative to individual scoring functions. Nevertheless, current consensus scoring strategies suffer from several shortcomings, in particular, strong dependence on the initial parameters and an incomplete treatment of inactive compounds. In this work we present a new consensus scoring algorithm (SeleX-Consensus Scoring abbreviated to SeleX-CS) specifically designed to address these limitations: (i) A subset of the initial set of the scoring functions is allowed to form the consensus score, and this subset is optimized via a Monte Carlo/Simulated Annealing procedure. (ii) Rank redundancy between the members of the screening library is removed. (iii) The method explicitly considers the presence of inactive compounds. The new algorithm was applied to the ranking of screening libraries targeting two G-protein coupled receptors (GPCR). Excellent enrichment factors were obtained in both cases: For the cannabinoid receptor 1 (CB1), SeleX-CS outperformed the best single score and afforded an enrichment factor of 41 at 1% of the screening library compared with the best single score value of 15 (GOLD_Fitness). For the chemokine receptor type 2 (CCR2) SeleX-CS afforded an enrichment factor of 72 (again at 1% of the screening library) once more outperforming any single score (enrichment factor of 20 by GSCORE). Moreover, SeleX-CS demonstrated success rates of 67% (CCR2) and 73% (CB1) when applied to ranking an external test set. In both cases, the new algorithm also afforded good derichment of inactive compounds (i.e., the ability to push inactive compounds to the bottom of the ranked library). The method was then extended to rank a lead optimization series targeting the Kv4.3 potassium ion channel, resulting in a Spearman's correlation coefficient, p

  • Comparative assessment of scoring functions on a diverse test set.
    Cheng, Tiejun and Li, Xun and Li, Yan and Liu, Zhihai and Wang, Renxiao
    Journal of chemical information and modeling, 2009, 49(4), 1079-1093
    PMID: 19358517     doi: 10.1021/ci9000053
    Scoring functions are widely applied to the evaluation of protein-ligand binding in structure-based drug design. We have conducted a comparative assessment of 16 popular scoring functions implemented in main-stream commercial software or released by academic research groups. A set of 195 diverse protein-ligand complexes with high-resolution crystal structures and reliable binding constants were selected through a systematic nonredundant sampling of the PDBbind database and used as the primary test set in our study. All scoring functions were evaluated in three aspects, that is, "docking power", "ranking power", and "scoring power", and all evaluations were independent from the context of molecular docking or virtual screening. As for "docking power", six scoring functions, including GOLD::ASP, DS::PLP1, DrugScore(PDB), GlideScore-SP, DS::LigScore, and GOLD::ChemScore, achieved success rates over 70% when the acceptance cutoff was root-mean-square deviation < 2.0 A. Combining these scoring functions into consensus scoring schemes improved the success rates to 80% or even higher. As for "ranking power" and "scoring power", the top four scoring functions on the primary test set were X-Score, DrugScore(CSD), DS::PLP, and SYBYL::ChemScore. They were able to correctly rank the protein-ligand complexes containing the same type of protein with success rates around 50%. Correlation coefficients between the experimental binding constants and the binding scores computed by these scoring functions ranged from 0.545 to 0.644. Besides the primary test set, each scoring function was also tested on four additional test sets, each consisting of a certain number of protein-ligand complexes containing one particular type of protein. Our study serves as an updated benchmark for evaluating the general performance of today's scoring functions. Our results indicate that no single scoring function consistently outperforms others in all three aspects. Thus, it is important in practice to choose the appropriate scoring functions for different purposes.

  • Testing assumptions and hypotheses for rescoring success in protein-ligand docking.
    O'Boyle, Noel M. and Liebeschuetz, John W and Cole, Jason C
    Journal of chemical information and modeling, 2009, 49(8), 1871-1878
    PMID: 19645429     doi: 10.1021/ci900164f
    In protein-ligand docking, the scoring function is responsible for identifying the correct pose of a particular ligand as well as separating ligands from nonligands. Recently there has been considerable interest in schemes that combine results from several scoring functions in an effort to achieve improved performance in virtual screens. One such scheme is consensus scoring, which involves combining the results from several rescoring experiments. Although there have been a number of studies that have investigated factors affecting success in consensus scoring, these studies have not addressed the question of why a rescoring strategy works in the first place. Here we propose and test two alternative hypotheses for why rescoring has the potential to improve results, using GOLD 4.0. The "consensus" hypothesis is that rescoring is a way of combining results from two scoring functions such that only true positives are likely to score highly. The "complementary" hypothesis is that the two scoring functions used in rescoring have complementary strengths; one is better at ranking actives with respect to inactives while the other is better at ranking poses of actives. We find that in general it is this hypothesis that explains success in a rescoring experiment. We also test an assumption of any rescoring method, which is that the scores obtained are representative of the fitness of the docked pose. We find that although rescored poses tended to have slightly higher clash values than their docked equivalents, in general the scores were representative.

  • Empirical scoring functions for advanced protein-ligand docking with PLANTS.
    Korb, Oliver and Stutzle, Thomas and Exner, Thomas E.
    Journal of chemical information and modeling, 2009, 49(1), 84-96
    PMID: 19125657     doi: 10.1021/ci800298z
    In this paper we present two empirical scoring functions, PLANTS(CHEMPLP) and PLANTS(PLP), designed for our docking algorithm PLANTS (Protein-Ligand ANT System), which is based on ant colony optimization (ACO). They are related, regarding their functional form, to parts of already published scoring functions and force fields. The parametrization procedure described here was able to identify several parameter settings showing an excellent performance for the task of pose prediction on two test sets comprising 298 complexes in total. Up to 87% of the complexes of the Astex diverse set and 77% of the CCDC/Astex clean listnc (noncovalently bound complexes of the clean list) could be reproduced with root-mean-square deviations of less than 2 A with respect to the experimentally determined structures. A comparison with the state-of-the-art docking tool GOLD clearly shows that this is, especially for the druglike Astex diverse set, an improvement in pose prediction performance. Additionally, optimized parameter settings for the search algorithm were identified, which can be used to balance pose prediction reliability and search speed.

  • DOCK 6: combining techniques to model RNA-small molecule complexes.
    Lang, P Therese and Brozell, Scott R and Mukherjee, Sudipto and Pettersen, Eric F and Meng, Elaine C and Thomas, Veena and Rizzo, Robert C and Case, David A and James, Thomas L and Kuntz, Irwin D
    RNA (New York, N.Y.), 2009, 15(6), 1219-1230
    PMID: 19369428     doi: 10.1261/rna.1563609
    With an increasing interest in RNA therapeutics and for targeting RNA to treat disease, there is a need for the tools used in protein-based drug design, particularly DOCKing algorithms, to be extended or adapted for nucleic acids. Here, we have compiled a test set of RNA-ligand complexes to validate the ability of the DOCK suite of programs to successfully recreate experimentally determined binding poses. With the optimized parameters and a minimal scoring function, 70% of the test set with less than seven rotatable ligand bonds and 26% of the test set with less than 13 rotatable bonds can be successfully recreated within 2 A heavy-atom RMSD. When DOCKed conformations are rescored with the implicit solvent models AMBER generalized Born with solvent-accessible surface area (GB/SA) and Poisson-Boltzmann with solvent-accessible surface area (PB/SA) in combination with explicit water molecules and sodium counterions, the success rate increases to 80% with PB/SA for less than seven rotatable bonds and 58% with AMBER GB/SA and 47% with PB/SA for less than 13 rotatable bonds. These results indicate that DOCK can indeed be useful for structure-based drug design aimed at RNA. Our studies also suggest that RNA-directed ligands often differ from typical protein-ligand complexes in their electrostatic properties, but these differences can be accommodated through the choice of potential function. In addition, in the course of the study, we explore a variety of newly added DOCK functions, demonstrating the ease with which new functions can be added to address new scientific questions.

  • AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility.
    Morris, Garrett M and Huey, Ruth and Lindstrom, William and Sanner, Michel F and Belew, Richard K and Goodsell, David S and Olson, Arthur J
    Journal of computational chemistry, 2009, 30(16), 2785-2791
    PMID: 19399780     doi: 10.1002/jcc.21256
    We describe the testing and release of AutoDock4 and the accompanying graphical user interface AutoDockTools. AutoDock4 incorporates limited flexibility in the receptor. Several tests are reported here, including a redocking experiment with 188 diverse ligand-protein complexes and a cross-docking experiment using flexible sidechains in 87 HIV protease complexes. We also report its utility in analysis of covalently bound ligands, using both a grid-based docking method and a modification of the flexible sidechain technique.

  • Docking, virtual high throughput screening and in silico fragment-based drug design.
    Zoete, Vincent and Grosdidier, Aurélien and Michielin, Olivier
    Journal of cellular and molecular medicine, 2009, 13(2), 238-248
    PMID: 19183238     doi: 10.1111/j.1582-4934.2008.00665.x
    The drug discovery process has been profoundly changed recently by the adoption of computational methods helping the design of new drug candidates more rapidly and at lower costs. In silico drug design consists of a collection of tools helping to make rational decisions at the different steps of the drug discovery process, such as the identification of a biomolecular target of therapeutical interest, the selection or the design of new lead compounds and their modification to obtain better affinities, as well as pharmacokinetic and pharmacodynamic properties. Among the different tools available, a particular emphasis is placed in this review on molecular docking, virtual high-throughput screening and fragment-based ligand design.

  • Energetic analysis of fragment docking and application to structure-based pharmacophore hypothesis generation.
    Loving, Kathryn and Salam, Noeris K. and Sherman, Woody
    Journal of computer-aided molecular design, 2009, 23(8), 541-554
    PMID: 19421721     doi: 10.1007/s10822-009-9268-1
    We have developed a method that uses energetic analysis of structure-based fragment docking to elucidate key features for molecular recognition. This hybrid ligand- and structure-based methodology uses an atomic breakdown of the energy terms from the Glide XP scoring function to locate key pharmacophoric features from the docked fragments. First, we show that Glide accurately docks fragments, producing a root mean squared deviation (RMSD) of <1.0 A for the top scoring pose to the native crystal structure. We then describe fragment-specific docking settings developed to generate poses that explore every pocket of a binding site while maintaining the docking accuracy of the top scoring pose. Next, we describe how the energy terms from the Glide XP scoring function are mapped onto pharmacophore sites from the docked fragments in order to rank their importance for binding. Using this energetic analysis we show that the most energetically favorable pharmacophore sites are consistent with features from known tight binding compounds. Finally, we describe a method to use the energetically selected sites from fragment docking to develop a pharmacophore hypothesis that can be used in virtual database screening to retrieve diverse compounds. We find that this method produces viable hypotheses that are consistent with known active compounds. In addition to retrieving diverse compounds that are not biased by the co-crystallized ligand, the method is able to recover known active compounds from a database screen, with an average enrichment of 8.1 in the top 1% of the database.

  • Virtual fragment screening: an exploration of various docking and scoring protocols for fragments using Glide.
    Kawatkar, Sameer and Wang, Hongming and Czerminski, Ryszard and Joseph-McCarthy, Diane
    Journal of computer-aided molecular design, 2009, 23(8), 527-539
    PMID: 19495993     doi: 10.1007/s10822-009-9281-4
    Fragment-based drug discovery approaches allow for a greater coverage of chemical space and generally produce high efficiency ligands. As such, virtual and experimental fragment screening are increasingly being coupled in an effort to identify new leads for specific therapeutic targets. Fragment docking is employed to create target-focussed subset of compounds for testing along side generic fragment libraries. The utility of the program Glide with various scoring schemes for fragment docking is discussed. Fragment docking results for two test cases, prostaglandin D2 synthase and DNA ligase, are presented and compared to experimental screening data. Self-docking, cross-docking, and enrichment studies are performed. For the enrichment runs, experimental data exists indicating that the docking decoys in fact do not inhibit the corresponding enzyme being examined. Results indicate that even for difficult test cases fragment docking can yield enrichments significantly better than random.


  • Evaluation of the performance of 3D virtual screening protocols: RMSD comparisons, enrichment assessments, and decoy selection-what can we learn from earlier mistakes?
    Kirchmair, Johannes and Markt, Patrick and Distinto, Simona and Wolber, Gerhard and Langer, Thierry
    Journal of computer-aided molecular design, 2008, 22(3-4), 213-228
    PMID: 18196462     doi: 10.1007/s10822-007-9163-6
    Within the last few years a considerable amount of evaluative studies has been published that investigate the performance of 3D virtual screening approaches. Thereby, in particular assessments of protein-ligand docking are facing remarkable interest in the scientific community. However, comparing virtual screening approaches is a non-trivial task. Several publications, especially in the field of molecular docking, suffer from shortcomings that are likely to affect the significance of the results considerably. These quality issues often arise from poor study design, biasing, by using improper or inexpressive enrichment descriptors, and from errors in interpretation of the data output. In this review we analyze recent literature evaluating 3D virtual screening methods, with focus on molecular docking. We highlight problematic issues and provide guidelines on how to improve the quality of computational studies. Since 3D virtual screening protocols are in general assessed by their ability to discriminate between active and inactive compounds, we summarize the impact of the composition and preparation of test sets on the outcome of evaluations. Moreover, we investigate the significance of both classic enrichment parameters and advanced descriptors for the performance of 3D virtual screening methods. Furthermore, we review the significance and suitability of RMSD as a measure for the accuracy of protein-ligand docking algorithms and of conformational space sub sampling algorithms.

  • Bootstrap-based consensus scoring method for protein-ligand docking.
    Fukunishi, Hiroaki and Teramoto, Reiji and Takada, Toshikazu and Shimada, Jiro
    Journal of chemical information and modeling, 2008, 48(5), 988-996
    PMID: 18426197     doi: 10.1021/ci700204v
    To improve the performance of a single scoring function used in a protein-ligand docking program, we developed a bootstrap-based consensus scoring (BBCS) method, which is based on ensemble learning. BBCS combines multiple scorings, each of which has the same function form but different energy-parameter sets. These multiple energy-parameter sets are generated in two steps: (1) generation of training sets by a bootstrap method and (2) optimization of energy-parameter set by a Z-score approach, which is based on energy landscape theory as used in protein folding, against each training set. In this study, we applied BBCS to the FlexX scoring function. Using given 50 complexes, we generated 100 training sets and obtained 100 optimized energy-parameter sets. These parameter sets were tested against 48 complexes different from the training sets. BBCS was shown to be an improvement over single scoring when using a parameter set optimized by the same Z-score approach. Comparing BBCS with the original FlexX scoring function, we found that (1) the success rate of recognizing the crystal structure at the top relative to decoys increased from 33.3% to 52.1% and that (2) the rank of the crystal structure improved for 54.2% of the complexes and worsened for none. We also found that BBCS performed better than conventional consensus scoring (CS).

  • Q-Dock: Low-resolution flexible ligand docking with pocket-specific threading restraints.
    Brylinski, Michal and Skolnick, Jeffrey
    Journal of computational chemistry, 2008, 29(10), 1574-1588
    PMID: 18293308     doi: 10.1002/jcc.20917
    The rapidly growing number of theoretically predicted protein structures requires robust methods that can utilize low-quality receptor structures as targets for ligand docking. Typically, docking accuracy falls off dramatically when apo or modeled receptors are used in docking experiments. Low-resolution ligand docking techniques have been developed to deal with structural inaccuracies in predicted receptor models. In this spirit, we describe the development and optimization of a knowledge-based potential implemented in Q-Dock, a low-resolution flexible ligand docking approach. Self-docking experiments using crystal structures reveals satisfactory accuracy, comparable with all-atom docking. All-atom models reconstructed from Q-Dock's low-resolution models can be further refined by even a simple all-atom energy minimization. In decoy-docking against distorted receptor models with a root-mean-square deviation, RMSD, from native of approximately 3 A, Q-Dock recovers on average 15-20% more specific contacts and 25-35% more binding residues than all-atom methods. To further improve docking accuracy against low-quality protein models, we propose a pocket-specific protein-ligand interaction potential derived from weakly homologous threading holo-templates. The success rate of Q-Dock employing a pocket-specific potential is 6.3 times higher than that previously reported for the Dolores method, another low-resolution docking approach.

  • Consensus scoring with feature selection for structure-based virtual screening
    Teramoto, Reiji and Fukunishi, Hiroaki
    Journal of chemical information and modeling, 2008, 48(2), 288-295
    doi: 10.1021/ci700239t
    The evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, and scoring functions play significant roles in it. While consensus scoring (CS) generally improves enrichment by compensating for the deficiencies of each scoring function, the strategy of how individual scoring functions are selected remains a challenging task when few known active compounds are available. To address this problem, we propose feature selection-based consensus scoring (FSCS), which performs supervised feature selection with docked native ligand conformations to select complementary scoring functions. We evaluated the enrichments of five scoring functions (F-Score, D-Score, PMF, G-Score, and ChemScore), FSCS, and RCS (rank-by-rank consensus scoring) for four different target proteins: acetylcholine esterase (AChE), thrombin (thrombin), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPAR gamma). The results indicated that FSCS was able to select the complementary scoring functions and enhance ligand enrichments and that it outperformed RCS and the individual scoring functions for all target proteins. They also indicated that the performances of the single scoring functions were strongly dependent on the target protein. An especially favorable result with implications for practical drug screening is that FSCS performs well even if only one 3D structure of the protein-ligand complex is known. Moreover, we found that one can infer which scoring functions significantly enrich active compounds by using feature selection before actual docking and that the selected scoring functions are complementary.

  • DOVIS: an implementation for high-throughput virtual screening using AutoDock.
    Zhang, Shuxing and Kumar, Kamal and Jiang, Xiaohui and Wallqvist, Anders and Reifman, Jaques
    Bmc Bioinformatics, 2008, 9, 126
    PMID: 18304355     doi: 10.1186/1471-2105-9-126
    BACKGROUND:Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial.

  • Flexible ligand docking to multiple receptor conformations: a practical alternative.
    Totrov, Maxim and Abagyan, Ruben
    Current opinion in structural biology, 2008, 18(2), 178-184
    PMID: 18302984     doi: 10.1016/
    State of the art docking algorithms predict an incorrect binding pose for about 50-70% of all ligands when only a single fixed receptor conformation is considered. In many more cases, lack of receptor flexibility results in meaningless ligand binding scores, even when the correct pose is obtained. Incorporating conformational rearrangements of the receptor binding pocket into predictions of both ligand binding pose and binding score is crucial for improving structure-based drug design and virtual ligand screening methodologies. However, direct modeling of protein binding site flexibility remains challenging because of the large conformational space that must be sampled, and difficulties remain in constructing a suitably accurate energy function. Here we show that using multiple fixed receptor conformations, either experimentally determined by crystallography or NMR, or computationally generated, is a practical shortcut that may improve docking calculations. In several cases, such an approach has led to experimentally validated predictions.

  • An anchor-dependent molecular docking process for docking small flexible molecules into rigid protein receptors.
    Lin, Thy-Hou and Lin, Guan-Liang
    Journal of chemical information and modeling, 2008, 48(8), 1638-1655
    PMID: 18642894     doi: 10.1021/ci800124g
    A molecular docking method designated as ADDock, anchor-dependent molecular docking process for docking small flexible molecules into rigid protein receptors, is presented in this article. ADDock makes the bond connection lists for atoms based on anchors chosen for building molecular structures for docking small flexible molecules or ligands into rigid active sites of protein receptors. ADDock employs an extended version of piecewise linear potential for scoring the docked structures. Since no translational motion for small molecules is implemented during the docking process, ADDock searches the best docking result by systematically changing the anchors chosen, which are usually the single-edge connected nodes or terminal hydrogen atoms of ligands. ADDock takes intact ligand structures generated during the docking process for computing the docked scores; therefore, no energy minimization is required in the evaluation phase of docking. The docking accuracy by ADDock for 92 receptor-ligand complexes docked is 91.3%. All these complexes have been docked by other groups using other docking methods. The receptor-ligand steric interaction energies computed by ADDock for some sets of active and inactive compounds selected and docked into the same receptor active sites are apparently separated. These results show that based on the steric interaction energies computed between the docked structures and receptor active sites, ADDock is able to separate active from inactive compounds for both being docked into the same receptor.

  • Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results.
    Sheridan, Robert P and McGaughey, Georgia B and Cornell, Wendy D
    Journal of computer-aided molecular design, 2008, 22(3-4), 257-265
    PMID: 18273559     doi: 10.1007/s10822-008-9168-9
    As an extension to a previous published study (McGaughey et al., J Chem Inf Model 47:1504-1519, 2007) comparing 2D and 3D similarity methods to docking, we apply a subset of those virtual screening methods (TOPOSIM, SQW, ROCS-color, and Glide) to a set of protein/ligand pairs where the protein is the target for docking and the cocrystallized ligand is the target for the similarity methods. Each protein is represented by a maximum of five crystal structures. We search a diverse subset of the MDDR as well as a diverse small subset of the MCIDB, Merck's proprietary database. It is seen that the relative effectiveness of virtual screening methods, as measured by the enrichment factor, is highly dependent on the particular crystal structure or ligand, and on the database being searched. 2D similarity methods appear very good for the MDDR, but poor for the MCIDB. However, ROCS-color (a 3D similarity method) does well for both databases.

  • High quality binding modes in docking ligands to proteins.
    Gorelik, Boris and Goldblum, Amiram
    Proteins, 2008, 71(3), 1373-1386
    PMID: 18058908     doi: 10.1002/prot.21847
    Multiple near-optimal conformations of protein-ligand complexes provide a better chance for accurate representation of biomolecular interactions, compared with a single structure. We present ISE-dock-a docking program which is based on the iterative stochastic elimination (ISE) algorithm. ISE eliminates values that consistently lead to the worst results, thus optimizing the search for docking poses. It constructs large sets of such poses with no additional computational cost compared with single poses. ISE-dock is validated using 81 protein-ligand complexes from the PDB and its performance was compared with those of Glide, GOLD, and AutoDock. ISE-dock has a better chance than the other three to find more than 60% top single poses under RMSD

  • Protein-ligand Docking: A Review of Recent Advances and Future Perspectives
    Pujadas, Gerard and Vaque, Montserrat and Ardevol, Anna and Blade, Cinta and Salvado, M. J. and Blay, Mayte and Fernandez-Larrea, Juan and Arola, Lluis
    Current Pharmaceutical Analysis, 2008, 4(1), 1-19
    doi: 10.2174/157341208783497597
    Abstract: Understanding the interactions between proteins and ligands is crucial for the pharmaceutical and functional food industries. The experimental structures of these protein/ ligand complexes are usually obtained, under highly expert control, by time- ...

  • Protein-ligand Docking: A Review of Recent Advances and Future Perspectives
    Pujadas, Gerard and Vaque, Montserrat and Ardevol, Anna and Blade, Cinta and Salvado, M. J. and Blay, Mayte and Fernandez-Larrea, Juan and Arola, Lluis
    Current Pharmaceutical Analysis, 2008, 4(1), 1-19
    doi: 10.2174/157341208783497597
    Abstract: Understanding the interactions between proteins and ligands is crucial for the pharmaceutical and functional food industries. The experimental structures of these protein/ ligand complexes are usually obtained, under highly expert control, by time- ...

  • Bias, reporting, and sharing: computational evaluations of docking methods.
    Jain, Ajay N
    Journal of computer-aided molecular design, 2008, 22(3-4), 201-212
    PMID: 18075713     doi: 10.1007/s10822-007-9151-x
    Computational methods for docking ligands to protein binding sites have become ubiquitous in drug discovery. Despite the age of the field, no standards have been established with respect to methodological evaluation of docking accuracy, virtual screening utility, or scoring accuracy. There are critical issues relating to data sharing, data set design and preparation, and statistical reporting that have an impact on the degree to which a report will translate into real-world performance. These issues also have an impact on whether there is a transparent relationship between methodological changes and reported performance improvements. This paper presents detailed examples of pitfalls in each area and makes recommendations as to best practices.

  • Protein-ligand docking accounting for receptor side chain and global flexibility in normal modes: evaluation on kinase inhibitor cross docking.
    May, Andreas and Zacharias, Martin
    Journal of medicinal chemistry, 2008, 51(12), 3499-3506
    PMID: 18517186     doi: 10.1021/jm800071v
    Efficient treatment of conformational changes during docking of drug-like ligands to receptor molecules is a major computational challenge. A new docking methodology has been developed that includes ligand flexibility and both global backbone flexibility and side chain flexibility of the protein receptor. Whereas side chain flexibility is based on a discrete rotamer approach, global backbone conformational changes are modeled by relaxation in a few precalculated soft collective degrees of freedom of the receptor. The method was applied to docking of several known cyclin dependent kinase 2 inhibitors to the unbound kinase structure and to cross-docking of inhibitors to several bound kinase structures. Significant improvement of ranking and deviation of predicted binding geometries from experiment was obtained compared to docking to a rigid receptor. The inclusion of only the soft collective degrees of freedom during docking resulted in improved docking performance at a very modest increase (doubling) of the computational demand.

  • Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go.
    Moitessier, N and Englebienne, P and Lee, D and Lawandi, J and Corbeil, C R
    British journal of pharmacology, 2008, 153 Suppl 1, S7-26
    PMID: 18037925     doi: 10.1038/sj.bjp.0707515
    Accelerating the drug discovery process requires predictive computational protocols capable of reducing or simplifying the synthetic and/or combinatorial challenge. Docking-based virtual screening methods have been developed and successfully applied to a number of pharmaceutical targets. In this review, we first present the current status of docking and scoring methods, with exhaustive lists of these. We next discuss reported comparative studies, outlining criteria for their interpretation. In the final section, we describe some of the remaining developments that would potentially lead to a universally applicable docking/scoring method.

  • Docking ligands into flexible and solvated macromolecules. 2. Development and application of fitted 1.5 to the virtual screening of potential HCV polymerase inhibitors.
    Corbeil, Christopher R and Englebienne, Pablo and Yannopoulos, Constantin G and Chan, Laval and Das, Sanjoy K and Bilimoria, Darius and L'heureux, Lucille and Moitessier, Nicolas
    Journal of chemical information and modeling, 2008, 48(4), 902-909
    PMID: 18341269     doi: 10.1021/ci700398h
    HCV NS5B polymerase is a validated target for the treatment of hepatitis C, known to be one of the most challenging enzymes for docking programs. In order to improve the low accuracy of existing docking methods observed with this challenging enzyme, we have significantly modified and updated F itted 1.0, a recently reported docking program, into F itted 1.5. This enhanced version is now applicable to the virtual screening of compound libraries and includes new features such as filters and pharmacophore- or interaction-site-oriented docking. As a first validation, F itted 1.5 was applied to the testing set previously developed for F itted 1.0 and extended to include hepatitis C virus (HCV) polymerase inhibitors. This first validation showed an increased accuracy as well as an increase in speed. It also shows that the accuracy toward HCV polymerase is better than previously observed with other programs. Next, application of F itted 1.5 to the virtual screening of the Maybridge library seeded with known HCV polymerase inhibitors revealed its ability to recover most of these actives in the top 5% of the hit list. As a third validation, further biological assays uncovered HCV polymerase inhibition for selected Maybridge compounds ranked in the top of the hit list.

  • An improved relaxed complex scheme for receptor flexibility in computer-aided drug design.
    Amaro, Rommie E and Baron, Riccardo and McCammon, J Andrew
    Journal of computer-aided molecular design, 2008, 22(9), 693-705
    PMID: 18196463     doi: 10.1007/s10822-007-9159-2
    The interactions among associating (macro)molecules are dynamic, which adds to the complexity of molecular recognition. While ligand flexibility is well accounted for in computational drug design, the effective inclusion of receptor flexibility remains an important challenge. The relaxed complex scheme (RCS) is a promising computational methodology that combines the advantages of docking algorithms with dynamic structural information provided by molecular dynamics (MD) simulations, therefore explicitly accounting for the flexibility of both the receptor and the docked ligands. Here, we briefly review the RCS and discuss new extensions and improvements of this methodology in the context of ligand binding to two example targets: kinetoplastid RNA editing ligase 1 and the W191G cavity mutant of cytochrome c peroxidase. The RCS improvements include its extension to virtual screening, more rigorous characterization of local and global binding effects, and methods to improve its computational efficiency by reducing the receptor ensemble to a representative set of configurations. The choice of receptor ensemble, its influence on the predictive power of RCS, and the current limitations for an accurate treatment of the solvent contributions are also briefly discussed. Finally, we outline potential methodological improvements that we anticipate will assist future development.

  • PDTD: a web-accessible protein database for drug target identification
    Gao, Zhenting and Li, Honglin and Zhang, Hailei and Liu, Xiaofeng and Kang, Ling and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Wang, Xicheng and Jiang, Hualiang
    Bmc Bioinformatics, 2008, 9, -
    PMID: 18282303     doi: 10.1186/1471-2105-9-104
    Background: Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking), which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation.Description: PDTD is a web-accessible protein database for in silico target identification. It currently contains > 1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of > 830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores.Conclusion: PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at

  • MS-DOCK: Accurate multiple conformation generator and rigid docking protocol for multi-step virtual ligand screening
    Sauton, Nicolas and Lagorce, David and Villoutreix, Bruno O. and Miteva, Maria A.
    Bmc Bioinformatics, 2008, 9, -
    PMID: 18402678     doi: 10.1186/1471-2105-9-184
    Background: The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites.Results: Here we present an efficient multiple conformation rigid-body docking approach, MS-DOCK, which is based on the program DOCK. This approach can be used as the first step of a multi-stage docking/scoring protocol. First, we developed and validated the Multiconf-DOCK tool that generates several conformers per input ligand. Then, each generated conformer (bioactives and 37970 decoys) was docked rigidly using DOCK6 with our optimized protocol into seven different receptor-binding sites. MS-DOCK was able to significantly reduce the size of the initial input library for all seven targets, thereby facilitating subsequent more CPU demanding flexible docking procedures.Conclusion: MS-DOCK can be easily used for the generation of multi-conformer libraries and for shape-based filtering within a multi-step structure-based screening protocol in order to shorten computation times.

  • Lead finder: an approach to improve accuracy of protein-ligand docking, binding energy estimation, and virtual screening.
    Stroganov, Oleg V and Novikov, Fedor N and Stroylov, Viktor S and Kulkov, Val and Chilov, Ghermes G
    Journal of chemical information and modeling, 2008, 48(12), 2371-2385
    PMID: 19007114     doi: 10.1021/ci800166p
    An innovative molecular docking algorithm and three specialized high accuracy scoring functions are introduced in the Lead Finder docking software. Lead Finder's algorithm for ligand docking combines the classical genetic algorithm with various local optimization procedures and resourceful exploitation of the knowledge generated during docking process. Lead Finder's scoring functions are based on a molecular mechanics functional which explicitly accounts for different types of energy contributions scaled with empiric coefficients to produce three scoring functions tailored for (a) accurate binding energy predictions; (b) correct energy-ranking of docked ligand poses; and (c) correct rank-ordering of active and inactive compounds in virtual screening experiments. The predicted values of the free energy of protein-ligand binding were benchmarked against a set of experimentally measured binding energies for 330 diverse protein-ligand complexes yielding rmsd of 1.50 kcal/mol. The accuracy of ligand docking was assessed on a set of 407 structures, which included almost all published test sets of the following programs: FlexX, Glide SP, Glide XP, Gold, LigandFit, MolDock, and Surflex. rmsd of 2 A or less was observed for 80-96% of the structures in the test sets (80.0% on the Glide XP and FlexX test sets, 96.0% on the Surflex and MolDock test sets). The ability of Lead Finder to distinguish between active and inactive compounds during virtual screening experiments was benchmarked against 34 therapeutically relevant protein targets. Impressive enrichment factors were obtained for almost all of the targets with the average area under receiver operator curve being equal to 0.92.

  • Similarity based docking.
    Marialke, J and Tietze, S and Apostolakis, Joannis
    Journal of chemical information and modeling, 2008, 48(1), 186-196
    PMID: 18044949     doi: 10.1021/ci700124r
    We have recently introduced GMA, a highly efficient method for flexible molecular alignment. Here we show how this approach can be used to improve docking accuracy and efficiency, in cases where a complex structure of a ligand with the target protein is known. In cases where a known ligand exists, yet the complex structure is unknown it is possible to make use of the advantages offered by this approach, by combining it with standard ligand docking.

  • Exploiting ordered waters in molecular docking.
    Huang, Niu and Shoichet, Brian K
    Journal of medicinal chemistry, 2008, 51(16), 4862-4865
    PMID: 18680357     doi: 10.1021/jm8006239
    A current weakness in docking is the treatment of water-mediated protein-ligand interactions. We explore switching ordered water molecules "on" and "off" during docking screens of a large library. The method assumes additivity and scales linearly with the number of waters sampled despite the exponential growth in configurations. It is tested for ligand enrichment against 24 targets, exploring up to 256 water configurations. Water inclusion increased enrichment substantially for 12 targets, while most others were largely unaffected.

  • Ligand-protein docking with water molecules.
    Roberts, Benjamin C and Mancera, Ricardo L
    Journal of chemical information and modeling, 2008, 48(2), 397-408
    PMID: 18211049     doi: 10.1021/ci700285e
    The presence of water molecules plays an important role in the accuracy of ligand-protein docking predictions. Comprehensive docking simulations have been performed on a large set of ligand-protein complexes whose crystal structures contain water molecules in their binding sites. Only those water molecules found in the immediate vicinity of both the ligand and the protein were considered. We have investigated whether prior optimization of the orientation of water molecules in either the presence or absence of the bound ligand has any effect on the accuracy of docking predictions. We have observed a statistically significant overall increase in accuracy when water molecules are included during docking simulations and have found this to be independent of the method of optimization of the orientation of water molecules. These results confirm the importance of including water molecules whenever possible in a ligand-protein docking simulation. Our findings also reveal that prior optimization of the orientation of water molecules, in the absence of any bound ligand, does not have a detrimental effect on the improved accuracy of ligand-protein docking. This is important, given the use of docking simulations to predict the binding modes of new ligands or drug molecules.

  • ASEDock-docking based on alpha spheres and excluded volumes
    Goto, Junichi and Kataoka, Ryoichi and Muta, Hajime and Hirayama, Noriaki
    Journal of chemical information and modeling, 2008, 48(3), 583-590
    PMID: 18278891     doi: 10.1021/ci700352q
    ASEDock is a novel docking program based on a shape similarity assessment between a concave portion (i.e., concavity) on a protein and the ligand. We have introduced two novel concepts into ASEDock. One is an ASE model, which is defined by the combination of alpha spheres generated at a concavity in a protein and the excluded volumes around the concavity. The other is an ASE score, which evaluates the shape similarity between the ligand and the ASE model. The ASE score selects and refines the initial pose by maximizing the overlap between the alpha spheres and the ligand, and minimizing the overlap between the excluded volume and the ligand. Because the ASE score makes good use of the Gaussian-type function for evaluating and optimizing the overlap between the ligand and the site model, it can pose a ligand onto the docking site relatively faster and more effectively than using potential energy functions. The posing stage through the use of the ASE score is followed by full atomistic energy minimization. Because the posing algorithm of ASEDock is free from any bias except for shape, it is a very robust docking method. A validation study using 59 high-quality X-ray structures of the complexes between drug-like molecules and the target proteins has demonstrated that ASEDock can faithfully reproduce experimentally determined docking modes of various druglike molecules in their target proteins. Almost 80% of the structures were reconstructed within the estimated experimental error. The success rate of similar to 98% was attained based on the docking criterion of the root-mean-square deviation (RMSD) of non-hydrogen atoms (<

  • Molecular docking with multi-objective Particle Swarm Optimization
    Janson, Stefan and Merkle, Daniel and Middendorf, Martin
    Applied Soft Computing, 2008, 8(1), 666-675
    doi: 10.1016/j.asoc.2007.05.005
    The molecular docking problem is to find a good position and orientation for docking a small molecule (ligand) to a larger receptor molecule. In the first part of this paper we propose a new algorithm for solving the docking problem. This algorithm - called ClustMPSO - is based on Particle Swarm Optimization (PSO) and follows a multi-objective approach for comparing the quality of solutions. For the energy evaluation the algorithm uses the binding free energy function that is provided by the Autodock 3.05 tool. The experimental results show that ClustMPSO computes a more diverse set of possible docking conformations than the standard Simulated Annealing and Lamarckian Genetic Algorithm that are incorporated into Autodock. Moreover, ClustMPSO is significantly faster and more reliable in finding good solutions. In the second part of this paper a new approach for the prediction of a docking trajectory is proposed. In this approach the ligand is ``un-docked'' via a controlled random walk that can be biased into a given direction and where only positions are accepted that have an energy level that is below a given threshold.

  • Integrating Structure- and Ligand-Based Virtual Screening: Comparison of Individual, Parallel, and Fused Molecular Docking and Similarity Search Calculations on Multiple Targets
    Tan, Lu and Geppert, Hanna and Sisay, Mihiret T. and Guetschow, Michael and Bajorath, Juergen
    Chemmedchem, 2008, 3(10), 1566-1571
    doi: 10.1002/cmdc.200800129
    Similarity searching is often used to preselect compounds for docking, thereby decreasing the size of screening databases. However, integrated structure- and ligand-based screening schemes are rare at present. Docking and similarity search calculations using 2D fingerprints were carried out in a comparative manner on nine target enzymes, for which significant numbers of diverse inhibitors could be obtained. In the absence of knowledge-based docking constraints and target-directed parameter optimisation, fingerprint searching displayed a clear preference over docking calculations. Alternative combinations of docking and similarity search results were investigated and found to further increase compound recall of individual methods in a number of instances. When the results of similarity searching and docking were combined, parallel selection of candidate compounds from individual rankings was generally superior to rank fusion. We suggest that complementary results from docking and similarity searching can be captured by integrated compound selection schemes.

  • Evaluating docking programs: keeping the playing field level.
    Liebeschuetz, John W
    Journal of computer-aided molecular design, 2008, 22(3-4), 229-238
    PMID: 18196461     doi: 10.1007/s10822-008-9169-8
    Over recent years many enrichment studies have been published which purport to rigorously compare the performance of two or more docking protocols. It has become clear however that such studies often have flaws within their methodologies, which cast doubt on the rigour of the conclusions. Setting up such comparisons is fraught with difficulties and no best mode of practice is available to guide the experimenter. Careful choice of structural models and ligands appropriate to those models is important. The protein structure should be representative for the target. In addition the set of active ligands selected should be appropriate to the structure in cases where different forms of the protein bind different classes of ligand. Binding site definition is also an area in which errors arise. Particular care is needed in deciding which crystallographic waters to retain and again this may be predicated by knowledge of the likely binding modes of the ligands making up the active ligand list. Geometric integrity of the ligand structures used is clearly important yet it is apparent that published sets of actives + decoys may contain sometimes high proportions of incorrect structures. Choice of protocol for docking and analysis needs careful consideration as many programs can be tweaked for optimum performance. Should studies be run using 'black box' protocols supplied by the software provider? Lastly, the correct method of analysis of enrichment studies is a much discussed topic at the moment. However currently promoted approaches do not consider a crucial aspect of a successful virtual screen, namely that a good structural diversity of hits be returned. Overall there is much to consider in the experimental design of enrichment studies. Hopefully this study will be of benefit in helping others plan such experiments.

  • Using buriedness to improve discrimination between actives and inactives in docking
    O'Boyle, Noel M. and Brewerton, Suzanne C. and Taylor, Robin
    Journal of chemical information and modeling, 2008, 48(6), 1269-1278
    PMID: 18533645     doi: 10.1021/ci8000452
    A continuing problem in protein-ligand docking is the correct relative ranking of active molecules versus inactives. Using the ChemScore scoring function as implemented in the GOLD docking software, we have investigated the effect of scaling hydrogen bond, metal-ligand, and lipophilic interactions based on the buriedness of the interaction. Buriedness was measured using the receptor density, the number of protein heavy atoms within 8.0 angstrom. Terms in the scaling functions were optimized using negative data, represented by docked poses of inactive molecules. The objective function was the mean rank of the scores of the active poses in the Astex Diverse Set (Hartshorn et al. J. Med. Chem., 2007, 50, 726) with respect to the docked poses of 99 inactives. The final four-parameter model gave a substantial improvement in the average rank from 18.6 to 12.5. Similar results were obtained for an independent test set. Receptor density scaling is available as an option in the recent GOLD release.

  • MedusaScore: an accurate force field-based scoring function for virtual drug screening.
    Yin, Shuangye and Biedermannova, Lada and Vondrasek, Jiri and Dokholyan, Nikolay V
    Journal of chemical information and modeling, 2008, 48(8), 1656-1662
    PMID: 18672869     doi: 10.1021/ci8001167
    Virtual screening is becoming an important tool for drug discovery. However, the application of virtual screening has been limited by the lack of accurate scoring functions. Here, we present a novel scoring function, MedusaScore, for evaluating protein-ligand binding. MedusaScore is based on models of physical interactions that include van der Waals, solvation, and hydrogen bonding energies. To ensure the best transferability of the scoring function, we do not use any protein-ligand experimental data for parameter training. We then test the MedusaScore for docking decoy recognition and binding affinity prediction and find superior performance compared to other widely used scoring functions. Statistical analysis indicates that one source of inaccuracy of MedusaScore may arise from the unaccounted entropic loss upon ligand binding, which suggests avenues of approach for further MedusaScore improvement.

  • Using AutoDock for ligand-receptor docking.
    Morris, Garrett M and Huey, Ruth and Olson, Arthur J
    Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.], 2008, Chapter 8, Unit 8.14
    PMID: 19085980     doi: 10.1002/0471250953.bi0814s24
    This unit describes how to set up and analyze ligand-protein docking calculations using AutoDock and the graphical user interface, AutoDockTools (ADT). The AutoDock scoring function is a subset of the AMBER force field that treats molecules using the United Atom model. The unit uses an X-ray crystal structure of Indinavir bound to HIV-1 protease taken from the Protein Data Bank (UNIT 1.9) and shows how to prepare the ligand and receptor for AutoGrid, which computes grid maps needed by AutoDock. Indinavir is prepared for AutoDock, adding the polar hydrogens, and partial charges, and defining the rotatable bonds that will be explored during the docking. The input files for AutoGrid and AutoDock are created, and then the grid map calculation run, followed by the docking calculation in AutoDock. Finally, this unit describes some of the ways the results can be analyzed using AutoDockTools.


  • Ensemble docking of multiple protein structures: considering protein structural variations in molecular docking.
    Huang, Sheng-You and Zou, Xiaoqin
    Proteins, 2007, 66(2), 399-421
    PMID: 17096427     doi: 10.1002/prot.21214
    One approach to incorporate protein flexibility in molecular docking is the use of an ensemble consisting of multiple protein structures. Sequentially docking each ligand into a large number of protein structures is computationally too expensive to allow large-scale database screening. It is challenging to achieve a good balance between docking accuracy and computational efficiency. In this work, we have developed a fast, novel docking algorithm utilizing multiple protein structures, referred to as ensemble docking, to account for protein structural variations. The algorithm can simultaneously dock a ligand into an ensemble of protein structures and automatically select an optimal protein structure that best fits the ligand by optimizing both ligand coordinates and the conformational variable m, where m represents the m-th structure in the protein ensemble. The docking algorithm was validated on 10 protein ensembles containing 105 crystal structures and 87 ligands in terms of binding mode and energy score predictions. A success rate of 93% was obtained with the criterion of root-mean-square deviation <2.5 A if the top five orientations for each ligand were considered, comparable to that of sequential docking in which scores for individual docking are merged into one list by re-ranking, and significantly better than that of single rigid-receptor docking (75% on average). Similar trends were also observed in binding score predictions and enrichment tests of virtual database screening. The ensemble docking algorithm is computationally efficient, with a computational time comparable to that for docking a ligand into a single protein structure. In contrast, the computational time for the sequential docking method increases linearly with the number of protein structures in the ensemble. The algorithm was further evaluated using a more realistic ensemble in which the corresponding bound protein structures of inhibitors were excluded. The results show that ensemble docking successfully predicts the binding modes of the inhibitors, and discriminates the inhibitors from a set of noninhibitors with similar chemical properties. Although multiple experimental structures were used in the present work, our algorithm can be easily applied to multiple protein conformations generated by computational methods, and helps improve the efficiency of other existing multiple protein structure(MPS)-based methods to accommodate protein flexibility.

  • Supervised consensus scoring for docking and virtual screening
    Teramoto, Reiji and Fukunishi, Hiroaki
    Journal of chemical information and modeling, 2007, 47(2), 526-534
    doi: 10.1021/ci6004993
    Docking programs are widely used to discover novel ligands efficiently and can predict protein-ligand complex structures with reasonable accuracy and speed. However, there is an emerging demand for better performance from the scoring methods. Consensus scoring (CS) methods improve the performance by compensating for the deficiencies of each scoring function. However, conventional CS and existing scoring functions have the same problems, such as a lack of protein flexibility, inadequate treatment of salvation, and the simplistic nature of the energy function used. Although there are many problems in current scoring functions, we focus our attention on the incorporation of unbound ligand conformations. To address this problem, we propose supervised consensus scoring (SCS), which takes into account protein-ligand binding process using unbound ligand conformations with supervised learning. An evaluation of docking accuracy for 100 diverse protein-ligand complexes shows that SCS outperforms both CS and 11 scoring functions (PLP, F-Score, LigScore, DrugScore, LUDI, X-Score, AutoDock, PMF, G-Score, ChemScore, and D-score). The success rates of SCS range from 89% to 91% in the range of rmsd < 2 A, while those of CS range from 80% to 85%, and those of the scoring functions range from 26% to 76%. Moreover, we also introduce a method for judging whether a compound is active or inactive with the appropriate criterion for virtual screening. SCS performs quite well in docking accuracy and is presumably useful for screening large-scale compound databases before predicting binding affinity.

  • WinDock: structure-based drug discovery on Windows-based PCs.
    Hu, Zengjian and Southerland, William
    Journal of computational chemistry, 2007, 28(14), 2347-2351
    PMID: 17476686     doi: 10.1002/jcc.20756
    In recent years, virtual database screening using high-throughput docking (HTD) has emerged as a very important tool and a well-established method for finding new lead compounds in the drug discovery process. With the advent of powerful personal computers (PCs), it is now plausible to perform HTD investigations on these inexpensive PCs. To make HTD more accessible to a broad community, we present here WinDock, an integrated application designed to help researchers perform structure-based drug discovery tasks under a uniform, user friendly graphical interface for Windows-based PCs. WinDock combines existing small molecule searchable three-dimensional (3D) libraries, homology modeling tools, and ligand-protein docking programs in a semi-automatic, interactive manner, which guides the user through the use of each integrated software component. WinDock is coded in C++.

  • Solvated interaction energy (SIE) for scoring protein-ligand binding affinities. 1. Exploring the parameter space.
    Naïm, Marwen and Bhat, Sathesh and Rankin, Kathryn N and Dennis, Sheldon and Chowdhury, Shafinaz F and Siddiqi, Imran and Drabik, Piotr and Sulea, Traian and Bayly, Christopher I and Jakalian, Araz and Purisima, Enrico O
    Journal of chemical information and modeling, 2007, 47(1), 122-133
    PMID: 17238257     doi: 10.1021/ci600406v
    We present a binding free energy function that consists of force field terms supplemented by solvation terms. We used this function to calibrate the solvation model along with the binding interaction terms in a self-consistent manner. The motivation for this approach was that the solute dielectric-constant dependence of calculated hydration gas-to-water transfer free energies is markedly different from that of binding free energies (J. Comput. Chem. 2003, 24, 954). Hence, we sought to calibrate directly the solvation terms in the context of a binding calculation. The five parameters of the model were systematically scanned to best reproduce the absolute binding free energies for a set of 99 protein-ligand complexes. We obtained a mean unsigned error of 1.29 kcal/mol for the predicted absolute binding affinity in a parameter space that was fairly shallow near the optimum. The lowest errors were obtained with solute dielectric values of Din

  • A flexible approach to induced fit docking.
    Nabuurs, Sander B and Wagener, Markus and de Vlieg, Jacob
    Journal of medicinal chemistry, 2007, 50(26), 6507-6518
    PMID: 18031000     doi: 10.1021/jm070593p
    We present Fleksy, a new approach to consider both ligand and receptor flexibility in small molecule docking. Pivotal to our method is the use of a receptor ensemble to describe protein flexibility. To construct these ensembles, we use a backbone-dependent rotamer library and implement the concept of interaction sampling. The latter allows the evaluation of different orientations of ambivalent interaction partners. The docking stage consists of an ensemble-based soft-docking experiment using FlexX-Ensemble, followed by an effective flexible receptor-ligand complex optimization using Yasara. Fleksy produces a set of receptor-ligand complexes ranked using a consensus scoring function combining docking scores and force field energies. Averaged over three cross-docking datasets, containing 35 different receptor-ligand complexes in total, Fleksy reproduces the observed binding mode within 2.0 A for 78% of the complexes. This compares favorably to the rigid receptor FlexX program, which on average reaches a success rate of 44% for these datasets.

  • Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search.
    Jain, Ajay N
    Journal of computer-aided molecular design, 2007, 21(5), 281-306
    PMID: 17387436     doi: 10.1007/s10822-007-9114-2
    The Surflex flexible molecular docking method has been generalized and extended in two primary areas related to the search component of docking. First, incorporation of a small-molecule force-field extends the search into Cartesian coordinates constrained by internal ligand energetics. Whereas previous versions searched only the alignment and acyclic torsional space of the ligand, the new approach supports dynamic ring flexibility and all-atom optimization of docked ligand poses. Second, knowledge of well established molecular interactions between ligand fragments and a target protein can be directly exploited to guide the search process. This offers advantages in some cases over the search strategy where ligand alignment is guided solely by a "protomol" (a pre-computed molecular representation of an idealized ligand). Results are presented on both docking accuracy and screening utility using multiple publicly available benchmark data sets that place Surflex's performance in the context of other molecular docking methods. In terms of docking accuracy, Surflex-Dock 2.1 performs as well as the best available methods. In the area of screening utility, Surflex's performance is extremely robust, and it is clearly superior to other methods within the set of cases for which comparative data are available, with roughly double the screening enrichment performance.

  • SODOCK: swarm optimization for highly flexible protein-ligand docking.
    Chen, Hung-Ming and Liu, Bo-Fu and Huang, Hui-Ling and Hwang, Shiow-Fen and Ho, Shinn-Ying
    Journal of computational chemistry, 2007, 28(2), 612-623
    PMID: 17186483     doi: 10.1002/jcc.20542
    Protein-ligand docking can be formulated as a parameter optimization problem associated with an accurate scoring function, which aims to identify the translation, orientation, and conformation of a docked ligand with the lowest energy. The parameter optimization problem for highly flexible ligands with many rotatable bonds is more difficult than that for less flexible ligands using genetic algorithm (GA)-based approaches, due to the large numbers of parameters and high correlations among these parameters. This investigation presents a novel optimization algorithm SODOCK based on particle swarm optimization (PSO) for solving flexible protein-ligand docking problems. To improve efficiency and robustness of PSO, an efficient local search strategy is incorporated into SODOCK. The implementation of SODOCK adopts the environment and energy function of AutoDock 3.05. Computer simulation results reveal that SODOCK is superior to the Lamarckian genetic algorithm (LGA) of AutoDock, in terms of convergence performance, robustness, and obtained energy, especially for highly flexible ligands. The results also reveal that PSO is more suitable than the conventional GA in dealing with flexible docking problems with high correlations among parameters. This investigation also compared SODOCK with four state-of-the-art docking methods, namely GOLD 1.2, DOCK 4.0, FlexX 1.8, and LGA of AutoDock 3.05. SODOCK obtained the smallest RMSD in 19 of 37 cases. The average 2.29 A of the 37 RMSD values of SODOCK was better than those of other docking programs, which were all above 3.0 A.

  • Ligand docking and structure-based virtual screening in drug discovery.
    Cavasotto, Claudio N and Orry, Andrew J W
    Current topics in medicinal chemistry, 2007, 7(10), 1006-1014
    PMID: 17508934    
    Ligand-docking-based methods are starting to play a critical role in lead discovery and optimization, thus resulting in new 'drug-candidates'. They offer the possibility to go beyond the pool of existing active compounds, and thus find novel chemotypes. A brief tutorial on ligand docking and structure-based virtual screening is presented highlighting current problems and limitations, together with the most recent methodological and algorithmic developments in the field. Recent successful applications of docking-based tools for hit discovery, lead optimization and target-biased library design are also presented. Special consideration is devoted to ongoing efforts to account for protein flexibility in structure-based virtual screening.

  • Docking ligands into flexible and solvated macromolecules. 1. Development and validation of FITTED 1.0.
    Corbeil, Christopher R and Englebienne, Pablo and Moitessier, Nicolas
    Journal of chemical information and modeling, 2007, 47(2), 435-449
    PMID: 17305329     doi: 10.1021/ci6002637
    We report the development and validation of a novel suite of programs, FITTED 1.0, for the docking of flexible ligands into flexible proteins. This docking tool is unique in that it can deal with both the flexibility of macromolecules (side chains and main chains) and the presence of bridging water molecules while treating protein/ligand complexes as realistically dynamic systems. This software relies on a genetic algorithm to account for the flexibility of the two molecules as well as the location of bridging water molecules. In addition, FITTED 1.0 features a novel application of a switching function to retain or displace key water molecules from the protein-ligand complexes. Two independent modules, ProCESS and SMART, were developed to set up the proteins and the ligands prior to the docking stage. Validation of the accuracy of the software was achieved via the application of FITTED 1.0 to the docking of inhibitors of HIV-1 protease, thymidine kinase, trypsin, factor Xa, and MMP to their respective proteins.

  • Evaluation of docking programs for predicting binding of Golgi alpha-mannosidase II inhibitors: a comparison with crystallography.
    Englebienne, Pablo and Fiaux, Hélène and Kuntz, Douglas A and Corbeil, Christopher R and Gerber-Lemaire, Sandrine and Rose, David R and Moitessier, Nicolas
    Proteins, 2007, 69(1), 160-176
    PMID: 17557336     doi: 10.1002/prot.21479
    Golgi alpha-mannosidase II (GMII), a zinc-dependent glycosyl hydrolase, is a promising target for drug development in anti-tumor therapies. Using X-ray crystallography, we have determined the structure of Drosophila melanogaster GMII (dGMII) complexed with three different inhibitors exhibiting IC50's ranging from 80 to 1000 microM. These structures, along with those of seven other available dGMII/inhibitor complexes, were then used as a basis for the evaluation of seven docking programs (GOLD, Glide, FlexX, AutoDock, eHiTS, LigandFit, and FITTED). We found that small inhibitors could be accurately docked by most of the software, while docking of larger compounds (i.e., those with extended aromatic cycles or long aliphatic chains) was more problematic. Overall, Glide provided the best docking results, with the most accurately predicted binding around the active site zinc atom. Further evaluation of Glide's performance revealed its ability to extract active compounds from a benchmark library of decoys.

  • Evaluations of molecular docking programs for virtual screening.
    Onodera, Kenji and Satou, Kazuhito and Hirota, Hiroshi
    Journal of chemical information and modeling, 2007, 47(4), 1609-1618
    PMID: 17602548     doi: 10.1021/ci7000378
    Structure-based virtual screening is carried out using molecular docking programs. A number of such docking programs are currently available, and the selection of docking program is difficult without knowing the characteristics or performance of each program. In this study, the screening performances of three molecular docking programs, DOCK, AutoDock, and GOLD, were evaluated with 116 target proteins. The screening performances were validated using two novel standards, along with a traditional enrichment rate measurement. For the evaluations, each docking run was repeated 1000 times with three initial conformations of a ligand. While each docking program has some merit over the other docking programs in some aspects, DOCK showed an unexpectedly better screening performance in the enrichment rates. Finally, we made several recommendations based on the evaluation results to enhance the screening performances of the docking programs.

  • Comments on the article "On evaluating molecular-docking methods for pose prediction and enrichment factors".
    Perola, Emanuele and Walters, W Patrick and Charifson, Paul
    Journal of chemical information and modeling, 2007, 47(2), 251-253
    PMID: 17260981     doi: 10.1021/ci600460h
    The recent article "On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors" (Chen H. et al. J. Chem. Inf. Model. 2006, 46, 401-415) contains a series of comments on a similar study we published in Proteins in 2004 (Perola et al. Proteins 2004, 56, 235-249). We believe that some of these comments are misleading, and we feel that an adequate response is in order.

  • ParDOCK: An all atom energy based Monte Carlo docking protocol for protein-ligand complexes
    Gupta, A. and Gandhimathi, A. and Sharma, P. and Jayaram, B.
    Protein and peptide letters, 2007, 14(7), 632-646
    PMID: 17897088    
    We report here an all-atom energy based Monte Carlo docking procedure tested on a dataset of 226 protein-ligand complexes. Average root mean square deviation ( RMSD) from crystal conformation was observed to be similar to 0.53 angstrom. The correlation coefficient (r(2)) for the predicted binding free energies calculated using the docked structures against experimental binding affinities was 0.72. The docking protocol is web-enabled as a free software at

  • EADock: docking of small molecules into protein active sites with a multiobjective evolutionary optimization.
    Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
    Proteins, 2007, 67(4), 1010-1025
    PMID: 17380512     doi: 10.1002/prot.21367
    In recent years, protein-ligand docking has become a powerful tool for drug development. Although several approaches suitable for high throughput screening are available, there is a need for methods able to identify binding modes with high accuracy. This accuracy is essential to reliably compute the binding free energy of the ligand. Such methods are needed when the binding mode of lead compounds is not determined experimentally but is needed for structure-based lead optimization. We present here a new docking software, called EADock, that aims at this goal. It uses an hybrid evolutionary algorithm with two fitness functions, in combination with a sophisticated management of the diversity. EADock is interfaced with the CHARMM package for energy calculations and coordinate handling. A validation was carried out on 37 crystallized protein-ligand complexes featuring 11 different proteins. The search space was defined as a sphere of 15 A around the center of mass of the ligand position in the crystal structure, and on the contrary to other benchmarks, our algorithm was fed with optimized ligand positions up to 10 A root mean square deviation (RMSD) from the crystal structure, excluding the latter. This validation illustrates the efficiency of our sampling strategy, as correct binding modes, defined by a RMSD to the crystal structure lower than 2 A, were identified and ranked first for 68% of the complexes. The success rate increases to 78% when considering the five best ranked clusters, and 92% when all clusters present in the last generation are taken into account. Most failures could be explained by the presence of crystal contacts in the experimental structure. Finally, the ability of EADock to accurately predict binding modes on a real application was illustrated by the successful docking of the RGD cyclic pentapeptide on the alphaVbeta3 integrin, starting far away from the binding pocket.

  • Validation studies of the site-directed docking program LibDock.
    Rao, Shashidhar N and Head, Martha S and Kulkarni, Amit and LaLonde, Judith M
    Journal of chemical information and modeling, 2007, 47(6), 2159-2171
    PMID: 17985863     doi: 10.1021/ci6004299
    The performance of the site-features docking algorithm LibDock has been evaluated across eight GlaxoSmithKline targets as a follow-up to a broad validation study of docking and scoring software (Warren, G. L.; Andrews, W. C.; Capelli, A.; Clarke, B.; Lalonde, J.; Lambert, M. H.; Lindvall, M.; Nevins, N.; Semus, S. F.; Senger, S.; Tedesco, G.; Walls, I. D.; Woolven, J. M.; Peishoff, C. E.; Head, M. S. J. Med. Chem. 2006, 49, 5912-5931). Docking experiments were performed to assess both the accuracy in reproducing the binding mode of the ligand and the retrieval of active compounds in a virtual screening protocol using both the DJD (Diller, D. J.; Merz, K. M., Jr. Proteins 2001, 43, 113-124) and LigScore2 (Krammer, A. K.; Kirchoff, P. D.; Jiang, X.; Venkatachalam, C. M.; Waldman, M. J. Mol. Graphics Modell. 2005, 23, 395-407) scoring functions. This study was conducted using DJD scoring, and poses were rescored using all available scoring functions in the Accelrys LigandFit module, including LigScore2. For six out of eight targets at least 30% of the ligands were docked within a root-mean-square difference (RMSD) of 2.0 A for the crystallographic poses when the LigScore2 scoring function was used. LibDock retrieved at least 20% of active compounds in the top 10% of screened ligands for four of the eight targets in the virtual screening protocol. In both studies the LigScore2 scoring function enhanced the retrieval of crystallographic poses or active compounds in comparison with the results obtained using the DJD scoring function. The results for LibDock accuracy and ligand retrieval in virtual screening are compared to 10 other docking and scoring programs. These studies demonstrate the utility of the LigScore2 scoring function and that LibDock as a feature directed docking method performs as well as docking programs that use genetic/growing and Monte Carlo driven algorithms.

  • Lessons in molecular recognition. 2. Assessing and improving cross-docking accuracy.
    Sutherland, Jeffrey J and Nandigam, Ravi K and Erickson, Jon A and Vieth, Michal
    Journal of chemical information and modeling, 2007, 47(6), 2293-2302
    PMID: 17956084     doi: 10.1021/ci700253h
    Docking methods are used to predict the manner in which a ligand binds to a protein receptor. Many studies have assessed the success rate of programs in self-docking tests, whereby a ligand is docked into the protein structure from which it was extracted. Cross-docking, or using a protein structure from a complex containing a different ligand, provides a more realistic assessment of a docking program's ability to reproduce X-ray results. In this work, cross-docking was performed with CDocker, Fred, and Rocs using multiple X-ray structures for eight proteins (two kinases, one nuclear hormone receptor, one serine protease, two metalloproteases, and two phosphodiesterases). While average cross-docking accuracy is not encouraging, it is shown that using the protein structure from the complex that contains the bound ligand most similar to the docked ligand increases docking accuracy for all methods ("similarity selection"). Identifying the most successful protein conformer ("best selection") and similarity selection substantially reduce the difference between self-docking and average cross-docking accuracy. We identify universal predictors of docking accuracy (i.e., showing consistent behavior across most protein-method combinations), and show that models for predicting docking accuracy built using these parameters can be used to select the most appropriate docking method.

  • FLIPDock: docking flexible ligands into flexible receptors.
    Zhao, Yong and Sanner, Michel F
    Proteins, 2007, 68(3), 726-737
    PMID: 17523154     doi: 10.1002/prot.21423
    Conformational changes of biological macromolecules when binding with ligands have long been observed and remain a challenge for automated docking methods. Here we present a novel protein-ligand docking software called FLIPDock (Flexible LIgand-Protein Docking) allowing the automated docking of flexible ligand molecules into active sites of flexible receptor molecules. In FLIPDock, conformational spaces of molecules are encoded using a data structure that we have developed recently called the Flexibility Tree (FT). While the FT can represent fully flexible ligands, it was initially designed as a hierarchical and multiresolution data structure for the selective encoding of conformational subspaces of large biological macromolecules. These conformational subspaces can be built to span a range of conformations important for the biological activity of a protein. A variety of motions can be combined, ranging from domains moving as rigid bodies or backbone atoms undergoing normal mode-based deformations, to side chains assuming rotameric conformations. In addition, these conformational subspaces are parameterized by a small number of variables which can be searched during the docking process, thus effectively modeling the conformational changes in a flexible receptor. FLIPDock searches the variables using genetic algorithm-based search techniques and evaluates putative docking complexes with a scoring function based on the AutoDock3.05 force-field. In this paper, we describe the concepts behind FLIPDock and the overall architecture of the program. We demonstrate FLIPDock's ability to solve docking problems in which the assumption of a rigid receptor previously prevented the successful docking of known ligands. In particular, we repeat an earlier cross docking experiment and demonstrate an increased success rate of 93.5%, compared to original 72% success rate achieved by AutoDock over the 400 cross-docking calculations. We also demonstrate FLIPDock's ability to handle conformational changes involving backbone motion by docking balanol to an adenosine-binding pocket of protein kinase A.

  • Alternative to consensus scoring-a new approach toward the qualitative combination of docking algorithms.
    Wolf, Antje and Zimmermann, Marc and Hofmann-Apitius, Martin
    Journal of chemical information and modeling, 2007, 47(3), 1036-1044
    PMID: 17492829     doi: 10.1021/ci6004965
    Since the development of the first docking algorithm in the early 1980s a variety of different docking approaches and tools has been created in order to solve the docking problem. Subsequent studies have shown that the docking performance of most tools strongly depends on the considered target. Thus it is hard to choose the best algorithm in the situation at hand. The docking tools FlexX and AutoDock are among the most popular programs for docking flexible ligands into target proteins. Their analysis, comparison, and combination are the topics of this study. In contrast to standard consensus scoring techniques which integrate different scoring algorithms usually only by their rank, we focus on a more general approach. Our new combined docking workflow-AutoxX-unifies the interaction models of AutoDock and FlexX rather than combining the scores afterward which allows interpretability of the results. The performance of FlexX, AutoDock, and the combined algorithm AutoxX was evaluated on the basis of a test set of 204 structures from the Protein Data Bank (PDB). AutoDock and FlexX show a highly diverse redocking accuracy at the different complexes which assures again the usefulness of taking several docking algorithms into account. With the combined docking the number of complexes reproduced below an rmsd of 2.5 A could be raised by 10. AutoxX had a strong positive effect on several targets. The highest performance increase could be found when redocking 20 protein-ligand complexes of alpha-thrombin, plasmepsin, neuraminidase, and d-xylose isomerase. A decrease was found for gamma-chymotrypsin. The results show that-applied to the right target-AutoxX can improve the docking performance compared to AutoDock and FlexX alone.

  • pso@autodock: a fast flexible molecular docking program based on Swarm intelligence.
    Namasivayam, Vigneshwaran and Günther, Robert
    Chemical biology & drug design, 2007, 70(6), 475-484
    PMID: 17986206     doi: 10.1111/j.1747-0285.2007.00588.x
    On the quest of novel therapeutics, molecular docking methods have proven to be valuable tools for screening large libraries of compounds determining the interactions of potential drugs with the target proteins. A widely used docking approach is the simulation of the docking process guided by a binding energy function. On the basis of the molecular docking program autodock, we present pso@autodock as a tool for fast flexible molecular docking. Our novel Particle Swarm Optimization (PSO) algorithms varCPSO and varCPSO-ls are suited for rapid docking of highly flexible ligands. Thus, a ligand with 23 rotatable bonds was successfully docked within as few as 100 000 computing steps (rmsd

  • A multivariate approach to investigate docking parameters' effects on docking performance
    Andersson, C. David and Thysell, Elin and Lindstrom, Anton and Bylesjo, Max and Raubacher, Florian and Linusson, Anna
    Journal of chemical information and modeling, 2007, 47(4), 1673-1687
    PMID: 17559207     doi: 10.1021/ci6005596
    Increasingly powerful docking programs for analyzing and estimating the strength of protein-ligand interactions have been developed in recent decades, and they are now valuable tools in drug discovery. Software used to perform dockings relies on a number of parameters that affect various steps in the docking procedure. However, identifying the best choices of the settings for these parameters is often challenging. Therefore, the settings of the parameters are quite often left at their default values, even though scientists with long experience with a specific docking tool know that modifying certain parameters can improve the results. In the study presented here, we have used statistical experimental design and subsequent regression based on root-mean-square deviation values using partial least-square projections to latent structures (PLS) to scrutinize the effects of different parameters on the docking performance of two software packages: FRED and GOLD. Protein-ligand complexes with a high level of ligand diversity were selected from the PDBbind database for the study, using principal component analysis based on 1D and 2D descriptors, and space-filling design. The PLS models showed quantitative relationships between the docking parameters and the ability of the programs to reproduce the ligand crystallographic conformation. The PLS models also revealed which of the parameters and what parameter settings were important for the docking performance of the two programs. Furthermore, the variation in docking results obtained with specific parameter settings for different protein-ligand complexes in the diverse set examined indicates that there is great potential for optimizing the parameter settings for selected sets of proteins.

  • eHiTS: A new fast, exhaustive flexible ligand docking system
    Zsoldos, Zsolt and Reid, Darryl and Simon, Aniko and Sadjad, Sayyed Bashir and Johnson, A. Peter
    Journal of molecular graphics & modelling, 2007, 26(1), 198-212
    PMID: 16860582     doi: 10.1016/j.jmgm.2006.06.002
    The flexible ligand docking problem is divided into two subproblems: pose/conformation search and scoring function. For successful virtual screening the search algorithm must be fast and able to find the optimal binding pose and conformation of the ligand. Statistical analysis of experimental data of bound ligand conformations is presented with conclusions about the sampling requirements for docking algorithms.eHiTS is an exhaustive flexible-docking method that systematically covers the part of the conformational and positional search space that avoids severe steric clashes, producing highly accurate docking poses at a speed practical for virtual high-throughput screening.The customizable scoring function of eHiTS combines novel terms (based on local surface point contact evaluation) with traditional empirical and statistical approaches.Validation results of eHiTS are presented and compared to three other docking software on a set of 91 PDB structures that are common to the validation sets published for the other programs. (C) 2006 Elsevier Inc. All rights reserved.

  • Comparison of topological, shape, and docking methods in virtual screening.
    McGaughey, Georgia B and Sheridan, Robert P and Bayly, Christopher I and Culberson, J Chris and Kreatsoulas, Constantine and Lindsley, Stacey and Maiorov, Vladimir and Truchon, Jean-Francois and Cornell, Wendy D
    Journal of chemical information and modeling, 2007, 47(4), 1504-1519
    PMID: 17591764     doi: 10.1021/ci700052x
    Virtual screening benchmarking studies were carried out on 11 targets to evaluate the performance of three commonly used approaches: 2D ligand similarity (Daylight, TOPOSIM), 3D ligand similarity (SQW, ROCS), and protein structure-based docking (FLOG, FRED, Glide). Active and decoy compound sets were assembled from both the MDDR and the Merck compound databases. Averaged over multiple targets, ligand-based methods outperformed docking algorithms. This was true for 3D ligand-based methods only when chemical typing was included. Using mean enrichment factor as a performance metric, Glide appears to be the best docking method among the three with FRED a close second. Results for all virtual screening methods are database dependent and can vary greatly for particular targets.

  • Optimizing fragment and scaffold docking by use of molecular interaction fingerprints.
    Marcou, Gilles and Rognan, Didier
    Journal of chemical information and modeling, 2007, 47(1), 195-207
    PMID: 17238265     doi: 10.1021/ci600342e
    Protein-ligand interaction fingerprints have been used to postprocess docking poses of three ligand data sets: a set of 40 low-molecular-weight compounds from the Protein Data Bank, a collection of 40 scaffolds from pharmaceutically relevant protein ligands, and a database of 19 scaffolds extracted from true cdk2 inhibitors seeded in 2230 scaffold decoys. Four popular docking tools (FlexX, Glide, Gold, and Surflex) were used to generate poses for ligands of the three data sets. In all cases, scoring by the similarity of interaction fingerprints to a given reference was statistically superior to conventional scoring functions in posing low-molecular-weight fragments, predicting protein-bound scaffold coordinates according to the known binding mode of related ligands, and screening a scaffold library to enrich a hit list in true cdk2-targeted scaffolds.

  • Analysis of ligand-bound water molecules in high-resolution crystal structures of protein-ligand complexes.
    Lu, Yipin and Wang, Renxiao and Yang, Chao-Yie and Wang, Shaomeng
    Journal of chemical information and modeling, 2007, 47(2), 668-675
    PMID: 17266298     doi: 10.1021/ci6003527
    We have performed a comprehensive analysis of water molecules at the protein-ligand interfaces observed in 392 high-resolution crystal structures. There are a total of 1829 ligand-bound water molecules in these 392 complexes; 18% are surface water molecules, and 72% are interfacial water molecules. The number of ligand-bound water molecules in each complex structure ranges from 0 to 21 and has an average of 4.6. Of these interfacial water molecules, 76% are considered to be bridging water molecules, characterized by having polar interactions with both ligand and protein atoms. Among a number of factors that may influence the number of ligand-bound water molecules, the polar van der Waals (vdw) surface area of ligands has the highest Pearson linear correlation coefficient of 0.63. Our regression analysis predicted that one more ligand-bound water molecule is expected for every additional 24 A2 in the polar vdw surface area of the ligand. In contrast to the observation that the resolution is the primary factor influencing the number of water molecules in crystallographic models of proteins, we found that there is only a weak relationship between the number of ligand-bound water molecules and the resolution of the crystal structures. An analysis of the isotropic B factors of buried ligand-bound water molecules suggested that, when water molecules have fewer than two polar interactions with the protein-ligand complex, they are more mobile than protein atoms in the crystal structures; when they have more than three polar interactions, they are significantly less mobile than protein atoms.

  • Diverse, high-quality test set for the validation of protein-ligand docking performance.
    Hartshorn, Michael J and Verdonk, Marcel L and Chessari, Gianni and Brewerton, Suzanne C. and Mooij, Wijnand T M and Mortenson, Paul N and Murray, Christopher W
    Journal of medicinal chemistry, 2007, 50(4), 726-741
    PMID: 17300160     doi: 10.1021/jm061277y
    A procedure for analyzing and classifying publicly available crystal structures has been developed. It has been used to identify high-resolution protein-ligand complexes that can be assessed by reconstructing the electron density for the ligand using the deposited structure factors. The complexes have been clustered according to the protein sequences, and clusters have been discarded if they do not represent proteins thought to be of direct interest to the pharmaceutical or agrochemical industry. Rules have been used to exclude complexes containing non-drug-like ligands. One complex from each cluster has been selected where a structure of sufficient quality was available. The final Astex diverse set contains 85 diverse, relevant protein-ligand complexes, which have been prepared in a format suitable for docking and are to be made freely available to the entire research community ( The performance of the docking program GOLD against the new set is assessed using a variety of protocols. Relatively unbiased protocols give success rates of approximately 80% for redocking into native structures, but it is possible to get success rates of over 90% with some protocols.

  • Supervised scoring models with docked ligand conformations for structure-based virtual screening.
    Teramoto, Reiji and Fukunishi, Hiroaki
    Journal of chemical information and modeling, 2007, 47(5), 1858-1867
    PMID: 17685604     doi: 10.1021/ci700116z
    Protein-ligand docking programs have been used to efficiently discover novel ligands for target proteins from large-scale compound databases. However, better scoring methods are needed. Generally, scoring functions are optimized by means of various techniques that affect their fitness for reproducing X-ray structures and protein-ligand binding affinities. However, these scoring functions do not always work well for all target proteins. A scoring function should be optimized for a target protein to enhance enrichment for structure-based virtual screening. To address this problem, we propose the supervised scoring model (SSM), which takes into account the protein-ligand binding process using docked ligand conformations with supervised learning for optimizing scoring functions against a target protein. SSM employs a rough linear correlation between binding free energy and the root mean square deviation of a native ligand for predicting binding energy. We applied SSM to the FlexX scoring function, that is, F-Score, with five different target proteins: thymidine kinase (TK), estrogen receptor (ER), acetylcholine esterase (AChE), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). For these five proteins, SSM always enhanced enrichment better than F-Score, exhibiting superior performance that was particularly remarkable for TK, AChE, and PPARgamma. We also demonstrated that SSM is especially good at enhancing enrichments of the top ranks of screened compounds, which is useful in practical drug screening.

  • GlamDock: development and validation of a new docking tool on several thousand protein-ligand complexes.
    Tietze, Simon and Apostolakis, Joannis
    Journal of chemical information and modeling, 2007, 47(4), 1657-1672
    PMID: 17585857     doi: 10.1021/ci7001236
    In this study, we present GlamDock, a new docking tool for flexible ligand docking. GlamDock (version 1.0) is based on a simple Monte Carlo with minimization procedure. The main features of the method are the energy function, which is a continuously differentiable empirical potential, and the definition of the search space, which combines internal coordinates for the conformation of the ligand, with a mapping-based description of the rigid body translation and rotation. First, we validate GlamDock on a standard benchmark, a set of 100 protein-ligand complexes, which allows comparative evaluation to existing docking tools. The results on this benchmark show that GlamDock is at least comparable in efficiency and accuracy to the best existing docking tools. The main focus of this work is the validation on the scPDB database of protein-ligand complexes. The size of this data set allows a thorough analysis of the dependencies of docking accuracy on features of the protein-ligand system. In particular, it allows a two-dimensional analysis of the results, which identifies a number of interesting dependencies that are generally lost or even misinterpreted in the one-dimensional approach. The overall result that GlamDock correctly predicts the complex structure in practically half of the cases in the scPDB is important not only for screening ligands against a particular protein but even more so for inverse screening, that is, the identification of the correct targets for a particular ligand.

  • A semiempirical free energy force field with charge-based desolvation.
    Huey, Ruth and Morris, Garrett M and Olson, Arthur J and Goodsell, David S
    Journal of computational chemistry, 2007, 28(6), 1145-1152
    PMID: 17274016     doi: 10.1002/jcc.20634
    The authors describe the development and testing of a semiempirical free energy force field for use in AutoDock4 and similar grid-based docking methods. The force field is based on a comprehensive thermodynamic model that allows incorporation of intramolecular energies into the predicted free energy of binding. It also incorporates a charge-based method for evaluation of desolvation designed to use a typical set of atom types. The method has been calibrated on a set of 188 diverse protein-ligand complexes of known structure and binding energy, and tested on a set of 100 complexes of ligands with retroviral proteases. The force field shows improvement in redocking simulations over the previous AutoDock3 force field.


  • Identification and Evaluation of Molecular Properties Related to Preclinical Optimization and Clinical Fate
    Wang, Zhanli and Huo, Jianxin and Sun, Lidan and Wang, Yongfu and Jin, Hongwei and Yu, Hui and Zhang, Liangren and Zhou, Lishe
    Current medicinal chemistry, 2006, 13(1), 214-227
    PMID: 21595631     doi: 10.2174/092986706775197999
    ... Page 3. Privileged Structures as leads in Medicinal Chemistry ... 2), Ki

  • Identification and Evaluation of Molecular Properties Related to Preclinical Optimization and Clinical Fate
    Wang, Zhanli and Huo, Jianxin and Sun, Lidan and Wang, Yongfu and Jin, Hongwei and Yu, Hui and Zhang, Liangren and Zhou, Lishe
    Current medicinal chemistry, 2006, 13(1), 214-227
    PMID: 21595631     doi: 10.2174/092986706775197999
    ... Page 3. Privileged Structures as leads in Medicinal Chemistry ... 2), Ki

  • Benchmarking Sets for Molecular Docking
    Huang, Niu and Shoichet, Brian K and Irwin, John J
    Journal of medicinal chemistry, 2006, 49(23), 6789-6801
    doi: 10.1021/jm0608356
    Ligand enrichment among top-ranking hits is a key metric of molecular docking. To avoid bias, decoys should resemble ligands physically, so that enrichment is not simply a separation of gross features, yet be chemically distinct from them, so that they are unlikely ...

  • Benchmarking Sets for Molecular Docking
    Huang, Niu and Shoichet, Brian K and Irwin, John J
    Journal of medicinal chemistry, 2006, 49(23), 6789-6801
    doi: 10.1021/jm0608356
    Ligand enrichment among top-ranking hits is a key metric of molecular docking. To avoid bias, decoys should resemble ligands physically, so that enrichment is not simply a separation of gross features, yet be chemically distinct from them, so that they are unlikely ...

  • M-Score: A Knowledge-Based Potential Scoring Function Accounting for Protein Atom Mobility
    Yang, Chao-Yie and Wang, Renxiao and Wang, Shaomeng
    Journal of medicinal chemistry, 2006, 49(20), 5903-5911
    doi: 10.1021/jm050043w

  • Information-driven protein-DNA docking using HADDOCK: it is a matter of flexibility.
    van Dijk, Marc and van Dijk, Aalt D J and Hsu, Victor and Boelens, Rolf and Bonvin, Alexandre M J J
    Nucleic acids research, 2006, 34(11), 3317-3325
    PMID: 16820531     doi: 10.1093/nar/gkl412
    Intrinsic flexibility of DNA has hampered the development of efficient protein-DNA docking methods. In this study we extend HADDOCK (High Ambiguity Driven DOCKing) [C. Dominguez, R. Boelens and A. M. J. J. Bonvin (2003) J. Am. Chem. Soc. 125, 1731-1737] to explicitly deal with DNA flexibility. HADDOCK uses non-structural experimental data to drive the docking during a rigid-body energy minimization, and semi-flexible and water refinement stages. The latter allow for flexibility of all DNA nucleotides and the residues of the protein at the predicted interface. We evaluated our approach on the monomeric repressor-DNA complexes formed by bacteriophage 434 Cro, the Escherichia coli Lac headpiece and bacteriophage P22 Arc. Starting from unbound proteins and canonical B-DNA we correctly predict the correct spatial disposition of the complexes and the specific conformation of the DNA in the published complexes. This information is subsequently used to generate a library of pre-bent and twisted DNA structures that served as input for a second docking round. The resulting top ranking solutions exhibit high similarity to the published complexes in terms of root mean square deviations, intermolecular contacts and DNA conformation. Our two-stage docking method is thus able to successfully predict protein-DNA complexes from unbound constituents using non-structural experimental data to drive the docking.

  • A method for induced-fit docking, scoring, and ranking of flexible ligands. Application to peptidic and pseudopeptidic beta-secretase (BACE 1) inhibitors.
    Moitessier, Nicolas and Therrien, Eric and Hanessian, Stephen
    Journal of medicinal chemistry, 2006, 49(20), 5885-5894
    PMID: 17004704     doi: 10.1021/jm050138y
    Inhibition of beta-secretase (BACE 1) has recently been investigated as a promising therapeutic approach in the treatment of Alzheimer's disease, and a growing number of BACE 1 inhibitors and crystal structures of BACE 1/inhibitors complexes have been reported. We report herein a predictive computational method and its application to potential BACE 1 inhibitors. Using a training set of 50 known highly flexible inhibitors, we developed a docking method that accounts for the flexibility of both the protein and the inhibitors. Protein flexibility is accounted for using a specifically designed genetic algorithm. We next developed a scoring function consisting of force field evaluation of the inhibitor/protein interactions and two additional terms for hydrogen bonding and entropy change upon binding. Discarding three outliers from the training set, our protocol was found to perform well with an rmsd of 1.19 kcal/mol. Evaluation of the predictive power was next carried out by virtual screening of 80 synthetic compounds. The significant enrichment at the top of the ranking list in active compounds demonstrated the ability of the docking and scoring protocol to rank the compounds relative to their activities.

  • Scoring functions for protein-ligand docking.
    Jain, Ajay N
    Current Protein & Peptide Science, 2006, 7(5), 407-420
    PMID: 17073693    
    Virtual screening by molecular docking has become established as a method for drug lead discovery and optimization. All docking algorithms make use of a scoring function in combination with a method of search. Two theoretical aspects of scoring function performance dominate operational performance. The first is the degree to which a scoring function has a global extremum within the ligand pose landscape at the proper location. The second is the degree to which the magnitude of the function at the extremum is accurate. Presuming adequate search strategies, a scoring function's location performance will dominate behavior with respect to docking accuracy: the degree to which a predicted pose of a ligand matches experimental observation. A scoring function's magnitude performance will dominate behavior with respect to screening utility: enrichment of true ligands over non-ligands. Magnitude estimation also controls pure scoring accuracy: the degree to which bona fide ligands of a particular protein may be correctly ranked. Approaches to the development of scoring functions have varied widely, with a number of functions yielding similarly high levels of performance relating to the location issue. However, even among functions performing equally well on location, widely varying performance is observed on the question of magnitude. In many cases, performance is good enough to yield high enrichments of true ligands versus non-ligands in screening across a wide variety of protein types. Generally, performance is not good enough to correctly rank among true ligands. Strategies for improvement are discussed.

  • Parameter estimation for scoring protein-ligand interactions using negative training data.
    Pham, Tuan A and Jain, Ajay N
    Journal of medicinal chemistry, 2006, 49(20), 5856-5868
    PMID: 17004701     doi: 10.1021/jm050040j
    Surflex-Dock employs an empirically derived scoring function to rank putative protein-ligand interactions by flexible docking of small molecules to proteins of known structure. The scoring function employed by Surflex was developed purely on the basis of positive data, comprising noncovalent protein-ligand complexes with known binding affinities. Consequently, scoring function terms for improper interactions received little weight in parameter estimation, and an ad hoc scheme for avoiding protein-ligand interpenetration was adopted. We present a generalized method for incorporating synthetically generated negative training data, which allows for rigorous estimation of all scoring function parameters. Geometric docking accuracy remained excellent under the new parametrization. In addition, a test of screening utility covering a diverse set of 29 proteins and corresponding ligand sets showed improved performance. Maximal enrichment of true ligands over nonligands exceeded 20-fold in over 80% of cases, with enrichment of greater than 100-fold in over 50% of cases.

  • PSI-DOCK: towards highly efficient and accurate flexible ligand docking.
    Pei, Jianfeng and Wang, Qi and Liu, Zhenming and Li, Qingliang and Yang, Kun and Lai, Luhua
    Proteins, 2006, 62(4), 934-946
    PMID: 16395666     doi: 10.1002/prot.20790
    We have developed a new docking method, Pose-Sensitive Inclined (PSI)-DOCK, for flexible ligand docking. An improved SCORE function has been developed and used in PSI-DOCK for binding free energy evaluation. The improved SCORE function was able to reproduce the absolute binding free energies of a training set of 200 protein-ligand complexes with a correlation coefficient of 0.788 and a standard error of 8.13 kJ/mol. For ligand binding pose exploration, a unique searching strategy was designed in PSI-DOCK. In the first step, a tabu-enhanced genetic algorithm with a rapid shape-complementary scoring function is used to roughly explore and store potential binding poses of the ligand. Then, these predicted binding poses are optimized and compete against each other by using a genetic algorithm with the accurate SCORE function to determine the binding pose with the lowest docking energy. The PSI-DOCK 1.0 program is highly efficient in identifying the experimental binding pose. For a test dataset of 194 complexes, PSI-DOCK 1.0 achieved a 67% success rate (RMSD < 2.0 A) for only one run and a 74% success rate for 10 runs. PSI-DOCK can also predict the docking binding free energy with high accuracy. For a test set of 64 complexes, the correlation between the experimentally observed binding free energies and the docking binding free energies for 64 complexes is r

  • Protein Alpha Shape (PAS) Dock: a new gaussian-based score function suitable for docking in homology modelled protein structures.
    T{\o}ndel, Kristin and Anderssen, Endre and Drabl{\o}s, Finn
    Journal of computer-aided molecular design, 2006, 20(3), 131-144
    PMID: 16652207     doi: 10.1007/s10822-006-9041-7
    Protein Alpha Shape (PAS) Dock is a new empirical score function suitable for virtual library screening using homology modelled protein structures. Here, the score function is used in combination with the geometry search method Tabu search. A description of the protein binding site is generated using gaussian property fields like in Protein Alpha Shape Similarity Analysis (PASSA). Gaussian property fields are also used to describe the ligand properties. The overlap between the receptor and ligand hydrophilicity and lipophilicity fields is maximised, while minimising steric clashes. Gaussian functions introduce a smoothing of the property fields. This makes the score function robust against small structural variations, and therefore suitable for use with homology models. This also makes it less critical to include protein flexibility in the docking calculations. We use a fast and simplified version of the score function in the geometry search, while a more detailed version is used for the final prediction of the binding free energies. This use of a two-level scoring makes PAS-Dock computationally efficient, and well suited for virtual screening. The PAS-Dock score function is trained on 218 X-ray structures of protein- ligand complexes with experimental binding affinities. The performance of PAS-Dock is compared to two other docking methods, AutoDock and MOE-Dock, with respect to both accuracy and computational efficiency. According to this study, PAS-Dock is more computationally efficient than both AutoDock and MOE-Dock, and gives a better prediction of the free energies of binding. PAS-Dock is also more robust against structural variations than AutoDock.

  • kinDOCK: a tool for comparative docking of protein kinase ligands.
    Martin, Laetitia and Catherinot, Vincent and Labesse, Gilles
    Nucleic acids research, 2006, 34(Web Server issue), W325-9
    PMID: 16845019     doi: 10.1093/nar/gkl211
    KinDOCK is a new web server for the analysis of ATP-binding sites of protein kinases. This characterization is based on the docking of ligands already co-crystallized with other protein kinases. A structural library of protein kinase-ligand complexes has been extracted from the Protein Data Bank (PDB). This library can provide both potential ligands and their putative binding orientation for a given protein kinase. After protein-protein structural superposition, the ligands are transferred from the template complexes to the target protein kinase. The resulting complexes are evaluated using the program SCORE to compute a theoretical affinity. They can be dynamically visualized to allow a rapid mapping of important steric clashes and potential substitutions relevant for specificity and affinity. These characteristics allow a quick characterization of protein kinase active sites including conformation changes potentially required to accommodate particular ligands. Additionally, promising pharmacophores can be identified in the focussed library. These features will help to rationalize or optimize virtual screening (VS) on larger chemical compound libraries. The server and its documentation are freely available at

  • Effective handling of induced-fit motion in flexible docking.
    Mizutani, Miho Yamada and Takamatsu, Yoshihiro and Ichinose, Tazuko and Nakamura, Kensuke and Itai, Akiko
    Proteins, 2006, 63(4), 878-891
    PMID: 16532451     doi: 10.1002/prot.20931
    For structure-based drug design, where various ligand structures need to be docked to a target protein structure, a docking method that can handle conformational flexibility of not only the ligand, but also the protein, is indispensable. We have developed a simple and effective approach for dealing with the local induced-fit motion of the target protein, and implemented it in our docking tool, ADAM. Our approach efficiently combines the following two strategies: a vdW-offset grid in which the protein cavity is enlarged uniformly, and structure optimization allowing the motion of ligand and protein atoms. To examine the effectiveness of our approach, we performed docking validation studies, including redocking in 18 test cases and foreign-docking, in which various ligands from foreign crystal structures of complexes are docked into a target protein structure, in 22 cases (on five target proteins). With the original ADAM, the correct docking modes (RMSD < 2.0 A) were not present among the top 20 models in one case of redocking and four cases of foreign-docking. When the handling of induced-fit motion was implemented, the correct solutions were acquired in all 40 test cases. In foreign-docking on thymidine kinase, the correct docking modes were obtained as the top-ranked solutions for all 10 test ligands by our combinatorial approach, and this appears to be the best result ever reported with any docking tool. The results of docking validation have thus confirmed the effectiveness of our approach, which can provide reliable docking models even in the case of foreign-docking, where conformational change of the target protein cannot be ignored. We expect that this approach will contribute substantially to actual drug design, including virtual screening.

  • A critical assessment of docking programs and scoring functions.
    Warren, Gregory L and Andrews, C Webster and Capelli, Anna-Maria and Clarke, Brian and LaLonde, Judith and Lambert, Millard H and Lindvall, Mika and Nevins, Neysa and Semus, Simon F and Senger, Stefan and Tedesco, Giovanna and Wall, Ian D and Woolven, James M and Peishoff, Catherine E and Head, Martha S
    Journal of medicinal chemistry, 2006, 49(20), 5912-5931
    PMID: 17004707     doi: 10.1021/jm050362n
    Docking is a computational technique that samples conformations of small molecules in protein binding sites; scoring functions are used to assess which of these conformations best complements the protein binding site. An evaluation of 10 docking programs and 37 scoring functions was conducted against eight proteins of seven protein types for three tasks: binding mode prediction, virtual screening for lead identification, and rank-ordering by affinity for lead optimization. All of the docking programs were able to generate ligand conformations similar to crystallographically determined protein/ligand complex structures for at least one of the targets. However, scoring functions were less successful at distinguishing the crystallographic conformation from the set of docked poses. Docking programs identified active compounds from a pharmaceutically relevant pool of decoy compounds; however, no single program performed well for all of the targets. For prediction of compound affinity, none of the docking programs or scoring functions made a useful prediction of ligand binding affinity.

  • On evaluating molecular-docking methods for pose prediction and enrichment factors.
    Chen, Hongming and Lyne, Paul D and Giordanetto, Fabrizio and Lovell, Timothy and Li, Jin
    Journal of chemical information and modeling, 2006, 46(1), 401-415
    PMID: 16426074     doi: 10.1021/ci0503255
    Four of the most well-known, commercially available docking programs, FlexX, GOLD, GLIDE, and ICM, have been examined for their ligand-docking and virtual-screening capabilities. The relative performance of the programs in reproducing the native ligand conformation from starting SMILES strings for 164 high-resolution protein-ligand complexes is presented and compared. Applying only the native scoring functions, the latest versions of these four docking programs were also used to conduct virtual screening for 12 protein targets of therapeutic interest, involving both publicly available structures and AstraZeneca in-house structures. The capability of the four programs to correctly rank-order target-specific active compounds over alternative binders and nonbinders (decoys plus randomly selected compounds) and thereby enrich a small subset of a screening library is compared. Enrichments from the virtual-screening experiments are contrasted with those obtained with alternative 3D shape-matching and 2D similarity database-search methods.

  • Protein-ligand docking: current status and future challenges.
    Sousa, Sérgio Filipe and Fernandes, Pedro Alexandrino and Ramos, Maria Jo{\~a}o
    Proteins, 2006, 65(1), 15-26
    PMID: 16862531     doi: 10.1002/prot.21082
    Understanding the ruling principles whereby protein receptors recognize, interact, and associate with molecular substrates and inhibitors is of paramount importance in drug discovery efforts. Protein-ligand docking aims to predict and rank the structure(s) arising from the association between a given ligand and a target protein of known 3D structure. Despite the breathtaking advances in the field over the last decades and the widespread application of docking methods, several downsides still exist. In particular, protein flexibility-a critical aspect for a thorough understanding of the principles that guide ligand binding in proteins-is a major hurdle in current protein-ligand docking efforts that needs to be more efficiently accounted for. In this review the key concepts of protein-ligand docking methods are outlined, with major emphasis being given to the general strengths and weaknesses that presently characterize this methodology. Despite the size of the field, the principal types of search algorithms and scoring functions are reviewed and the most popular docking tools are briefly depicted. Recent advances that aim to address some of the traditional limitations associated with molecular docking are also described. A selection of hand-picked examples is used to illustrate these features.

  • TarFisDock: a web server for identifying drug targets with docking approach
    Li, Honglin and Gao, Zhenting and Kang, Ling and Zhang, Hailei and Yang, Kun and Yu, Kunqian and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Shen, Jianhua and Wang, Xicheng and Jiang, Hualiang
    Nucleic acids research, 2006, 34(Web Server issue), W219-W224
    PMID: 16844997     doi: 10.1093/nar/gkl114
    TarFisDock is a web-based tool for automating the procedure of searching for small molecule-protein interactions over a large repertoire of protein structures. It offers PDTD (potential drug target database), a target database containing 698 protein structures covering 15 therapeutic areas and a reverse ligand protein docking program. In contrast to conventional ligand-protein docking, reverse ligand-protein docking aims to seek potential protein targets by screening an appropriate protein database. The input file of this web server is the small molecule to be tested, in standard mol2 format; TarFisDock then searches for possible binding proteins for the given small molecule by use of a docking approach. The ligand-protein interaction energy terms of the program DOCK are adopted for ranking the proteins. To test the reliability of the TarFisDock server, we searched the PDTD for putative binding proteins for vitamin E and 4H-tamoxifen. The top 2 and 10% candidates of vitamin E binding proteins identified by TarFisDock respectively cover 30 and 50% of reported targets verified or implicated by experiments; and 30 and 50% of experimentally confirmed targets for 4H-tamoxifen appear amongst the top 2 and 5% of the TarFisDock predicted candidates, respectively. Therefore, TarFisDock may be a useful tool for target identification, mechanism study of old drugs and probes discovered from natural products. TarFisDock and PDTD are available at

  • ROSETTALIGAND: protein-small molecule docking with full side-chain flexibility.
    Meiler, Jens and Baker, David
    Proteins, 2006, 65(3), 538-548
    PMID: 16972285     doi: 10.1002/prot.21086
    Protein-small molecule docking algorithms provide a means to model the structure of protein-small molecule complexes in structural detail and play an important role in drug development. In recent years the necessity of simulating protein side-chain flexibility for an accurate prediction of the protein-small molecule interfaces has become apparent, and an increasing number of docking algorithms probe different approaches to include protein flexibility. Here we describe a new method for docking small molecules into protein binding sites employing a Monte Carlo minimization procedure in which the rigid body position and orientation of the small molecule and the protein side-chain conformations are optimized simultaneously. The energy function comprises van der Waals (VDW) interactions, an implicit solvation model, an explicit orientation hydrogen bonding potential, and an electrostatics model. In an evaluation of the scoring function the computed energy correlated with experimental small molecule binding energy with a correlation coefficient of 0.63 across a diverse set of 229 protein- small molecule complexes. The docking method produced lowest energy models with a root mean square deviation (RMSD) smaller than 2 A in 71 out of 100 protein-small molecule crystal structure complexes (self-docking). In cross-docking calculations in which both protein side-chain and small molecule internal degrees of freedom were varied the lowest energy predictions had RMSDs less than 2 A in 14 of 20 test cases.

  • Multiple target screening method for robust and accurate in silico ligand screening.
    Fukunishi, Yoshifumi and Mikami, Yoshiaki and Kubota, Satoru and Nakamura, Haruki
    Journal of molecular graphics & modelling, 2006, 25(1), 61-70
    PMID: 16376595     doi: 10.1016/j.jmgm.2005.11.006
    We developed a new in silico multiple target screening (MTS) method, based on a multi-receptor versus multi-ligand docking affinity matrixes, and examined its robustness against changes in the scoring system. According to this method, compounds in a database are docked to multiple proteins. The compounds among these proteins that are likely bind to the target protein are selected as the members of the candidate-hit compound group. Then, the compounds in the group are sorted into descending order using the docking score: the first (n-th) compound is expected to be the most (n-th) probable hit compound. This method was applied to the analysis of a set of 142 receptors and 142 compounds using a receptor-ligand docking program, Sievgene [Y. Fukunishi, Y. Mikami, H. Nakamura, Similarities among receptor pockets and among compounds: analysis and application to in silico ligand screening, J. Mol. Graphics Modelling, 24 (2005) 34-45], and the results demonstrated that this method achieves a high hit ratio compared to uniform sampling. We prepared two new scores: the DeltaG score, designed to reproduce the protein-ligand binding free energy, and the hit-optimized score, designed to maximize the hit ratio of in silico screening. Using the Sievgene docking score, DeltaG score and hit-optimized score, the MTS method is more robust than the multiple active-site correction scoring method [G.P.A. Vigers, J.P. Rizzi, Multiple active site corrections for docking and virtual screening, J. Med. Chem., 47 (2004) 80-89].

  • Automatic and efficient decomposition of two-dimensional structures of small molecules for fragment-based high-throughput docking.
    Kolb, Peter and Caflisch, Amedeo
    Journal of medicinal chemistry, 2006, 49(25), 7384-7392
    PMID: 17149868     doi: 10.1021/jm060838i
    The computer program DAIM (Decomposition and Identification of Molecules) has been developed to automatically break up compounds in small-molecule libraries for fragment-based docking as well as database analysis. Here, DAIM is evaluated on 130 ligands derived from known crystal structures of ligand-protein complexes. The decomposition and a new fingerprint-based identification technique are used to select anchor fragments for docking. The docking results show that the DAIM selection is superior to size-based or random selection of fragments. To evaluate the usefulness for analyzing the fragment composition of a large library, DAIM is applied to a collection of about 1.85 million commercially available compounds. Interestingly, it is found that the set of most frequent cyclic and acyclic fragments originating from the decomposition of the 1.85 million molecules shows a large overlap with the most frequent fragments in a library of 5120 known drugs. DAIM has been successfully used in the in silico screening for inhibitors of beta-secretase and EphB4 kinase by fragment-based high-throughput docking. Possible future applications for de novo ligand design are briefly discussed.

  • Critical assessment of the automated AutoDock as a new docking tool for virtual screening.
    Park, Hwangseo and Lee, Jinuk and Lee, Sangyoub
    Proteins, 2006, 65(3), 549-554
    PMID: 16988956     doi: 10.1002/prot.21183
    A major problem in virtual screening concerns the accuracy of the binding free energy between a target protein and a putative ligand. Here we report an example supporting the outperformance of the AutoDock scoring function in virtual screening in comparison to the other popular docking programs. The original AutoDock program is in itself inefficient to be used in virtual screening because the grids of interaction energy have to be calculated for each putative ligand in chemical database. However, the automation of the AutoDock program with the potential grids defined in common for all putative ligands leads to more than twofold increase in the speed of virtual database screening. The utility of the automated AutoDock in virtual screening is further demonstrated by identifying the actual inhibitors of various target enzymes in chemical databases with accuracy higher than the other docking tools including DOCK and FlexX. These results exemplify the usefulness of the automated AutoDock as a new promising tool in structure-based virtual screening.

  • eHITS: An innovative approach to the docking and scoring function problems
    Zsoldos, Zsolt and Reid, Darryl and Simon, Aniko and Sadjad, Bashir S and Johnson, A. Peter
    Current Protein & Peptide Science, 2006, 7(5), 421-435
    PMID: 17073694    
    Virtual Ligand Screening (VLS) has become an integral part of the drug design process for many pharmaceutical companies. In protein structure based VLS the aim is to find a ligand that has a high binding affinity to the target receptor whose 3D structure is known. This review will describe the docking tool eHiTS. eHiTS is an exhaustive and systematic docking tool which contains many automated features that simplify the drug design workflow. A description of the unique docking algorithm and novel approach to scoring used within eHiTS is presented. In addition a validation study is presented that demonstrates the accuracy and wide applicability of eHiTs in re-docking bound ligands into their receptors.

  • MolDock: A new technique for high-accuracy molecular docking
    Thomsen, R and Christensen, MH
    Journal of medicinal chemistry, 2006, 49(11), 3315-3321
    PMID: 16722650     doi: 10.1021/jm051197e
    In this article we introduce a molecular docking algorithm called MolDock. MolDock is based on a new heuristic search algorithm that combines differential evolution with a cavity prediction algorithm. The docking scoring function of MolDock is an extension of the piecewise linear potential (PLP) including new hydrogen bonding and electrostatic terms. To further improve docking accuracy, a re-ranking scoring function is introduced, which identifies the most promising docking solution from the solutions obtained by the docking algorithm. The docking accuracy of MolDock has been evaluated by docking flexible ligands to 77 protein targets. MolDock was able to identify the correct binding mode of 87% of the complexes. In comparison, the accuracy of Glide and Surflex is 82% and 75%, respectively. FlexX obtained 58% and GOLD 78% on subsets containing 76 and 55 cases, respectively.

  • Fully automated flexible docking of ligands into flexible synthetic receptors using forward and inverse docking strategies.
    Kämper, Andreas and Apostolakis, Joannis and Rarey, Matthias and Marian, Christel M and Lengauer, Thomas
    Journal of chemical information and modeling, 2006, 46(2), 903-911
    PMID: 16563022     doi: 10.1021/ci050467z
    The prediction of the structure of host-guest complexes is one of the most challenging problems in supramolecular chemistry. Usual procedures for docking of ligands into receptors do not take full conformational freedom of the host molecule into account. We describe and apply a new docking approach which performs a conformational sampling of the host and then sequentially docks the ligand into all receptor conformers using the incremental construction technique of the FlexX software platform. The applicability of this approach is validated on a set of host-guest complexes with known crystal structure. Moreover, we demonstrate that due to the interchangeability of the roles of host and guest, the docking process can be inverted. In this inverse docking mode, the receptor molecule is docked around its ligand. For all investigated test cases, the predicted structures are in good agreement with the experiment for both normal (forward) and inverse docking. Since the ligand is often smaller than the receptor and, thus, its conformational space is more restricted, the inverse docking approach leads in most cases to considerable speed-up. By having the choice between two alternative docking directions, the application range of the method is significantly extended. Finally, an important result of this study is the suitability of the simple energy function used here for structure prediction of complexes in organic media.

  • Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes.
    Friesner, Richard A and Murphy, Robert B and Repasky, Matthew P and Frye, Leah L and Greenwood, Jeremy R and Halgren, Thomas A and Sanschagrin, Paul C and Mainz, Daniel T
    Journal of medicinal chemistry, 2006, 49(21), 6177-6196
    PMID: 17034125     doi: 10.1021/jm051256o
    A novel scoring function to estimate protein-ligand binding affinities has been developed and implemented as the Glide 4.0 XP scoring function and docking protocol. In addition to unique water desolvation energy terms, protein-ligand structural motifs leading to enhanced binding affinity are included: (1) hydrophobic enclosure where groups of lipophilic ligand atoms are enclosed on opposite faces by lipophilic protein atoms, (2) neutral-neutral single or correlated hydrogen bonds in a hydrophobically enclosed environment, and (3) five categories of charged-charged hydrogen bonds. The XP scoring function and docking protocol have been developed to reproduce experimental binding affinities for a set of 198 complexes (RMSDs of 2.26 and 1.73 kcal/mol over all and well-docked ligands, respectively) and to yield quality enrichments for a set of fifteen screens of pharmaceutical importance. Enrichment results demonstrate the importance of the novel XP molecular recognition and water scoring in separating active and inactive ligands and avoiding false positives.

  • Development and validation of a modular, extensible docking program: DOCK 5.
    Moustakas, Demetri T and Lang, P Therese and Pegg, Scott and Pettersen, Eric and Kuntz, Irwin D and Brooijmans, Natasja and Rizzo, Robert C
    Journal of computer-aided molecular design, 2006, 20(10-11), 601-619
    PMID: 17149653     doi: 10.1007/s10822-006-9060-4
    We report on the development and validation of a new version of DOCK. The algorithm has been rewritten in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools. We validated the sampling algorithm with a test set of 114 protein-ligand complexes. Using an optimized parameter set, we are able to reproduce the crystal ligand pose to within 2 A of the crystal structure for 79% of the test cases using our rigid ligand docking algorithm with an average run time of 1 min per complex and for 72% of the test cases using our flexible ligand docking algorithm with an average run time of 5 min per complex. Finally, we perform an analysis of the docking failures in the test set and determine that the sampling algorithm is generally sufficient for the binding pose prediction problem for up to 7 rotatable bonds; i.e. 99% of the rigid ligand docking cases and 95% of the flexible ligand docking cases are sampled successfully. We point out that success rates could be improved through more advanced modeling of the receptor prior to docking and through improvement of the force field parameters, particularly for structures containing metal-based cofactors.

  • Prediction of Protein−Ligand Interactions. Docking and Scoring: Successes and Gaps
    Leach, Andrew R and Shoichet, Brian K and Peishoff, Catherine E
    Journal of medicinal chemistry, 2006, 49(20), 5851-5855
    doi: 10.1021/jm060999m
    Computational methods have become standard in today's medicinal chemistry tool kit. Like any tool, it is important to periodically evaluate utility and ask how function can be improved. In this section of the Journal, we call attention to the area of calculating molecular interactions, specifically docking, the positioning of a ligand in a protein binding site, and scoring, the quality assessment of docked ligands. As several recent reviews have made clear,1-3 the technology has been productive for both finding and elaborating bioactive molecules. But has docking and scoring delivered on the promises first made over 20 years ago? To consider that question, we follow up on an extensive symposium held in Philadelphia during the 2004 Fall National Meeting of the American Chemistry Society and on subsequent meetings sponsored by the National Institutes of Health (NIH) and the National Institute of Standards and Technology (NIST) in 2005 and 2006 to address the outcomes of the American Chemical Society symposium. Speakers at the symposium were invited to contribute original manuscripts to be published with this overview to highlight the area of docking and scoring and to identify some of the major gaps yet to be addressed.

  • Prediction of Protein−Ligand Interactions. Docking and Scoring: Successes and Gaps
    Leach, Andrew R and Shoichet, Brian K and Peishoff, Catherine E
    Journal of medicinal chemistry, 2006, 49(20), 5851-5855
    doi: 10.1021/jm060999m
    Computational methods have become standard in today's medicinal chemistry tool kit. Like any tool, it is important to periodically evaluate utility and ask how function can be improved. In this section of the Journal, we call attention to the area of calculating molecular interactions, specifically docking, the positioning of a ligand in a protein binding site, and scoring, the quality assessment of docked ligands. As several recent reviews have made clear,1-3 the technology has been productive for both finding and elaborating bioactive molecules. But has docking and scoring delivered on the promises first made over 20 years ago? To consider that question, we follow up on an extensive symposium held in Philadelphia during the 2004 Fall National Meeting of the American Chemistry Society and on subsequent meetings sponsored by the National Institutes of Health (NIH) and the National Institute of Standards and Technology (NIST) in 2005 and 2006 to address the outcomes of the American Chemical Society symposium. Speakers at the symposium were invited to contribute original manuscripts to be published with this overview to highlight the area of docking and scoring and to identify some of the major gaps yet to be addressed.

  • sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank.
    Kellenberger, Esther and Muller, Pascal and Schalon, Claire and Bret, Guillaume and Foata, Nicolas and Rognan, Didier
    Journal of chemical information and modeling, 2006, 46(2), 717-727
    PMID: 16563002     doi: 10.1021/ci050372x
    The sc-PDB is a collection of 6 415 three-dimensional structures of binding sites found in the Protein Data Bank (PDB). Binding sites were extracted from all high-resolution crystal structures in which a complex between a protein cavity and a small-molecular-weight ligand could be identified. Importantly, ligands are considered from a pharmacological and not a structural point of view. Therefore, solvents, detergents, and most metal ions are not stored in the sc-PDB. Ligands are classified into four main categories: nucleotides (< 4-mer), peptides (< 9-mer), cofactors, and organic compounds. The corresponding binding site is formed by all protein residues (including amino acids, cofactors, and important metal ions) with at least one atom within 6.5 angstroms of any ligand atom. The database was carefully annotated by browsing several protein databases (PDB, UniProt, and GO) and storing, for every sc-PDB entry, the following features: protein name, function, source, domain and mutations, ligand name, and structure. The repository of ligands has also been archived by diversity analysis of molecular scaffolds, and several chemoinformatics descriptors were computed to better understand the chemical space covered by stored ligands. The sc-PDB may be used for several purposes: (i) screening a collection of binding sites for predicting the most likely target(s) of any ligand, (ii) analyzing the molecular similarity between different cavities, and (iii) deriving rules that describe the relationship between ligand pharmacophoric points and active-site properties. The database is periodically updated and accessible on the web at


  • In silico prediction of harmful effects triggered by drugs and chemicals.
    Vedani, Angelo and Dobler, Max and Lill, Markus A
    Toxicology and applied pharmacology, 2005, 207(2 Suppl), 398-407
    PMID: 16045954     doi: 10.1016/j.taap.2005.01.055
    While the computer-assisted discovery and optimization of drug candidates based on the known three-dimensional structure of the macromolecular target (structure-based design) or a binding-site surrogate (receptor modeling) is doubtless one of the more potent approaches in rational drug design, the simulation and quantification of side effects triggered by drugs and chemicals are still in their infancy. Major obstacles include the often not available 3D structure of the molecular target, the low specificity of the involved bioregulators and the identification of the controlling metabolic pathways. In the recent past, our laboratory has explored concepts allowing to simulate receptor-mediated toxic phenomena by developing algorithms, allowing to construct realistic 3D binding-site surrogates of receptors known or assumed triggering adverse effects and validating them against large batches of molecular data. The underlying technology (software Quasar and Raptor, respectively) specifically allows for induced fit, solvation phenomena and entropic effects. It has been applied to various systems both of pharmacological and toxicological interest including the neurokinin-1, chemokine-3, bradykinin B(2), steroid, 5 HT(2A), aryl hydrocarbon, estrogen and androgen receptor, respectively. In this account, we describe the design of a virtual laboratory allowing for a reliable estimation of harmful effects triggered by drugs, chemicals and their metabolites in silico. In the recent past, the Biographics Laboratory 3R has compiled a 3D database including the surrogates of three major receptor systems known to mediate adverse effects (the aryl hydrocarbon, the estrogen and the androgen receptor, respectively) and validated them against a total of 345 compounds (drugs, chemicals, toxins) using multidimensional QSAR technologies. Within this pilot project, we could demonstrate that our virtual laboratory is able to both recognize toxic compounds substantially different from those used in the training set as well as to classify harmless compounds as being nontoxic. This suggests that our approach may be used for the prediction of adverse effects of drug molecules and chemicals. It is the aim to provide cost-covering access to this technology-particularly to universities, hospitals and regulatory bodies-as it bears a significant potential to recognize hazardous compounds early in the development process and hence improve resource and waste management as well as reduce animal testing. The Biographics Laboratory 3R is a non-profit-oriented organization aimed at reducing animal experimentation in the biomedical sciences by computational approaches (cf.

  • Validation and use of the MM-PBSA approach for drug discovery.
    Kuhn, Bernd and Gerber, Paul and Schulz-Gasch, Tanja and Stahl, Martin
    Journal of medicinal chemistry, 2005, 48(12), 4040-4048
    PMID: 15943477     doi: 10.1021/jm049081q
    The MM-PBSA approach has become a popular method for calculating binding affinities of biomolecular complexes. Published application examples focus on small test sets and few proteins and, hence, are of limited relevance in assessing the general validity of this method. To further characterize MM-PBSA, we report on a more extensive study involving a large number of ligands and eight different proteins. Our results show that applying the MM-PBSA energy function to a single, relaxed complex structure is an adequate and sometimes more accurate approach than the standard free energy averaging over molecular dynamics snapshots. The use of MM-PBSA on a single structure is shown to be valuable (a) as a postdocking filter in further enriching virtual screening results, (b) as a helpful tool to prioritize de novo design solutions, and (c) for distinguishing between good and weak binders (DeltapIC(50) > or

  • DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction.
    Velec, Hans F G and Gohlke, Holger and Klebe, Gerhard
    Journal of medicinal chemistry, 2005, 48(20), 6296-6303
    PMID: 16190756     doi: 10.1021/jm050436v
    Following the formalism used for the development of the knowledge-based scoring function DrugScore, new distance-dependent pair potentials are obtained from nonbonded interactions in small organic molecule crystal packings. Compared to potentials derived from protein-ligand complexes, the better resolved small molecule structures provide relevant contact data in a more balanced distribution of atom types and produce potentials of superior statistical significance and more detailed shape. Applied to recognizing binding geometries of ligands docked into proteins, this new scoring function (DrugScore(CSD)) ranks the crystal structures of 100 protein-ligand complexes best among up to 100 generated decoy geometries in 77% of all cases. Accepting root-mean-square deviations (rmsd) of up to 2 angstroms from the native pose as well-docked solutions, a correct binding mode is found in 87% of the cases. This translates into an improvement of the new scoring function of 57% with respect to the retrieval of the crystal structure and 20% with respect to the identification of a well-docked ligand pose compared to the original Protein Data Bank-based DrugScore. In the analysis of decoy geometries of cross-docking studies, DrugScore(CSD) shows equivalent or increased performance compared to the original PDB-based DrugScore. Furthermore, DrugScore(CSD) predicts binding affinities convincingly. Reducing the set of docking solutions to examples that deviate increasingly from the native pose results in a loss of performance of DrugScore(CSD). This indicates that a necessary prerequisite to successfully resolving the scoring problem with a more discriminative scoring function is the generation of highly accurate ligand poses, which approximate the native pose to below 1 angstroms rmsd, in a docking run.

  • LigScore: a novel scoring function for predicting binding affinities.
    Krammer, André and Kirchhoff, Paul D and Jiang, X and Venkatachalam, C M and Waldman, Marvin
    Journal of molecular graphics & modelling, 2005, 23(5), 395-407
    PMID: 15781182     doi: 10.1016/j.jmgm.2004.11.007
    We present two new empirical scoring functions, LigScore1 and LigScore2, that attempt to accurately predict the binding affinity between ligand molecules and their protein receptors. The LigScore functions consist of three distinct terms that describe the van der Waals interaction, the polar attraction between the ligand and protein, and the desolvation penalty attributed to the binding of the polar ligand atoms to the protein and vice versa. Utilizing a regression approach on a data set of 118 protein-ligand complexes we have obtained a linear equation, LigScore2, using these three descriptors. LigScore2 has good predictability with regard to experimental pKi values yielding a correlation coefficient, r2), of 0.75 and a standard deviation of 1.04 over the training data set, which consists of a diverse set of proteins that span more than seven protein families.

  • Yucca: an efficient algorithm for small-molecule docking.
    Choi, Vicky
    Chemistry & biodiversity, 2005, 2(11), 1517-1524
    PMID: 17191951     doi: 10.1002/cbdv.200590123
    In this paper, we present a new algorithm, which is based on an efficient heuristic for local search, for rigid protein-small-molecule docking. We tested our algorithm, called Yucca, on the recent 100-complex benchmark, using the conformer generator OMEGA to generate a set of low-energy conformers. The results showed that Yucca is competitive both in terms of algorithm efficiency and docking accuracy.

  • Receptor flexibility in de novo ligand design and docking.
    Alberts, Ian L and Todorov, Nikolay P and Dean, Philip M
    Journal of medicinal chemistry, 2005, 48(21), 6585-6596
    PMID: 16220975     doi: 10.1021/jm050196j
    One of the major problems in computational drug design is incorporation of the intrinsic flexibility of protein binding sites. This is particularly crucial in ligand binding events, when induced fit can lead to protein structure rearrangements. As a consequence of the huge conformational space available to protein structures, receptor flexibility is rarely considered in ligand design procedures. In this work, we present an algorithm for integrating protein binding-site flexibility into de novo ligand design and docking processes. The approach allows dynamic rearrangement of amino acid side chains during the docking and design simulations. The impact of protein conformational flexibility is investigated in the docking of highly active inhibitors in the binding sites of acetylcholinesterase and human collagenase (matrix metalloproteinase-1) and in the design of ligands in the S1' pocket of MMP-1. The results of corresponding simulations for both rigid and flexible binding sites are compared in order to gauge the influence of receptor flexibility in drug discovery protocols.

  • Side-chain flexibility in protein-ligand binding: the minimal rotation hypothesis.
    Zavodszky, Maria I and Kuhn, Leslie A
    Protein science : a publication of the Protein Society, 2005, 14(4), 1104-1114
    PMID: 15772311     doi: 10.1110/ps.041153605
    The goal of this work is to learn from nature about the magnitudes of side-chain motions that occur when proteins bind small organic molecules, and model these motions to improve the prediction of protein-ligand complexes. Following analysis of protein side-chain motions upon ligand binding in 63 complexes, we tested the ability of the docking tool SLIDE to model these motions without being restricted to rotameric transitions or deciding which side chains should be considered as flexible. The model tested is that side-chain conformational changes involving more atoms or larger rotations are likely to be more costly and less prevalent than small motions due to energy barriers between rotamers and the potential of large motions to cause new steric clashes. Accordingly, SLIDE adjusts the protein and ligand side groups as little as necessary to achieve steric complementarity. We tested the hypothesis that small motions are sufficient to achieve good dockings using 63 ligands and the apo structures of 20 different proteins and compared SLIDE side-chain rotations to those experimentally observed. None of these proteins undergoes major main-chain conformational change upon ligand binding, ensuring that side-chain flexibility modeling is not required to compensate for main-chain motions. Although more frugal in the number of side-chain rotations performed, this model substantially mimics the experimentally observed motions. Most side chains do not shift to a new rotamer, and small motions are both necessary and sufficient to predict the correct binding orientation and most protein-ligand interactions for the 20 proteins analyzed.

  • ProPose: steered virtual screening by simultaneous protein-ligand docking and ligand-ligand alignment.
    Seifert, Markus H J
    Journal of chemical information and modeling, 2005, 45(2), 449-460
    PMID: 15807511     doi: 10.1021/ci0496393
    The 'model-free' screening engine ProPose implements a general method for performing simultaneous protein-ligand docking, ligand-ligand alignment, pharmacophore queries-and combinations thereof-in order to incorporate a priori information into screening protocols. In this manuscript we describe a case study on herpes simplex virus thymidine kinase, an important antiviral drug target, where we evaluate different approaches for handling a specific type of a priori information, i.e., multiple target structures. We demonstrate that a simultaneous alignment on two target structures-in conjunction with logic operations on interactions and docking constraints derived from protein structure-is an effective means of (i) improving the enrichment of chemical substructures that are compatible with the a priori known ligands, (ii) ensuring the steric fit into the target protein, and (iii) handling target flexibility. The combination of ligand- and receptor-based methods steers the virtual screening by ranking molecules according to the similarity of their interaction pattern with known ligands, thereby-to some extent-outweighing the deficiencies of simple scoring functions often used in initial virtual screening.

  • Comparison of automated docking programs as virtual screening tools.
    Cummings, Maxwell D and Desjarlais, Renee L and Gibbs, Alan C and Mohan, Venkatraman and Jaeger, Edward P
    Journal of medicinal chemistry, 2005, 48(4), 962-976
    PMID: 15715466     doi: 10.1021/jm049798d
    The performance of several commercially available docking programs is compared in the context of virtual screening. Five different protein targets are used, each with several known ligands. The simulated screening deck comprised 1000 molecules from a cleansed version of the MDL drug data report and 49 known ligands. For many of the known ligands, crystal structures of the relevant protein-ligand complexes were available. We attempted to run experiments with each docking method that were as similar as possible. For a given docking method, hit rates were improved versus what would be expected for random selection for most protein targets. However, the ability to prioritize known ligands on the basis of docking poses that resemble known crystal structures is both method- and target-dependent.

  • MEDock: a web server for efficient prediction of ligand binding sites based on a novel optimization algorithm
    Chang, DTH and Oyang, YJ and Lin, JH
    Nucleic acids research, 2005, 33(Web Server issue), W233-W238
    PMID: 15991337     doi: 10.1093/nar/gki586
    The prediction of ligand binding sites is an essential part of the drug discovery process. Knowing the location of binding sites greatly facilitates the search for hits, the lead optimization process, the design of site-directed mutagenesis experiments and the hunt for structural features that influence the selectivity of binding in order to minimize the drug's adverse effects. However, docking is still the rate-limiting step for such predictions; consequently, much more efficient algorithms are required. In this article, the design of the MEDock web server is described. The goal of this sever is to provide an efficient utility for predicting ligand binding sites. The MEDock web server incorporates a global search strategy that exploits the maximum entropy property of the Gaussian probability distribution in the context of information theory. As a result of the global search strategy, the optimization algorithm incorporated in MEDock is significantly superior when dealing with very rugged energy landscapes, which usually have insurmountable barriers. This article describes four different benchmark cases that span a diverse set of different types of ligand binding interactions. These benchmarks were compared with the use of the Lamarckian genetic algorithm (LGA), which is the major workhorse of the well-known AutoDock program. These results demonstrate that MEDock consistently converged to the correct binding-modes with significantly smaller numbers of energy evaluations than the LGA required. When judged by a threshold of the number of energy evaluations consumed in the docking simulation, MEDock also greatly elevates the rate of accurate predictions for all benchmark cases. MEDock is available at and

  • PatchDock and SymmDock: servers for rigid and symmetric docking.
    Schneidman-Duhovny, Dina and Inbar, Yuval and Nussinov, Ruth and Wolfson, Haim J
    Nucleic acids research, 2005, 33(Web Server issue), W363-7
    PMID: 15980490     doi: 10.1093/nar/gki481
    Here, we describe two freely available web servers for molecular docking. The PatchDock method performs structure prediction of protein-protein and protein-small molecule complexes. The SymmDock method predicts the structure of a homomultimer with cyclic symmetry given the structure of the monomeric unit. The inputs to the servers are either protein PDB codes or uploaded protein structures. The services are available at The methods behind the servers are very efficient, allowing large-scale docking experiments.

  • Evaluation of library ranking efficacy in virtual screening
    Kontoyianni, M and Sokol, GS and McClellan, LM
    Journal of computational chemistry, 2005, 26(1), 11-22
    PMID: 15526325     doi: 10.1002/jcc.20141
    We present the results of a comprehensive study in which we explored how the docking procedure affects the performance of a virtual screening approach. We used four docking engines and applied 10 scoring functions to the top-ranked docking solutions of seeded databases against six target proteins. The scores of the experimental poses were placed within the total set to assess whether the scoring function required an accurate pose to provide the appropriate rank for the seeded compounds. This method allows a direct comparison of library ranking efficacy. Our results indicate that the LigandFit/Ligscore1 and LigandFit/GOLD docking/scoring combinations, and to a lesser degree FlexX/FlexX, Glide/Ligscore1, DOCK/PMF (Tripos implementation), LigandFit1/Ligscore2 and LigandFit/PMF (Tripos implementation) were able to retrieve the highest number of actives at a 10% fraction of the database when all targets were looked upon collectively. We also show that the scoring functions rank the observed binding modes higher than the inaccurate poses provided that the experimental poses are available. This finding stresses the discriminatory ability of the scoring algorithms, when better poses are available, and suggests that the number of false positives can be lowered with conformers closer to bioactive ones. (C) 2004 Wiley Periodicals, Inc.

  • Improved FlexX docking using FlexS-determined base fragment placement.
    Cross, Simon S J
    Journal of chemical information and modeling, 2005, 45(4), 993-1001
    PMID: 16045293     doi: 10.1021/ci050026f
    We report on a novel hybrid FlexX/FlexS docking approach, whereby the base fragment of the test ligand is chosen by FlexS superposition onto a cocrystallized template ligand and then fed into FlexX for the incremental construction of the final solution. The new approach is tested on the diverse 200 protein-ligand complex dataset that has been previously described for FlexX validation. In total, 62.9% of the complexes can be reproduced at rank 1 by our approach, which compares favorably with 46.9% when using FlexX alone. In addition, we report "cross-docking" experiments in which several receptor structures of complexes with identical proteins have been used for docking all cocrystallized ligands of these complexes. The results show that, in almost all cases, the hybrid approach can acceptably dock a ligand into a foreign receptor structure using a different ligand template, can give solutions where FlexX alone fails, and tends to give solutions that are more accurately positioned.

  • Binding mode prediction of cytochrome p450 and thymidine kinase protein-ligand complexes by consideration of water and rescoring in automated docking.
    de Graaf, Chris and Pospisil, Pavel and Pos, Wouter and Folkers, Gerd and Vermeulen, Nico P E
    Journal of medicinal chemistry, 2005, 48(7), 2308-2318
    PMID: 15801824     doi: 10.1021/jm049650u
    The popular docking programs AutoDock, FlexX, and GOLD were used to predict binding modes of ligands in crystallographic complexes including X-ray water molecules or computationally predicted water molecules. Isoenzymes of two different enzyme systems were used, namely cytochromes P450 (n

  • Comparing protein-ligand docking programs is difficult.
    Cole, Jason C and Murray, Christopher W and Nissink, J Willem M and Taylor, Richard D and Taylor, Robin
    Proteins, 2005, 60(3), 325-332
    PMID: 15937897     doi: 10.1002/prot.20497
    There is currently great interest in comparing protein-ligand docking programs. A review of recent comparisons shows that it is difficult to draw conclusions of general applicability. Statistical hypothesis testing is required to ensure that differences in pose-prediction success rates and enrichment rates are significant. Numerical measures such as root-mean-square deviation need careful interpretation and may profitably be supplemented by interaction-based measures and visual inspection of dockings. Test sets must be of appropriate diversity and of good experimental reliability. The effects of crystal-packing interactions may be important. The method used for generating starting ligand geometries and positions may have an appreciable effect on docking results. For fair comparison, programs must be given search problems of equal complexity (e.g. binding-site regions of the same size) and approximately equal time in which to solve them. Comparisons based on rescoring require local optimization of the ligand in the space of the new objective function. Re-implementations of published scoring functions may give significantly different results from the originals. Ostensibly minor details in methodology may have a profound influence on headline success rates.

  • Modeling water molecules in protein-ligand docking using GOLD.
    Verdonk, Marcel L and Chessari, Gianni and Cole, Jason C and Hartshorn, Michael J and Murray, Christopher W and Nissink, J Willem M and Taylor, Richard D and Taylor, Robin
    Journal of medicinal chemistry, 2005, 48(20), 6504-6515
    PMID: 16190776     doi: 10.1021/jm050543p
    We implemented a novel approach to score water mediation and displacement in the protein-ligand docking program GOLD. The method allows water molecules to switch on and off and to rotate around their three principal axes. A constant penalty, sigma(p), representing the loss of rigid-body entropy, is added for water molecules that are switched on, hence rewarding water displacement. We tested the methodology in an extensive validation study. First, sigma(p) is optimized against a training set of 58 protein-ligand complexes. For this training set, our algorithm correctly predicts water mediation/displacement in approximately 92% of the cases. We observed small improvements in the quality of the predicted binding modes for water-mediated complexes. In the second part of this work, an entirely independent set of 225 complexes is used. For this test set, our algorithm correctly predicts water mediation/displacement in approximately 93% of the cases. Improvements in binding mode quality were observed for individual water-mediated complexes.


  • Docking and scoring in virtual screening for drug discovery: methods and applications.
    Kitchen, Douglas B and Decornez, Hélène and Furr, John R and Bajorath, Jürgen
    Nature reviews. Drug discovery, 2004, 3(11), 935-949
    PMID: 15520816     doi: 10.1038/nrd1549
    Computational approaches that 'dock' small molecules into the structures of macromolecular targets and 'score' their potential complementarity to binding sites are widely used in hit identification and lead optimization. Indeed, there are now a number of drugs whose development was heavily influenced by or based on structure-based design and screening strategies, such as HIV protease inhibitors. Nevertheless, there remain significant challenges in the application of these approaches, in particular in relation to current scoring schemes. Here, we review key concepts and specific features of small-molecule-protein docking methods, highlight selected applications and discuss recent advances that aim to address the acknowledged limitations of established approaches.

  • Scoring functions for protein-ligand interactions: a critical perspective
    Schulz-Gasch, T
    Drug Discovery Today: Technologies, 2004, 1(3), 231-239
    Scoring functions play an essential role in structure- based virtual screening. They are required to guide the docking of candidate compounds to structures of receptor binding sites, to select probable binding modes, and to discriminate binders from non-binders. Although many scoring functions have successfully been used to identify novel ligands for a wide variety of targets, much work remains to be done to avoid incorrect prediction of binding modes and high num- bers of false positives. This review gives an overview of the current state of the field and outlines key issues for the further development of scoring functions.

  • Native atom types for knowledge-based potentials: application to binding energy prediction.
    Dominy, Brian N and Shakhnovich, Eugene I
    Journal of medicinal chemistry, 2004, 47(18), 4538-4558
    PMID: 15317465     doi: 10.1021/jm0498046
    Knowledge-based potentials have been found useful in a variety of biophysical studies of macromolecules. Recently, it has also been shown in self-consistent studies that it is possible to extract quantities consistent with pair potentials from model structural databases. In this study, we attempt to extend the results obtained from these self-consistent studies toward the extraction of realistic pair potentials from the Protein Data Bank (PDB). The new method utilizes a clustering approach to define atom types within the PDB consistent with the optimal effective pairwise potential. The method has been integrated into the SMoG drug design package, resulting in an improved approach for the rapid and accurate estimation of binding affinities from structural information. Using this approach, it is possible to generate simple knowledge-based potentials that correlate (R

  • OptiDock: virtual HTS of combinatorial libraries by efficient sampling of binding modes in product space.
    Sprous, Dennis G and Lowis, David R and Leonard, Joseph M and Heritage, Trevor and Burkett, Steven N and Baker, David S and Clark, Robert D
    Journal of combinatorial chemistry, 2004, 6(4), 530-539
    PMID: 15244414     doi: 10.1021/cc034068x
    Products from combinatorial libraries generally share a common core structure that can be exploited to improve the efficiency of virtual high-throughput screening (vHTS). In general, it is more efficient to find a method that scales with the total number of reagents (Sigma growth) rather with the number of products (Pi growth). The OptiDock methodology described herein entails selecting a diverse but representative subset of compounds that span the structural space encompassed by the full library. These compounds are docked individually using the FlexX program (Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. J. Mol. Biol. 1995, 251, 470-489) to define distinct docking modes in terms of reference placements for combinatorial core atoms. Thereafter, substituents in R-cores (consisting of the core structure substituted at a single variation site) are docked, keeping the core atoms fixed at the coordinates dictated by each reference placement. Interaction energies are calculated for each docked R-core with respect to the target protein, and energies for whole compounds are calculated by finding the reference core placement for which the sum of corresponding R-core energies is most negative. The use of diverse whole compounds to define binding modes is a key advantage of the protocol over other combinatorial docking programs. As a result, OptiDock returns better-scoring conformers than does serially applied FlexX. OptiDock is also better able to find a viable docked pose for each library member than are other combinatorial approaches.

  • SDOCKER: a method utilizing existing X-ray structures to improve docking accuracy.
    Wu, Guosheng and Vieth, Michal
    Journal of medicinal chemistry, 2004, 47(12), 3142-3148
    PMID: 15163194     doi: 10.1021/jm040015y
    This paper introduces a new strategy for structure-based drug design that combines high-quality docking with data from existing ligand-protein cocrystal X-ray structures. The main goal of SDOCKER, a new algorithm that implements this strategy, is docking accuracy improvement. In this new paradigm, simulated annealing molecular dynamics is used for conformational sampling and optimization and an additional similarity force is applied on the basis of the positions of ligands from X-ray data that focus the sampling on relevant regions of the active site. Because the structural information from both the ligand and protein active site is included, this approach is more effective in finding the optimal conformation for a ligand-protein complex than the classical docking or similarity overlays. Interestingly, it was found that a 3D similarity-only approach gives comparable docking accuracy to the regular force field approach used in classical docking, given the final structures are minimized in the presence of the protein. The combination of both, as implemented in SDOCKER, is shown here to be more accurate. A significant improvement in docking accuracy has been observed for three different test systems. Specifically an improvement of 10%, 17.5%, and 10% is seen for 37 HIV-1 protease, 32 thrombin, and 23 CDK2 ligands, respectively, compared to docking using the force field alone. In addition, SDOCKER's accuracy performance dependence on the similarity template is discussed. The strategy of utilizing existing ligand X-ray information should prove effective in light of the multitude of structures available from structural genomics approaches.

  • Validation of an empirical RNA-ligand scoring function for fast flexible docking using Ribodock.
    Morley, S David and Afshar, Mohammad
    Journal of computer-aided molecular design, 2004, 18(3), 189-208
    PMID: 15368919    
    We report the design and validation of a fast empirical function for scoring RNA-ligand interactions, and describe its implementation within RiboDock, a virtual screening system for automated flexible docking. Building on well-known protein-ligand scoring function foundations, features were added to describe the interactions of common RNA-binding functional groups that were not handled adequately by conventional terms, to disfavour non-complementary polar contacts, and to control non-specific charged interactions. The results of validation experiments against known structures of RNA-ligand complexes compare favourably with previously reported methods. Binding modes were well predicted in most cases and good discrimination was achieved between native and non-native ligands for each binding site, and between native and non-native binding sites for each ligand. Further evidence of the ability of the method to identify true RNA binders is provided by compound selection ('enrichment factor') experiments based around a series of HIV-1 TAR RNA-binding ligands. Significant enrichment in true binders was achieved amongst high scoring docking hits, even when selection was from a library of structurally related, positively charged molecules. Coupled with a semi-automated cavity detection algorithm for identification of putative ligand binding sites, also described here, the method is suitable for the screening of very large databases of molecules against RNA and RNA-protein interfaces, such as those presented by the bacterial ribosome.

  • Ph4Dock: Pharmacophore-based protein-ligand docking
    Goto, J and Kataoka, R and Hirayama, N
    Journal of medicinal chemistry, 2004, 47(27), 6804-6811
    PMID: 15615529     doi: 10.1021/jm0493818
    The development and validation of the program Ph4Dock is presented. Ph4Dock is a novel automated ligand docking program that makes best use of pharmacophoric features both in a ligand and at concave portions of a protein. By mapping of pharmacophores of the ligand to the pharmacophoric features that represent the concaves of the target protein, Ph4Dock realizes an efficient and accurate prediction of the binding modes between the ligand and the protein. To validate the potential of this unique docking algorithm, we have selected 43 reliable crystal structures of protein-ligand complexes. All of the ligands are druglike, and they are varied in nature. The diffraction-component precision index (DPI) originally used in crystallography was applied in this study in order to evaluate the docking results quantitatively. The root-mean-square deviation (rmsd) between non-hydrogen atoms of the ligand in the prediction and experimental results were analyzed using DPI. The rmsd values for 25 structures, consisting of almost 60% of the dataset, are less than three times of the corresponding DPI values. It means that the precision of docking results obtained by Ph4Dock is mostly equivalent to the experimental error in these cases. The present study has demonstrated that Ph4Dock can accurately reproduce the experimentally determined docking modes if the reliable crystal structures are used. Normally the success rate of the docking is judged using rmsd less than or equal to 2.0 Angstrom as the criterion. The Ph4Dock marked an appreciably good success rate of 86% based on this criterion.

  • HierVLS hierarchical docking protocol for virtual ligand screening of large-molecule databases
    Floriano, WB and Vaidehi, N and Zamanakos, G and Goddard, WA
    Journal of medicinal chemistry, 2004, 47(1), 56-71
    PMID: 14695820     doi: 10.1021/jm030271v
    To provide practical means for rapidly scanning the extensive experimental combinatorial chemistry libraries now available for high-throughput screening (HTS), it is essential to establish computational virtual ligand screening (VLS) techniques to rapidly identify out of a large library all active compounds against a particular protein target. Toward this goal we developed HierVLS, a fast hierarchical docking approach that starts with a coarse grain conformational search over a large number of configurations filtered with a fast but crude energy function, followed by a succession of finer grain levels, using successively more accurate but more expensive descriptions of the ligand-protein-solvent interactions to filter successively fewer cases. The final step of this procedure optimizes one configuration of the ligand in the protein site using our most accurate energy expression and description of the solvent, which would be impractical for all conformations and sites sampled in the coarse level. HierVLS is based on the HierDock approach, but rather than allowing an hour or more to determine the best binding site and energy for each ligands (as in HierDock), we have adapted our procedure so that it can lead to reliable results while using only 4 min (866 MHz Pentium III processor) per ligand. To validate the accuracy for HierVLS to predict the experimentally observed binding conformation, we considered 37 cocrystal structures comprising 11 target proteins. We find that HierVLS identifies the correct binding mode for all 37 cocrystals. In addition, the calculated binding energies correlate well with available experimental binding constants. To validate how well HierVLS can identify the correct ligand in an extensive library of decoys, we considered a library of over 10 000 molecules. HierVLS identifies 26 out of the 37 cases in the top 2% ranked by binding affinity among the 10 037 molecules. The failures result from either metal-containing sites on the protein or water-mediated ligand-protein interactions, which we anticipate can be solved within the constraints of practical VLS. We then applied HierVLS to screen a 55000-compound virtual library against the target protein-tyrosine phosphatase 1B (ptp1b). The top 250 compounds by binding affinity included all six ptp1b cocrystal ligands added to the library plus three other experimentally confirmed binders. The best (top 1) binder is an experimentally confirmed positive. We conclude that HierVLS is useful for selecting leads for a particular target out of large combinatorial databases.

  • GAsDock: a new approach for rapid flexible docking based on an improved multi-population genetic algorithm
    Li, HL and Li, CL and Gui, CS and Luo, XM and Chen, KX and Shen, JH and Wang, XC and Jiang, HL
    Bioorganic & Medicinal Chemistry Letters, 2004, 14(18), 4671-4676
    PMID: 15324886     doi: 10.1016/j.bmcl.2004.06.091
    Based on an improved multi-population genetic algorithm, a new fast flexible docking program, GAsDock, was developed. The docking accuracy, screening efficiency, and docking speed of GAsDock were evaluated by the docking results of thymidine kinase (TK) and HIV-1 reverse transcriptase (RT) enzyme with 10 available inhibitors of each protein and 990 randomly selected ligands. Nine of the ten known inhibitors of TK were accurately docked into the protein active site, the root-mean-square deviation (RMSD) values between the docking and X-ray crystal structures are less than 1.7Angstrom; binding poses (conformation and orientation) of 9 of the 10 known inhibitors of RT were reproduced by GAsDock with RMSD values less than 2.0Angstrom. The docking time is approximately in proportion to the number of rotatable bonds of ligands; GAsDock can finish a docking simulation within 60s for a ligand with no more than 20 rotatable bonds. Results indicate that GAsDock is an accurate and remarkably faster docking program in comparison with other docking programs, which is applausive in the application of virtual screening. (C) 2004 Elsevier Ltd. All rights reserved.

  • Calculation of ligand-nucleic acid binding free energies with the generalized-born model in DOCK.
    Kang, Xinshan and Shafer, Richard H and Kuntz, Irwin D
    Biopolymers, 2004, 73(2), 192-204
    PMID: 14755577     doi: 10.1002/bip.10541
    The calculation of ligand-nucleic acid binding free energies is investigated by including solvation effects computed with the generalized-Born model. Modifications of the solvation module in DOCK, including introduction of all-atom parameters and revision of coefficients in front of different terms, are shown to improve calculations involving nucleic acids. This computing scheme is capable of calculating binding energies, with reasonable accuracy, for a wide variety of DNA-ligand complexes, RNA-ligand complexes, and even for the formation of double-stranded DNA. This implementation of GB/SA is also shown to be capable of discriminating strong ligands from poor ligands for a series of RNA aptamers without sacrificing the high efficiency of the previous implementation. These results validate this approach to screening large databases against nucleic acid targets.

  • Rapid protein-ligand docking using soft modes from molecular dynamics simulations to account for protein deformability: binding of FK506 to FKBP.
    Zacharias, Martin
    Proteins, 2004, 54(4), 759-767
    PMID: 14997571     doi: 10.1002/prot.10637
    Most current docking methods to identify possible ligands and putative binding sites on a receptor molecule assume a rigid receptor structure to allow virtual screening of large ligand databases. However, binding of a ligand can lead to changes in the receptor protein conformation that are sterically necessary to accommodate a bound ligand. An approach is presented that allows relaxation of the protein conformation in precalculated soft flexible degrees of freedom during ligand-receptor docking. For the immunosuppressant FK506-binding protein FKBP, the soft flexible modes are extracted as principal components of motion from a molecular dynamics simulation. A simple penalty function for deformations in the soft flexible mode is used to limit receptor protein deformations during docking that avoids a costly recalculation of the receptor energy by summing over all receptor atom pairs at each step. Rigid docking of the FK506 ligand binding to an unbound FKBP conformation failed to identify a geometry close to experiment as favorable binding site. In contrast, inclusion of the flexible soft modes during systematic docking runs selected a binding geometry close to experiment as lowest energy conformation. This has been achieved at a modest increase of computational cost compared to rigid docking. The approach could provide a computationally efficient way to approximately account for receptor flexibility during docking of large numbers of putative ligands and putative docking geometries.

  • A practical approach to docking of zinc metalloproteinase inhibitors.
    Hu, Xin and Balaz, Stefan and Shelver, William H
    Journal of molecular graphics & modelling, 2004, 22(4), 293-307
    PMID: 15177081     doi: 10.1016/j.jmgm.2003.11.002
    Forty zinc-dependent metalloproteinase/ligand complexes with known crystal structures were re-docked using five docking/scoring approaches (DOCK, FlexX, DrugScore, GOLD, and AutoDock). Correct geometry of the coordination bonds between the ligand's zinc binding group (ZBG) and the catalytic zinc is important for docking accuracy and scoring reliability. More than 75% of docked poses with RMSD less than 2A were found to have appropriate ZBG binding, but for poor ZBG binding, about 95% of poses failed to dock correctly. Elimination of poses with inappropriate zinc binding resulted in better binding energy predictions that were further improved by dividing the ligands into subsets according to the ZBG (carboxylates, hydroxamates, and phosphorus containing groups). After a subset re-scoring using the regression functions obtained for individual subsets, DrugScore was able to explain 77% and the consensus scoring scheme X-CSCORE even 88% of variance in binding energies. The approach combining ZBG-based pose selection and subset re-scoring improved the hit rate in virtual screening for metalloproteinase inhibitors for all tested methods by 4-16%.

  • Evaluation and application of multiple scoring functions for a virtual screening experiment.
    Xing, Li and Hodgkin, Edward and Liu, Qian and Sedlock, David
    Journal of computer-aided molecular design, 2004, 18(5), 333-344
    PMID: 15595460    
    In order to identify novel chemical classes of factor Xa inhibitors, five scoring functions (FlexX, DOCK, GOLD, ChemScore and PMF) were engaged to evaluate the multiple docking poses generated by FlexX. The compound collection was composed of confirmed potent factor Xa inhibitors and a subset of the LeadQuest screening compound library. Except for PMF the other four scoring functions succeeded in reproducing the crystal complex (PDB code: 1FAX). During virtual screening the highest hit rate (80%) was demonstrated by FlexX at an energy cutoff of -40 kJ/mol, which is about 40-fold over random screening (2.06%). Limited results suggest that presenting more poses of a single molecule to the scoring functions could deteriorate their enrichment factors. A series of promising scaffolds with favorable binding scores was retrieved from LeadQuest. Consensus scoring by pair-wise intersection failed to enrich the hit rate yielded by single scorings (i.e. FlexX). We note that reported successes of consensus scoring in hit rate enrichment could be artificial because their comparisons were based on a selected subset of single scoring and a markedly reduced subset of double or triple scoring. The findings presented in this report are based upon a single biological system and support further studies.

  • A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance.
    Perola, Emanuele and Walters, W Patrick and Charifson, Paul S
    Proteins, 2004, 56(2), 235-249
    PMID: 15211508     doi: 10.1002/prot.20088
    A thorough evaluation of some of the most advanced docking and scoring methods currently available is described, and guidelines for the choice of an appropriate protocol for docking and virtual screening are defined. The generation of a large and highly curated test set of pharmaceutically relevant protein-ligand complexes with known binding affinities is described, and three highly regarded docking programs (Glide, GOLD, and ICM) are evaluated on the same set with respect to their ability to reproduce crystallographic binding orientations. Glide correctly identified the crystallographic pose within 2.0 A in 61% of the cases, versus 48% for GOLD and 45% for ICM. In general Glide appears to perform most consistently with respect to diversity of binding sites and ligand flexibility, while the performance of ICM and GOLD is more binding site-dependent and it is significantly poorer when binding is predominantly driven by hydrophobic interactions. The results also show that energy minimization and reranking of the top N poses can be an effective means to overcome some of the limitations of a given docking function. The same docking programs are evaluated in conjunction with three different scoring functions for their ability to discriminate actives from inactives in virtual screening. The evaluation, performed on three different systems (HIV-1 protease, IMPDH, and p38 MAP kinase), confirms that the relative performance of different docking and scoring methods is to some extent binding site-dependent. GlideScore appears to be an effective scoring function for database screening, with consistent performance across several types of binding sites, while ChemScore appears to be most useful in sterically demanding sites since it is more forgiving of repulsive interactions. Energy minimization of docked poses can significantly improve the enrichments in systems with sterically demanding binding sites. Overall Glide appears to be a safe general choice for docking, while the choice of the best scoring tool remains to a larger extent system-dependent and should be evaluated on a case-by-case basis.

  • Comparative evaluation of eight docking tools for docking and virtual screening accuracy.
    Kellenberger, Esther and Rodrigo, Jordi and Muller, Pascal and Rognan, Didier
    Proteins, 2004, 57(2), 225-242
    PMID: 15340911     doi: 10.1002/prot.20149
    Eight docking programs (DOCK, FLEXX, FRED, GLIDE, GOLD, SLIDE, SURFLEX, and QXP) that can be used for either single-ligand docking or database screening have been compared for their propensity to recover the X-ray pose of 100 small-molecular-weight ligands, and for their capacity to discriminate known inhibitors of an enzyme (thymidine kinase) from randomly chosen "drug-like" molecules. Interestingly, both properties are found to be correlated, since the tools showing the best docking accuracy (GLIDE, GOLD, and SURFLEX) are also the most successful in ranking known inhibitors in a virtual screening experiment. Moreover, the current study pinpoints some physicochemical descriptors of either the ligand or its cognate protein-binding site that generally lead to docking/scoring inaccuracies.

  • Evaluation of docking performance: comparative data on docking algorithms.
    Kontoyianni, Maria and McClellan, Laura M and Sokol, Glenn S
    Journal of medicinal chemistry, 2004, 47(3), 558-565
    PMID: 14736237     doi: 10.1021/jm0302997
    Docking molecules into their respective 3D macromolecular targets is a widely used method for lead optimization. However, the best known docking algorithms often fail to position the ligand in an orientation close to the experimental binding mode. It was reported recently that consensus scoring enhances the hit rates in a virtual screening experiment. This methodology focused on the top-ranked pose, with the underlying assumption that the orientation/conformation of the docked compound is the most accurate. In an effort to eliminate the scoring function bias, and assess the ability of the docking algorithms to provide solutions similar to the crystallographic modes, we investigated the most known docking programs and evaluated all of the resultant poses. We present the results of an extensive computational study in which five docking programs (FlexX, DOCK, GOLD, LigandFit, Glide) were investigated against 14 protein families (69 targets). Our findings show that some algorithms perform consistently better than others, and a correspondence between the nature of the active site and the best docking algorithm can be found.

  • Assessment of docking poses: interactions-based accuracy classification (IBAC) versus crystal structure deviations.
    Kroemer, Romano T and Vulpetti, Anna and McDonald, Joseph J and Rohrer, Douglas C and Trosset, Jean-Yves and Giordanetto, Fabrizio and Cotesta, Simona and McMartin, Colin and Kihlén, Mats and Stouten, Pieter F W
    Journal of Chemical Information and Computer Sciences, 2004, 44(3), 871-881
    PMID: 15154752     doi: 10.1021/ci049970m
    Six docking programs (FlexX, GOLD, ICM, LigandFit, the Northwestern University version of DOCK, and QXP) were evaluated in terms of their ability to reproduce experimentally observed binding modes (poses) of small-molecule ligands to macromolecular targets. The accuracy of a pose was assessed in two ways: First, the RMS deviation of the predicted pose from the crystal structure was calculated. Second, the predicted pose was compared to the experimentally observed one regarding the presence of key interactions with the protein. The latter assessment is referred to as interactions-based accuracy classification (IBAC). In a number of cases significant discrepancies were found between IBAC and RMSD-based classifications. Despite being more subjective, the IBAC proved to be a more meaningful measure of docking accuracy in all these cases.

  • Lessons in molecular recognition: the effects of ligand and protein flexibility on molecular docking accuracy.
    Erickson, Jon A and Jalaie, Mehran and Robertson, Daniel H and Lewis, Richard A and Vieth, Michal
    Journal of medicinal chemistry, 2004, 47(1), 45-55
    PMID: 14695819     doi: 10.1021/jm030209y
    The key to success for computational tools used in structure-based drug design is the ability to accurately place or "dock" a ligand in the binding pocket of the target of interest. In this report we examine the effect of several factors on docking accuracy, including ligand and protein flexibility. To examine ligand flexibility in an unbiased fashion, a test set of 41 ligand-protein cocomplex X-ray structures were assembled that represent a diversity of size, flexibility, and polarity with respect to the ligands. Four docking algorithms, DOCK, FlexX, GOLD, and CDOCKER, were applied to the test set, and the results were examined in terms of the ability to reproduce X-ray ligand positions within 2.0A heavy atom root-mean-square deviation. Overall, each method performed well (>50% accuracy) but for all methods it was found that docking accuracy decreased substantially for ligands with eight or more rotatable bonds. Only CDOCKER was able to accurately dock most of those ligands with eight or more rotatable bonds (71% accuracy rate). A second test set of structures was gathered to examine how protein flexibility influences docking accuracy. CDOCKER was applied to X-ray structures of trypsin, thrombin, and HIV-1-protease, using protein structures bound to several ligands and also the unbound (apo) form. Docking experiments of each ligand to one "average" structure and to the apo form were carried out, and the results were compared to docking each ligand back to its originating structure. The results show that docking accuracy falls off dramatically if one uses an average or apo structure. In fact, it is shown that the drop in docking accuracy mirrors the degree to which the protein moves upon ligand binding.

  • Multiple active site corrections for docking and virtual screening.
    Vigers, Guy P A and Rizzi, James P
    Journal of medicinal chemistry, 2004, 47(1), 80-89
    PMID: 14695822     doi: 10.1021/jm030161o
    Several docking programs are now available that can reproduce the bound conformation of a ligand in an active site, for a wide variety of experimentally determined complexes. However, these programs generally perform less well at ranking multiple possible ligands in one site. Since accurate identification of potential ligands is a prerequisite for many aspects of structure-based drug design, this is a serious limitation. We have tested the ability of two docking programs, FlexX and Gold, to match ligands and active sites for multiple complexes. We show that none of the docking scores from either program are able to match consistently ligands and active sites in our tests. We propose a simple statistical correction, the multiple active site correction (MASC), which greatly ameliorates this problem. We have also tested the correction method against an extended set of 63 cocrystals and in a virtual screening experiment. In all cases, MASC significantly improves the results of the docking experiments.

  • Automated docking of highly flexible ligands by genetic algorithms: a critical assessment.
    Cecchini, Marco and Kolb, Peter and Majeux, Nicolas and Caflisch, Amedeo
    Journal of computational chemistry, 2004, 25(3), 412-422
    PMID: 14696075     doi: 10.1002/jcc.10384
    An improved version of the fragment-based flexible ligand docking approach SEED-FFLD is tested on inhibitors of human immunodeficiency virus type 1 protease, human alpha-thrombin and the estrogen receptor beta. The docking results indicate that it is possible to correctly reproduce the binding mode of inhibitors with more than ten rotatable bonds if the strain in their covalent geometry upon binding is not large. A high degree of convergence towards a unique binding mode in multiple runs of the genetic algorithm is proposed as a necessary condition for successful docking.

  • FlexX-Scan: Fast, structure-based virtual screening
    Schellhammer, I and Rarey, M
    Proteins, 2004, 57(3), 504-517
    PMID: 15382244     doi: 10.1002/prot.20217
    We present a new software module, FlexX-Scan, for high-throughput, structure-based virtual screening. FlexX-Scan was developed with the aim to further speed up the virtual screening process. Based on the incremental construction docking tool FlexX (Rarey et al., J Mol Biol 1996;261: 470-489), a compact descriptor for representing favorable protein interaction spots within the protein binding site has been developed. The descriptor is calculated using special-purpose clustering techniques applied to the usual interaction points created by FlexX. The algorithm automatically detects a small set of interaction spots in the binding site for positioning ligand functional groups. The parametrizations of the base placement and incremental construction algorithms have been adapted to the new interaction model. We tested the software tool on a diverse set of 200 protein-ligand complexes from the protein database (PDB) (Kramer et al., Proteins 1999;37:228-241). On average, the algorithm proposes about 90 interaction spots per binding site compared to about 1000 interaction dots in FlexX. We observe that the docking solutions of FlexX-Scan have a root-mean-square deviation from the crystal structure similar to the deviation of docking solutions of standard FlexX. For further validation we also performed virtual screening experiments for cyclin-dependent kinase 2, thrombin, angiotensin-converting enzyme, and dihydrofolat reductase. In these experiments, we screened a set of 34,000 random compounds and a number of known actives for each target. With FlexX-Scan, we achieved comparable enrichments to standard FlexX, with an averaged computing time of 5-10 s per compound, depending on parametrization. (C) 2004 Wiley-Liss, Inc.

  • GEMDOCK: A generic evolutionary method for molecular docking
    Yang, JM and Chen, CC
    Proteins, 2004, 55(2), 288-304
    PMID: 15048822     doi: 10.1002/prot.20035
    We have developed an evolutionary approach for flexible ligand docking. This approval, GEMDOCK, uses a Generic Evolutionary Method for molecular DOCKing and an empirical scoring function. The former combines both discrete and continuous global search strategies with local search strategies to speed up convergence, whereas the latter results in rapid recognition of potential ligands. GEMDOCK was tested on a diverse data set of 100 protein-ligand complexes from the Protein Data Bank. In 79% of these complexes, the docked lowest energy ligand structures had root-mean-square derivations (RMSDs) below 2.0 Angstrom with respect to the corresponding crystal structures. The success rate increased to 85% if the structure water molecules were retained. We evaluated GEMDOCK on two cross-docking experiments in which each ligand of a protein ensemble was docked into each protein of the ensemble. Seventy-six percent of the docked structures had RMSDs below 2.0 Angstrom when the ligands were docked into foreign structures. We analyzed and validated GEMDOCK with respect to various search spaces and scoring functions, and found that if the scoring function was perfect, then the predicted accuracy was also essentially perfect. This study suggests that GEMDOCK is a useful tool for molecular recognition and may be used to systematically evaluate and thus improve scoring functions. (C) 2004 Wiley-Liss, Inc.

  • Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening
    Halgren, TA and Murphy, RB and Friesner, RA and Beard, HS and Frye, LL and Pollard, WT and Banks, JL
    Journal of medicinal chemistry, 2004, 47(7), 1750-1759
    PMID: 15027866     doi: 10.1021/jm030644s
    Glide's ability to identify active compounds in a database screen is characterized by applying Glide to a diverse set of nine protein receptors. In many cases, two, or even three, protein sites are employed to probe the sensitivity of the results to the site geometry. To make the database screens as realistic as possible, the screens use sets of "druglike" decoy ligands that have been selected to be representative of what we believe is likely to be found in the compound collection of a pharmaceutical or biotechnology company. Results are presented for releases 1.8, 2.0, and 2.5 of Glide. The comparisons show that average measures for both "early" and "global" enrichment for Glide 2.5 are 3 times higher than for Glide 1.8 and more than 2 times higher than for Glide 2.0 because of better results for the least well-handled screens. This improvement in enrichment stems largely from the better balance of the more widely parametrized GlideScore 2.5 function and the inclusion of terms that penalize ligand-protein interactions that violate established principles of physical chemistry, particularly as it concerns the exposure to solvent of charged protein and ligand groups. Comparisons to results for the thymidine kinase and estrogen receptors published by Rognan and co-workers (J. Med. Chem. 2000, 43, 4759-4767) show that Glide 2.5 performs better than GOLD 1.1, FlexX 1.8, or DOCK 4.01.

  • Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy
    Friesner, RA and Banks, JL and Murphy, RB and Halgren, TA and Klicic, JJ and Mainz, DT and Repasky, MP and Knoll, EH and Shelley, M and Perry, JK and Shaw, DE and Francis, P and Shenkin, PS
    Journal of medicinal chemistry, 2004, 47(7), 1739-1749
    PMID: 15027865     doi: 10.1021/jm0306430
    Unlike other methods for docking ligands to the rigid 3D structure of a known protein receptor, Glide approximates a complete systematic search of the conformational, orientational, and positional space of the docked ligand. In this search, an initial rough positioning and scoring phase that dramatically narrows the search space is followed by torsionally flexible energy optimization on an OPLS-AA nonbonded potential grid for a few hundred surviving candidate poses. The very best candidates are further refined via a Monte Carlo sampling of pose conformation; in some cases, this is crucial to obtaining an accurate docked pose. Selection of the best docked pose uses a model energy function that combines empirical and force-field-based terms. Docking accuracy is assessed by redocking ligands from 282 cocrystallized PDB complexes starting from conformationally optimized ligand geometries that bear no memory of the correctly docked pose. Errors in geometry for the top-ranked pose are less than 1 Angstrom in nearly half of the cases and are greater than 2 Angstrom in only about one-third of them. Comparisons to published data on rms deviations show that Glide is nearly twice as accurate as GOLD and more than twice as accurate as FlexX for ligands having up to 20 rotatable bonds. Glide is also found to be more accurate than the recently described Surflex method.

  • Virtual screening using protein-ligand docking: Avoiding artificial enrichment
    Verdonk, ML and Berdini, V and Hartshorn, MJ and Mooij, WTM and Murray, CW and Taylor, RD and Watson, P
    Journal of Chemical Information and Computer Sciences, 2004, 44(3), 793-806
    PMID: 15154744     doi: 10.1021/ci034289q
    This study addresses a number of topical issues around the use of protein-ligand docking in virtual screening. We show that, for the validation of such methods, it is key to use focused libraries (containing compounds with one-dimensional properties, similar to the actives), rather than "random" or "drug-like" libraries to test the actives against. We also show that, to obtain good enrichments, the docking program needs to produce reliable binding modes. We demonstrate how pharmacophores can be used to guide the dockings and improve enrichments, and we compare the performance of three consensus-ranking protocols against ranking based on individual scoring functions. Finally, we show that protein-ligand docking can be an effective aid in the screening for weak, fragment-like binders, which has rapidly become a popular strategy for hit identification. All results presented are based on carefully constructed virtual screening experiments against four targets, using the protein-ligand docking program GOLD.

  • Assessing scoring functions for protein-ligand interactions.
    Ferrara, Philippe and Gohlke, Holger and Price, Daniel J and Klebe, Gerhard and Brooks, Charles L
    Journal of medicinal chemistry, 2004, 47(12), 3032-3047
    PMID: 15163185     doi: 10.1021/jm030489h
    An assessment of nine scoring functions commonly applied in docking using a set of 189 protein-ligand complexes is presented. The scoring functions include the CHARMm potential, the scoring function DrugScore, the scoring function used in AutoDock, the three scoring functions implemented in DOCK, as well as three scoring functions implemented in the CScore module in SYBYL (PMF, Gold, ChemScore). We evaluated the abilities of these scoring functions to recognize near-native configurations among a set of decoys and to rank binding affinities. Binding site decoys were generated by molecular dynamics with restraints. To investigate whether the scoring functions can also be applied for binding site detection, decoys on the protein surface were generated. The influence of the assignment of protonation states was probed by either assigning "standard" protonation states to binding site residues or adjusting protonation states according to experimental evidence. The role of solvation models in conjunction with CHARMm was explored in detail. These include a distance-dependent dielectric function, a generalized Born model, and the Poisson equation. We evaluated the effect of using a rigid receptor on the outcome of docking by generating all-pairs decoys ("cross-decoys") for six trypsin and seven HIV-1 protease complexes. The scoring functions perform well to discriminate near-native from misdocked conformations, with CHARMm, DOCK-energy, DrugScore, ChemScore, and AutoDock yielding recognition rates of around 80%. Significant degradation in performance is observed in going from decoy to cross-decoy recognition for CHARMm in the case of HIV-1 protease, whereas DrugScore and ChemScore, as well as CHARMm in the case of trypsin, show only small deterioration. In contrast, the prediction of binding affinities remains problematic for all of the scoring functions. ChemScore gives the highest correlation value with R(2)


  • Molecular recognition and docking algorithms.
    Brooijmans, Natasja and Kuntz, Irwin D
    Annual review of biophysics and biomolecular structure, 2003, 32, 335-373
    PMID: 12574069     doi: 10.1146/annurev.biophys.32.110601.142532
    Molecular docking is an invaluable tool in modern drug discovery. This review focuses on methodological developments relevant to the field of molecular docking. The forces important in molecular recognition are reviewed and followed by a discussion of how different scoring functions account for these forces. More recent applications of computational chemistry tools involve library design and database screening. Last, we summarize several critical methodological issues that must be addressed in future developments.

  • Comparative evaluation of 11 scoring functions for molecular docking.
    Wang, Renxiao and Lu, Yipin and Wang, Shaomeng
    Journal of medicinal chemistry, 2003, 46(12), 2287-2303
    PMID: 12773034     doi: 10.1021/jm0203783
    Eleven popular scoring functions have been tested on 100 protein-ligand complexes to evaluate their abilities to reproduce experimentally determined structures and binding affinities. They include four scoring functions implemented in the LigFit module in Cerius2 (LigScore, PLP, PMF, and LUDI), four scoring functions implemented in the CScore module in SYBYL (F-Score, G-Score, D-Score, and ChemScore), the scoring function implemented in the AutoDock program, and two stand-alone scoring functions (DrugScore and X-Score). These scoring functions are not tested in the context of a particular docking program. Instead, conformational sampling and scoring are separated into two consecutive steps. First, an exhaustive conformational sampling is performed by using the AutoDock program to generate an ensemble of docked conformations for each ligand molecule. This conformational ensemble is required to cover the entire conformational space as much as possible rather than to focus on a few energy minima. Then, each scoring function is applied to score this conformational ensemble to see if it can identify the experimentally observed conformation from all of the other decoys. Among all of the scoring functions under test, six of them, i.e., PLP, F-Score, LigScore, DrugScore, LUDI, and X-Score, yield success rates higher than the AutoDock scoring function. The success rates of these six scoring functions range from 66% to 76% if using root-mean-square deviation < or

  • Virtual screening to enrich hit lists from high-throughput screening: a case study on small-molecule inhibitors of angiogenin.
    Jenkins, Jeremy L and Kao, Richard Y T and Shapiro, Robert
    Proteins, 2003, 50(1), 81-93
    PMID: 12471601     doi: 10.1002/prot.10270
    "Hit lists" generated by high-throughput screening (HTS) typically contain a large percentage of false positives, making follow-up assays necessary to distinguish active from inactive substances. Here we present a method for improving the accuracy of HTS hit lists by computationally based virtual screening (VS) of the corresponding chemical libraries and selecting hits by HTS/VS consensus. This approach was applied in a case study on the target-enzyme angiogenin, a potent inducer of angiogenesis. In conjunction with HTS of the National Cancer Institute Diversity Set and ChemBridge DIVERSet E (approximately 18,000 compounds total), VS was performed with two flexible library docking/scoring methods, DockVision/Ludi and GOLD. Analysis of the results reveals that dramatic enrichment of the HTS hit rate can be achieved by selecting compounds in consensus with one or both of the VS functions. For example, HTS hits ranked in the top 2% by GOLD included 42% of the true hits, but only 8% of the false positives; this represents a sixfold enrichment over the HTS hit rate. Notably, the HTS/VS method was effective in selecting out inhibitors with midmicromolar dissociation constants typical of leads commonly obtained in primary screens.

  • Pharmacophore-based molecular docking to account for ligand flexibility.
    Joseph-McCarthy, Diane and Thomas, Bert E and Belmarsh, Michael and Moustakas, Demetri and Alvarez, Juan C
    Proteins, 2003, 51(2), 172-188
    PMID: 12660987     doi: 10.1002/prot.10266
    Rapid computational mining of large 3D molecular databases is central to generating new drug leads. Accurate virtual screening of large 3D molecular databases requires consideration of the conformational flexibility of the ligand molecules. Ligand flexibility can be included without prohibitively increasing the search time by docking ensembles of precomputed conformers from a conformationally expanded database. A pharmacophore-based docking method whereby conformers of the same or different molecules are overlaid by their largest 3D pharmacophore and simultaneously docked by partial matches to that pharmacophore is presented. The method is implemented in DOCK 4.0.

  • Discovery of a novel family of CDK inhibitors with the program LIDAEUS: structural basis for ligand-induced disordering of the activation loop.
    Wu, Su Ying and McNae, Iain and Kontopidis, George and McClue, Steven J and McInnes, Campbell and Stewart, Kevin J and Wang, Shudong and Zheleva, Daniella I and Marriage, Howard and Lane, David P and Taylor, Paul and Fischer, Peter M and Walkinshaw, Malcolm D
    Structure (London, England : 1993), 2003, 11(4), 399-410
    PMID: 12679018    
    A family of 4-heteroaryl-2-amino-pyrimidine CDK2 inhibitor lead compounds was discovered with the new database-mining program LIDAEUS through in silico screening. Four compounds with IC(50) values ranging from 17 to 0.9 microM were selected for X-ray crystal analysis. Two distinct binding modes are observed, one of which resembles the hydrogen bonding pattern of bound ATP. In the second binding mode, the ligands trigger a conformational change in the activation T loop by inducing movement of Lys(33) and Asp(145) side chains. The family of molecules discovered provides an excellent starting point for the design and synthesis of tight binding inhibitors, which may lead to a new class of antiproliferative drugs.

  • FDS: flexible ligand and receptor docking with a continuum solvent model and soft-core energy function.
    Taylor, Richard D and Jewsbury, Philip J and Essex, Jonathan W
    Journal of computational chemistry, 2003, 24(13), 1637-1656
    PMID: 12926007     doi: 10.1002/jcc.10295
    The docking of flexible small molecule ligands to large flexible protein targets is addressed in this article using a two-stage simulation-based method. The methodology presented is a hybrid approach where the first component is a dock of the ligand to the protein binding site, based on deriving sets of simultaneously satisfied intermolecular hydrogen bonds using graph theory and a recursive distance geometry algorithm. The output structures are reduced in number by cluster analysis based on distance similarities. These structures are submitted to a modified Monte Carlo algorithm using the AMBER-AA molecular mechanics force field with the Generalized Born/Surface Area (GB/SA) continuum model. This solvent model is not only less expensive than an explicit representation, but also yields increased sampling. Sampling is also increased using a rotamer library to direct some of the protein side-chain movements along with large dihedral moves. Finally, a softening function for the nonbonded force field terms is used, enabling the potential energy function to be slowly turned on throughout the course of the simulation. The docking procedure is optimized, and the results are presented for a single complex of the arabinose binding protein. It was found that for a rigid receptor model, the X-ray binding geometry was reproduced and uniquely identified based on the associated potential energy. However, when side-chain flexibility was included, although the X-ray structure was identified, it was one of three possible binding geometries that were energetically indistinguishable. These results suggest that on relaxing the constraint on receptor flexibility, the docking energy hypersurface changes from being funnel-like to rugged. A further 14 complexes were then examined using the optimized protocol. For each complex the docking methodology was tested for a fully flexible ligand, both with and without protein side-chain flexibility. For the rigid protein docking, 13 out of the 15 test cases were able to find the experimental binding mode; this number was reduced to 11 for the flexible protein docking. However, of these 11, in the majority of cases the experimental binding mode was not uniquely identified, but was present in a cluster of low energy structures that were energetically indistinguishable. These results not only support the presence of a rugged docking energy hypersurface, but also suggest that it may be necessary to consider the possibility of more than one binding conformation during ligand optimization.

  • Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine
    Jain, Ajay N
    Journal of medicinal chemistry, 2003, 46(4), 499-511
    doi: 10.1021/jm020406h

  • Comparative study of several algorithms for flexible ligand docking.
    Bursulaya, Badry D and Totrov, Maxim and Abagyan, Ruben and Brooks, Charles L
    Journal of computer-aided molecular design, 2003, 17(11), 755-763
    PMID: 15072435    
    We have performed a comparative assessment of several programs for flexible molecular docking: DOCK 4.0, FlexX 1.8, AutoDock 3.0, GOLD 1.2 and ICM 2.8. This was accomplished using two different studies: docking experiments on a data set of 37 protein-ligand complexes and screening a library containing 10,037 entries against 11 different proteins. The docking accuracy of the methods was judged based on the corresponding rank-one solutions. We have found that the fraction of molecules docked with acceptable accuracy is 0.47, 0.31, 0.35, 0.52 and 0.93 for, respectively, AutoDock, DOCK, FlexX, GOLD and ICM. Thus ICM provided the highest accuracy in ligand docking against these receptors. The results from the other programs are found to be less accurate and of approximately the same quality. A speed comparison demonstrated that FlexX was the fastest and AutoDock was the slowest among the tested docking programs. The database screening was performed using DOCK, FlexX and ICM. ICM was able to identify the original ligands within the top 1% of the total library in 17 cases. The corresponding number for DOCK and FlexX was 7 and 8, respectively. We have estimated that in virtual database screening, 50% of the potentially active compounds will be found among approximately 1.5% of the top scoring solutions found with ICM and among approximately 9% of the top scoring solutions produced by DOCK and FlexX.

  • LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites
    Venkatachalam, CM and Jiang, X and Oldfield, T and Waldman, M
    Journal of molecular graphics & modelling, 2003, 21(4), 289-307
    PMID: 12479928    
    We present a new shape-based method, LigandFit, for accurately docking ligands into protein active sites. The method employs a cavity detection algorithm for detecting invaginations in the protein as candidate active site regions. A shape comparison filter is combined with a Monte Carlo conformational search for generating ligand poses consistent with the active site shape. Candidate poses are minimized in the context of the active site using a grid-based method for evaluating protein-ligand interaction energies. Errors arising from grid interpolation are dramatically reduced using a new non-linear interpolation scheme. Results are presented for 19 diverse protein-ligand complexes. The method appears quite promising, reproducing the X-ray structure ligand pose within an RMS of 2Angstrom in 14 out of the 19 complexes. A high-throughput screening study applied to the thymidine kinase receptor is also presented in which LigandFit, when combined with LigScore, an internally developed scoring function [1], yields very good hit rates for a ligand pool seeded with known actives. (C) 2002 Published by Elsevier Science Inc.

  • Detailed analysis of grid-based molecular docking: A case study of CDOCKER-A CHARMm-based MD docking algorithm.
    Wu, Guosheng and Robertson, Daniel H and Brooks, Charles L and Vieth, Michal
    Journal of computational chemistry, 2003, 24(13), 1549-1562
    PMID: 12925999     doi: 10.1002/jcc.10306
    The influence of various factors on the accuracy of protein-ligand docking is examined. The factors investigated include the role of a grid representation of protein-ligand interactions, the initial ligand conformation and orientation, the sampling rate of the energy hyper-surface, and the final minimization. A representative docking method is used to study these factors, namely, CDOCKER, a molecular dynamics (MD) simulated-annealing-based algorithm. A major emphasis in these studies is to compare the relative performance and accuracy of various grid-based approximations to explicit all-atom force field calculations. In these docking studies, the protein is kept rigid while the ligands are treated as fully flexible and a final minimization step is used to refine the docked poses. A docking success rate of 74% is observed when an explicit all-atom representation of the protein (full force field) is used, while a lower accuracy of 66-76% is observed for grid-based methods. All docking experiments considered a 41-member protein-ligand validation set. A significant improvement in accuracy (76 vs. 66%) for the grid-based docking is achieved if the explicit all-atom force field is used in a final minimization step to refine the docking poses. Statistical analysis shows that even lower-accuracy grid-based energy representations can be effectively used when followed with full force field minimization. The results of these grid-based protocols are statistically indistinguishable from the detailed atomic dockings and provide up to a sixfold reduction in computation time. For the test case examined here, improving the docking accuracy did not necessarily enhance the ability to estimate binding affinities using the docked structures.

  • Gaussian docking functions.
    McGann, Mark R and Almond, Harold R and Nicholls, Anthony and Grant, J Andrew and Brown, Frank K
    Biopolymers, 2003, 68(1), 76-90
    PMID: 12579581     doi: 10.1002/bip.10207
    A shape-based Gaussian docking function is constructed which uses Gaussian functions to represent the shapes of individual atoms. A set of 20 trypsin ligand-protein complexes are drawn from the Protein Data Bank (PDB), the ligands are separated from the proteins, and then are docked back into the active sites using numerical optimization of this function. It is found that by employing this docking function, quasi-Newton optimization is capable of moving ligands great distances [on average 7 A root mean square distance (RMSD)] to locate the correctly docked structure. It is also found that a ligand drawn from one PDB file can be docked into a trypsin structure drawn from any of the trypsin PDB files. This implies that this scoring function is not limited to more accurate x-ray structures, as is the case for many of the conventional docking methods, but could be extended to homology models.

  • Ligand binding: functional site location, similarity and docking
    Campbell, S J and Gold, N D and Jackson, R M
    Current opinion in\ldots}, 2003, 13, 389-395
    ... Similarly, the evidence is that structural (or feature) similarity in the binding sites of proteins will ... The possibility that protein docking methods can also be used for site detection and ... tools for the functional characterisation of ligand-binding sites and for structure- based drug design ...

  • Ligand binding: functional site location, similarity and docking
    Campbell, S J and Gold, N D and Jackson, R M
    Current opinion in\ldots}, 2003, 13, 389-395
    ... Similarly, the evidence is that structural (or feature) similarity in the binding sites of proteins will ... The possibility that protein docking methods can also be used for site detection and ... tools for the functional characterisation of ligand-binding sites and for structure- based drug design ...

  • Improved protein-ligand docking using GOLD.
    Verdonk, Marcel L and Cole, Jason C and Hartshorn, Michael J and Murray, Christopher W and Taylor, Richard D
    Proteins, 2003, 52(4), 609-623
    PMID: 12910460     doi: 10.1002/prot.10465
    The Chemscore function was implemented as a scoring function for the protein-ligand docking program GOLD, and its performance compared to the original Goldscore function and two consensus docking protocols, "Goldscore-CS" and "Chemscore-GS," in terms of docking accuracy, prediction of binding affinities, and speed. In the "Goldscore-CS" protocol, dockings produced with the Goldscore function are scored and ranked with the Chemscore function; in the "Chemscore-GS" protocol, dockings produced with the Chemscore function are scored and ranked with the Goldscore function. Comparisons were made for a "clean" set of 224 protein-ligand complexes, and for two subsets of this set, one for which the ligands are "drug-like," the other for which they are "fragment-like." For "drug-like" and "fragment-like" ligands, the docking accuracies obtained with Chemscore and Goldscore functions are similar. For larger ligands, Goldscore gives superior results. Docking with the Chemscore function is up to three times faster than docking with the Goldscore function. Both combined docking protocols give significant improvements in docking accuracy over the use of the Goldscore or Chemscore function alone. "Goldscore-CS" gives success rates of up to 81% (top-ranked GOLD solution within 2.0 A of the experimental binding mode) for the "clean list," but at the cost of long search times. For most virtual screening applications, "Chemscore-GS" seems optimal; search settings that give docking speeds of around 0.25-1.3 min/compound have success rates of about 78% for "drug-like" compounds and 85% for "fragment-like" compounds. In terms of producing binding energy estimates, the Goldscore function appears to perform better than the Chemscore function and the two consensus protocols, particularly for faster search settings. Even at docking speeds of around 1-2 min/compound, the Goldscore function predicts binding energies with a standard deviation of approximately 10.5 kJ/mol.

  • Automated generation of MCSS-derived pharmacophoric DOCK site points for searching multiconformation databases.
    Joseph-McCarthy, Diane and Alvarez, Juan C
    Proteins, 2003, 51(2), 189-202
    PMID: 12660988     doi: 10.1002/prot.10296
    All docking methods employ some sort of heuristic to orient the ligand molecules into the binding site of the target structure. An automated method, MCSS2SPTS, for generating chemically labeled site points for docking is presented. MCSS2SPTS employs the program Multiple Copy Simultaneous Search (MCSS) to determine target-based theoretical pharmacophores. More specifically, chemically labeled site points are automatically extracted from selected low-energy functional-group minima and clustered together. These pharmacophoric site points can then be directly matched to the pharmacophoric features of database molecules with the use of either DOCK or PhDOCK to place the small molecules into the binding site. Several examples of the ability of MCSS2SPTS to reproduce the three-dimensional pharmacophoric features of ligands from known ligand-protein complex structures are discussed. In addition, a site-point set calculated for one human immunodeficiency virus 1 (HIV1) protease structure is used with PhDOCK to dock a set of HIV1 protease ligands; the docked poses are compared to the corresponding complex structures of the ligands. Finally, the use of an MCSS2SPTS-derived site-point set for acyl carrier protein synthase is compared to the use of atomic positions from a bound ligand as site points for a large-scale DOCK search. In general, MCSS2SPTS-generated site points focus the search on the more relevant areas and thereby allow for more effective sampling of the target site.


  • Virtual screening and fast automated docking methods
    Schneider, Gisbert and Böhm, Hans-Joachim
    Drug discovery today, 2002, 7(1), 64-70
    doi: 10.1016/S1359-6446(01)02091-8
    ... molecules which were identified, optimized or designed using virtual screening methods a. Molecular structure, Activity, Method, Refs. Ca 2+ antagonist (T-channel blocker), Pharmacophore similarity searching, [51]. K + channel (kv 1.5) blocker, Fragment based evolutionary de novo ...

  • Virtual screening and fast automated docking methods
    Schneider, Gisbert and Böhm, Hans-Joachim
    Drug discovery today, 2002, 7(1), 64-70
    doi: 10.1016/S1359-6446(01)02091-8
    ... molecules which were identified, optimized or designed using virtual screening methods a. Molecular structure, Activity, Method, Refs. Ca 2+ antagonist (T-channel blocker), Pharmacophore similarity searching, [51]. K + channel (kv 1.5) blocker, Fragment based evolutionary de novo ...

  • Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors.
    Gohlke, Holger and Klebe, Gerhard
    Angewandte Chemie (International ed. in English), 2002, 41(15), 2644-2676
    PMID: 12203463     doi: 10.1002/1521-3773(20020802)41:15<2644::AID-ANIE2644>3.0.CO;2-O
    The influence of a xenobiotic compound on an organism is usually summarized by the expression biological activity. If a controlled, therapeutically relevant, and regulatory action is observed the compound has potential as a drug, otherwise its toxicity on the biological system is of interest. However, what do we understand by the biological activity? In principle, the overall effect on an organism has to be considered. However, because of the complexity of the interrelated processes involved, as a simplification primarily the "main action" on the organism is taken into consideration. On the molecular level, biological activity corresponds to the binding of a (low-molecular weight) compound to a macromolecular receptor, usually a protein. Enzymatic reactions or signal-transduction cascades are thereby influenced with respect to their function for the organism. We regard this binding as a process under equilibrium conditions; thus, binding can be described as an association or dissociation process. Accordingly, biological activity is expressed as the affinity of both partners for each other, as a thermodynamic equilibrium quantity. How well do we understand these terms and how well are they theoretically predictable today? The holy grail of rational drug design is the prediction of the biological activity of a compound. The processes involving ligand binding are extremely complicated, both ligand and protein are flexible molecules, and the energy inventory between the bound and unbound states must be considered in aqueous solution. How sophisticated and reliable are our experimental approaches to obtaining the necessary insight? The present review summarizes our current understanding of the binding affinity of a small-molecule ligand to a protein. Both theoretical and empirical approaches for predicting binding affinity, starting from the three-dimensional structure of a protein-ligand complex, will be described and compared. Experimental methods, primarily microcalorimetry, will be discussed. As a perspective, our own knowledge-based approach towards affinity prediction and experimental data on factorizing binding contributions to protein-ligand binding will be presented.

  • Further development and validation of empirical scoring functions for structure-based binding affinity prediction
    Wang, R and Lai, L
    Journal of computer-aided molecular design, 2002, 16, 11-26
    PMID: 12197663    
    New empirical scoring functions have been developed to estimate the binding affinity of a given protein-ligand complex with known three-dimensional structure. These scoring functions include terms accounting for van der Waals interaction, hydrogen bonding, deformation penalty, and hydrophobic effect. A special feature is that three different algorithms have been implemented to calculate the hydrophobic effect term, which results in three parallel scoring functions. All three scoring functions are calibrated through multivariate regression analysis of a set of 200 protein-ligand complexes and they reproduce the binding free energies of the entire training set with standard deviations of 2.2 kcal/mol, 2.1 kcal/mol, and 2.0 kcal/mol, respectively. These three scoring functions are further combined into a consensus scoring function, X-CSCORE. When tested on an independent set of 30 protein-ligand complexes, X-CSCORE is able to predict their binding free energies with a standard deviation of 2.2 kcal/mol. The potential application of X-CSCORE to molecular docking is also investigated. Our results show that this consensus scoring function improves the docking accuracy considerably when compared to the conventional force field computation used for molecular docking.

  • Simple, intuitive calculations of free energy of binding for protein-ligand complexes. 1. Models without explicit constrained water.
    Cozzini, Pietro and Fornabaio, Micaela and Marabotti, Anna and Abraham, Donald J and Kellogg, Glen E and Mozzarelli, Andrea
    Journal of medicinal chemistry, 2002, 45(12), 2469-2483
    PMID: 12036355    
    The prediction of the binding affinity between a protein and ligands is one of the most challenging issues for computational biochemistry and drug discovery. While the enthalpic contribution to binding is routinely available with molecular mechanics methods, the entropic contribution is more difficult to estimate. We describe and apply a relatively simple and intuitive calculation procedure for estimating the free energy of binding for 53 protein-ligand complexes formed by 17 proteins of known three-dimensional structure and characterized by different active site polarity. HINT, a software model based on experimental LogP(o/w) values for small organic molecules, was used to evaluate and score all atom-atom hydropathic interactions between the protein and the ligands. These total scores (H(TOTAL)), which have been previously shown to correlate with DeltaG(interaction) for protein-protein interactions, correlate with DeltaG(binding) for protein-ligand complexes in the present study with a standard error of +/-2.6 kcal mol(-1) from the equation DeltaG(binding)

  • Q-fit: A probabilistic method for docking molecular fragments by sampling low energy conformational space
    Jackson, RM
    Journal of computer-aided molecular design, 2002, 16(1), 43-57
    PMID: 12197665    
    A new method is presented that docks molecular fragments to a rigid protein receptor. It uses a probabilistic procedure based on statistical thermodynamic principles to place ligand atom triplets at the lowest energy sites. The probabilistic method ranks receptor binding modes so that the lowest energy ones are sampled first. This allows constraints to be introduced to limit the depth of the search leading to a computationally efficient method of sampling low energy conformational space. This is combined with energy minimization of the initial fragment placement to arrive at a low energy conformation for the molecular fragment. Two different search methods are tested involving (i) geometric hashing and (ii) pose clustering methods. Ten molecular fragments were docked that have commonly been used to test docking methods. The success rate was 8/10 and 10/10 for generating a close solution ranked first using the two different sampling procedures. In general, all five of the top ranked solutions reproduce the observed binding mode, which increases confidence in the predictions. A set of ten molecular fragments that have previously been identified as problematic were docked. Success was achieved in 3/10 and 4/10 using the two different methods. Again there is a high level of agreement between the two methods and again in the successful cases the top ranked solutions are correct whilst in the case of the failures none are. The geometric hashing and pose clustering methods are fast averaging similar to13 and similar to11 s per placement respectively using conservative parameters. The results are very encouraging and will facilitate the process of finding novel small molecule lead compounds by virtual screening of chemical databases.

  • A review of protein-small molecule docking methods.
    Taylor, R D and Jewsbury, P J and Essex, J W
    Journal of computer-aided molecular design, 2002, 16(3), 151-166
    PMID: 12363215    
    The binding of small molecule ligands to large protein targets is central to numerous biological processes. The accurate prediction of the binding modes between the ligand and protein, (the docking problem) is of fundamental importance in modern structure-based drug design. An overview of current docking techniques is presented with a description of applications including single docking experiments and the virtual screening of databases.

  • Flexible docking under pharmacophore type constraints.
    Hindle, Sally A and Rarey, Matthias and Buning, Christian and Lengaue, Thomas
    Journal of computer-aided molecular design, 2002, 16(2), 129-149
    PMID: 12188022    
    FLEXX-PHARM, an extended version of the flexible docking tool FLEXX, allows the incorporation of information about important characteristics of protein-ligand binding modes into a docking calculation. This information is introduced as a simple set of constraints derived from receptor-based type pharmacophore features. The constraints are determined by selected FLEXX interactions and inclusion volumes in the receptor active site. They guide the docking process to produce a set of docking solutions with particular properties. By applying a series of look-ahead checks during the flexible construction of ligand fragments within the active site, FLEXX-PHARM determines which partially built docking solutions can potentially obey the constraints. Solutions that will not obey the constraints are deleted as early as possible, often decreasing the calculation time and enabling new docking solutions to emerge. FLEXX-PHARM was evaluated on various individual protein-ligand complexes where the top docking solutions generated by FLEXX had high root mean square deviations (RMSD) from the experimentally observed binding modes. FLEXX-PHARM showed an improvement in the RMSD of the top solutions in most cases, along with a reduction in run time. We also tested FLEXX-PHARM as a database screening tool on a small dataset of molecules for three target proteins. In two cases, FLEXX-PHARM missed one or two of the active molecules due to the constraints selected. However, in general FLEXX-PHARM maintained or improved the enrichment shown with FLEXX, while completing the screen in considerably less run time.

  • Consensus scoring for ligand/protein interactions.
    Clark, Robert D and Strizhev, Alexander and Leonard, Joseph M and Blake, James F and Matthew, James B
    Journal of molecular graphics & modelling, 2002, 20(4), 281-295
    PMID: 11858637    
    Several different functions have been put forward for evaluating the energetics of ligand binding to proteins. Those employed in the DOCK, GOLD and FlexX docking programs have been especially widely used, particularly in connection with virtual high-throughput screening (vHTS) projects. Until recently, such evaluation functions were usually considered only in conjunction with the docking programs that relied on them. In such studies, the evaluation function in question actually fills two distinct roles: it serves as the objective function being optimized (fitness function), but is also the scoring function used to compare the candidate docking configurations generated by the program. We have used descriptions available in the open literature to create free-standing scoring functions based on those used in DOCK and GOLD, and have implemented the more recently formulated PMF [J. Med. Chem. 42 (1999) 791] scoring function as well. The performance of these functions was examined individually for each of several data sets for which both crystal structures and affinities are available, as was the performance of the FlexX scoring function. Various ways of combining individual scores into a consensus score (CScore) were also considered. The individual and consensus scores were also used to try to pick out configurations most similar to those found in crystal structures from among a set of candidate configurations produced by FlexX docking runs. We find that the reliability and interpretability of results can be improved by combining results from all four functions into a CScore.

  • A new test set for validating predictions of protein-ligand interaction.
    Nissink, J Willem M and Murray, Chris and Hartshorn, Mike and Verdonk, Marcel L and Cole, Jason C and Taylor, Robin
    Proteins, 2002, 49(4), 457-471
    PMID: 12402356     doi: 10.1002/prot.10232
    We present a large test set of protein-ligand complexes for the purpose of validating algorithms that rely on the prediction of protein-ligand interactions. The set consists of 305 complexes with protonation states assigned by manual inspection. The following checks have been carried out to identify unsuitable entries in this set: (1) assessing the involvement of crystallographically related protein units in ligand binding; (2) identification of bad clashes between protein side chains and ligand; and (3) assessment of structural errors, and/or inconsistency of ligand placement with crystal structure electron density. In addition, the set has been pruned to assure diversity in terms of protein-ligand structures, and subsets are supplied for different protein-structure resolution ranges. A classification of the set by protein type is available. As an illustration, validation results are shown for GOLD and SuperStar. GOLD is a program that performs flexible protein-ligand docking, and SuperStar is used for the prediction of favorable interaction sites in proteins. The new CCDC/Astex test set is freely available to the scientific community (

  • Protein-ligand recognition using spherical harmonic molecular surfaces: towards a fast and efficient filter for large virtual throughput screening.
    Cai, Wensheng and Shao, Xueguang and Maigret, Bernard
    Journal of molecular graphics & modelling, 2002, 20(4), 313-328
    PMID: 11858640    
    Molecular surfaces are important because surface-shape complementarity is often a necessary condition in protein-ligand interactions and docking studies. We have previously described a fast and efficient method to obtain triangulated surface-meshes by topologically mapping ellipsoids on molecular surfaces. In this paper, we present an extension of our work to spherical harmonic surfaces in order to approximate molecular surfaces of both ligands and receptor-cavities and to easily check the surface-shape complementarity. The method consists of (1) finding lobes and holes on both ligand and cavity surfaces using contour maps of radius functions with spherical harmonic expansions, (2) superposing the surfaces around a given binding site by minimizing the distance between their respective expansion coefficients. This docking procedure capabilities was demonstrated by application to 35 protein-ligand complexes of known crystal structures. The method can also be easily and efficiently used as a filter to detect in a large conformational sampling the possible conformations presenting good complementarity with the receptor site, and being, therefore, good candidates for further more elaborate docking studies. This "virtual screening" was demonstrated on the platelet thrombin receptor.

  • Automated docking to multiple target structures: incorporation of protein mobility and structural water heterogeneity in AutoDock.
    Osterberg, Fredrik and Morris, Garrett M and Sanner, Michel F and Olson, Arthur J and Goodsell, David S
    Proteins, 2002, 46(1), 34-40
    PMID: 11746701    
    Protein motion and heterogeneity of structural waters are approximated in ligand-docking simulations, using an ensemble of protein structures. Four methods of combining multiple target structures within a single grid-based lookup table of interaction energies are tested. The method is evaluated using complexes of 21 peptidomimetic inhibitors with human immunodeficiency virus type 1 (HIV-1) protease. Several of these structures show motion of an arginine residue, which is essential for binding of large inhibitors. A structural water is also present in 20 of the structures, but it must be absent in the remaining one for proper binding. Mean and minimum methods perform poorly, but two weighted average methods permit consistent and accurate ligand docking, using a single grid representation of the target protein structures.


  • Evaluation of docking functions for protein-ligand docking
    Perez, C and Ortiz, AR
    Journal of medicinal chemistry, 2001, 44(23), 3768-3785
    doi: 10.1021/jm010141r
    Docking functions are believed to be the essential component of docking algorithms. Both physically and statistically based functions have been proposed, but there is no consensus about their relative performances. Here, we propose an evaluation approach based on exhaustive enumeration of all possible docking solutions obtained with a discretized description of a rigid docking process. We apply the approach to study both molecular mechanics and statistical potentials. It is found that the statistical potential evaluated is less effective than the AMBER molecular mechanics function to provide an accurate description of the docking process when the exact experimental coordinates are used. However, when coordinates of crystal structures obtained with analogous ligands are used, similar performances are obtained in both cases. Possible reasons for the successes and failures of both docking schemes have been uncovered using linear discriminant analysis, on the basis of a set of physicochemical descriptors capturing the main physical effects at play during protein-ligand docking. In both types of potentials steric effects appear critical to obtain a successful docking. Our results also indicate that neglecting desolvation effects and the explicit treatment of hydrogen bonds are the main source of the failures observed with the molecular mechanics potential. On the other hand, detailed consideration of steric interactions, with a careful treatment of dispersive forces, seems to be needed when using statistical potentials derived from a structural database. The possibility of filtering combinatorial libraries in order to maximize the probability of correct docking is discussed.

  • Detailed analysis of scoring functions for virtual screening.
    Stahl, M and Rarey, M
    Journal of medicinal chemistry, 2001, 44(7), 1035-1042
    PMID: 11297450    
    We present a comprehensive study of the performance of fast scoring functions for library docking using the program FlexX as the docking engine. Four scoring functions, among them two recently developed knowledge-based potentials, are evaluated on seven target proteins whose binding sites represent a wide range of size, form, and polarity. The results of these calculations give valuable insight into strengths and weaknesses of current scoring functions. Furthermore, it is shown that a well-chosen combination of two of the tested scoring functions leads to a new, robust scoring scheme with superior performance in virtual screening.

  • High throughput docking for library design and library prioritization.
    Diller, D J and Merz, K M
    Proteins, 2001, 43(2), 113-124
    PMID: 11276081    
    The prioritization of the screening of combinatorial libraries is an extremely important task for the rapid identification of tight binding ligands and ultimately pharmaceutical compounds. When structural information for the target is available, molecular docking is an approach that can be used for prioritization. Here, we present the initial validation of a new rapid approach to molecular docking developed for prioritizing combinatorial libraries. The algorithm is tested on 103 individual cases from the protein data bank and in nearly 90% of these cases docks the ligand to within 2.0 A of the observed binding mode. Because the mean CPU time is <5 s/mol, this approach can process hundreds of thousands of compounds per week. Furthermore, if a somewhat less thorough search is performed, the search time drops to 1 s/mol, thus allowing millions of compounds to be docked per week and tested for potential activity. Proteins 2001;43:113-124.

  • Docking ligands onto binding site representations derived from proteins built by homology modelling.
    Schafferhans, A and Klebe, G
    Journal of molecular biology, 2001, 307(1), 407-427
    PMID: 11243828     doi: 10.1006/jmbi.2000.4453
    Due to the abundant sequence information available from genome projects, an increasing number of structurally unknown proteins, homologous to examples of known 3D structure, will be discovered as new targets for drug design. Since homology models do not provide sufficient accuracy to apply common drug design tools, a new approach, DragHome, has been developed to dock ligands into such approximate protein models. DragHome combines information from homology modelling with ligand data, used by and derived from 3D quantitative structure-activity relationships (QSAR). The binding-site of a model-built protein is analysed in terms of putative ligand interaction sites and translated via Gaussian functions into a functional binding-site description represented by physico-chemical properties. Ligands to be docked onto these binding-site representations are similarly translated into a description based on Gaussian functions. The docking is computed by optimising the overlap between the functional description of the binding site and the ligand, generating multiple solutions. For a set of different ligands, these solutions are ranked according to the internal similarity consistance among the various ligands in the binding modes obtained from docking. DragHome has been validated at examples for which crystal structures are available: structurally distinct thrombin inhibitors were docked onto models of thrombin generated from serine proteases of 28 to 40 % sequence identity, yielding ligand binding modes with an average RMS deviation of 1.4 A. Mostly the near-native solutions are ranked best. Molecular flexibility of ligands can be considered in terms of pre-calculated multiple conformers. DragHome has been used to automatically generate an alignment of 88 thrombin inhibitors, for which a significant 3D QSAR model could be derived. The contribution maps resulting from this analysis can be interpreted with respect to the surrounding protein model. They highlight inconsistencies and deficiencies present in the model. In future developments, this information could be fed back into a subsequent modelling step to improve the protein model.

  • EUDOC: A computer program for identification of drug interaction sites in macromolecules and drug leads from chemical databases
    Pang, YP and Perola, E and Xu, K and Prendergast, FG
    Journal of computational chemistry, 2001, 22(15), 1750-1771
    PMID: 12116409     doi: 10.1002/jcc.1129
    The completion of the Human Genome Project, the growing effort on proteomics, and the Structural Genomics Initiative have recently intensified the attention being paid to reliable computer docking programs able to identify molecules that can affect the function of a macromolecule through molecular complexation. We report herein an automated computer docking program, EUDOC, for prediction of ligand-receptor complexes from 3D receptor structures, including metalloproteins, and for identification of a subset enriched in drug leads from chemical databases. This program was evaluated from the standpoints of force field and sampling issues using 154 experimentally determined ligand-receptor complexes and four "real-life" applications of the EUDOC program. The results provide evidence for the reliability and accuracy of the EUDOC program. In addition, key principles underlying molecular recognition, and the effects of structural water molecules in the active site and different atomic charge models on docking results are discussed. (C) 2001 John Wiley & Sons, Inc.

  • FlexE: efficient molecular docking considering protein structure variations.
    Claussen, H and Buning, C and Rarey, M and Lengauer, T
    Journal of molecular biology, 2001, 308(2), 377-395
    PMID: 11327774     doi: 10.1006/jmbi.2001.4551
    Side-chain or even backbone adjustments upon docking of different ligands to the same protein structure, a phenomenon known as induced fit, are frequently observed. Sometimes point mutations within the active site influence the ligand binding of proteins. Furthermore, for homology derived protein structures there are often ambiguities in side-chain placement and uncertainties in loop modeling which may be critical for docking applications. Nevertheless, only very few molecular docking approaches have taken into account such variations in protein structures. We present the new software tool FlexE which addresses the problem of protein structure variations during docking calculations. FlexE can dock flexible ligands into an ensemble of protein structures which represents the flexibility, point mutations, or alternative models of a protein. The FlexE approach is based on a united protein description generated from the superimposed structures of the ensemble. For varying parts of the protein, discrete alternative conformations are explicitly taken into account, which can be combinatorially joined to create new valid protein structures.FlexE was evaluated using ten protein structure ensembles containing 105 crystal structures from the PDB and one modeled structure with 60 ligands in total. For 50 ligands (83 %) FlexE finds a placement with an RMSD to the crystal structure below 2.0 A. In all cases our results are of similar quality to the best solution obtained by sequentially docking the ligands into all protein structures (cross docking). In most cases the computing time is significantly lower than the accumulated run times for the single structures. FlexE takes about five and a half minutes on average for placing one ligand into the united protein description on a common workstation. The example of the aldose reductase demonstrates the necessity of considering protein structure variations for docking calculations. We docked three potent inhibitors into four protein structures with substantial conformational changes within the active site. Using only one rigid protein structure for screening would have missed potential inhibitors whereas all inhibitors can be docked taking all protein structures into account.

  • DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases.
    Ewing, T J and Makino, S and Skillman, A G and Kuntz, I D
    Journal of computer-aided molecular design, 2001, 15(5), 411-428
    PMID: 11394736    
    In this paper we describe the search strategies developed for docking flexible molecules to macomolecular sites that are incorporated into the widely distributed DOCK software, version 4.0. The search strategies include incremental construction and random conformation search and utilize the existing Coulombic and Lennard-Jones grid-based scoring function. The incremental construction strategy is tested with a panel of 15 crystallographic testcases, created from 12 unique complexes whose ligands vary in size and flexibility. For all testcases, at least one docked position is generated within 2 A of the crystallographic position. For 7 of 15 testcases, the top scoring position is also within 2 A of the crystallographic position. The algorithm is fast enough to successfully dock a few testcases within seconds and most within 100 s. The incremental construction and the random search strategy are evaluated as database docking techniques with a database of 51 molecules docked to two of the crystallographic testcases. Incremental construction outperforms random search and is fast enough to reliably rank the database of compounds within 15 s per molecule on an SGI R10000 cpu.


  • A method for including protein flexibility in protein-ligand docking: improving tools for database mining and virtual screening.
    Broughton, H B
    Journal of molecular graphics & modelling, 2000, 18(3), 247-57, 302-4
    PMID: 11021541    
    Second-generation methods for docking ligands into their biological receptors, such as FLOG, provide for flexibility of the ligand but not of the receptor. Molecular dynamics based methods, such as free energy perturbation, account for flexibility, solvent effects, etc., but are very time consuming. We combined the use of statistical analysis of conformational samples from short-run protein molecular dynamics with grid-based docking protocols and demonstrated improved performance in two test cases. Our statistical analysis explores the importance of the average strength of a potential interaction with the biological target and optionally applies a weighting depending on the variability in the strength of the interaction seen during dynamics simulation. Using these methods, we improved the num-top-ranked 10% of a database of drug-like molecules, in searches based on the three-dimensional structure of the protein. These methods are able to match the ability of manual docking to assess likely inactivity on steric grounds and indeed to rank order ligands from a homologous series of cyclooxygenase-2 inhibitors with good correlation to their true activity. Furthermore, these methods reduce the need for human intervention in setting up molecular docking experiments.

  • Knowledge-based scoring function to predict protein-ligand interactions.
    Gohlke, H and Hendlich, M and Klebe, G
    Journal of molecular biology, 2000, 295(2), 337-356
    PMID: 10623530     doi: 10.1006/jmbi.1999.3371
    The development and validation of a new knowledge-based scoring function (DrugScore) to describe the binding geometry of ligands in proteins is presented. It discriminates efficiently between well-docked ligand binding modes (root-mean-square deviation <2.0 A with respect to a crystallographically determined reference complex) and those largely deviating from the native structure, e.g. generated by computer docking programs. Structural information is extracted from crystallographically determined protein-ligand complexes using ReLiBase and converted into distance-dependent pair-preferences and solvent-accessible surface (SAS) dependent singlet preferences for protein and ligand atoms. Definition of an appropriate reference state and accounting for inaccuracies inherently present in experimental data is required to achieve good predictive power. The sum of the pair preferences and the singlet preferences is calculated based on the 3D structure of protein-ligand binding modes generated by docking tools. For two test sets of 91 and 68 protein-ligand complexes, taken from the Protein Data Bank (PDB), the calculated score recognizes poses generated by FlexX deviating <2 A from the crystal structure on rank 1 in three quarters of all possible cases. Compared to FlexX, this is a substantial improvement. For ligand geometries generated by DOCK, DrugScore is superior to the "chemical scoring" implemented into this tool, while comparable results are obtained using the "energy scoring" in DOCK. None of the presently known scoring functions achieves comparable power to extract binding modes in agreement with experiment. It is fast to compute, regards implicitly solvation and entropy contributions and produces correctly the geometry of directional interactions. Small deviations in the 3D structure are tolerated and, since only contacts to non-hydrogen atoms are regarded, it is independent from assumptions of protonation states.

  • Similarity-driven flexible ligand docking.
    Fradera, X and Knegtel, R M and Mestres, J
    Proteins, 2000, 40(4), 623-636
    PMID: 10899786    
    A similarity-driven approach to flexible ligand docking is presented. Given a reference ligand or a pharmacophore positioned in the protein active site, the method allows inclusion of a similarity term during docking. Two different algorithms have been implemented, namely, a similarity-penalized docking (SP-DOCK) and a similarity-guided docking (SG-DOCK). The basic idea is to maximally exploit the structural information about the ligand binding mode present in cases where ligand-bound protein structures are available, information that is usually ignored in standard docking procedures. SP-DOCK and SG-DOCK have been derived as modified versions of the program DOCK 4.0, where the similarity program MIMIC acts as a module for the calculation of similarity indices that correct docking energy scores at certain steps of the calculation. SP-DOCK applies similarity corrections to the set of ligand orientations at the end of the ligand incremental construction process, penalizing the docking energy and, thus, having only an effect on the relative ordering of the final solutions. SG-DOCK applies similarity corrections throughout the entire ligand incremental construction process, thus affecting not only the relative ordering of solutions but also actively guiding the ligand docking. The performance of SP-DOCK and SG-DOCK for binding mode assessment and molecular database screening is discussed. When applied to a set of 32 thrombin ligands for which crystal structures are available, SG-DOCK improves the average RMSD by ca. 1 A when compared with DOCK. When those 32 thrombin ligands are included into a set of 1,000 diverse molecules from the ACD, DIV, and WDI databases, SP-DOCK significantly improves the retrieval of thrombin ligands within the first 10% of each of the three databases with respect to DOCK, with minimal additional computational cost. In all cases, comparison of SP-DOCK and SG-DOCK results with those obtained by DOCK and MIMIC is performed.

  • DoMCoSAR: a novel approach for establishing the docking mode that is consistent with the structure-activity relationship. Application to HIV-1 protease inhibitors and VEGF receptor tyrosine kinase inhibitors.
    Vieth, M and Cummins, D J
    Journal of medicinal chemistry, 2000, 43(16), 3020-3032
    PMID: 10956210    
    DoMCoSAR is a novel approach for statistically determining the docking mode that is consistent with a structure-activity relationship. The approach establishes the binding mode for the compounds in a chemical series with the assumption that all molecules exhibit the same binding mode. It involves three stages. In the first stage all molecules that belong to a given chemical series are docked to the active site of the protein target. The only bias used in the docking at this stage involves the location of the protein binding site. Coordinates of the common substructure (CS) that results from the unbiased docking are then clustered to establish the major substructure docking modes. In the second stage all molecules are docked to the major docking modes (MDMs) with constraints based on the common substructure. The third stage generates, for the major docking modes, interaction-based descriptors that include electrostatic, VDW, strain, and solvation contributions. The problem of docking mode evaluation is now reduced to the question of which descriptor set is more predictive. To establish a quantitative comparison of the descriptor sets associated with the major docking modes, we use 50 instances of random 4-fold cross-validation. For each 4-fold cross-validation the predictive squared correlation coefficient (R(2)) is computed. t-Tests are applied to establish significance of the differences in mean R(2) for one docking mode versus another. We test the methodology on two test cases: HIV-1 protease inhibitors (Holloway et al. J. Med. Chem. 1995, 38, 305-317) and vascular endothelial growth factor (VEGF) receptor tyrosine kinase oxoindoles (Sun et al. J. Med. Chem. 1998, 41, 2588-2603). For both test cases there is statistically significant preference for the binding mode consistent with the X-ray structure. The appeal of this methodology is that researchers gain the objectivity of statistical justification for the selected docking mode. The methodology is relatively insensitive to subtle variations of the protein structure that include, but are not limited to, side chain and small backbone rearrangement during binding. In addition, predictive models that result from the approach can be used to further optimize chemical series.

  • DARWIN: a program for docking flexible molecules.
    Taylor, J S and Burnett, R M
    Proteins, 2000, 41(2), 173-191
    PMID: 10966571    
    A new program named "DARWIN" has been developed to perform docking calculations with proteins and other biological molecules. The program uses the Genetic Algorithm to optimize the molecule's conformation and orientation under the selective pressure of minimizing the potential energy of the complex. A unique feature of DARWIN is that it communicates with the molecular mechanics program CHARMM to make the energy calculations. A second important feature is its parallel interface, which allows simultaneous use of multiple stand-alone copies of CHARMM to rapidly evaluate large numbers of potential solutions. This permits an "accuracy first" approach to docking, which avoids many of the common assumptions and shortcuts often made to reduce computation time. The method was applied to three protein-carbohydrate complexes: the crystallographically determined structures of Concanavalin A and Fab Se155-4; and a model structure for Fab ME36.1. Conformations close to the crystal structures were obtained with this approach, but some "false positive" solutions were also selected. Many of these could be eliminated by introducing different methods for simulating solvent effects. An effective screening method for docking a database of compounds to a single target enzyme using DARWIN is also presented.

  • Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations.
    Bissantz, C and Folkers, G and Rognan, D
    Journal of medicinal chemistry, 2000, 43(25), 4759-4767
    PMID: 11123984    
    Three different database docking programs (Dock, FlexX, Gold) have been used in combination with seven scoring functions (Chemscore, Dock, FlexX, Fresno, Gold, Pmf, Score) to assess the accuracy of virtual screening methods against two protein targets (thymidine kinase, estrogen receptor) of known three-dimensional structure. For both targets, it was generally possible to discriminate about 7 out of 10 true hits from a random database of 990 ligands. The use of consensus lists common to two or three scoring functions clearly enhances hit rates among the top 5% scorers from 10% (single scoring) to 25-40% (double scoring) and up to 65-70% (triple scoring). However, in all tested cases, no clear relationships could be found between docking and ranking accuracies. Moreover, predicting the absolute binding free energy of true hits was not possible whatever docking accuracy was achieved and scoring function used. As the best docking/consensus scoring combination varies with the selected target and the physicochemistry of target-ligand interactions, we propose a two-step protocol for screening large databases: (i) screening of a reduced dataset containing a few known ligands for deriving the optimal docking/consensus scoring scheme, (ii) applying the latter parameters to the screening of the entire database.

  • Similarity‐driven flexible ligand docking
    Fradera, X and Knegtel, R and Mestres, J
    Proteins: Structure, 2000, 40(4), 623-636
    Abstract A similarity -driven approach to flexible ligand docking is presented. Given a reference ligand or a pharmacophore positioned in the protein active site, the method allows inclusion of a similarity term during docking . Two different algorithms have been ...

  • Similarity‐driven flexible ligand docking
    Fradera, X and Knegtel, R and Mestres, J
    Proteins: Structure, 2000, 40(4), 623-636
    Abstract A similarity -driven approach to flexible ligand docking is presented. Given a reference ligand or a pharmacophore positioned in the protein active site, the method allows inclusion of a similarity term during docking . Two different algorithms have been ...


  • A general and fast scoring function for protein-ligand interactions: a simplified potential approach.
    Muegge, I and Martin, Y C
    Journal of medicinal chemistry, 1999, 42(5), 791-804
    PMID: 10072678     doi: 10.1021/jm980536j
    A fast, simplified potential-based approach is presented that estimates the protein-ligand binding affinity based on the given 3D structure of a protein-ligand complex. This general, knowledge-based approach exploits structural information of known protein-ligand complexes extracted from the Brookhaven Protein Data Bank and converts it into distance-dependent Helmholtz free interaction energies of protein-ligand atom pairs (potentials of mean force, PMF). The definition of an appropriate reference state and the introduction of a correction term accounting for the volume taken by the ligand were found to be crucial for deriving the relevant interaction potentials that treat solvation and entropic contributions implicitly. A significant correlation between experimental binding affinities and computed score was found for sets of diverse protein-ligand complexes and for sets of different ligands bound to the same target. For 77 protein-ligand complexes taken from the Brookhaven Protein Data Bank, the calculated score showed a standard deviation from observed binding affinities of 1.8 log Ki units and an R2 value of 0.61. The best results were obtained for the subset of 16 serine protease complexes with a standard deviation of 1.0 log Ki unit and an R2 value of 0.86. A set of 33 inhibitors modeled into a crystal structure of HIV-1 protease yielded a standard deviation of 0.8 log Ki units from measured inhibition constants and an R2 value of 0.74. In contrast to empirical scoring functions that show similar or sometimes better correlation with observed binding affinities, our method does not involve deriving specific parameters that fit the observed binding affinities of protein-ligand complexes of a given training set. We compared the performance of the PMF score, Böhm's score (LUDI), and the SMOG score for eight different test sets of protein-ligand complexes. It was found that for the majority of test sets the PMF score performs best. The strength of the new approach presented here lies in its generality as no knowledge about measured binding affinities is needed to derive atomic interaction potentials. The use of the new scoring function in docking studies is outlined.

  • BLEEP - potential of mean force describing protein-ligand interactions: II. Calculation of binding energies and comparison with experimental data
    Alex, A and Forster, MJ and Thornton, JM
    Journal of computational chemistry, 1999, 20(11), 1177-1185
    We have developed BLEEP\v Z}biomolecular ligand energy evaluation protocol., an atomic level potential of mean force\v Z}PMF. describing protein􏱌ligand interactions. Here, we present four tests designed to assess different attributes of BLEEP. Calculating the energy of a small hydrogen-bonded complex allows us to compare BLEEP's description of this system with a quantum-chemical description. The results suggest that BLEEP gives an adequate description of hydrogen bonding. A study of the relative energies of various heparin binding geometries for human basic fibroblast growth factor\v Z}bFGF. demonstrates that BLEEP performs excellently in identifying low-energy binding modes from decoy conformations for a given protein􏱌ligand complex. We also calculate binding energies for a set of 90 protein􏱌ligand complexes, obtaining a correlation coefficient of 0.74 when compared with experiment. This shows that BLEEP can perform well in the difficult area of ranking the interaction energies of diverse complexes. We also study a set of nine serine proteinase􏱌inhibitor complexes; BLEEP's good performance here illustrates its ability to determine the relative energies of a series of similar complexes. We find that a protocol for incorporating solvation does not improve correlation with experiment.

  • The sensitivity of the results of molecular docking to induced fit effects: application to thrombin, thermolysin and neuraminidase.
    Murray, C W and Baxter, C A and Frenkel, A D
    Journal of computer-aided molecular design, 1999, 13(6), 547-562
    PMID: 10584214    
    This paper describes the application of PRO_LEADS to the flexible docking of ligands into crystallographically derived enzyme structures that are assumed to be rigid. PRO_LEADS uses a Tabu search methodology to perform the flexible search and an empirically derived estimate of the binding affinity to drive the docking process. The paper tests the extent to which the assumption of a rigid enzyme compromises the accuracy of the results. All-pairs docking experiments are performed for three enzymes (thrombin, thermolysin and influenza virus neuraminidase) based on six or more ligand-enzyme crystal structures for each enzyme. In 76% of the cases, PRO_LEADS can successfully identify the correct ligand conformation as the lowest energy configuration when the enzyme structure is derived from that ligand's crystal structure, but the methodology only docks 49% of the cases successfully when the ligand is docked against enzyme crystal structures derived from other ligands. Small movements in the enzyme structure lead to an under-prediction in the energy of the correct binding mode by up to 14 kJ/mol and in some cases this under-prediction can lead to the native mode not being recognised as the lowest energy solution. The type of movements responsible for mis-docking are: the movement of sidechains as a result of changes in C alpha position; the movement of sidechains without changes in C alpha position; the movement of flexible portions of main chains to facilitate the formation of hydrogen bonds; and the movement of metal atoms bound to the enzyme active site. The work illustrates that the assumption of a rigid active site can lead to errors in identification of the correct binding mode and the assessment of binding affinity, even for enzymes which show relatively small shift in atomic positions from one ligand to the next. A good docking code, such as PRO_LEADS, can usually dock successfully if there is induced fit in relatively rigid enzymes but there remains the need to develop improved strategies for dealing with enzyme flexibility. The work implies that treatments of enzyme flexibility which focus only on sidechain rotations will not deal with the critical shifts responsible for mis-docking of ligands in thrombin, thermolysin and neuraminidase. The paper demonstrates the utility of all pairs docking experiments as a method of assessing the effectiveness of docking methodologies in dealing with enzyme flexibility.

  • PRODOCK: Software package for protein modeling and docking
    Trosset, JY and Scheraga, HA
    Journal of computational chemistry, 1999, 20(4), 412-427
    A new software package, PRODOCK, for protein modeling and flexible docking is presented. The protein system is described in internal coordinates with an arbitrary level of flexibility for the proteins or ligands. The protein is represented by an all-atom model with the ECEPP/3 or AMBER IV force field, depending on whether the Ligand is a peptidic molecule or not. PRODOCK is based on a new residue data dictionary that makes the programming easier and the definition of molecular flexibility more straigthforward. Two versions of the dictionary have been constructed for the ECEPP/3 and AMBER IV geometry, respectively. The global optimization of the energy function is carried out with the scaled collective variable Monte Carlo method plus energy minimization. The incorporation of a local minimization during the conformational sampling has been shown to be very important for distinguishing low-energy normative conformations from native structures. To make the Monte Carlo minimization method efficient for docking, a new grid-based energy evaluation technique using Bezier splines has been incorporated. This article includes some techniques and simulation tools that significantly improve the efficiency of flexible docking simulations, in particular forward/backward polypeptide chain generation. A comparative study to illustrate the advantage of using quaternions over Euler angles for the rigid-body rotational variables is presented in this paper. Several applications of the program PRODOCK are also discussed. (C) 1999 John Wiley & Sons, Inc.

  • The particle concept: placing discrete water molecules during protein-ligand docking predictions.
    Rarey, M and Kramer, B and Lengauer, T
    Proteins, 1999, 34(1), 17-28
    PMID: 10336380    
    Water is known to play a significant role in the formation of protein-ligand complexes. In this paper, we focus on the influence of water molecules on the structure of protein-ligand complexes. We present an algorithmic approach, called the particle concept, for integrating the placement of single water molecules in the docking algorithm of FLEXX. FLEXX is an incremental construction approach to ligand docking consisting of three phases: the selection of base fragments, the placement of the base fragments, and the incremental reconstruction of the ligand inside the active site of a protein. The goal of the extension is to find water molecules at favorable places in the protein-ligand interface which may guide the placement of the ligand. In a preprocessing phase, favorable positions of water molecules inside the active site are calculated and stored in a list of possible water positions. During the incremental construction phase, water molecules are placed at the precomputed positions if they can form additional hydrogen bonds to the ligand. Steric constraints resulting from the water molecules as well as the geometry of the hydrogen bonds are used to optimize the ligand orientation in the active site during the reconstruction process. We have tested the particle concept on a series of 200 protein-ligand complexes. Although the average improvement of the prediction results is minor, we were able to predict water molecules between the protein and the ligand correctly in several cases. For instance in the case of HIV-1 protease, where a single water molecule between the protein and the ligand is known to be of importance in complex formation, significant improvements can be achieved.

  • MCDOCK: A Monte Carlo simulation approach to the molecular docking problem
    Liu, M and Wang, SM
    Journal of computer-aided molecular design, 1999, 13(5), 435-451
    PMID: 10483527    
    Prediction of the binding mode of a ligand (a drug molecule) to its macromolecular receptor, or molecular docking, is an important problem in rational drug design. We have developed a new docking method in which a non-conventional Monte Carlo (MC) simulation technique is employed. A computer program, MCDOCK, was developed to carry out the molecular docking operation automatically. The current version of the MCDOCK program (version 1.0) allows for the full flexibility of ligands in the docking calculations. The scoring function used in MCDOCK is the sum of the interaction energy between the ligand and its receptor, and the conformational energy of the ligand. To validate the MCDOCK method, 19 small ligands, the binding modes of which had been determined experimentally using X-ray diffraction, were docked into their receptor binding sites. To produce statistically significant results, 20 MCDOCK runs were performed for each protein-ligand complex. It was found that a significant percentage of these MCDOCK runs converge to the experimentally observed binding mode. The root-mean-square (rms) of all non-hydrogen atoms of the ligand between the predicted and experimental binding modes ranges from 0.25 to 1.84 Angstrom for these 19 cases. The computational time for each run on an SGI Indigo2/R10000 varies from less than 1 min to 15 min, depending upon the size and the flexibility of the ligands. Thus MCDOCK may be used to predict the precise binding mode of ligands in lead optimization and to discover novel lead compounds through structure-based database searching.

  • Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins.
    Charifson, P S and Corkery, J J and Murcko, M A and Walters, W P
    Journal of medicinal chemistry, 1999, 42(25), 5100-5109
    PMID: 10602695    
    We present the results of an extensive computational study in which we show that combining scoring functions in an intersection-based consensus approach results in an enhancement in the ability to discriminate between active and inactive enzyme inhibitors. This is illustrated in the context of docking collections of three-dimensional structures into three different enzymes of pharmaceutical interest: p38 MAP kinase, inosine monophosphate dehydrogenase, and HIV protease. An analysis of two different docking methods and thirteen scoring functions provides insights into which functions perform well, both singly and in combination. Our data shows that consensus scoring further provides a dramatic reduction in the number of false positives identified by individual scoring functions, thus leading to a significant enhancement in hit-rates.

  • Evaluation of the FLEXX incremental construction algorithm for protein-ligand docking.
    Kramer, B and Rarey, M and Lengauer, T
    Proteins, 1999, 37(2), 228-241
    PMID: 10584068    
    We report on a test of FLEXX, a fully automatic docking tool for flexible ligands, on a highly diverse data set of 200 protein-ligand complexes from the Protein Data Bank. In total 46.5% of the complexes of the data set can be reproduced by a FLEXX docking solution at rank 1 with an rms deviation (RMSD) from the observed structure of less than 2 A. This rate rises to 70% if one looks at the entire generated solution set. FLEXX produces reliable results for ligands with up to 15 components which can be docked in 80% of the cases with acceptable accuracy. Ligands with more than 15 components tend to generate wrong solutions more often. The average runtime of FLEXX on this test set is 93 seconds per complex on a SUN Ultra-30 workstation. In addition, we report on "cross-docking" experiments, in which several receptor structures of complexes with identical proteins have been used for docking all cocrystallized ligands of these complexes. In most cases, these experiments show that FLEXX can acceptably dock a ligand into a foreign receptor structure. Finally we report on screening runs of ligands out of a library with 556 entries against ten different proteins. In eight cases FLEXX is able to find the original inhibitor within the top 7% of the total library.

  • Exhaustive docking of molecular fragments with electrostatic solvation.
    Majeux, N and Scarsi, M and Apostolakis, J and Ehrhardt, C and Caflisch, A
    Proteins, 1999, 37(1), 88-105
    PMID: 10451553    
    A new method is presented for docking molecular fragments to a rigid protein with evaluation of the binding energy. Polar fragments are docked with at least one hydrogen bond with the protein while apolar fragments are positioned in the hydrophobic pockets. The electrostatic contribution to the binding energy, which consists of screened intermolecular energy and protein and fragment desolvation terms, is evaluated efficiently by a numerical approach based on the continuum dielectric approximation. The latter is also used to predetermine the hydrophobic pockets of the protein by rolling a low dielectric sphere over the protein surface and calculating the electrostatic desolvation of the protein and van der Waals interaction energy. The method was implemented in the program SEED (solvation energy for exhaustive docking). The SEED continuum electrostatic approach has been successfully validated by a comparison with finite difference solutions of the Poisson equation for more than 2,500 complexes of small molecules with thrombin and the monomer of HIV-1 aspartic proteinase. The fragments docked by SEED in the active site of thrombin reproduce the structural features of the interaction patterns between known inhibitors and thrombin. Moreover, the combinatorial connection of these fragments yields a number of compounds that are very similar to potent inhibitors of thrombin. Proteins 1999;37:88-105.


  • Surface solid angle-based site points for molecular docking.
    Hendrix, D K and Kuntz, I D
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 1998, 317-326
    PMID: 9697192    
    We are developing a new site descriptor for the DOCK molecular modeling program suite. Sphgen, the current site description program for the DOCK suite, describes the pockets of a macromolecule by filling a volume with intersecting spheres. DOCK then identifies possible ligand orientations in the pocket by overlapping the atoms of proposed ligands with the sphere centers. Sphgen limits use of the DOCK program to concave binding regions, but macromolecular binding regions can be solvent-exposed rather than buried pockets. We present a more general site descriptor, based on the surface solid angle, which generates site points by determining the solid angle of exposure for points on the surface of the molecule, then identifying patches of surface with similar solid angle values which are then built into site points. We find possible ligand orientations by matching shape-based site points on the ligand and protein and demanding complementary solid angle values. Orientations are evaluated using the DOCK's force field-based score, which evaluates the Coulombic and van der Waals energy. The surface solid angle descriptor displays the complementary characteristics of the interfaces of our test systems: trypsin/trypsin inhibitor, chymotrypsin/turkey ovomucoid third domain, and subtilisin/chymotrypsin inhibitor. The solid angle site points can be used by DOCK to generate orientations within 1.5 A r.m.s.d. of the crystal structure orientation.

  • Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function
    Morris, GM and Goodsell, DS and Halliday, RS
    Journal of computational chemistry, 1998, 19(14), 1639-1662
    A novel and robust automated docking method that predicts the bound conformations of flexible ligands to macromolecular targets has been developed and tested, in combination with a new scoring function that estimates the free energy change upon binding. Interestingly, this method applies a Lamarckian model of genetics, in which environmental adaptations of an individual's phenotype are reverse transcribed into its genotype and become heritable traits\v Z}sic.. We consider three search methods, Monte Carlo simulated annealing, a traditional genetic algorithm, and the Lamarckian genetic algorithm, and compare their performance in dockings of seven protein􏱌ligand test systems having known three-dimensional structure. We show that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AUTODOCK, and that the Lamarckian genetic algorithm is the most efficient, reliable, and successful of the three. The empirical free energy function was calibrated using a set of 30 structurally known protein􏱌ligand complexes with experimentally determined binding constants. Linear regression analysis of the observed binding constants in terms of a wide variety of structure-derived molecular properties was performed. The final model had a residual standard error of 9.11 kJ mol􏳡1\v Z}2.177 kcal mol􏳡1 . and was chosen as the new energy

  • Screening a peptidyl database for potential ligands to proteins with side-chain flexibility.
    Schnecke, V and Swanson, C A and Getzoff, E D and Tainer, J A and Kuhn, L A
    Proteins, 1998, 33(1), 74-87
    PMID: 9741846    
    The three key challenges addressed in our development of SPECITOPE, a tool for screening large structural databases for potential ligands to a protein, are to eliminate infeasible candidates early in the search, incorporate ligand and protein side-chain flexibility upon docking, and provide an appropriate rank for potential new ligands. The protein ligand-binding site is modeled by a shell of surface atoms and by hydrogen-bonding template points for the ligand to match, conferring specificity to the interaction. SPECITOPE combinatorially matches all hydrogen-bond donors and acceptors of the screened molecules to the template points. By eliminating molecules that cannot match distance or hydrogen-bond constraints, the transformation of potential docking candidates into the ligand-binding site and the shape and hydrophobic complementarity evaluations are only required for a small subset of the database. SPECITOPE screens 140,000 peptide fragments in about an hour and has identified and docked known inhibitors and potential new ligands to the free structures of four distinct targets: a serine protease, a DNA repair enzyme, an aspartic proteinase, and a glycosyltransferase. For all four, protein side-chain rotations were critical for successful docking, emphasizing the importance of inducible complementarity for accurately modeling ligand interactions. SPECITOPE has a range of potential applications for understanding and engineering protein recognition, from inhibitor and linker design to protein docking and macromolecular assembly.

  • Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs.
    Böhm, H J
    Journal of computer-aided molecular design, 1998, 12(4), 309-323
    PMID: 9777490    
    A dataset of 82 protein-ligand complexes of known 3D structure and binding constant Ki was analysed to elucidate the important factors that determine the strength of protein-ligand interactions. The following parameters were investigated: the number and geometry of hydrogen bonds and ionic interactions between the protein and the ligand, the size of the lipophilic contact surface, the flexibility of the ligand, the electrostatic potential in the binding site, water molecules in the binding site, cavities along the protein-ligand interface and specific interactions between aromatic rings. Based on these parameters, a new empirical scoring function is presented that estimates the free energy of binding for a protein-ligand complex of known 3D structure. The function distinguishes between buried and solvent accessible hydrogen bonds. It tolerates deviations in the hydrogen bond geometry of up to 0.25 A in the length and up to 30 degrees in the hydrogen bond angle without penalizing the score. The new energy function reproduces the binding constants (ranging from 3.7 x 10(-2) M to 1 x 10(-14) M, corresponding to binding energies between -8 and -80 kJ/mol) of the dataset with a standard deviation of 7.3 kJ/mol corresponding to 1.3 orders of magnitude in binding affinity. The function can be evaluated very fast and is therefore also suitable for the application in a 3D database search or de novo ligand design program such as LUDI. The physical significance of the individual contributions is discussed.

  • Empirical scoring functions. II. The testing of an empirical scoring function for the prediction of ligand-receptor binding affinities and the use of Bayesian regression to improve the quality of the model.
    Murray, C W and Auton, T R and Eldridge, M D
    Journal of computer-aided molecular design, 1998, 12(5), 503-519
    PMID: 9834910    
    This paper tests the performance of a simple empirical scoring function on a set of candidate designs produced by a de novo design package. The scoring function calculates approximate ligand-receptor binding affinities given a putative binding geometry. To our knowledge this is the first substantial test of an empirical scoring function of this type on a set of molecular designs which were then subsequently synthesised and assayed. The performance illustrates that the methods used to construct the scoring function and the reliance on plausible, yet potentially false, binding modes can lead to significant over-prediction of binding affinity in bad cases. This is anticipated on theoretical grounds and provides caveats on the reliance which can be placed when using the scoring function as a screen in the choice of molecular designs. To improve the predictability of the scoring function and to understand experimental results, it is important to perform subsequent Quantitative Structure-Activity Relationship (QSAR) studies. In this paper, Bayesian regression is performed to improve the predictability of the scoring function in the light of the assay results. Bayesian regression provides a rigorous mathematical framework for the incorporation of prior information, in this case information from the original training set, into a regression on the assay results of the candidate molecular designs. The results indicate that Bayesian regression is a useful and practical technique when relevant prior knowledge is available and that the constraints embodied in the prior information can be used to improve the robustness and accuracy of regression models. We believe this to be the first application of Bayesian regression to QSAR analysis in chemistry.

  • An example of a protein ligand found by database mining: description of the docking method and its verification by a 2.3 A X-ray structure of a thrombin-ligand complex.
    Burkhard, P and Taylor, P and Walkinshaw, M D
    Journal of molecular biology, 1998, 277(2), 449-466
    PMID: 9514757     doi: 10.1006/jmbi.1997.1608
    A computer program (SANDOCK) has been developed for the automated docking of small ligands to a target protein. It uses a guided matching algorithm to fit ligand atoms into the protein binding pocket. The protein is described by a modified Lee-Richard's dotted surface with each dot coded by chemical property and accessibility. Orientations of the ligand in the active site are generated such that a chemical and a shape complementary between the ligand and the active site cavity have to be fulfilled. The generated fits are evaluated with scoring functions which account for van der Waals, hydrophobic and hydrogen bonding interactions. This newly developed docking program can efficiently screen very large databases in a reasonable time and has been used to successfully identify novel ligands. The X-ray structure of a thrombin-ligand complex predicted by SANDOCK is described. The ligand binds to thrombin with a Kd of 65 microM and has an rmsd of 0.7 A for all ligand atoms from the predicted binding mode by SANDOCK.

  • Assessing search strategies for flexible docking
    Vieth, M and Hirst, JD and Dominy, BN and Daigler, H and Brooks, CL
    Journal of computational chemistry, 1998, 19(14), 1623-1631
    We assess the efficiency of molecular dynamics (MD), Monte Carlo (MC), and genetic algorithms (GA) for docking five representative ligand-receptor complexes. All three algorithms employ a modified CHARMM-based energy function. The algorithms are also compared with an established docking algorithm, AutoDock. The receptors are kept rigid while flexibility of ligands is permitted. To test the efficiency of the algorithms, two search spaces are used: an 11-Angstrom-radius sphere and a 2.5-Angstrom-radius sphere, both centered on the active site. We find MD is most efficient in the case of the large search space, and GA outperforms the other methods in the small search space. We also find that MD provides structures that are, on average, lower in energy and closer to the crystallographic conformation. The GA obtains good solutions over the course of the fewest energy evaluations. However, due to the nature of the nonbonded interaction calculations, the GA requires the longest time for a single energy evaluation, which results in a decreased efficiency. The GA and MC search algorithms are implemented in the CHARMM macromolecular package. (C) 1998 John Wiley & Sons, Inc.

  • Assessing energy functions for flexible docking
    Vieth, M and Hirst, JD and Kolinski, A and Brooks, CL
    Journal of computational chemistry, 1998, 19(14), 1612-1622
    A good docking algorithm requires an energy function that is selective, in that it clearly differentiates correctly docked structures from misdocked ones, and that is efficient, meaning that a correctly docked structure can be identified quickly. We assess the selectivity and efficiency of a broad spectrum of energy functions, derived from systematic modifications of the CHARMM param19/toph19 energy function, in particular, we examine the effects of the dielectric constant, the solvation model, the scaling of surface charges, reduction of van der Waals repulsion, and nonbonded cutoffs. Based on an assessment of the energy functions for the docking of five different Ligand-receptor complexes, we find that selective energy functions include a variety of distance-dependent dielectric models together with truncation of the nonbonded interactions at 8 Angstrom. We evaluate the docking efficiency, the mean number of docked structures per unit of time, of the more selective energy functions, using a simulated annealing molecular dynamics protocol. The largest improvements in efficiency come from a reduction of van der Waals repulsion and a reduction of surface charges. We note that the most selective potential is quite inefficient, although a hierarchical approach can be employed to take advantage of both selective and efficient energy functions. (C) 1998 John Wiley & Sons, Inc.


  • Molecular docking to ensembles of protein structures.
    Knegtel, R M and Kuntz, I D and Oshiro, C M
    Journal of molecular biology, 1997, 266(2), 424-440
    PMID: 9047373     doi: 10.1006/jmbi.1996.0776
    Until recently, applications of molecular docking assumed that the macromolecular receptor exists in a single, rigid conformation. However, structural studies involving different ligands bound to the same target biomolecule frequently reveal modest but significant conformational changes in the target. In this paper, two related methods for molecular docking are described that utilize information on conformational variability from ensembles of experimental receptor structures. One method combines the information into an "energy-weighted average" of the interaction energy between a ligand and each receptor structure. The other method performs the averaging on a structural level, producing a "geometry-weighted average" of the inter-molecular force field score used in DOCK 3.5. Both methods have been applied in docking small molecules to ensembles of crystal and solution structures, and we show that experimentally determined binding orientations and computed energies of known ligands can be reproduced accurately. The use of composite grids, when conformationally different protein structures are available, yields an improvement in computational speed for database searches in proportion to the number of structures.

  • SMoG:  de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 2. Case Studies in Molecular Design
    Robert S DeWitte and Alexey V Ishchenko, and and Shakhnovich, Eugene I
    Journal of the American Chemical Society, 1997, 119(20), 4608-4617
    In this paper, we summarize three ligand design studies performed using the program SMoG, which was developed in our lab. The aim of this presentation is to communicate through examples the potential of this method:  the richness of the molecules that can be developed and the ease with which they are found. In particular, we present suggestions for ligands to Src SH3 domain (specificity pocket and LP site) and CD4.

  • Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes.
    Eldridge, M D and Murray, C W and Auton, T R and Paolini, G V and Mee, R P
    Journal of computer-aided molecular design, 1997, 11(5), 425-445
    PMID: 9385547    
    This paper describes the development of a simple empirical scoring function designed to estimate the free energy of binding for a protein-ligand complex when the 3D structure of the complex is known or can be approximated. The function uses simple contact terms to estimate lipophilic and metal-ligand binding contributions, a simple explicit form for hydrogen bonds and a term which penalises flexibility. The coefficients of each term are obtained using a regression based on 82 ligand-receptor complexes for which the binding affinity is known. The function reproduces the binding affinity of the complexes with a cross-validated error of 8.68 kJ/mol. Tests on internal consistency indicate that the coefficients obtained are stable to changes in the composition of the training set. The function is also tested on two test sets containing a further 20 and 10 complexes, respectively. The deficiencies of this type of function are discussed and it is compared to approaches by other workers.

  • QXP: Powerful, rapid computer algorithms for structure-based drug design
    McMartin, C and Bohacek, RS
    Journal of computer-aided molecular design, 1997, 11(4), 333-344
    PMID: 9334900    
    New methods for docking, template fitting and building pseudo-receptors are described. Full conformational searches are carried out for flexible cyclic and acyclic molecules. QXP (quick explore) search algorithms are derived from the method of Monte Carlo perturbation with energy minimization in Cartesian space. An additional fast search step is introduced between the initial perturbation and energy minimization. The fast search produces approximate low-energy structures, which are likely to minimize to a low energy. For template fitting, QXP uses a superposition force field which automatically assigns short-range attractive forces to similar atoms in different molecules. The docking algorithms were evaluated using X-ray data for 12 protein-ligand complexes. The ligands had up to 24 rotatable bonds and ranged from highly polar to mostly nonpolar. Docking searches of the randomly disordered ligands gave rms differences between the lowest energy docked structure and the energy-minimized X-ray structure, of less than 0.76 Angstrom for 10 of the ligands. For all the ligands, the rms difference between the energy-minimized X-ray structure and the closest docked structure was less than 0.4 Angstrom, when parts of one of the molecules which are in the solvent were excluded from the rms calculation. Template fitting was tested using four ACE inhibitors. Three ACE templates have been previously published. A single run using QXP generated a series of templates which contained examples of each of the three. A pseudo-receptor, complementary to an ACE template, was built out of small molecules, such as pyrrole, cyclo-pentanone and propane. When individually energy minimized in the pseudo-receptor, each of the four ACE inhibitors moved with an rms of less than 0.25 Angstrom. After random perturbation, the inhibitors were docked into the pseudo-receptor. Each lowest energy docked structure matched the energy-minimized geometry with an rms of less than 0.08 Angstrom. Thus, the pseudo-receptor shows steric and chemical complementarity to all four molecules. The QXP program is reliable, easy to use and sufficiently rapid for routine application in structure-based drug design.

  • Development and validation of a genetic algorithm for flexible docking.
    Jones, G and Willett, P and Glen, R C and Leach, A R and Taylor, R
    Journal of molecular biology, 1997, 267(3), 727-748
    PMID: 9126849     doi: 10.1006/jmbi.1996.0897
    Prediction of small molecule binding modes to macromolecules of known three-dimensional structure is a problem of paramount importance in rational drug design (the "docking" problem). We report the development and validation of the program GOLD (Genetic Optimisation for Ligand Docking). GOLD is an automated ligand docking program that uses a genetic algorithm to explore the full range of ligand conformational flexibility with partial flexibility of the protein, and satisfies the fundamental requirement that the ligand must displace loosely bound water on binding. Numerous enhancements and modifications have been applied to the original technique resulting in a substantial increase in the reliability and the applicability of the algorithm. The advanced algorithm has been tested on a dataset of 100 complexes extracted from the Brookhaven Protein DataBank. When used to dock the ligand back into the binding site, GOLD achieved a 71% success rate in identifying the experimental binding mode.


  • SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence
    DeWitte, Robert S and Shakhnovich, Eugene I
    Journal of the American Chemical Society, 1996, 118(47), 11733-11744
    doi: 10.1021/ja960751u
    In this paper, we present SMoG (Small Molecule Growth), a novel, straightforward method for de novo lead design and the evidence for its effectiveness. It is based on a simple model for ligand-protein interactions and a scoring that is directly related to the free energy through a knowledge-based potential. A large number of structures are examined by an efficient metropolis Monte Carlo molecular growth algorithm that generates molecules through the adjoining of functional groups directly in the binding region. Thus SMoG is a method that is able to rank a large number of potential compounds according to binding free energy in a short time. In this sense, SMoG represents a step toward an ideal computational tool for ligand design.

  • VALIDATE: A New Method for the Receptor-Based Prediction of Binding Affinities of Novel Ligands
    Head, Richard D and Smythe, Mark L and Oprea, Tudor I and Waller, Chris L and Green, Stuart M and Marshall, Garland R.
    Journal of the American Chemical Society, 1996, 118(16), 3959-3969
    doi: 10.1021/ja9539002
    VALIDATE is a hybrid approach to predict the binding affinity of novel ligands for receptors of known three-dimensional structure. This approach calculates physicochemical properties of the ligand and the receptor- ligand complex to estimate the free energy of binding. The enthalpy of binding is calculated by molecular mechanics while properties such as complementary hydrophobic surface area are used to estimate the entropy of binding through heuristics. A diverse training set of 51 crystalline complexes was assembled, and their relevant physicochemical properties were computed. These properties were analyzed by partial least squares (PLS) statistics, or neural network analysis (SONNIC), to generate models for the general prediction of the affinity of ligands with receptors of known three-dimensional structure. The ability of the model to predict the affinity of novel complexes not included in the training set was demonstrated with three independent test sets: 14 complexes of known three-dimensional structure including 3 DNA complexes, a class of compound not included in the training set, 13 HIV protease inhibitors fit to HIV-1 protease, and 11 thermolysin inhibitors fit to thermolysin.

  • Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities.
    Jain, A N
    Journal of computer-aided molecular design, 1996, 10(5), 427-440
    PMID: 8951652    
    Exploitation of protein structures for potential drug leads by molecular docking is critically dependent on methods for scoring putative protein-ligand interactions. An ideal function for scoring must exhibit predictive accuracy and high computational speed, and must be tolerant of variations in the relative protein-ligand molecular alignment and conformation. This paper describes the development of an empirically derived scoring function, based on the binding affinities of protein-ligand complexes coupled with their crystallographically determined structures. The function's primary terms involve hydrophobic and polar complementarity, with additional terms for entropic and solvation effects. The issue of alignment/conformation dependence was solved by constructing a continuous differentiable nonlinear function with the requirement that maxima in ligand conformation/alignment space corresponded closely to crystallographically determined structures. The expected error in the predicted affinity based on cross-validation was 1.0 log unit. The function is sufficiently fast and accurate to serve as the objective function of a molecular-docking search engine. The function is particularly well suited to the docking problem, since it has spatially narrow maxima that are broadly accessible via gradient descent.

  • A fast flexible docking method using an incremental construction algorithm.
    Rarey, M and Kramer, B and Lengauer, T and Klebe, G
    Journal of molecular biology, 1996, 261(3), 470-489
    PMID: 8780787     doi: 10.1006/jmbi.1996.0477
    We present an automatic method for docking organic ligands into protein binding sites. The method can be used in the design process of specific protein ligands. It combines an appropriate model of the physico-chemical properties of the docked molecules with efficient methods for sampling the conformational space of the ligand. If the ligand is flexible, it can adopt a large variety of different conformations. Each such minimum in conformational space presents a potential candidate for the conformation of the ligand in the complexed state. Our docking method samples the conformation space of the ligand on the basis of a discrete model and uses a tree-search technique for placing the ligand incrementally into the active site. For placing the first fragment of the ligand into the protein, we use hashing techniques adapted from computer vision. The incremental construction algorithm is based on a greedy strategy combined with efficient methods for overlap detection and for the search of new interactions. We present results on 19 complexes of which the binding geometry has been crystallographically determined. All considered ligands are docked in at most three minutes on a current workstation. The experimentally observed binding mode of the ligand is reproduced with 0.5 to 1.2 A rms deviation. It is almost always found among the highest-ranking conformations computed.

  • Molecular docking using surface complementarity.
    Sobolev, V and Wade, R C and Vriend, G and Edelman, M
    Proteins, 1996, 25(1), 120-129
    PMID: 8727324     doi: 10.1002/(SICI)1097-0134(199605)25:1{ <120::AID-PROT10{ >3.0.CO;2-M
    A method is described to dock a ligand into a binding site in a protein on the basis of the complementarity of the intermolecular atomic contacts. Docking is performed by maximization of a complementarity function that is dependent on atomic contact surface area and the chemical properties of the contacting atoms. The generality and simplicity of the complementarity function ensure that a wide range of chemical structures can be handled. The ligand and the protein are treated as rigid bodies, but displacement of a small number of residues lining the ligand binding site can be taken into account. The method can assist in the design of improved ligands by indicating what changes in complementarity may occur as a result of the substitution of an atom in the ligand. The capabilities of the method are demonstrated by application to 14 protein-ligand complexes of known crystal structure.

  • Hammerhead: Fast, fully automated docking of flexible ligands to protein binding sites
    Welch, W and Ruppert, J and Jain, AN
    Chemistry & Biology, 1996, 3(6), 449-462
    PMID: 8807875    
    Background: Molecular docking seeks to predict the geometry and affinity of the binding of a small molecule to a given protein of known structure. Rigid docking has long been used to screen databases of small molecules, because docking techniques that account for ligand flexibility have either been too slow or have required significant human intervention, Here we describe a docking algorithm, Hammerhead, which is a fast, automated tool to screen for the binding of flexible molecules to protein binding sites.Results: We used Hammerhead to successfully dock a variety of positive control ligands into their cognate proteins. The empirically tuned scoring function of the algorithm predicted binding affinities within 1.3 log units of the known affinities for these ligands, Conformations and alignments close to those determined crystallographically received the highest scores. We screened 80 000 compounds for binding to streptavidin, and biotin was predicted as the top-scoring ligand, with other known ligands included among the highest-scoring dockings, The screen ran in a few days on commonly available hardware.Conclusions: Hammerhead is suitable for screening large databases of flexible molecules for binding to a protein of known structure. It correctly docks a variety of known flexible ligands, and it spends an average of only a few seconds on each compound during a screen. The approach is completely automated, from the elucidation of protein binding sites, through the docking of molecules, to the final selection of compounds for assay.

  • Automated docking of flexible ligands: applications of AutoDock.
    Goodsell, D S and Morris, G M and Olson, A J
    Journal of molecular recognition : JMR, 1996, 9(1), 1-5
    PMID: 8723313     doi: 10.1002/(SICI)1099-1352(199601)9:1{ <1::AID-JMR241{ >3.0.CO;2-6
    AutoDock is a suite of C programs used to predict the bound conformations of a small, flexible ligand to a macromolecular target of known structure. The technique combines simulated annealing for conformation searching with a rapid grid-based method of energy evaluation. This paper reviews recent applications of the technique and describes the enhancements included in the current release.

  • Distributed automated docking of flexible ligands to proteins: parallel applications of AutoDock 2.4.
    Morris, G M and Goodsell, D S and Huey, R and Olson, A J
    Journal of computer-aided molecular design, 1996, 10(4), 293-304
    PMID: 8877701    
    AutoDock 2.4 predicts the bound conformations of a small, flexible ligand to a nonflexible macromolecular target of known structure. The technique combines simulated annealing for conformation searching with a rapid grid-based method of energy evaluation based on the AMBER force field. AutoDock has been optimized in performance without sacrificing accuracy; it incorporates many enhancements and additions, including an intuitive interface. We have developed a set of tools for launching and analyzing many independent docking jobs in parallel on a heterogeneous network of UNIX-based workstations. This paper describes the current release, and the results of a suite of diverse test systems. We also present the results of a systematic investigation into the effects of varying simulated-annealing parameters on molecular docking. We show that even for ligands with a large number of degrees of freedom, root-mean-square deviations of less than 1 A from the crystallographic conformation are obtained for the lowest-energy dockings, although fewer dockings find the crystallographic conformation when there are more degrees of freedom.


  • Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming.
    Gehlhaar, D K and Verkhivker, G M and Rejto, P A and Sherman, C J and Fogel, D B and Fogel, L J and Freer, S T
    Chemistry & Biology, 1995, 2(5), 317-324
    PMID: 9383433    
    BACKGROUND:An important prerequisite for computational structure-based drug design is prediction of the structures of ligand-protein complexes that have not yet been experimentally determined by X-ray crystallography or NMR. For this task, docking of rigid ligands is inadequate because it assumes knowledge of the conformation of the bound ligand. Docking of flexible ligands would be desirable, but requires one to search an enormous conformational space. We set out to develop a strategy for flexible docking by combining a simple model of ligand-protein interactions for molecular recognition with an evolutionary programming search technique.

  • Flexible Ligand Docking Without Parameter Adjustment Across 4 Ligand-Receptor Complexes
    CLARK, KP andAJAY}
    Journal of computational chemistry, 1995, 16(10), 1210-1226
    Understanding molecular recognition is one of the fundamental problems in molecular biology. Computationally, molecular recognition is formulated as a docking problem. Ideally, a molecular docking algorithm should be computationally efficient, provide reasonably thorough search of conformational space, obtain solutions with reasonable consistency, and not require parameter adjustments. With these goals in mind, we developed DIVALI (Docking with eVolutionary Algorithms), a program which efficiently and reliably searches for the possible binding modes of a ligand within a fixed receptor. We use an AMBER-type potential function and search for good ligand conformations using a genetic algorithm (GA). We apply our system to study the docking of both rigid and flexible ligands in four different complexes. Our results indicate that it is possible to find diverse binding modes, including structures like the crystal structure, all with comparable potential function values. To achieve this, certain modifications to the standard GA recipe are essential. (C) 1995 by John Wiley & Sons, Inc.


  • Rational automatic search method for stable docking models of protein and ligand.
    Mizutani, M Y and Tomioka, N and Itai, A
    Journal of molecular biology, 1994, 243(2), 310-326
    PMID: 7932757     doi: 10.1006/jmbi.1994.1656
    An efficient automatic method has been developed for docking a ligand molecule to a protein molecule. The method can construct energetically favorable docking models, considering specific interactions between the two molecules and conformational flexibility in the ligand. In the first stage of docking, likely binding modes are searched and estimated effectively in terms of hydrogen bonds, together with conformations in part of the ligand structure that includes hydrogen bonding groups. After that part is placed in the protein cavity and is optimized, conformations in the remaining part are also examined systematically. Finally, several stable docking models are obtained after optimization of the position, orientation and conformation of the whole ligand molecule. In all the screening processes, the total potential energy including intra- and intermolecular interaction energy, consisting of van der Waals, electrostatic and hydrogen bonding energies, is used as the index. The characteristics of our docking method are high accuracy of the results, fully automatic generation of models and short computational time. The efficiency of the method was confirmed by four docking trials using two enzyme systems. In two attempts to dock methotrexate to dihydrofolate reductase and 2'-GMP to ribonuclease T1, the exact structures of complexes in crystals were reproduced as the most stable docking models, without any assumptions concerning the binding modes and ligand conformations. The most stable docking models of dihydrofolate and trimethoprim, respectively, to dihydrofolate reductase were also in good agreement with those suggested by experiment. In all test cases, it was shown that our method can accurately predict the correct docking structures, discriminating the correct model from incorrect ones. The efficiency of our method was further tested from the viewpoint of ability to predict the relative stability of the docking structures of two triazine derivatives to dihydrofolate reductase. Our docking method provides a useful tool for rational drug design and investigations of biochemical reaction mechanisms.

  • FLOG: a system to select 'quasi-flexible' ligands complementary to a receptor of known three-dimensional structure.
    Miller, M D and Kearsley, S K and Underwood, D J and Sheridan, R P
    Journal of computer-aided molecular design, 1994, 8(2), 153-174
    PMID: 8064332    
    We present a system, FLOG (Flexible Ligands Oriented on Grid), that searches a database of 3D coordinates to find molecules complementary to a macromolecular receptor of known 3D structure. The philosophy of FLOG is similar to that reported for DOCK [Shoichet, B.K. et al., J. Comput. Chem., 13 (1992) 380]. In common with that system, we use a match center representation of the volume of the binding cavity and we use a clique-finding algorithm to generate trial orientations of each candidate ligand in the binding site. Also we use a grid representation of the receptor to assess the fit of each orientation. We have introduced a number of novel features within this paradigm. First, we address ligand flexibility by including up to 25 explicit conformations of each structure in our databases. Nonhydrogen atoms in each database entry are assigned one of seven atom types (anion, cation, donor, acceptor, polar, hydrophobic and other) based on their local bonded chemical environments. Second, we have devised a new grid-based scoring function compatible with this 'heavy atom' representation of the ligands. This includes several potentials (electrostatic, hydrogen bonding, hydrophobic and van der Waals) calculated from the location of the receptor atoms. Third, we have improved the fitting stage of the search. Initial dockings are generated with a more efficient clique-finding algorithm. This new algorithm includes the concept of 'essential points', match centers that must be paired with a ligand atom. Also, we introduce the use of a rapid simplex-based rigid-body optimizer to refine the orientations. We demonstrate, using dihydrofolate reductase as a sample receptor, that the FLOG system can select known inhibitors from a large database of drug-like compounds.

  • Icm - a New Method for Protein Modeling and Design - Applications to Docking and Structure Prediction From the Distorted Native Conformation
    Journal of computational chemistry, 1994, 15(5), 488-506
    An efficient methodology, further referred to as ICM, for versatile modeling operations and global energy optimization on arbitrarily fixed multimolecular systems is described. It is aimed at protein structure prediction, homology modeling, molecular docking, nuclear magnetic resonance (NMR) structure determination, and protein design. The method uses and further develops a previously introduced approach to model biomolecular structures in which bond lengths, bond angles, and torsion angles are considered as independent variables, any subset of them being fixed. Here we simplify and generalize the basic description of the system, introduce the variable dihedral phase angle, and allow arbitrary connections of the molecules and conventional definition of the torsion angles. Algorithms for calculation of energy derivatives with respect to internal variables in the topological tree of the system and for rapid evaluation of accessible surface are presented. Multidimensional variable restraints are proposed to represent the statistical information about the torsion angle distributions in proteins. To incorporate complex energy terms as solvation energy and electrostatics into a structure prediction procedure, a ''double-energy'' Monte Carlo minimization procedure in which these terms are omitted during the minimization stage of the random step and included for the comparison with the previous conformation in a Markov chain is proposed and justified. The ICM method is applied successfully to a molecular docking problem. The procedure finds the correct parallel arrangement of two rigid helixes from a leucine zipper domain as the lowest-energy conformation (0.5 Angstrom root mean square, rms, deviation from the native structure) starting from completely random configuration. Structures with antiparallel helixes or helixes staggered by one helix turn had energies higher by about 7 or 9 kcal/mol, respectively. Soft docking was also attempted. A docking procedure allowing side-chain flexibility also converged to the parallel configuration, starting from the helixes optimized individually. To justify an internal coordinate approach to the structure prediction as opposed to a Cartesian one, energy hypersurfaces around the native structure of the squash seeds trypsin inhibitor were studied. Torsion angle minimization from the optimal conformation randomly distorted up to the rms deviation of 2.2 Angstrom or angular rms deviation of 10 degrees restored the native conformation in most cases. In contrast, Cartesian coordinate minimization did not reach the minimum from deviations as small as 0.3 Angstrom or 2 degrees. We conclude that the most promising detailed approach to the protein-folding problem would consist of some coarse global sampling strategy combined with the local energy minimization in the torsion coordinate space. (C) 1994 by John Wiley & Sons, Inc.


  • Automated docking with grid‐based energy evaluation
    Meng, EC and Shoichet, BK
    Journal of computational chemistry, 1992, 13(4), 505-524
    The ability to generate feasible binding orientations of a small molecule within a site of known structure is important for ligand design. We present a method that combines a rapid, geometric docking algorithm with the evaluation of molecular mechanics interaction energies.The computational costs of evaluation are minimal because we precalculate the receptor-dependent terms in the potential function at points on a three- dimensional grid. In four test cases where the components of crystallographically determined complexes are redocked, the ``force field' score correctly identifies the family of orientations closest to the experimental binding geometry. Scoring functions that consider only steric factors or only electrostatic factors are less successful. The force field function will play an important role in our efforts to search databases for potential lead compounds.

  • A multiple-start Monte Carlo docking method.
    Hart, T N and Read, R J
    Proteins, 1992, 13(3), 206-222
    PMID: 1603810     doi: 10.1002/prot.340130304
    We present a method to search for possible binding modes of molecular fragments at a specific site of a potential drug target of known structure. Our method is based on a Monte Carlo (MC) algorithm applied to the translational and rotational degrees of freedom of the probe fragment. Starting from a randomly generated initial configuration, favorable binding modes are generated using a two-step process. An MC run is first performed in which the energy in the Metropolis algorithm is substituted by a score function that measures the average distance of the probe to the target surface. This has the effect of making buried probes move toward the target surface and also allows enhanced sampling of deep pockets. In a second MC run, a pairwise atom potential function is used, and the temperature parameter is slowly lowered during the run (Simulated Annealing). We repeat this procedure starting from a large number of different randomly generated initial configurations in order to find all energetically favorable docking modes in a specified region around the target. We test this method using two inhibitor-receptor systems: Streptomyces griseus proteinase B in complex with the third domain of the ovomucoid inhibitor from turkey, and dihydrofolate reductase from E. coli in complex with methotrexate. The method could consistently reproduce the complex found in the crystal structure searching from random initial positions in cubes ranging from 25 A to 50 A about the binding site. In the case of SGPB, we were also successful in docking to the native structure. In addition, we were successful in docking small probes in a search that included the entire protein surface.


  • Functionality maps of binding sites: A multiple copy simultaneous search method
    Miranker, Andrew and Karplus, Martin
    Proteins, 1991, 11(1), 29-34
    PMID: 1961699     doi: 10.1002/prot.340110104
    A new method is proposed for determining energetically favorable positions and orientations for functional groups on the surface of proteins with known three-dimensional structure. From 1,000 to 5,000 copies of a functional group are randomly placed in the site and subjected to simultaneous energy minimization and/or quenched molecular dynamics. The resulting functionality maps of a protein receptor site, which can take account of its flexibility, can be used for the analysis of protein ligand interactions and rational drug design. Application of the method to the sialic acid binding site of the influenza coat protein, hemagglutinin, yields functional group minima that correspond with those of the ligand in a cocrystal structure.


  • A Network Approach for Computational Drug Repositioning
    Li, Jiao and Lu, Zhiyong
    Journal of molecular biology, 2012, 161(2), 83-83
    PMID: 7154081     doi: 10.1109/HISB.2012.26
    Computational drug repositioning offers promise for discovering new uses of existing drugs, as drug related molecular, chemical, and clinical information has increased over the past decade and become broadly accessible. In this study, we present a new computational approach for identifying potential new indications of an existing drug through its relation to similar drugs in disease-drug-target network. When measuring drug pairwise similarly, we used a bipartite-graph based method which combined similarity of drug compound structures, similarity of target protein profiles, and interaction between target proteins. In evaluation, our method compared favorably to the state of the art, achieving AUC of 0.888. The results indicated that our method is able to identify drug repositioning opportunities by exploring complex relationships in disease-drug-target network.