# Bibliography of computer-aided Drug Design

Updated on 7/18/2014. Currently 2130 references

## Docking / Methodology

2014 / 2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2007 / 2006 / 2005 / 2004 / 2003 / 2002 / 2001 / 2000 / 1999 / 1998 / 1997 / 1996 / 1995 / 1994 / 1992 / 1991 / 1982 /

## 2014

• BP-Dock: A Flexible Docking Scheme for Exploring Protein-Ligand Interactions Based on Unbound Structures.
Bolia, Ashini and Gerek, Z Nevin and Ozkan, S Banu
Journal of chemical information and modeling, 2014, 54(3), 913-925
PMID: 24380381     doi: 10.1021/ci4004927

Molecular docking serves as an important tool in modeling protein-ligand interactions. However, it is still challenging to incorporate overall receptor flexibility, especially backbone flexibility, in docking due to the large conformational space that needs to be sampled. To overcome this problem, we developed a novel flexible docking approach, BP-Dock (Backbone Perturbation-Dock) that can integrate both backbone and side chain conformational changes induced by ligand binding through a multi-scale approach. In the BP-Dock method, we mimic the nature of binding-induced events as a first-order approximation by perturbing the residues along the protein chain with a small Brownian kick one at a time. The response fluctuation profile of the chain upon these perturbations is computed using the perturbation response scanning method. These response fluctuation profiles are then used to generate binding-induced multiple receptor conformations for ensemble docking. To evaluate the performance of BP-Dock, we applied our approach on a large and diverse data set using unbound structures as receptors. We also compared the BP-Dock results with bound and unbound docking, where overall receptor flexibility was not taken into account. Our results highlight the importance of modeling backbone flexibility in docking for recapitulating the experimental binding affinities, especially when an unbound structure is used. With BP-Dock, we can generate a wide range of binding site conformations realized in nature even in the absence of a ligand that can help us to improve the accuracy of unbound docking. We expect that our fast and efficient flexible docking approach may further aid in our understanding of protein-ligand interactions as well as virtual screening of novel targets for rational drug design.

• Incorporating replacement free energy of binding-site waters in molecular docking
Sun, Hanzi and Zhao, Lifeng and Peng, Shiming and Huang, Niu
Proteins, 2014, n/a-n/a
PMID: 24549784     doi: 10.1002/prot.24530

Binding-site water molecules play a crucial role in protein-ligand recognition, either being displaced upon ligand binding or forming water bridges to stabilize the complex. However, rigorously treating explicit binding-site waters is challenging in molecular docking, which requires to fully sample ensembles of waters and to consider the free energy cost of replacing waters. Here, we describe a method to incorporate structural and energetic properties of binding-site waters into molecular docking. We first developed a solvent property analysis (SPA) program to compute the replacement free energies of binding-site water molecules by post-processing molecular dynamics trajectories obtained from ligand-free protein structure simulation in explicit water. Next, we implemented a distance-dependent scoring term into DOCK scoring function to take account of the water replacement free energy cost upon ligand binding. We assessed this approach in protein targets containing important binding-site waters, and we demonstrated that our approach is reliable in reproducing the crystal binding geometries of protein-ligand-water complexes, as well as moderately improving the ligand docking enrichment performance. In addition, SPA program (free available to academic users upon request) may be applied in identifying hot-spot binding-site residues and structure-based lead optimization.Proteins 2014.

• Exhaustive docking and solvated interaction energy scoring: lessons learned from the SAMPL4 challenge.
Hogues, Hervé and Sulea, Traian and Purisima, Enrico O
Journal of computer-aided molecular design, 2014
PMID: 24474162     doi: 10.1007/s10822-014-9715-5

We continued prospective assessments of the Wilma-solvated interaction energy (SIE) platform for pose prediction, binding affinity prediction, and virtual screening on the challenging SAMPL4 data sets including the HIV-integrase inhibitor and two host-guest systems. New features of the docking algorithm and scoring function are tested here prospectively for the first time. Wilma-SIE provides good correlations with actual binding affinities over a wide range of binding affinities that includes strong binders as in the case of SAMPL4 host-guest systems. Absolute binding affinities are also reproduced with appropriate training of the scoring function on available data sets or from comparative estimation of the change in target's vibrational entropy. Even when binding modes are known, SIE predictions lack correlation with experimental affinities within dynamic ranges below 2 kcal/mol as in the case of HIV-integrase ligands, but they correctly signaled the narrowness of the dynamic range. Using a common protein structure for all ligands can reduce the noise, while incorporating a more sophisticated solvation treatment improves absolute predictions. The HIV-integrase virtual screening data set consists of promiscuous weak binders with relatively high flexibility and thus it falls outside of the applicability domain of the Wilma-SIE docking platform. Despite these difficulties, unbiased docking around three known binding sites of the enzyme resulted in over a third of ligands being docked within 2\AA} from their actual poses and over half of the ligands docked in the correct site, leading to better-than-random virtual screening results.

• Importance of ligand conformational energies in carbohydrate docking: Sorting the wheat from the chaff.
Nivedha, Anita K and Makeneni, Spandana and Foley, Bethany Lachele and Tessier, Matthew B and Woods, Robert J
Journal of computational chemistry, 2014, 35(7), 526-539
PMID: 24375430     doi: 10.1002/jcc.23517

Docking algorithms that aim to be applicable to a broad range of ligands suffer reduced accuracy because they are unable to incorporate ligand-specific conformational energies. Here, we develop a set of Carbohydrate Intrinsic (CHI) energy functions that quantify the conformational properties of oligosaccharides, based on the values of their glycosidic torsion angles. The relative energies predicted by the CHI energy functions mirror the conformational distributions of glycosidic linkages determined from a survey of oligosaccharide-protein complexes in the protein data bank. Addition of CHI energies to the standard docking scores in Autodock 3, 4.2, and Vina consistently improves pose ranking of oligosaccharides docked to a set of anticarbohydrate antibodies. The CHI energy functions are also independent of docking algorithm, and with minor modifications, may be incorporated into both theoretical modeling methods, and experimental NMR or X-ray structure refinement programs.

• istar: a web platform for large-scale protein-ligand docking.
Li, Hongjian and Leung, Kwong-Sak and Ballester, Pedro J and Wong, Man-Hon
PloS one, 2014, 9(1), e85678
PMID: 24475049     doi: 10.1371/journal.pone.0085678

Protein-ligand docking is a key computational method in the design of starting points for the drug discovery process. We are motivated by the desire to automate large-scale docking using our popular docking engine idock and thus have developed a publicly-accessible web platform called istar. Without tedious software installation, users can submit jobs using our website. Our istar website supports 1) filtering ligands by desired molecular properties and previewing the number of ligands to dock, 2) monitoring job progress in real time, and 3) visualizing ligand conformations and outputting free energy and ligand efficiency predicted by idock, binding affinity predicted by RF-Score, putative hydrogen bonds, and supplier information for easy purchase, three useful features commonly lacked on other online docking platforms like DOCK Blaster or iScreen. We have collected 17,224,424 ligands from the All Clean subset of the ZINC database, and revamped our docking engine idock to version 2.0, further improving docking speed and accuracy, and integrating RF-Score as an alternative rescoring function. To compare idock 2.0 with the state-of-the-art AutoDock Vina 1.1.2, we have carried out a rescoring benchmark and a redocking benchmark on the 2,897 and 343 protein-ligand complexes of PDBbind v2012 refined set and CSAR NRC HiQ Set 24Sept2010 respectively, and an execution time benchmark on 12 diverse proteins and 3,000 ligands of different molecular weight. Results show that, under various scenarios, idock achieves comparable success rates while outperforming AutoDock Vina in terms of docking speed by at least 8.69 times and at most 37.51 times. When evaluated on the PDBbind v2012 core set, our istar platform combining with RF-Score manages to reproduce Pearson's correlation coefficient and Spearman's correlation coefficient of as high as 0.855 and 0.859 respectively between the experimental binding affinity and the predicted binding affinity of the docked conformation. istar is freely available at http://istar.cse.cuhk.edu.hk/idock.

## 2013

• Roles for ordered and bulk solvent in ligand recognition and docking in two related cavities.
Barelier, Sarah and Boyce, Sarah E and Fish, Inbar and Fischer, Marcus and Goodin, David B and Shoichet, Brian K
PloS one, 2013, 8(7), e69153
PMID: 23874896     doi: 10.1371/journal.pone.0069153

A key challenge in structure-based discovery is accounting for modulation of protein-ligand interactions by ordered and bulk solvent. To investigate this, we compared ligand binding to a buried cavity in Cytochrome c Peroxidase (CcP), where affinity is dominated by a single ionic interaction, versus a cavity variant partly opened to solvent by loop deletion. This opening had unexpected effects on ligand orientation, affinity, and ordered water structure. Some ligands lost over ten-fold in affinity and reoriented in the cavity, while others retained their geometries, formed new interactions with water networks, and improved affinity. To test our ability to discover new ligands against this opened site prospectively, a 534,000 fragment library was docked against the open cavity using two models of ligand solvation. Using an older solvation model that prioritized many neutral molecules, three such uncharged docking hits were tested, none of which was observed to bind; these molecules were not highly ranked by the new, context-dependent solvation score. Using this new method, another 15 highly-ranked molecules were tested for binding. In contrast to the previous result, 14 of these bound detectably, with affinities ranging from 8 µM to 2 mM. In crystal structures, four of these new ligands superposed well with the docking predictions but two did not, reflecting unanticipated interactions with newly ordered waters molecules. Comparing recognition between this open cavity and its buried analog begins to isolate the roles of ordered solvent in a system that lends itself readily to prospective testing and that may be broadly useful to the community.

• VinaMPI: Facilitating multiple receptor high-throughput virtual docking on high-performance computers
Ellingson, Sally R and Smith, Jeremy C and Baudry, Jerome
Journal of computational chemistry, 2013, 34(25), 2212-2221
PMID: 23813626     doi: 10.1002/jcc.23367

• Towards ligand docking including explicit interface water molecules.
Lemmon, Gordon and Meiler, Jens
PloS one, 2013, 8(6), e67536
PMID: 23840735     doi: 10.1371/journal.pone.0067536

Small molecule docking predicts the interaction of a small molecule ligand with a protein at atomic-detail accuracy including position and conformation the ligand but also conformational changes of the protein upon ligand binding. While successful in the majority of cases, docking algorithms including RosettaLigand fail in some cases to predict the correct protein/ligand complex structure. In this study we show that simultaneous docking of explicit interface water molecules greatly improves Rosetta's ability to distinguish correct from incorrect ligand poses. This result holds true for both protein-centric water docking wherein waters are located relative to the protein binding site and ligand-centric water docking wherein waters move with the ligand during docking. Protein-centric docking is used to model 99 HIV-1 protease/protease inhibitor structures. We find protease inhibitor placement improving at a ratio of 9∶1 when one critical interface water molecule is included in the docking simulation. Ligand-centric docking is applied to 341 structures from the CSAR benchmark of diverse protein/ligand complexes [1]. Across this diverse dataset we see up to 56% recovery of failed docking studies, when waters are included in the docking simulation.

• Small-molecule ligand docking into comparative models with Rosetta.
Combs, Steven A and Deluca, Samuel L and Deluca, Stephanie H and Lemmon, Gordon H and Nannemann, David P and Nguyen, Elizabeth D and Willis, Jordan R and Sheehan, Jonathan H and Meiler, Jens
Nature protocols, 2013, 8(7), 1277-1298
PMID: 23744289     doi: 10.1038/nprot.2013.074

Structure-based drug design is frequently used to accelerate the development of small-molecule therapeutics. Although substantial progress has been made in X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, the availability of high-resolution structures is limited owing to the frequent inability to crystallize or obtain sufficient NMR restraints for large or flexible proteins. Computational methods can be used to both predict unknown protein structures and model ligand interactions when experimental data are unavailable. This paper describes a comprehensive and detailed protocol using the Rosetta modeling suite to dock small-molecule ligands into comparative models. In the protocol presented here, we review the comparative modeling process, including sequence alignment, threading and loop building. Next, we cover docking a small-molecule ligand into the protein comparative model. In addition, we discuss criteria that can improve ligand docking into comparative models. Finally, and importantly, we present a strategy for assessing model quality. The entire protocol is presented on a single example selected solely for didactic purposes. The results are therefore not representative and do not replace benchmarks published elsewhere. We also provide an additional tutorial so that the user can gain hands-on experience in using Rosetta. The protocol should take 5-7 h, with additional time allocated for computer generation of models.

• The MM2QM tool for combining docking, molecular dynamics, molecular mechanics, and quantum mechanics†
Nowosielski, Marcin and Hoffmann, Marcin and Kuron, Aneta and Korycka-Machala, Malgorzata and Dziadek, Jaroslaw
Journal of computational chemistry, 2013, 34(9), 750-756
PMID: 23233437     doi: 10.1002/jcc.23192

The use of the MM2QM tool in a combined docking + molecular dynamics (MD) + molecular mechanics (MM) + quantum mechanical (QM) binding affinity prediction study is presented, and the tool itself is discussed. The system of interest is Mycobacterium tuberculosis (MTB) pantothenate synthetase in complexes with three highly similar sulfonamide inhibitors, for which crystal structures are available. Starting from the structure of MTB pantothenate synthetase in the "open" conformation and following the combined docking + MD + MM + QM procedure, we were able to capture the closing of the enzyme binding pocket and to reproduce the position of the ligands with an average root mean square deviation of 1.6\AA}. Protein-ligand interaction energies were reproduced with an average error lower than 10%. The discussion on the MD part and a protein flexibility importance is carried out. The presented approach may be useful especially for finding analog inhibitors or improving drug candidates.

• Docking Challenge: Protein Sampling and Molecular Docking Performance
Elokely, Khaled M and Doerksen, Robert J
Journal of chemical information and modeling, 2013
PMID: 23530568

Computational tools are essential in the drug design process, especially in order to take advantage of the increasing numbers of solved X-ray and NMR protein-ligand structures. Nowadays, molecular docking methods are routinely used for prediction of protein-ligand interactions and to aid in selecting potent molecules as a part of virtual screening of large databases. The improvements and advances in computational capacity in the last decade have allowed for further developments in molecular docking algorithms to address more complicated aspects such as protein flexibility. The effects of incorporation of active site water molecules and implicit or explicit solvation of the binding site are other relevant issues to be addressed in the docking procedures. Using the right docking algorithm at the right stage of virtual screening is most important. We report a staged study to address the effects of various aspects of protein flexibility and inclusion of active site water molecules on docking effectiveness to retrieve (and to be able to predict) correct ligand poses and to rank docked ligands in relation to their biological activity, for CHK1, ERK2, LpxC and UPA. We generated multiple conformers for the ligand, and compared different docking algorithms that use a variety of approaches to protein flexibility, including rigid receptor, soft receptor, flexible side chains, induced-fit, and multiple structure algorithms. Docking accuracy varied from 1 to 84%, demonstrating that the choice of method is important.

• Water PMF for predicting the properties of water molecules in protein binding site
Zheng, Mingyue and Li, Yanlian and Xiong, Bing and Jiang, Hualiang and Shen, Jingkang
Journal of computational chemistry, 2013, 34(7), 583-592
PMID: 23114863     doi: 10.1002/jcc.23170

Water is an important component in living systems and deserves better understanding in chemistry and biology. However, due to the difficulty of investigating the water functions in protein structures, it is usually ignored in computational modeling, especially in the field of computer-aided drug design. Here, using the potential of mean forces (PMFs) approach, we constructed a water PMF (wPMF) based on 3946 non-redundant high resolution crystal structures. The extracted wPMF potential was first used to investigate the structure pattern of water and analyze the residue hydrophilicity. Then, the relationship between wPMF score and the B factor value of crystal waters was studied. It was found that wPMF agrees well with some previously reported experimental observations. In addition, the wPMF score was also tested in parallel with 3D-RISM to measure the ability of retrieving experimentally observed waters, and showed comparable performance but with much less computational cost. In the end, we proposed a grid-based clustering scheme together with a distance weighted wPMF score to further extend wPMF to predict the potential hydration sites of protein structure. From the test, this approach can predict the hydration site at the accuracy about 80% when the calculated score lower than -4.0. It also allows the assessment of whether or not a given water molecule should be targeted for displacement in ligand design. Overall, the wPMF presented here provides an optional solution to many water related computational modeling problems, some of which can be highly valuable as part of a rational drug design strategy.

• DOLINA - Docking Based on a Local Induced-Fit Algorithm: Application toward Small-Molecule Binding to Nuclear Receptors.
Smiesko, Martin
Journal of chemical information and modeling, 2013, 53(6), 1415-1423
PMID: 23725336     doi: 10.1021/ci400098y

Docking algorithms allowing for ligand and - to various extent - also protein flexibility are nowadays replacing techniques based on rigid protocols. The algorithm implemented in the Dolina software relies on pharmacophore matching for generating potential ligand poses and treats associated local induced-fit changes by combinatorial rearrangement of side-chains lining the binding site. In Dolina, ligand flexibility is not treated internally, instead a pool of low-energy conformers identified in a conformational search is screened for extended binding-pose candidates. Grouping rearranged residues in sterically independent families and side-chain conformer clustering are employed to achieve efficient use of the computational resources along with a good accuracy of the generated poses. Dolina was applied toward docking of small-molecule ligands to three different nuclear receptor ligand binding domains for which in total 18 high-resolution crystal structures were used as reference. The selected nuclear receptors feature a deeply buried ligand-binding site where local induced-fit is to be expected, particularly for receptor antagonists. For each receptor, a crystal structure with a cocrystallized small steroid ligand (template) was chosen as a target system, to which several synthetic ligands of different sizes were docked. Poses within an RMSD of 2.0\AA} from the crystal reference pose were generated in 91% of the cases. In 28%, the pose with the lowest RMSD to the reference pose was ranked as the top one, and in 76% it was ranked among the top five poses. Detailed descriptions of the docking algorithm and observed results are included. Dolina is available free of charge for academic institutions.

• Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 - A Public Library of Challenging Docking Benchmark Sets.
Bauer, Matthias R and Ibrahim, Tamer M and Vogel, Simon M and Boeckler, Frank M
Journal of chemical information and modeling, 2013, 53(6), 1447-1462
PMID: 23705874     doi: 10.1021/ci400115b

The application of molecular benchmarking sets helps to assess the actual performance of virtual screening (VS) workflows. To improve the efficiency of structure-based VS approaches, the selection and optimization of various parameters can be guided by benchmarking. With the DEKOIS 2.0 library, we aim to further extend and complement the collection of publicly available decoy sets. Based on BindingDB bioactivity data, we provide 81 new and structurally diverse benchmark sets for a wide variety of different target classes. To ensure a meaningful selection of ligands, we address several issues that can be found in bioactivity data. We have improved our previously introduced DEKOIS methodology with enhanced physicochemical matching, now including the consideration of molecular charges, as well as a more sophisticated elimination of latent actives in the decoy set (LADS). We evaluate the docking performance of Glide, GOLD, and AutoDock Vina with our data sets and highlight existing challenges for VS tools. All DEKOIS 2.0 benchmark sets will be made accessible at http://www.dekois.com .

• Are predicted protein structures of any value for binding site prediction and virtual ligand screening?
Skolnick, Jeffrey and Zhou, Hongyi and Gao, Mu
Current Opinion in Structural Biology VL -, 2013(0 SP - EP - PY - T2 -)
PMID: 23415854     doi: 10.1016/j.sbi.2013.01.009

The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ∼75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.

• Grid-based molecular footprint comparison method for docking and de novo design: Application to HIVgp41
Balius, Trent E and Allen, William J and Mukherjee, Sudipto and Rizzo, Robert C
Journal of computational chemistry, 2013, 34(14), 1226-1240
PMID: 23436713     doi: 10.1002/jcc.23245

Scoring functions are a critically important component of computer-aided screening methods for the identification of lead compounds during early stages of drug discovery. Here, we present a new multigrid implementation of the footprint similarity (FPS) scoring function that was recently developed in our laboratory which has proven useful for identification of compounds which bind to a protein on a per-residue basis in a way that resembles a known reference. The grid-based FPS method is much faster than its Cartesian-space counterpart, which makes it computationally tractable for on-the-fly docking, virtual screening, or de novo design. In this work, we establish that: (i) relatively few grids can be used to accurately approximate Cartesian space footprint similarity, (ii) the method yields improved success over the standard DOCK energy function for pose identification across a large test set of experimental co-crystal structures, for crossdocking, and for database enrichment, and (iii) grid-based FPS scoring can be used to tailor construction of new molecules to have specific properties, as demonstrated in a series of test cases targeting the viral protein HIVgp41. The method is available in the program DOCK6.

• Systematic and efficient side chain optimization for molecular docking using a cheapest-path procedure
Schumann, Marcel and Armen, Roger S
Journal of computational chemistry, 2013, 34(14), 1258-1269
PMID: 23420703     doi: 10.1002/jcc.23251

Molecular docking of small-molecules is an important procedure for computer-aided drug design. Modeling receptor side chain flexibility is often important or even crucial, as it allows the receptor to adopt new conformations as induced by ligand binding. However, the accurate and efficient incorporation of receptor side chain flexibility has proven to be a challenge due to the huge computational complexity required to adequately address this problem. Here we describe a new docking approach with a very fast, graph-based optimization algorithm for assignment of the near-optimal set of residue rotamers. We extensively validate our approach using the 40 DUD target benchmarks commonly used to assess virtual screening performance and demonstrate a large improvement using the developed side chain optimization over rigid receptor docking (average ROC AUC of 0.693 vs. 0.623). Compared to numerous benchmarks, the overall performance is better than nearly all other commonly used procedures. Furthermore, we provide a detailed analysis of the level of receptor flexibility observed in docking results for different classes of residues and elucidate potential avenues for further improvement.

• Docking-Based Virtual Screening of Covalently Binding Ligands: An Orthogonal Lead Discovery Approach
Schröder, Jörg and Klinger, Anette and Oellien, Frank and Marhofer, Richard J and Duszenko, Michael and Selzer, Paul M
Journal of medicinal chemistry, 2013, 56(4), 1478-1490
PMID: 23350811

In pharmaceutical industry, lead discovery strategies and screening collections have been predominantly tailored to discover compounds that modulate target proteins through noncovalent interactions. Conversely, covalent linkage formation is an important mechanism for a quantity of successful drugs in the market, which are discovered in most cases by hindsight instead of systematical design. In this article, the implementation of a docking-based virtual screening workflow for the retrieval of covalent binders is presented considering human cathepsin K as a test case. By use of the docking conditions that led to the best enrichment of known actives, 44 candidate compounds with unknown activity on cathepsin K were finally selected for experimental evaluation. The most potent inhibitor, 4-(N-phenylanilino)-6-pyrrolidin-1-yl-1,3,5-triazine-2-carbonitrile (CP243522), showed a K(i) of 21 nM and was confirmed to have a covalent reversible mechanism of inhibition. The presented approach will have great potential in cases where covalent inhibition is the desired drug discovery strategy.

• AutoMap: A tool for analyzing protein-ligand recognition using multiple ligand binding modes.
Agostino, Mark and Mancera, Ricardo L and Ramsland, Paul A and Yuriev, Elizabeth
Journal of molecular graphics & modelling, 2013, 40C, 80-90
PMID: 23376613     doi: 10.1016/j.jmgm.2013.01.001

Prediction of the protein residues most likely to be involved in ligand recognition is of substantial value in structure-based drug design. Considering multiple ligand binding modes is of potential relevance to studying ligand recognition, but is generally ignored by currently available techniques. We have previously presented the site mapping technique, which considers multiple ligand binding modes in its analysis of protein-ligand recognition. AutoMap is a partially automated implementation of our previously developed site mapping procedure. It consists of a series of Perl scripts that utilize the output of molecular docking to generate "site maps" of a protein binding site. AutoMap determines the hydrogen bonding and van der Waals interactions taking place between a target protein and each pose of a ligand ensemble. It tallies these interactions according to the protein residues with which they occur, then normalizes the tallies and maps these to the surface of the protein. The residues involved in interactions are selected according to specific cutoffs. The procedure has been demonstrated to perform well in studying carbohydrate-protein and peptide-antibody recognition. An automated procedure to optimize cutoff selection is demonstrated to rapidly identify the appropriate cutoffs for these previously studied systems. The prediction of key ligand binding residues is compared between AutoMap using automatically optimized cutoffs, AutoMap using a previously selected cutoff, the top ranked pose from docking and the predictions supplied by FTMap. AutoMap using automatically optimized cutoffs is demonstrated to provide improved predictions, compared to other methods, in a set of immunologically relevant test cases. The automated implementation of the site mapping technique provides the opportunity for rapid optimization and deployment of the technique for investigating a broad range of protein-ligand systems.

• Multiple structures for virtual ligand screening: defining binding site properties-based criteria to optimize the selection of the query.
Ben Nasr, Nesrine and Guillemain, Hélène and Lagarde, Nathalie and Zagury, Jean-François and Montes, Matthieu
Journal of chemical information and modeling, 2013, 53(2), 293-311
PMID: 23312043

Virtual ligand screening is an integral part of the modern drug discovery process. Traditional ligand-based, virtual screening approaches are fast but require a set of structurally diverse ligands known to bind to the target. Traditional structure-based approaches require high-resolution target protein structures and are computationally demanding. In contrast, the recently developed threading/structure-based FINDSITE-based approaches have the advantage that they are as fast as traditional ligand-based approaches and yet overcome the limitations of traditional ligand- or structure-based approaches. These new methods can use predicted low-resolution structures and infer the likelihood of a ligand binding to a target by utilizing ligand information excised from the target's remote or close homologous proteins and/or libraries of ligand binding databases. Here, we develop an improved version of FINDSITE, FINDSITEfilt, that filters out false positive ligands in threading identified templates by a better binding site detection procedure that includes information about the binding site amino acid similarity. We then combine FINDSITEfilt with FINDSITEX that uses publicly available binding databases ChEMBL and DrugBank for virtual ligand screening. The combined approach, FINDSITEcomb, is compared to two traditional docking methods, AUTODOCK Vina and DOCK 6, on the DUD benchmark set. It is shown to be significantly better in terms of enrichment factor, dependence on target structure quality, and speed. FINDSITEcomb is then tested for virtual ligand screening on a large set of 3576 generic targets from the DrugBank database as well as a set of 168 Human GPCRs. Excluding close homologues, FINDSITEcomb gives an average enrichment factor of 52.1 for generic targets and 22.3 for GPCRs within the top 1% of the screened compound library. Around 65% of the targets have better than random enrichment factors. The performance is insensitive to target structure quality, as long as it has a TM-score ≥ 0.4 to native. Thus, FINDSITEcomb makes the screening of millions of compounds across entire proteomes feasible. The FINDSITEcomb web service is freely available for academic users at http://cssb.biology.gatech.edu/skolnick/webservice/FINDSITE-COMB/index.html

• CovalentDock: Automated covalent docking with parameterized covalent linkage energy estimation and molecular geometry constraints
Ouyang, Xuchang and Zhou, Shuo and Su, Chinh Tran To and Ge, Zemei and Li, Runtao and Kwoh, Chee Keong
Journal of computational chemistry, 2013, 34(4), 326-336
PMID: 23034731     doi: 10.1002/jcc.23136

Covalent linkage formation is a very important mechanism for many covalent drugs to work. However, partly due to the limitations of proper computational tools for covalent docking, most covalent drugs are not discovered systematically. In this article, we present a new covalent docking package, the CovalentDock, built on the top of the source code of Autodock. We developed an empirical model of free energy change estimation for covalent linkage formation, which is compatible with existing scoring functions used in docking, while handling the molecular geometry constrains of the covalent linkage with special atom types and directional grid maps. Integrated preparation scripts are also written for the automation of the whole covalent docking workflow. The result tested on existing crystal structures with covalent linkage shows that CovalentDock can reproduce the native covalent complexes with significant improved accuracy when compared with the default covalent docking method in Autodock. Experiments also suggest that CovalentDock is capable of covalent virtual screening with satisfactory enrichment performance. In addition, the investigation on the results also shows that the chirality and target selectivity along with the molecular geometry constrains are well preserved by CovalentDock, showing great capability of this method in the application for covalent drug discovery.

• Consensus Docking: Improving the Reliability of Docking in a Virtual Screening Context
Houston, Douglas R and Walkinshaw, Malcolm D
Journal of chemical information and modeling, 2013, 53(2), 384-390
PMID: 23351099

Structure-based virtual screening relies on scoring the predicted binding modes of compounds docked into the target. Because the accuracy of this scoring relies on the accuracy of the docking, methods that increase docking accuracy are valuable. Here, we present a relatively straightforward method for improving the probability of identifying accurately docked poses. The method is similar in concept to consensus scoring schemes, which have been shown to increase ranking power and thus hit rates, but combines information about predicted binding modes rather than predicted binding affinities. The pose prediction success rate of each docking program alone was found in this trial to be 55% for Autodock, 58% for DOCK, and 64% for Vina. By using more than one docking program to predict the binding pose, correct poses were identified in 82% or more of cases, a significant improvement. In a virtual screen, these more reliably posed compounds can be preferentially advanced to subsequent scoring stages to improve hit rates. Consensus docking can be easily introduced into established structure-based virtual screening methodologies.

• Use of Experimental Design To Optimize Docking Performance: The Case of LiGenDock, the Docking Module of Ligen, a New De Novo Design Program
Beato, Claudia and Beccari, Andrea R and Cavazzoni, Carlo and Lorenzi, Simone and Costantino, Gabriele
Journal of chemical information and modeling, 2013, 53(6), 1503-1517
PMID: 23590204     doi: 10.1021/ci400079k

On route toward a novel de novo design program, called LiGen, we developed a docking program, LiGenDock, based on pharmacophore models of binding sites, including a non-enumerative docking algorithm. In this paper, we present the functionalities of LiGenDock and its accompanying module LiGenPocket, aimed at the binding site analysis and structure-based pharmacophore definition. We also report the optimization procedure we have carried out to improve the cognate docking and virtual screening performance of LiGenDock. In particular, we applied the design of experiments (DoE) methodology to screen the set of user-adjustable parameters to identify those having the largest influence on the accuracy of the results (which ensure the best performance in pose prediction and in virtual screening approaches) and then to choose their optimal values. The results are also compared with those obtained by two popular docking programs, namely, Glide and AutoDock for pose prediction, and Glide and DOCK6 for Virtual Screening.

• Identifying ligand binding sites and poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics.
Wang, Kai and Chodera, John D and Yang, Yanzhi and Shirts, Michael R
Journal of computer-aided molecular design, 2013, 27(12), 989-1007
PMID: 24297454     doi: 10.1007/s10822-013-9689-8

We present a method to identify small molecule ligand binding sites and poses within a given protein crystal structure using GPU-accelerated Hamiltonian replica exchange molecular dynamics simulations. The Hamiltonians used vary from the physical end state of protein interacting with the ligand to an unphysical end state where the ligand does not interact with the protein. As replicas explore the space of Hamiltonians interpolating between these states, the ligand can rapidly escape local minima and explore potential binding sites. Geometric restraints keep the ligands from leaving the vicinity of the protein and an alchemical pathway designed to increase phase space overlap between intermediates ensures good mixing. Because of the rigorous statistical mechanical nature of the Hamiltonian exchange framework, we can also extract binding free energy estimates for all putative binding sites. We present results of this methodology applied to the T4 lysozyme L99A model system for three known ligands and one non-binder as a control, using an implicit solvent. We find that our methodology identifies known crystallographic binding sites consistently and accurately for the small number of ligands considered here and gives free energies consistent with experiment. We are also able to analyze the contribution of individual binding sites to the overall binding affinity. Our methodology points to near term potential applications in early-stage structure-guided drug discovery.

• Peptide docking and structure-based characterization of peptide binding: from knowledge to know-how.
London, Nir and Raveh, Barak and Schueler-Furman, Ora
Current opinion in structural biology, 2013, 23(6), 894-902
PMID: 24138780     doi: 10.1016/j.sbi.2013.07.006

Peptide-mediated interactions are gaining increased attention due to their predominant roles in the many regulatory processes that involve dynamic interactions between proteins. The structures of such interactions provide an excellent starting point for their characterization and manipulation, and can provide leads for targeted inhibitor design. The relatively few experimentally determined structures of peptide-protein complexes can be complemented with an outburst of modeling approaches that have been introduced in recent years, with increasing accuracy and applicability to ever more systems. We review different methods to address the considerable challenges in modeling the binding of a short yet highly flexible peptide to its partner. These methods apply an array of sampling strategies and draw from a recent amassing of knowledge about the biophysical nature of peptide-protein interactions. We elaborate on applications of these structure-based approaches and in particular on the characterization of peptide binding specificity to different peptide-binding domains and enzymes. Such applications can identify new biological targets and thus complement our current view of protein-protein interactions in living organisms. Accurate peptide-protein docking is of particular importance in the light of increased appreciation of the crucial functional roles of disordered regions and the many linear binding motifs embedded within.

• Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines
Zhang, Xiaohua and Wong, Sergio E and Lightstone, Felice C
Journal of computational chemistry, 2013, 34(11), 915-927
PMID: 23345155     doi: 10.1002/jcc.23214

A mixed parallel scheme that combines message passing interface (MPI) and multithreading was implemented in the AutoDock Vina molecular docking program. The resulting program, named VinaLC, was tested on the petascale high performance computing (HPC) machines at Lawrence Livermore National Laboratory. To exploit the typical cluster-type supercomputers, thousands of docking calculations were dispatched by the master process to run simultaneously on thousands of slave processes, where each docking calculation takes one slave process on one node, and within the node each docking calculation runs via multithreading on multiple CPU cores and shared memory. Input and output of the program and the data handling within the program were carefully designed to deal with large databases and ultimately achieve HPC on a large number of CPU cores. Parallel performance analysis of the VinaLC program shows that the code scales up to more than 15K CPUs with a very low overhead cost of 3.94%. One million flexible compound docking calculations took only 1.4 h to finish on about 15K CPUs. The docking accuracy of VinaLC has been validated against the DUD data set by the re-docking of X-ray ligands and an enrichment study, 64.4% of the top scoring poses have RMSD values under 2.0\AA}. The program has been demonstrated to have good enrichment performance on 70% of the targets in the DUD data set. An analysis of the enrichment factors calculated at various percentages of the screening database indicates VinaLC has very good early recovery of actives.

• Characterizing Binding of Small Molecules. II. Evaluating the Potency of Small Molecules to Combat Resistance Based on Docking Structures
Ding, Bo and Li, Nan and Wang, Wei
Journal of chemical information and modeling, 2013
PMID: 23570305

Drug resistance severely erodes the efficacy of therapeutic treatments for many diseases. Assessing the potency of a drug lead to combat resistance is no doubt critical for designing new drugs or new therapeutic combinations. Virtual screening is often the first step in drug discovery and a challenging problem is to accurately predict the resistant profile of an inhibitor based on the docking structures. Using a well studied system HIV-1 protease, we have illustrated the success of a computational method called MIEC-SVM on tackling this problem. We computed molecular interaction energy components (MIECs) between the ligand and the protease residues to characterize the docking poses, which were input to support vector machine (SVM) to distinguish resistant from nonresistant mutants. More importantly, the method is able to predict resistant profiles for new drugs based on the docking structures as indicated by its satisfactory performance in leave-one-drug-out and leave-drug/mutants-out tests. Therefore, the MIEC-SVM method can also facilitate designing effective therapeutic combinations by combining drugs with complementary resistant profiles.

• Accounting for Conformational Variability in Protein-Ligand Docking with NMR-Guided Rescoring.
Skj{\ae}rven, Lars and Codutti, Luca and Angelini, Andrea and Grimaldi, Manuela and Latek, Dorota and Monecke, Peter and Dreyer, Matthias K and Carlomagno, Teresa
Journal of the American Chemical Society, 2013, 135(15), 5819-5827
PMID: 23565800     doi: 10.1021/ja4007468

A key component to success in structure-based drug design is reliable information on protein-ligand interactions. Recent development in NMR techniques has accelerated this process by overcoming some of the limitations of X-ray crystallography and computational protein-ligand docking. In this work we present a new scoring protocol based on NMR-derived interligand INPHARMA NOEs to guide the selection of computationally generated docking modes. We demonstrate the performance in a range of scenarios, encompassing traditionally difficult cases such as docking to homology models and ligand dependent domain rearrangements. Ambiguities associated with sparse experimental information are lifted by searching a consensus solution based on simultaneously fitting multiple ligand pairs. This study provides a previously unexplored integration between molecular modeling and experimental data, in which interligand NOEs represent the key element in the rescoring algorithm. The presented protocol should be widely applicable for protein-ligand docking also in a different context from drug design and highlights the important role of NMR-based approaches to describe intermolecular ligand-receptor interactions.

• Automated docking with protein flexibility in the design of femtomolar "click chemistry" inhibitors of acetylcholinesterase.
Morris, Garrett M and Green, Luke G and Radić, Zoran and Taylor, Palmer and Sharpless, K Barry and Olson, Arthur J and Grynszpan, Flavio
Journal of chemical information and modeling, 2013, 53(4), 898-906
PMID: 23451944     doi: 10.1021/ci300545a

The use of computer-aided structure-based drug design prior to synthesis has proven to be generally valuable in suggesting improved binding analogues of existing ligands. (1) Here we describe the application of the program AutoDock (2) to the design of a focused library that was used in the "click chemistry in-situ" generation of the most potent noncovalent inhibitor of the native enzyme acetylcholinesterase (AChE) yet developed (Kd

• Optimization of molecular docking scores with support vector rank regression
Wang, Wei and He, Wanlin and Zhou, Xi and Chen, Xin
Proteins, 2013, n/a-n/a
PMID: 23504920     doi: 10.1002/prot.24282

This work introduces the support vector rank regression (SVRR) algorithm for the optimization of molecular docking scores. Seven original docking scores reported by two docking software were integrated by the SVRR algorithm. The resulting SVRR scores showed an average of 12.1% improvement (59.5% to 66.7%) in binding conformation prediction tests to rank the correctly computed conformation in the first place, along with 16.7% RMSD improvement (2.5414\AA} vs. 2.1162\AA}) for the top ranked conformations. In compound library screening tests, an average of 46.3% improvement (18.2% to 26.6%) was also observed to rank the correct ligand in the first place. Furthermore, it was shown that SVRR scores trained with different example datasets, using different training strategies, all exhibited exceedingly consistent accuracies, suggesting that the SVRR algorithm is highly robust and generalizable. In contrast, using the same training datasets, traditional support vector classification and regression algorithms failed to comparably improve the accuracy of library screening and conformation prediction. These results suggested that, with additional features to indicate the comparative fitness between computed binding conformations, the SVRR algorithm holds the potential to create a new category of more accurate integrative docking scores. Proteins 2013.

• An Automated Docking Protocol for hERG Channel Blockers
Di Martino, Giovanni Paolo and Masetti, Matteo and Ceccarini, Luisa and Cavalli, Andrea and Recanatini, Maurizio
Journal of chemical information and modeling, 2013, 53(1), 159-175
PMID: 23259741

A docking protocol aimed at obtaining a consistent qualitative and quantitative picture of binding for a series of hERG channel blockers is presented. To overcome the limitations experienced by standard procedures when docking blockers at hERG binding site, we designed a strategy that explicitly takes into account the conformations of the channel, their possible intrinsic symmetry, and the role played by the configurational entropy of ligands. The protocol was developed on a series of congeneric sertindole derivatives, allowing us to satisfactorily explain the structure-activity relationships for this set of blockers. In addition, we show that the performance of structure-based models relying on multiple-receptor conformations statistically increases when the protein conformations are chosen in such a way as to capture relevant structural features at the binding site. The protocol was then successfully applied to a series of structurally unrelated blockers.

• S4MPLE - Sampler For Multiple Protein-Ligand Entities: Simultaneous Docking of Several Entities
Hoffer, Laurent and Horvath, Dragos
Journal of chemical information and modeling, 2013, 53(1), 88-102
PMID: 23215156

S4MPLE is a conformational sampling tool, based on a hybrid genetic algorithm, simulating one (conformer enumeration) or more molecules (docking). Energy calculations are based on the AMBER force field [ Cornell et al. J. Am. Chem. Soc. 1995 , 117 , 5179 . ] for biological macromolecules and its generalized version GAFF [ Wang et al. J. Comput. Chem. 2004 , 25 , 1157 . ] for ligands. This paper describes more advanced, specific applications of S4MPLE to problems more complex than classical redocking of drug-like compounds [ Hoffer et al. J. Mol. Graphics Modell. 2012 , submitted for publication. ]. Here, simultaneous docking of multiple entities is addressed in two different important contexts. First, simultaneous docking of two fragment-like ligands was attempted, as such ternary complexes are the basis of fragment-based drug design by linkage of the independent binders. As a preliminary, the capacity of S4MPLE to dock fragment-like compounds has been assessed, since this class of small probes used in fragment-based drug design covers a different chemical space than drug-like molecules. Herein reported success rates from fragments redocking are as good as classical benchmarking results on drug-like compounds (Astex Diverse Set [ Hartshorn et al. J. Med. Chem. 2007 , 50 , 726 . ]). Then, S4MPLE is successfully challenged to predict locations of fragments involved in ternary complexes by means of multientity docking. Second, the key problem of predicting water-mediated interaction is addressed by considering explicit water molecules as additional entities to be docked in the presence of the "main" ligand. Blind prediction of solvent molecule positions, reproducing relevant ligand-water-site mediated interactions, is achieved in 76% cases over saved poses. S4MPLE was also successful to predict crystallographic water displacement by a therefore tailored functional group in the optimized ligand. However, water localization is an extremely delicate issue in terms of weighing of electrostatic and desolvation terms and also introduces a significant increase of required sampling efforts. Yet, the herein reported results - not making use of massively parallel deployment of the software - are very encouraging.

• Incorporating Backbone Flexibility in MedusaDock Improves Ligand-Binding Pose Prediction in the CSAR2011 Docking Benchmark
Ding, Feng and Dokholyan, Nikolay V
Journal of chemical information and modeling, 2013, 0(0), null
PMID: 23237273

Solution of the structures of ligand-receptor complexes via computational docking is an integral step in many structural modeling efforts as well as in rational drug discovery. A major challenge in ligand-receptor docking is the modeling of both receptor and ligand flexibilities in order to capture receptor conformational changes induced by ligand binding. In the molecular docking suite MedusaDock, both ligand and receptor side chain flexibilities are modeled simultaneously with sets of discrete rotamers, where the ligand rotamer library is generated "on the fly" in a stochastic manner. Here, we introduce backbone flexibility into MedusaDock by implementing ensemble docking in a sequential manner for a set of distinct receptor backbone conformations. We generate corresponding backbone ensembles to capture backbone changes upon binding to different ligands, as observed experimentally. We develop a simple clustering and ranking approach to select the top poses as blind predictions. We applied our method in the CSAR2011 benchmark exercise. In 28 out of 35 cases (80%) where the ligand-receptor complex structures were released, we were able to predict near-native poses (<2.5\AA} RMSD), the highest success rate reported for CSAR2011. This result highlights the importance of modeling receptor backbone flexibility to the accurate docking of ligands to flexible targets. We expect a broad application of our fully flexible docking approach in biological studies as well as in rational drug design.

## 2012

• Application of Drug-perturbed Essential Dynamics/Molecular Dynamics (ED/MD) to Virtual Screening and Rational Drug Design.
Chaudhuri, Rima and Carrillo, Oliver and Laughton, Charles Anthony and Orozco, Modesto
Journal of chemical theory and computation, 2012, 8(7), 2204-2214
doi: 10.1021/ct300223c

We present here the first application of a new algorithm, essential dynamics/molecular dynamics (ED/MD), to the field of small molecule docking. The method uses a previously existing molecular dynamics (MD) ensemble of a protein or protein-drug complex to generate, with a very small computational cost, perturbed ensembles which represent ligand-induced binding site flexibility in a more accurate way than the original trajectory. The use of these perturbed ensembles in a standard docking program leads to superior performance than the same docking procedure using the crystal structure or ensembles obtained from conventional MD simulations as templates. The simplicity and accuracy of the method opens up the possibility of introducing protein flexibility in high-throughput docking experiments.

• A Force Field with Discrete Displaceable Waters and Desolvation Entropy for Hydrated Ligand Docking.
Forli, Stefano and Olson, Arthur J
Journal of medicinal chemistry, 2012, 55(2), 623-638
PMID: 22148468     doi: 10.1021/jm2005145

In modeling ligand-protein interactions, the representation and role of water are of great importance. We introduce a force field and hydration docking method that enables the automated prediction of waters mediating the binding of ligands with target proteins. The method presumes no prior knowledge of the apo or holo protein hydration state and is potentially useful in the process of structure-based drug discovery. The hydration force field accounts for the entropic and enthalpic contributions of discrete waters to ligand binding, improving energy estimation accuracy and docking performance. The force field has been calibrated and validated on a total of 417 complexes (197 training set; 220 test set), then tested in cross-docking experiments, for a total of 1649 ligand-protein complexes evaluated. The method is computationally efficient and was used to model up to 35 waters during docking. The method was implemented and tested using unaltered AutoDock4 with new force field tables.

• Utilizing Experimental Data for Reducing Ensemble Size in Flexible-Protein Docking.
Xu, Mengang and Lill, Markus A
Journal of chemical information and modeling, 2012, 52, 187-198
PMID: 22146074     doi: 10.1021/ci200428t

Efficient and sufficient incorporation of protein flexibility into docking is still a challenging task. Docking to an ensemble of protein structures has proven its utility for docking, but using a large ensemble of structures can reduce the efficiency of docking and can increase the number of false positives in virtual screening. In this paper, we describe the application of our new methodology, Limoc, to generate an ensemble of holo-like protein structures in combination with the relaxed complex scheme (RCS), to virtual screening. We describe different schemes to reduce the ensemble of protein structures to increase efficiency and enrichment quality. Utilizing experimental knowledge about actives for a target protein allows the reduction of ensemble members to a minimum of three protein structures, increasing enrichment quality and efficiency simultaneously.

• Using RosettaLigand for Small Molecule Docking into Comparative Models.
Kaufmann, Kristian W and Meiler, Jens
PloS one, 2012, 7(12), e50769
PMID: 23239984     doi: 10.1371/journal.pone.0050769

Computational small molecule docking into comparative models of proteins is widely used to query protein function and in the development of small molecule therapeutics. We benchmark RosettaLigand docking into comparative models for nine proteins built during CASP8 that contain ligands. We supplement the study with 21 additional protein/ligand complexes to cover a wider space of chemotypes. During a full docking run in 21 of the 30 cases, RosettaLigand successfully found a native-like binding mode among the top ten scoring binding modes. From the benchmark cases we find that careful template selection based on ligand occupancy provides the best chance of success while overall sequence identity between template and target do not appear to improve results. We also find that binding energy normalized by atom number is often less than -0.4 in native-like binding modes.

• Rosetta Ligand docking with flexible XML protocols.
Lemmon, Gordon and Meiler, Jens
Methods in molecular biology (Clifton, N.J.), 2012, 819, 143-155
PMID: 22183535     doi: 10.1007/978-1-61779-465-0_10

RosettaLigand is premiere software for predicting how a protein and a small molecule interact. Benchmark studies demonstrate that 70% of the top scoring RosettaLigand predicted interfaces are within 2{\AA} RMSD from the crystal structure [1]. The latest release of Rosetta ligand software includes many new features, such as (1) docking of multiple ligands simultaneously, (2) representing ligands as fragments for greater flexibility, (3) redesign of the interface during docking, and (4) an XML script based interface that gives the user full control of the ligand docking protocol.

• GalaxyDock: Protein-ligand docking with flexible protein side-chains.
Shin, Woong-Hee and Seok, Chaok
Journal of chemical information and modeling, 2012, 52(12), 3225-3232
PMID: 23198780     doi: 10.1021/ci300342z

An important issue in developing protein-ligand docking methods is how to incorporate receptor flexibility. Consideration of receptor flexibility using an ensemble of pre-compiled receptor conformations or by employing an effectively enlarged binding pocket has been reported to be useful. However, direct consideration of receptor flexibility during energy optimization of the docked conformation has been less popular because of the large increase in computational complexity. In this paper, we present a new docking program called GalaxyDock that accounts for the flexibility of pre-selected receptor side-chains by global optimization of an AutoDock-based energy function trained for flexible side-chain docking. This method was tested on 3 sets of protein-ligand complexes (HIV-PR, LXR$\beta$, cAPK) and a diverse set of 16 proteins that involve side-chain conformational changes upon ligand binding. The cross-docking tests show that the performance of GalaxyDock is higher or comparable to previous flexible docking methods tested on the same sets, increasing the binding conformation prediction accuracy by 10%-60% compared to rigid-receptor docking. This encouraging result suggests that this powerful global energy optimization method may be further extended to incorporate larger magnitudes of receptor flexibility in the future. The program is available at http://galaxy.seoklab.org/softwares/galaxydock.html.

• Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.
Barros, Rodrigo C and Winck, Ana T and Machado, Karina S and Basgalupp, Márcio P and de Carvalho, André Cplf and Ruiz, Duncan D and Norberto de Souza, Osmar
Bmc Bioinformatics, 2012, 13(1), 310
PMID: 23171000     doi: 10.1186/1471-2105-13-310

ABSTRACT: Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

• Current Assessment of Docking into GPCR Crystal Structures and Homology Models: Successes, Challenges, and Guidelines.
Beuming, Thijs and Sherman, Woody
Journal of chemical information and modeling, 2012, 52(12), 3263-3277
PMID: 23121495     doi: 10.1021/ci300411b

The growing availability of novel structures for several G protein-coupled receptors (GPCRs) has provided new opportunities for structure-based drug design of ligands against this important class of targets. Here, we report a systematic analysis of the accuracy of docking small molecules into GPCR structures and homology models using both rigid receptor (Glide SP and Glide XP) and flexible receptor (Induced Fit Docking; IFD) methods. The ability to dock ligands into different structures of the same target (cross-docking) is evaluated for both agonist and inverse agonist structures of the A2A receptor and the $\beta$1- and $\beta$2-adrenergic receptors. In addition, we have produced homology models for the $\beta$1-adrenergic, $\beta$2-adrenergic, D3 dopamine, H1 histamine, M2 muscarine, M3 muscarine, A2A adenosine, S1P1, $\kappa$-opioid, and C-X-C chemokine 4 receptors using multiple templates and investigated the ability of docking to predict the binding mode of ligands in these models. Clear correlations are observed between the docking accuracy and the similarity of the sequence of interest to the template, suggesting regimes in which docking can correctly identify ligand binding modes.

• A Network Approach for Computational Drug Repositioning
Li, Jiao and Lu, Zhiyong
Journal of molecular biology, 2012, 161(2), 83-83
PMID: 7154081     doi: 10.1109/HISB.2012.26

Computational drug repositioning offers promise for discovering new uses of existing drugs, as drug related molecular, chemical, and clinical information has increased over the past decade and become broadly accessible. In this study, we present a new computational approach for identifying potential new indications of an existing drug through its relation to similar drugs in disease-drug-target network. When measuring drug pairwise similarly, we used a bipartite-graph based method which combined similarity of drug compound structures, similarity of target protein profiles, and interaction between target proteins. In evaluation, our method compared favorably to the state of the art, achieving AUC of 0.888. The results indicated that our method is able to identify drug repositioning opportunities by exploring complex relationships in disease-drug-target network.

• Structural insights into the molecular basis of the ligand promiscuity.
Sturm, Noé and Desaphy, Jérémy and Quinn, Ronald J and Rognan, Didier and Kellenberger, Esther
Journal of chemical information and modeling, 2012, 52(9), 2410-2421
PMID: 22920885     doi: 10.1021/ci300196g

Selectivity is a key factor in drug development. In this paper, we questioned the Protein Data Bank to better understand the reasons for the promiscuity of bioactive compounds. We assembled a data set of >1000 pairs of three-dimensional structures of complexes between a "drug-like" ligand (as its physicochemical properties overlap that of approved drugs) and two distinct "druggable" protein targets (as their binding sites are likely to accommodate "drug-like" ligands). Studying the similarity between the ligand-binding sites in the different targets revealed that the lack of selectivity of a ligand can be due (i) to the fact that Nature has created the same binding pocket in different proteins, which do not necessarily have otherwise sequence or fold similarity, or (ii) to specific characteristics of the ligand itself. In particular, we demonstrated that many ligands can adapt to different protein environments by changing their conformation, by using different chemical moieties to anchor to different targets, or by adopting unusual extreme binding modes (e.g., only apolar contact between the ligand and the protein, even though polar groups are present on the ligand or at the protein surface). Lastly, we provided new elements in support to the recent studies which suggest that the promiscuity of a ligand might be inferred from its molecular complexity.

• Multiple ligand docking by Glide: implications for virtual second-site screening.
Vass, Márton and Tarcsay, Akos and Keseru, György M
Journal of computer-aided molecular design, 2012, 26(7), 821-834
PMID: 22639078     doi: 10.1007/s10822-012-9578-6

Performance of Glide was evaluated in a sequential multiple ligand docking paradigm predicting the binding modes of 129 protein-ligand complexes crystallized with clusters of 2-6 cooperative ligands. Three sampling protocols (single precision-SP, extra precision-XP, and SP without scaling ligand atom radii-SP hard) combined with three different scoring functions (GlideScore, Emodel and Glide Energy) were tested. The effects of ligand number, docking order and druglikeness of ligands and closeness of the binding site were investigated. On average 36 % of all structures were reproduced with RMSDs lower than 2\AA}. Correctly docked structures reached 50 % when docking druglike ligands into closed binding sites by the SP hard protocol. Cooperative binding to metabolic and transport proteins can dramatically alter pharmacokinetic parameters of drugs. Analyzing the cytochrome P450 subset the SP hard protocol with Emodel ranking reproduced two-thirds of the structures well. Multiple ligand binding is also exploited by the fragment linking approach in lead discovery settings. The HSP90 subset from real life fragment optimization programs revealed that Glide is able to reproduce the positions of multiple bound fragments if conserved water molecules are considered. These case studies assess the utility of Glide in sequential multiple docking applications.

• Consensus Induced Fit Docking (cIFD): methodology, validation, and application to the discovery of novel Crm1 inhibitors.
Kalid, Ori and Toledo Warshaviak, Dora and Shechter, Sharon and Sherman, Woody and Shacham, Sharon
Journal of computer-aided molecular design, 2012, 26(11), 1217-1228
PMID: 23053738     doi: 10.1007/s10822-012-9611-9

We present the Consensus Induced Fit Docking (cIFD) approach for adapting a protein binding site to accommodate multiple diverse ligands for virtual screening. This novel approach results in a single binding site structure that can bind diverse chemotypes and is thus highly useful for efficient structure-based virtual screening. We first describe the cIFD method and its validation on three targets that were previously shown to be challenging for docking programs (COX-2, estrogen receptor, and HIV reverse transcriptase). We then demonstrate the application of cIFD to the challenging discovery of irreversible Crm1 inhibitors. We report the identification of 33 novel Crm1 inhibitors, which resulted from the testing of 402 purchased compounds selected from a screening set containing 261,680 compounds. This corresponds to a hit rate of 8.2 %. The novel Crm1 inhibitors reveal diverse chemical structures, validating the utility of the cIFD method in a real-world drug discovery project. This approach offers a pragmatic way to implicitly account for protein flexibility without the additional computational costs of ensemble docking or including full protein flexibility during virtual screening.

• Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening?
Grigoryan, Arsen V and Wang, Hong and Cardozo, Timothy J
PloS one, 2012, 7(10), e46532
doi: 10.1371/journal.pone.0046532

The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of ...

• Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening?
Grigoryan, Arsen V and Wang, Hong and Cardozo, Timothy J
PloS one, 2012, 7(10), e46532
doi: 10.1371/journal.pone.0046532

The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of ...

• PRL-dock: Protein-ligand docking based on hydrogen bond matching and probabilistic relaxation labeling.
Wu, Meng-Yun and Dai, Dao-Qing and Yan, Hong
Proteins, 2012, 80(9), 2137-2153
PMID: 22544808     doi: 10.1002/prot.24104

Protein-ligand docking is widely applied to structure-based virtual screening for drug discovery. This paper presents a novel docking technique, PRL-Dock, based on hydrogen bond matching and probabilistic relaxation labeling. It deals with multiple hydrogen bonds and can match many acceptors and donors simultaneously. In the matching process, the initial probability of matching an acceptor with a donor is estimated by an efficient scoring function and the compatibility coefficients are assigned according to the coexisting condition of two hydrogen bonds. After hydrogen bond matching, the geometric complementarity of the interacting donor and acceptor sites is taken into account for displacement of the ligand. It is reduced to an optimization problem to calculate the optimal translation and rotation matrixes that minimize the root mean square deviation between two sets of points, which can be solved using the Kabsch algorithm. In addition to the van der Waals interaction, the contribution of intermolecular hydrogen bonds in a complex is included in the scoring function to evaluate the docking quality. A modified Lennard-Jones 12-6 dispersion-repulsion term is used to estimate the van der Waals interaction to make the scoring function fairly 'soft' so that ligands are not heavily penalized for small errors in the binding geometry. The calculation of this scoring function is very convenient. The evaluation is carried out on rigid complexes and 93 flexible ones where there is at least one intermolecular hydrogen bond. The experiment results of docking accuracy and prediction of binding affinity demonstrate that the proposed method is highly effective. Proteins 2012.

• Ligand Aligning Method for Molecular Docking: Alignment of Property-Weighted Vectors.
Joung, Jong Young and Nam, Ky-Youb and Cho, Kwang-Hwi and No, Kyoung Tai
Journal of chemical information and modeling, 2012, 52(4), 984-995
PMID: 22471323     doi: 10.1021/ci200501p

To reduce searching effort in conformational space of ligand docking positions, we propose an algorithm that generates initial binding positions of the ligand in a target protein, based on the property-weighted vector (P-weiV), the three-dimensional orthogonal vector determined by the molecular property of hydration-free energy density. The alignment of individual P-weiVs calculated separately for the ligand and the protein gives the initial orientation of a given ligand conformation relative to an active site; these initial orientations are then ranked by simple energy functions, including solvation. Because we are using three-dimensional orthogonal vectors to be aligned, only four orientations of ligand positions are possible for each ligand conformation, which reduces the search space dramatically. We found that the performance of P-weiV compared favorably to the use of principle moment of inertia (PMI) as implemented in LigandFit when we tested the abilities of the two approaches to correctly predict 205 protein-ligand complex data sets from the PDBBind database. P-weiV correctly predicted the alignment of ligands (within rmsd of 2.5\AA}) with 57.6% reliability (118/205) for the top 10 ranked conformations and with 74.1% reliability (152/205) for the top 50 ranked conformations of Catalyst-generated conformers, as compared to 22.9% (47/205) and 31.2% (64/205), respectively, in the case of PMI with the same conformer set.

• 3D-RISM-Dock: a new fragment-based drug design protocol
Nikolić, Dragan and Blinov, Nikolay and Wishart, David S. and Kovalenko, Andriy
Journal of chemical theory and computation, 2012, 8(9), 3356-3372
doi: 10.1021/ct300257v

We explore a new approach in the rational design of specificity in molecular recognition of small molecules based on statistical-mechanical integral equation theory of molecular liquids in the form of the three-dimensional reference interaction site model with the Kovalenko-Hirata closure (3D-RISM-KH). The numerically stable iterative solution of conventional 3D-RISM equations includes the fragmental decomposition of flexible ligands, which are treated as distinct species in solvent mixtures of arbitrary complexity. The computed density functions for solution (including ligand) molecules are obtained as a set of discrete spatial grids that uniquely describe the continuous solvent-site distribution around the protein solute. Potentials of mean force derived from these distributions define the scoring function interfaced with the AutoDock program for an automated ranking of docked conformations. As a case study in terms of solvent composition, we analyze cooperative interactions encountered in the binding o...

• Fast force field-based optimization of protein-ligand complexes with graphics processor.
Heinzerling, Lennart and Klein, Robert and Rarey, Matthias
Journal of computational chemistry, 2012, 33(32), 2554-2565
PMID: 22911510     doi: 10.1002/jcc.23094

Usually based on molecular mechanics force fields, the post-optimization of ligand poses is typically the most time-consuming step in protein-ligand docking procedures. In return, it bears the potential to overcome the limitations of discretized conformation models. Because of the parallel nature of the problem, recent graphics processing units (GPUs) can be applied to address this dilemma. We present a novel algorithmic approach for parallelizing and thus massively speeding up protein-ligand complex optimizations with GPUs. The method, customized to pose-optimization, performs at least 100 times faster than widely used CPU-based optimization tools. An improvement in Root-Mean-Square Distance (RMSD) compared to the original docking pose of up to 42% can be achieved.

• Computational Approach for Fast Screening of Small Molecular Candidates To Inhibit Crystallization in Amorphous Drugs.
Pajula, Katja and Lehto, Vesa-Pekka and Ketolainen, Jarkko and Korhonen, Ossi
Molecular Pharmaceutics, 2012, 9(10), 2844-2855
PMID: 22867030     doi: 10.1021/mp300135h

The applicability of the computational docking approach was investigated to create a novel method for quick additive screening to inhibit the crystallization taking place in amorphous drugs. Surface energy and attachment energy were utilized to recognize the morphologically most important crystal faces. The surfaces (100), (001), and (010) were identified as target faces, and the estimated free energies of binding of additives on these surfaces were computationally determined. The molecule of the crystallizing compound was included in the group of the modeled additives as the reference and for the validation of the approach. Additives having a lower estimated free energy of binding than the reference molecule itself were considered as potential crystallization inhibitors. Salicylamide, salicylic acid, and sulfanilamide with computationally prescreened additives were melt-quenched, and the nucleation and crystal growth rates were subsequently monitored by polarized light microscopy. As a result, computationally screened additives decelerated the nucleation and crystal growth rates of the studied drugs while the pure drugs crystallized too fast to be measured. The use of a computational approach enabled fast and cost-effective additive selection to retard nucleation and crystal growth, thus facilitating the production of amorphous binary small molecular compounds with stabilized disordered structures.

• Molecular Docking Using the Molecular Lipophilicity Potential as Hydrophobic Descriptor: Impact on GOLD Docking Performance.
Nurisso, Alessandra and Bravo, Juan and Carrupt, Pierre-Alain and Daina, Antoine
Journal of chemical information and modeling, 2012, 52(5), 319-1327
PMID: 22462609     doi: 10.1021/ci200515g

GOLD is a molecular docking software widely used in drug design. In the initial steps of docking, it creates a list of hydrophobic fitting points inside protein cavities that steer the positioning of ligand hydrophobic moieties. These points are generated based on the Lennard-Jones potential between a carbon probe and each atom of the residues delimitating the binding site. To thoroughly describe hydrophobic regions in protein pockets and properly guide ligand hydrophobic moieties toward favorable areas, an in-house tool, the MLP filter, was developed and herein applied. This strategy only retains GOLD hydrophobic fitting points that match the rigorous definition of hydrophobicity given by the molecular lipophilicity potential (MLP), a molecular interaction field that relies on an atomic fragmental system based on 1-octanol/water experimental partition coefficients (log P(oct)). MLP computations in the binding sites of crystallographic protein structures revealed that a significant number of points considered hydrophobic by GOLD were actually polar according to the MLP definition of hydrophobicity. To examine the impact of this new tool, ligand-protein complexes from the Astex Diverse Set and the PDB bind core database were redocked with and without the use of the MLP filter. Reliable docking results were obtained by using the MLP filter that increased the quality of docking in nonpolar cavities and outperformed the standard GOLD docking approach.

• Potential and Limitations of Ensemble Docking.
Korb, Oliver and Olsson, Tjelvar S G and Bowden, Simon J and Hall, Richard J and Verdonk, Marcel L and Liebeschuetz, John W and Cole, Jason C
Journal of chemical information and modeling, 2012, 52(5), 1262-1274
PMID: 22482774     doi: 10.1021/ci2005934

A major problem in structure-based virtual screening applications is the appropriate selection of a single or even multiple protein structures to be used in the virtual screening process. A priori it is unknown which protein structure(s) will perform best in a virtual screening experiment. We investigated the performance of ensemble docking, as a function of ensemble size, for eight targets of pharmaceutical interest. Starting from single protein structure docking results, for each ensemble size up to 500 000 combinations of protein structures were generated, and, for each ensemble, pose prediction and virtual screening results were derived. Comparison of single to multiple protein structure results suggests improvements when looking at the performance of the worst and the average over all single protein structures to the performance of the worst and average over all protein ensembles of size two or greater, respectively. We identified several key factors affecting ensemble docking performance, including the sampling accuracy of the docking algorithm, the choice of the scoring function, and the similarity of database ligands to the cocrystallized ligands of ligand-bound protein structures in an ensemble. Due to these factors, the prospective selection of optimum ensembles is a challenging task, shown by a reassessment of published ensemble selection protocols.

• Rigid Body Energy Minimization on Manifolds for Molecular Docking
Mirzaei, Hanieh and Beglov, Dmitri and Paschalidis, Ioannis Ch and Vajda, Sandor and Vakili, Pirooz and Kozakov, Dima
Journal of chemical theory and computation, 2012, 8(11), 4374-4380
doi: 10.1021/ct300272j

Virtually all docking methods include some local continuous minimization of an energy/scoring function in order to remove steric clashes and obtain more reliable energy values. In this paper, we describe an efficient rigid-body optimization algorithm that, compared to the most widely used algorithms, converges approximately an order of magnitude faster to conformations with equal or slightly lower energy. The space of rigid body transformations is a nonlinear manifold, namely, a space which locally resembles a Euclidean space. We use a canonical parametrization of the manifold, called the exponential parametrization, to map the Euclidean tangent space of the manifold onto the manifold itself. Thus, we locally transform the rigid body optimization to an optimization over a Euclidean space where basic optimization algorithms are applicable. Compared to commonly used methods, this formulation substantially reduces the dimension of the search space. As a result, it requires far fewer costly function and gradi...

• Protein-Ligand Binding Free Energies from Exhaustive Docking.
Purisima, Enrico O and Hogues, Hervé
The journal of physical chemistry. B, 2012, 116(23), 6872-6879
PMID: 22432509     doi: 10.1021/jp212646s

We explore the use of exhaustive docking as an alternative to Monte Carlo and molecular dynamics sampling for the direct integration of the partition function for protein-ligand binding. We enumerate feasible poses for the ligand and calculate the Boltzmann factor contribution of each pose to the partition function. From the partition function, the free energy, enthalpy, and entropy can be derived. All our calculations are done with a continuum solvation model that includes solving the Poisson equation. In contrast to Monte Carlo and molecular dynamics simulations, exhaustive docking avoids (within the limitations of a discrete sampling) the question of "Have we run long enough?" due to its deterministic complete enumeration of states. We tested the method on the T4 lysozyme L99A mutant, which has a nonpolar cavity that can accommodate a number of small molecules. We tested two electrostatic models. Model 1 used a solute dielectric of 2.25 for the complex apoprotein and free ligand and 78.5 for the solvent. Model 2 used a solute dielectric of 2.25 for the complex and apoprotein but 1.0 for the free ligand. For our test set of eight molecules, we obtain a reasonable correlation with a Pearson r(2)

• Numerical Errors and Chaotic Behavior in Docking Simulations.
Feher, Miklos and Williams, Christopher I
Journal of chemical information and modeling, 2012, 52(3), 724-738
PMID: 22379951     doi: 10.1021/ci200598m

This work examines the sensitivity of docking programs to tiny changes in ligand input files. The results show that nearly identical ligand input structures can produce dramatically different top-scoring docked poses. Even changing the atom order in a ligand input file can produce significantly different poses and scores. In well-behaved cases the docking variations are small and follow a normal distribution around a central pose and score, but in many cases the variations are large and reflect wildly different top scores and binding modes. The docking variations are characterized by statistical methods, and the sensitivity of high-throughput and more precise docking methods are compared. The results demonstrate that part of docking variation is due to numerical sensitivity and potentially chaotic effects in current docking algorithms and not solely due to incomplete ligand conformation and pose searching. These results have major implications for the way docking is currently used for pose prediction, ranking, and virtual screening.

• Rapid and Accurate Prediction and Scoring of Water Molecules in Protein Binding Sites
Ross, Gregory A and Morris, Garrett M and Biggin, Philip C
PloS one, 2012, 7(3), e32036
doi: 10.1371/journal.pone.0032036.t006

Water plays a critical role in ligand-protein interactions. However, it is still challenging to predict accurately not only where water molecules prefer to bind, but also which of those water molecules might be displaceable. The latter is often seen as a route to optimizing affinity of potential drug candidates. Using a protocol we call WaterDock, we show that the freely available AutoDock Vina tool can be used to predict accurately the binding sites of water molecules. WaterDock was validated using data from X-ray crystallography, neutron diffraction and molecular dynamics simulations and correctly predicted 97% of the water molecules in the test set. In addition, we combined data-mining, heuristic and machine learning techniques to develop probabilistic water molecule classifiers. When applied to WaterDock predictions in the Astex Diverse Set of protein ligand complexes, we could identify whether a water molecule was conserved or displaced to an accuracy of 75%. A second model predicted whether water molecules were displaced by polar groups or by non-polar groups to an accuracy of 80%. These results should prove useful for anyone wishing to undertake rational design of new compounds where the displacement of water molecules is being considered as a route to improved affinity.

• Flexible protein-ligand docking using the Fleksy protocol.
Wagener, Markus and Vlieg, Jacob de and Nabuurs, Sander B
Journal of computational chemistry, 2012, 33(12), 1215-1217
PMID: 22371008     doi: 10.1002/jcc.22948

Considering protein plasticity is important in accurately predicting the three-dimensional geometry of protein-ligand complexes. Here, we present the first public release of our flexible docking tool Fleksy, which is able to consider both ligand and protein flexibility in the docking process. We describe the workflow and different features of the software and present its performance on two cross-docking benchmark datasets.

• On the Applicability of Elastic Network Normal Modes in Small-Molecule Docking.
Dietzen, Matthias Michael and Hildebrandt, Andreas and Zotenko, Elena and Lengauer, Thomas
Journal of chemical information and modeling, 2012, 52(3), 844-856
PMID: 22320151     doi: 10.1021/ci2004847

Incorporating backbone flexibility into protein-ligand docking is still a challenging problem. In protein-protein docking, normal mode analysis (NMA) has become increasingly popular as it can be used to describe the collective motions of a biological system, but the question whether NMA can also be useful in predicting the conformational changes observed upon small-molecule binding has only been addressed in a few case studies. Here, we describe a large-scale study on the applicability of NMA for protein-ligand docking using 433 apo/holo pairs of the Astex data sets. Based on sets of the first normal modes from the apo structure, we first generated for each paired holo structure a set of conformations that optimally reproduce its C$\alpha$ trace w.r.t. the underlying normal mode subspace. Using AutoDock, GOLD, and FlexX we then docked the original ligands into these conformations to assess how the docking performance depends on the number of modes used to reproduce the holo structure. The results of our study indicate that, even for such a best-case scenario, the use of normal mode analysis in small-molecule docking is restricted, and that a general rule on how many modes to use does not seem to exist or at least is not easy to find.

• Virtual fragment screening: exploration of MM-PBSA re-scoring.
Kawatkar, Sameer and Moustakas, Demetri and Miller, Matthew and Joseph-McCarthy, Diane
Journal of computer-aided molecular design, 2012, 26(8), 921-934
PMID: 22869295     doi: 10.1007/s10822-012-9590-x

An NMR fragment screening dataset with known binders and decoys was used to evaluate the ability of docking and re-scoring methods to identify fragment binders. Re-scoring docked poses using the Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) implicit solvent model identifies additional active fragments relative to either docking or random fragment screening alone. Early enrichment, which is clearly most important in practice for selecting relatively small sets of compounds for experimental testing, is improved by MM-PBSA re-scoring. In addition, the value in MM-PBSA re-scoring of docked poses for virtual screening may be in lessening the effect of the variation in the protein complex structure used.

• Modeling loop backbone flexibility in receptor-ligand docking simulations.
Flick, Johannes and Tristram, Frank and Wenzel, Wolfgang
Journal of computational chemistry, 2012, 33(31), 2504-2515
PMID: 22886372     doi: 10.1002/jcc.23087

The relevance of receptor conformational change during ligand binding is well documented for many pharmaceutically relevant receptors, but is still not fully accounted for in in silico docking methods. While there has been significant progress in treatment of receptor side chain flexibility sampling of backbone flexibility remains challenging because the conformational space expands dramatically and the scoring function must balance protein-protein and protein-ligand contributions. Here, we investigate an efficient multistage backbone reconstruction algorithm for large loop regions in the receptor and demonstrate that treatment of backbone receptor flexibility significantly improves binding mode prediction starting from apo structures and in cross docking simulations. For three different kinase receptors in which large flexible loops reconstruct upon ligand binding, we demonstrate that treatment of backbone flexibility results in accurate models of the complexes in simulations starting from the apo structure. At the example of the DFG-motif in the p38 kinase, we also show how loop reconstruction can be used to model allosteric binding. Our approach thus paves the way to treat the complex process of receptor reconstruction upon ligand binding in docking simulations and may help to design new ligands with high specificity by exploitation of allosteric mechanisms.

• FIPSDock: A new molecular docking technique driven by fully informed swarm optimization algorithm.
Liu, Yu and Zhao, Lei and Li, Wentao and Zhao, Dongyu and Song, Miao and Yang, Yongliang
Journal of computational chemistry, 2012, 34(1), 67-75
PMID: 22961860     doi: 10.1002/jcc.23108

The accurate prediction of protein-ligand binding is of great importance for rational drug design. We present herein a novel docking algorithm called as FIPSDock, which implements a variant of the Fully Informed Particle Swarm (FIPS) optimization method and adopts the newly developed energy function of AutoDock 4.20 suite for solving flexible protein-ligand docking problems. The search ability and docking accuracy of FIPSDock were first evaluated by multiple cognate docking experiments. In a benchmarking test for 77 protein/ligand complex structures derived from GOLD benchmark set, FIPSDock has obtained a successful predicting rate of 93.5% and outperformed a few docking programs including particle swarm optimization (PSO)@AutoDock, SODOCK, AutoDock, DOCK, Glide, GOLD, FlexX, Surflex, and MolDock. More importantly, FIPSDock was evaluated against PSO@AutoDock, SODOCK, and AutoDock 4.20 suite by cross-docking experiments of 74 protein-ligand complexes among eight protein targets (CDK2, ESR1, F2, MAPK14, MMP8, MMP13, PDE4B, and PDE5A) derived from Sutherland-crossdock-set. Remarkably, FIPSDock is superior to PSO@AutoDock, SODOCK, and AutoDock in seven out of eight cross-docking experiments. The results reveal that FIPS algorithm might be more suitable than the conventional genetic algorithm-based algorithms in dealing with highly flexible docking problems.

• Virtual Target Screening: Validation Using Kinase Inhibitors.
Santiago, Daniel N and Pevzner, Yuri and Durand, Ashley A and Tran, Minhphuong and Scheerer, Rachel R and Daniel, Kenyon and Sung, Shen-Shu and Lee Woodcock, H and Guida, Wayne C and Brooks, Wesley H
Journal of chemical information and modeling, 2012, 52(8), 2192-2203
PMID: 22747098     doi: 10.1021/ci300073m

Computational methods involving virtual screening could potentially be employed to discover new biomolecular targets for an individual molecule of interest (MOI). However, existing scoring functions may not accurately differentiate proteins to which the MOI binds from a larger set of macromolecules in a protein structural database. An MOI will most likely have varying degrees of predicted binding affinities to many protein targets. However, correctly interpreting a docking score as a hit for the MOI docked to any individual protein can be problematic. In our method, which we term "Virtual Target Screening (VTS)", a set of small drug-like molecules are docked against each structure in the protein library to produce benchmark statistics. This calibration provides a reference for each protein so that hits can be identified for an MOI. VTS can then be used as tool for: drug repositioning (repurposing), specificity and toxicity testing, identifying potential metabolites, probing protein structures for allosteric sites, and testing focused libraries (collection of MOIs with similar chemotypes) for selectivity. To validate our VTS method, twenty kinase inhibitors were docked to a collection of calibrated protein structures. Here, we report our results where VTS predicted protein kinases as hits in preference to other proteins in our database. Concurrently, a graphical interface for VTS was developed.

• CRDOCK: An Ultrafast Multipurpose Protein-Ligand Docking Tool.
Cabrera, Alvaro Cortés and Klett, Javier and G Dos Santos, Helena and Perona, Almudena and Gil-Redondo, Rubén and Francis, Sandrea M and Priego, Eva M and Gago, Federico and Morreale, Antonio
Journal of chemical information and modeling, 2012, 52(8), 2300-2309
PMID: 22764680     doi: 10.1021/ci300194a

An ultrafast docking and virtual screening program, CRDOCK, is presented that contains (1) a search engine that can use a variety of sampling methods and an initial energy evaluation function, (2) several energy minimization algorithms for fine tuning the binding poses, and (3) different scoring functions. This modularity ensures the easy configuration of custom-made protocols that can be optimized depending on the problem in hand. CRDOCK employs a precomputed library of ligand conformations that are initially generated from one-dimensional SMILES strings. Testing CRDOCK on two widely used benchmarks, the ASTEX diverse set and the Directory of Useful Decoys, yielded a success rate of ∼75% in pose prediction and an average AUC of 0.66. A typical ligand can be docked, on average, in just ∼13 s. Extension to a representative group of pharmacologically relevant G protein-coupled receptors that have been recently cocrystallized with some selective ligands allowed us to demonstrate the utility of this tool and also highlight some current limitations. CRDOCK is now included within VSDMIP, our integrated platform for drug discovery.

• Automatic modeling of mammalian olfactory receptors and docking of odorants.
Launay, Guillaume and Téletchéa, Stéphane and Wade, Fallou and Pajot-Augy, Edith and Gibrat, Jean-François and Sanz, Guenhaël
Protein engineering, design & selection : PEDS, 2012, 25(8), 377-386
PMID: 22691703     doi: 10.1093/protein/gzs037

We present a procedure that (i) automates the homology modeling of mammalian olfactory receptors (ORs) based on the six three-dimensional (3D) structures of G protein-coupled receptors (GPCRs) available so far and (ii) performs the docking of odorants on these models, using the concept of colony energy to score the complexes. ORs exhibit low-sequence similarities with other GPCR and current alignment methods often fail to provide a reliable alignment. Here, we use a fold recognition technique to obtain a robust initial alignment. We then apply our procedure to a human OR that we have previously functionally characterized. The analysis of the resulting in silico complexes, supported by receptor mutagenesis and functional assays in a heterologous expression system, suggests that antagonists dock in the upper part of the binding pocket whereas agonists dock in the narrow lower part. We propose that the potency of agonists in activating receptors depends on their ability to establish tight interactions with the floor of the binding pocket. We developed a web site that allows the user to upload a GPCR sequence, choose a ligand in a library and obtain the 3D structure of the free receptor and ligand-receptor complex (http://genome.jouy.inra.fr/GPCRautomodel).

• BSP-SLIM: a blind low-resolution ligand-protein docking approach using predicted protein structures.
Lee, Hui Sun and Zhang, Yang
Proteins, 2012, 80(1), 93-110
PMID: 21971880     doi: 10.1002/prot.23165

We developed BSP-SLIM, a new method for ligand-protein blind docking using low-resolution protein structures. For a given sequence, protein structures are first predicted by I-TASSER; putative ligand binding sites are transferred from holo-template structures which are analogous to the I-TASSER models; ligand-protein docking conformations are then constructed by shape and chemical match of ligand with the negative image of binding pockets. BSP-SLIM was tested on 71 ligand-protein complexes from the Astex diverse set where the protein structures were predicted by I-TASSER with an average RMSD 2.92\AA} on the binding residues. Using I-TASSER models, the median ligand RMSD of BSP-SLIM docking is 3.99\AA} which is 5.94\AA} lower than that by AutoDock; the median binding-site error by BSP-SLIM is 1.77\AA} which is 6.23\AA} lower than that by AutoDock and 3.43\AA} lower than that by LIGSITE(CSC) . Compared to the models using crystal protein structures, the median ligand RMSD by BSP-SLIM using I-TASSER models increases by 0.87\AA while that by AutoDock increases by 8.41\AA}; the median binding-site error by BSP-SLIM increase by 0.69{\AA} while that by AutoDock and LIGSITE(CSC) increases by 7.31\AA} and 1.41\AA respectively. As case studies, BSP-SLIM was used in virtual screening for six target proteins, which prioritized actives of 25% and 50% in the top 9.2% and 17% of the library on average, respectively. These results demonstrate the usefulness of the template-based coarse-grained algorithms in the low-resolution ligand-protein docking and drug-screening. An on-line BSP-SLIM server is freely available at http://zhanglab.ccmb.med.umich.edu/BSP-SLIM.

• idTarget: a web server for identifying protein targets of small chemical molecules with robust scoring functions and a divide-and-conquer docking approach.
Wang, Jui-Chih and Chu, Pei-Ying and Chen, Chung-Ming and Lin, Jung-Hsin
Nucleic acids research, 2012, 40(Web Server issue), W393-9
PMID: 22649057     doi: 10.1093/nar/gks496

Identification of possible protein targets of small chemical molecules is an important step for unravelling their underlying causes of actions at the molecular level. To this end, we construct a web server, idTarget, which can predict possible binding targets of a small chemical molecule via a divide-and-conquer docking approach, in combination with our recently developed scoring functions based on robust regression analysis and quantum chemical charge models. Affinity profiles of the protein targets are used to provide the confidence levels of prediction. The divide-and-conquer docking approach uses adaptively constructed small overlapping grids to constrain the searching space, thereby achieving better docking efficiency. Unlike previous approaches that screen against a specific class of targets or a limited number of targets, idTarget screen against nearly all protein structures deposited in the Protein Data Bank (PDB). We show that idTarget is able to reproduce known off-targets of drugs or drug-like compounds, and the suggested new targets could be prioritized for further investigation. idTarget is freely available as a web-based server at http://idtarget.rcas.sinica.edu.tw.

## 2011

• Transferable scoring function based on semiempirical quantum mechanical PM6-DH2 method: CDK2 with 15 structurally diverse inhibitors.
Dobes, Petr and Fanfrlík, Jindrich and Rezác, Jan and Otyepka, Michal and Hobza, Pavel
Journal of computer-aided molecular design, 2011, 25(3), 223-235
PMID: 21286784     doi: 10.1007/s10822-011-9413-5

A semiempirical quantum mechanical PM6-DH2 method accurately covering the dispersion interaction and H-bonding was used to score fifteen structurally diverse CDK2 inhibitors. The geometries of all the complexes were taken from the X-ray structures and were reoptimised by the PM6-DH2 method in continuum water. The total scoring function was constructed as an estimate of the binding free energy, i.e., as a sum of the interaction enthalpy, interaction entropy and the corrections for the inhibitor desolvation and deformation energies. The applied scoring function contains a clear thermodynamical terms and does not involve any adjustable empirical parameter. The best correlations with the experimental inhibition constants (ln K (i)) were found for bare interaction enthalpy (r (2)

• DockoMatic: automated peptide analog creation for high throughput virtual screening.
Jacob, Reed B. and Bullock, Casey W and Andersen, Tim and McDougal, Owen M.
Journal of computational chemistry, 2011, 32(13), 2936-2941
PMID: 21717479     doi: 10.1002/jcc.21864

The purpose of this manuscript is threefold: (1) to describe an update to DockoMatic that allows the user to generate cyclic peptide analog structure files based on protein database (pdb) files, (2) to test the accuracy of the peptide analog structure generation utility, and (3) to evaluate the high throughput capacity of DockoMatic. The DockoMatic graphical user interface interfaces with the software program Treepack to create user defined peptide analogs. To validate this approach, DockoMatic produced cyclic peptide analogs were tested for three-dimensional structure consistency and binding affinity against four experimentally determined peptide structure files available in the Research Collaboratory for Structural Bioinformatics database. The peptides used to evaluate this new functionality were alpha-conotoxins ImI, PnIA, and their published analogs. Peptide analogs were generated by DockoMatic and tested for their ability to bind to X-ray crystal structure models of the acetylcholine binding protein originating from Aplysia californica. The results, consisting of more than 300 simulations, demonstrate that DockoMatic predicts the binding energy of peptide structures to within 3.5 kcal mol(-1), and the orientation of bound ligand compares to within 1.8\AA} root mean square deviation for ligand structures as compared to experimental data. Evaluation of high throughput virtual screening capacity demonstrated that Dockomatic can collect, evaluate, and summarize the output of 10,000 AutoDock jobs in less than 2 hours of computational time, while 100,000 jobs requires approximately 15 hours and 1,000,000 jobs is estimated to take up to a week.

• Predicting the accuracy of protein-ligand docking on homology models.
Bordogna, Annalisa and Pandini, Alessandro and Bonati, Laura
Journal of computational chemistry, 2011, 32(1), 81-98
PMID: 20607693     doi: 10.1002/jcc.21601

Ligand-protein docking is increasingly used in Drug Discovery. The initial limitations imposed by a reduced availability of target protein structures have been overcome by the use of theoretical models, especially those derived by homology modeling techniques. While this greatly extended the use of docking simulations, it also introduced the need for general and robust criteria to estimate the reliability of docking results given the model quality. To this end, a large-scale experiment was performed on a diverse set including experimental structures and homology models for a group of representative ligand-protein complexes. A wide spectrum of model quality was sampled using templates at different evolutionary distances and different strategies for target-template alignment and modeling. The obtained models were scored by a selection of the most used model quality indices. The binding geometries were generated using AutoDock, one of the most common docking programs. An important result of this study is that indeed quantitative and robust correlations exist between the accuracy of docking results and the model quality, especially in the binding site. Moreover, state-of-the-art indices for model quality assessment are already an effective tool for an a priori prediction of the accuracy of docking experiments in the context of groups of proteins with conserved structural characteristics.

• Quantum mechanics/molecular mechanics strategies for docking pose refinement: distinguishing between binders and decoys in cytochrome C peroxidase.
Burger, Steven K and Thompson, David C and Ayers, Paul W
Journal of chemical information and modeling, 2011, 51(1), 93-101
PMID: 21133348     doi: 10.1021/ci100329z

We investigate the effect of systematically applying molecular dynamics (MD) and quantum mechanics/molecular mechanics (QM/MM) to docked poses in an attempt to improve the correspondence between theoretical prediction and experimental observation. The proposed scheme involves running a short time scale MD simulation on a docked ligand pose (and any known structurally important crystal structure waters in the active site), followed by QM/MM minimization. Both of these steps are relatively fast for moderately sized ligands; longer time scale MD involving the protein is not found to improve the results. The final binding energy is given in terms of the QM/MM total energy, a van der Waals correction, and a term to account for desolvation effects. This methodology is first tested with a trypsin inhibitor, for which we establish the importance of running MD before reoptimizing with QM/MM. The method is then applied to cytochrome c peroxidase using a set of binders and decoys. In this example, the proposed methodology affords much better discrimination between binders and decoys than the traditional docking approach used. For both systems presented, application of this protocol results in a significantly better energetic ranking and a smaller root mean squared deviation from known crystallographic ligand poses. This work highlights the importance of including polarization effects through QM/MM and of sampling with MD to refine a set of initial docked poses.

• Virtual decoy sets for molecular docking benchmarks.
Wallach, Izhar and Lilien, Ryan
Journal of chemical information and modeling, 2011, 51(2), 196-202
PMID: 21207928     doi: 10.1021/ci100374f

Virtual docking algorithms are often evaluated on their ability to separate active ligands from decoy molecules. The current state-of-the-art benchmark, the Directory of Useful Decoys (DUD), minimizes bias by including decoys from a library of synthetically feasible molecules that are physically similar yet chemically dissimilar to the active ligands. We show that by ignoring synthetic feasibility, we can compile a benchmark that is comparable to the DUD and less biased with respect to physical similarity.

• FRED pose prediction and virtual screening accuracy.
McGann, Mark
Journal of chemical information and modeling, 2011, 51(3), 578-596
PMID: 21323318     doi: 10.1021/ci100436p

Results of a previous docking study are reanalyzed and extended to include results from the docking program FRED and a detailed statistical analysis of both structure reproduction and virtual screening results. FRED is run both in a traditional docking mode and in a hybrid mode that makes use of the structure of a bound ligand in addition to the protein structure to screen molecules. This analysis shows that most docking programs are effective overall but highly inconsistent, tending to do well on one system and poorly on the next. Comparing methods, the difference in mean performance on DUD is found to be statistically significant (95% confidence) 61% of the time when using a global enrichment metric (AUC). Early enrichment metrics are found to have relatively poor statistical power, with 0.5% early enrichment only able to distinguish methods to 95% confidence 14% of the time.

• Assessing the performance of the molecular mechanics/Poisson Boltzmann surface area and molecular mechanics/generalized Born surface area methods. II. The accuracy of ranking poses generated from docking.
Hou, Tingjun and Wang, Junmei and Li, Youyong and Wang, Wei
Journal of computational chemistry, 2011, 32(5), 866-877
PMID: 20949517     doi: 10.1002/jcc.21666

In molecular docking, it is challenging to develop a scoring function that is accurate to conduct high-throughput screenings. Most scoring functions implemented in popular docking software packages were developed with many approximations for computational efficiency, which sacrifices the accuracy of prediction. With advanced technology and powerful computational hardware nowadays, it is feasible to use rigorous scoring functions, such as molecular mechanics/Poisson Boltzmann surface area (MM/PBSA) and molecular mechanics/generalized Born surface area (MM/GBSA) in molecular docking studies. Here, we systematically investigated the performance of MM/PBSA and MM/GBSA to identify the correct binding conformations and predict the binding free energies for 98 protein-ligand complexes. Comparison studies showed that MM/GBSA (69.4%) outperformed MM/PBSA (45.5%) and many popular scoring functions to identify the correct binding conformations. Moreover, we found that molecular dynamics simulations are necessary for some systems to identify the correct binding conformations. Based on our results, we proposed the guideline for MM/GBSA to predict the binding conformations. We then tested the performance of MM/GBSA and MM/PBSA to reproduce the binding free energies of the 98 protein-ligand complexes. The best prediction of MM/GBSA model with internal dielectric constant 2.0, produced a Spearman's correlation coefficient of 0.66, which is better than MM/PBSA (0.49) and almost all scoring functions used in molecular docking. In summary, MM/GBSA performs well for both binding pose predictions and binding free-energy estimations and is efficient to re-score the top-hit poses produced by other less-accurate scoring functions.

• Significant enhancement of docking sensitivity using implicit ligand sampling.
Xu, Mengang and Lill, Markus A
Journal of chemical information and modeling, 2011, 51(3), 693-706
PMID: 21375306     doi: 10.1021/ci100457t

The efficient and accurate quantification of protein-ligand interactions using computational methods is still a challenging task. Two factors strongly contribute to the failure of docking methods to predict free energies of binding accurately: the insufficient incorporation of protein flexibility coupled to ligand binding and the neglected dynamics of the protein-ligand complex in current scoring schemes. We have developed a new methodology, named the 'ligand-model' concept, to sample protein conformations that are relevant for binding structurally diverse sets of ligands. In the ligand-model concept, molecular-dynamics (MD) simulations are performed with a virtual ligand, represented by a collection of functional groups that binds to the protein and dynamically changes its shape and properties during the simulation. The ligand model essentially represents a large ensemble of different chemical species binding to the same target protein. Representative protein structures were obtained from the MD simulation, and docking was performed into this ensemble of protein conformation. Similar binding poses were clustered, and the averaged score was utilized to rerank the poses. We demonstrate that the ligand-model approach yields significant improvements in predicting native-like binding poses and quantifying binding affinities compared to static docking and ensemble docking simulations into protein structures generated from an apo MD simulation.

• Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers.
Collignon, Barbara and Schulz, Roland and Smith, Jeremy C and Baudry, Jerome
Journal of computational chemistry, 2011, 32(6), 1202-1209
PMID: 21387347     doi: 10.1002/jcc.21696

A message passing interface (MPI)-based implementation (Autodock4.lga.MPI) of the grid-based docking program Autodock4 has been developed to allow simultaneous and independent docking of multiple compounds on up to thousands of central processing units (CPUs) using the Lamarkian genetic algorithm. The MPI version reads a single binary file containing precalculated grids that represent the protein-ligand interactions, i.e., van der Waals, electrostatic, and desolvation potentials, and needs only two input parameter files for the entire docking run. In comparison, the serial version of Autodock4 reads ASCII grid files and requires one parameter file per compound. The modifications performed result in significantly reduced input/output activity compared with the serial version. Autodock4.lga.MPI scales up to 8192 CPUs with a maximal overhead of 16.3%, of which two thirds is due to input/output operations and one third originates from MPI operations. The optimal docking strategy, which minimizes docking CPU time without lowering the quality of the database enrichments, comprises the docking of ligands preordered from the most to the least flexible and the assignment of the number of energy evaluations as a function of the number of rotatable bounds. In 24 h, on 8192 high-performance computing CPUs, the present MPI version would allow docking to a rigid protein of about 300K small flexible compounds or 11 million rigid compounds.

• Substantial improvements in large-scale redocking and screening using the novel HYDE scoring function.
Schneider, Nadine and Hindle, Sally and Lange, Gudrun and Klein, Robert and Albrecht, Jürgen and Briem, Hans and Beyer, Kristin and Clau{\ss}en, Holger and Gastreich, Marcus and Lemmen, Christian and Rarey, Matthias
Journal of computer-aided molecular design, 2011, 26(6), 701-723
PMID: 22203423     doi: 10.1007/s10822-011-9531-0

The HYDE scoring function consistently describes hydrogen bonding, the hydrophobic effect and desolvation. It relies on HYdration and DEsolvation terms which are calibrated using octanol/water partition coefficients of small molecules. We do not use affinity data for calibration, therefore HYDE is generally applicable to all protein targets. HYDE reflects the Gibbs free energy of binding while only considering the essential interactions of protein-ligand complexes. The greatest benefit of HYDE is that it yields a very intuitive atom-based score, which can be mapped onto the ligand and protein atoms. This allows the direct visualization of the score and consequently facilitates analysis of protein-ligand complexes during the lead optimization process. In this study, we validated our new scoring function by applying it in large-scale docking experiments. We could successfully predict the correct binding mode in 93% of complexes in redocking calculations on the Astex diverse set, while our performance in virtual screening experiments using the DUD dataset showed significant enrichment values with a mean AUC of 0.77 across all protein targets with little or no structural defects. As part of these studies, we also carried out a very detailed analysis of the data that revealed interesting pitfalls, which we highlight here and which should be addressed in future benchmark datasets.

• Molecular docking with ligand attached water molecules.
Lie, Mette A and Thomsen, René and Pedersen, Christian N S and Schi{\o}tt, Birgit and Christensen, Mikael H
Journal of chemical information and modeling, 2011, 51(4), 909-917
PMID: 21452852     doi: 10.1021/ci100510m

A novel approach to incorporate water molecules in protein-ligand docking is proposed. In this method, the water molecules display the same flexibility during the docking simulation as the ligand. The method solvates the ligand with the maximum number of water molecules, and these are then retained or displaced depending on energy contributions during the docking simulation. Instead of being a static part of the receptor, each water molecule is a flexible on/off part of the ligand and is treated with the same flexibility as the ligand itself. To favor exclusion of the water molecules, a constant entropy penalty is added for each included water molecule. The method was evaluated using 12 structurally diverse protein-ligand complexes from the PDB, where several water molecules bridge the ligand and the protein. A considerable improvement in successful docking simulations was found when including flexible water molecules solvating hydrogen bonding groups of the ligand. The method has been implemented in the docking program Molegro Virtual Docker (MVD).

• Correction to "A Machine Learning-Based Method To Improve Docking Scoring Functions and Its Application to Drug Repurposing"
Kinnings, Sarah L and Liu, Nina and Tonge, Peter J and Jackson, Richard M and Xie, Lei and Bourne, Philip E
Journal of chemical information and modeling, 2011, 51(5), 1195-1197
PMID: 21526828     doi: 10.1021/ci2001346

• MiniMuDS: a new optimizer using knowledge-based potentials improves scoring of docking solutions.
Spitzmüller, Andreas and Velec, Hans F G and Klebe, Gerhard
Journal of chemical information and modeling, 2011, 51(6), 1423-1430
PMID: 21528908     doi: 10.1021/ci200098v

In small molecule docking, the scoring and ranking of generated conformations is an important, though still not a completely resolved problem. Rescoring schemes often improve the quality of the obtained rankings. It is known that a local optimization is essential before a valid rescore value can be calculated. Here, we present a method that improves rescoring results obtained with the DrugScore function due to a new optimization technique. The method implements a more sophisticated search algorithm compared to the classic local optimization procedures used in this context. We validated the proposed method on a set of 192 protein-ligand complexes. Results show substantial improvements compared to original docking results with success rates increased by up to 10% for top scored solutions below 2\AA} root-mean-square deviation to the native state and up to 18% increase below 1\AA respectively.

• Efficient incorporation of protein flexibility and dynamics into molecular docking simulations.
Lill, Markus A
Biochemistry, 2011, 50(28), 6157-6169
PMID: 21678954     doi: 10.1021/bi2004558

Flexibility and dynamics are protein characteristics that are essential for the process of molecular recognition. Conformational changes in the protein that are coupled to ligand binding are described by the biophysical models of induced fit and conformational selection. Different concepts that incorporate protein flexibility into protein-ligand docking within the context of these two models are reviewed. Several computational studies that discuss the validity and possible limitations of such approaches will be presented. Finally, different approaches that incorporate protein dynamics, e.g., configurational entropy, and solvation effects into docking will be highlighted.

• Construction and test of ligand decoy sets using MDock: community structure-activity resource benchmarks for binding mode prediction.
Huang, Sheng-You and Zou, Xiaoqin
Journal of chemical information and modeling, 2011, 51(9), 2107-2114
PMID: 21755952     doi: 10.1021/ci200080g

Two sets of ligand binding decoys have been constructed for the community structure-activity resource (CSAR) benchmark by using the MDock and DOCK programs for rigid- and flexible-ligand docking, respectively. The decoys generated for each complex in the benchmark thoroughly cover the binding site and also contain a certain number of near-native binding modes. A few scoring functions have been evaluated using the ligand binding decoy sets for their abilities of predicting near-native binding modes. Among them, ITScore achieved a success rate of 86.7% for the rigid-ligand decoys and 79.7% for the flexible-ligand decoys, under the common definition of a successful prediction as root-mean-square deviation <2.0\AA} from the native structure if the top-scored binding mode was considered. The decoy sets may serve as benchmarks for binding mode prediction of a scoring function, which are available at the CSAR Web site ( http://www.csardock.org/).

• Docking performance of fragments and druglike compounds.
Verdonk, Marcel L and Giangreco, Ilenia and Hall, Richard J and Korb, Oliver and Mortenson, Paul N and Murray, Christopher W
Journal of medicinal chemistry, 2011, 54(15), 5422-5431
PMID: 21692478     doi: 10.1021/jm200558u

This paper addresses two questions of key interest to researchers working with protein-ligand docking methods: (i) Why is there such a large variation in docking performance between different test sets reported in the literature? (ii) Are fragments more difficult to dock than druglike compounds? To answer these, we construct a test set of in-house X-ray structures of protein-ligand complexes from drug discovery projects, half of which contain fragment ligands, the other half druglike ligands. We find that a key factor affecting docking performance is ligand efficiency (LE). High LE compounds are significantly easier to dock than low LE compounds, which we believe could explain the differences observed between test sets reported in the literature. There is no significant difference in docking performance between fragments and druglike compounds, but the reasons why dockings fail appear to be different.

• Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors.
Gatica, Edgar A and Cavasotto, Claudio N
Journal of chemical information and modeling, 2011, 52(1), 1-6
PMID: 22168315     doi: 10.1021/ci200412p

We compiled a G protein-coupled receptor (GPCR) ligand library (GLL) for 147 targets, selecting for each ligand 39 decoy molecules, collected in the GPCR Decoy Database (GDD). Decoys were chosen ensuring a ligand-decoy similarity of six physical properties, while enforcing ligand-decoy chemical dissimilarity. The performance in docking of the GDD was evaluated on 19 GPCRs, showing a marked decrease in enrichment compared to bias-uncorrected decoy sets. Both the GLL and GDD are freely available for the scientific community.

• Fragment-Based Drug Design and Drug Repositioning Using Multiple Ligand Simultaneous Docking (MLSD): Identifying Celecoxib and Template Compounds as Novel Inhibitors of Signal Transducer and Activator of Transcription 3 (STAT3).
Li, Huameng and Liu, Aiguo and Zhao, Zhenjiang and Xu, Yufang and Lin, Jiayuh and Jou, David and Li, Chenglong
Journal of medicinal chemistry, 2011, 54(15), 5592-5596
PMID: 21678971     doi: 10.1021/jm101330h

We describe a novel method of drug discovery using MLSD and drug repositioning, with cancer target STAT3 being used as a test case. Multiple drug scaffolds were simultaneously docked into hot spots of STAT3 by MLSD, followed by tethering to generate virtual template compounds. Similarity search of virtual hits on drug database identified celecoxib as a novel inhibitor of STAT3. Furthermore, we designed two novel lead inhibitors based on one of the lead templates and celecoxib.

• Normalizing Molecular Docking Rankings using Virtually Generated Decoys.
Wallach, Izhar and Jaitly, Navdeep and Nguyen, Kong and Schapira, Matthieu and Lilien, Ryan
Journal of chemical information and modeling, 2011, 51(8), 1817-1830
PMID: 21699246     doi: 10.1021/ci200175h

Drug discovery research often relies on the use of virtual screening via molecular docking to identify active hits in compound libraries. An area for improvement among many state-of-the-art docking methods is the accuracy of the scoring functions used to differentiate active from nonactive ligands. Many contemporary scoring functions are influenced by the physical properties of the docked molecule. This bias can cause molecules with certain physical properties to incorrectly score better than others. Since variation in physical properties is inevitable in large screening libraries, it is desirable to account for this bias. In this paper, we present a method of normalizing docking scores using virtually generated decoy sets with matched physical properties. First, our method generates a set of property-matched decoys for every molecule in the screening library. Each library molecule and its decoy set are docked using a state-of-the-art method, producing a set of raw docking scores. Next, the raw docking score of each library molecule is normalized against the scores of its decoys. The normalized score represents the probability that the raw docking score was drawn from the background distribution of nonactive property-matched decoys. Assuming that the distribution of scores of active molecules differs from the nonactive score distribution, we expect that the score of an active compound will have a low probability of having been drawn from the nonactive score distribution. In addition to the use of decoys in normalizing docking scores, we suggest that decoy sets may be a useful tool to evaluate, improve, or develop scoring functions. We show that by analyzing docking scores of library molecules with respect to the docking scores of their virtually generated property-matched decoys, one can gain insight into the advantages, limitations, and reliability of scoring functions.

• DEKOIS: Demanding Evaluation Kits for Objective in Silico Screening - A Versatile Tool for Benchmarking Docking Programs and Scoring Functions.
Vogel, Simon M and Bauer, Matthias R and Boeckler, Frank M
Journal of chemical information and modeling, 2011, 51(10), 2650-2665
PMID: 21774552     doi: 10.1021/ci2001549

For widely applied in silico screening techniques success depends on the rational selection of an appropriate method. We herein present a fast, versatile, and robust method to construct demanding evaluation kits for objective in silico screening (DEKOIS). This automated process enables creating tailor-made decoy sets for any given sets of bioactives. It facilitates a target-dependent validation of docking algorithms and scoring functions helping to save time and resources. We have developed metrics for assessing and improving decoy set quality and employ them to investigate how decoy embedding affects docking. We demonstrate that screening performance is target-dependent and can be impaired by latent actives in the decoy set (LADS) or enhanced by poor decoy embedding. The presented method allows extending and complementing the collection of publicly available high quality decoy sets toward new target space. All present and future DEKOIS data sets will be made accessible at www.dekois.com .

• Estimating binding affinities by docking/scoring methods using variable protonation states.
Park, Min-Sun and Gao, Cen and Stern, Harry A
Proteins, 2011, 79(1), 304-314
PMID: 21058298     doi: 10.1002/prot.22883

To investigate the effects of multiple protonation states on protein-ligand recognition, we generated alternative protonation states for selected titratable groups of ligands and receptors. The selection of states was based on the predicted pK(a) of the unbound receptor and ligand and the proximity of titratable groups of the receptor to the binding site. Various ligand tautomer states were also considered. An independent docking calculation was run for each state. Several protocols were examined: using an ensemble of all generated states of ligand and receptor, using only the most probable state of the unbound ligand/receptor, and using only the state giving the most favorable docking score. The accuracies of these approaches were compared, using a set of 176 protein-ligand complexes (15 receptors) for which crystal structures and measured binding affinities are available. The best agreement with experiment was obtained when ligand poses from experimental crystal structures were used. For 9 of 15 receptors, using an ensemble of all generated protonation states of the ligand and receptor gave the best correlation between calculated and measured affinities.

• Evaluation of Several Two-Step Scoring Functions Based on Linear Interaction Energy, Effective Ligand Size, and Empirical Pair Potentials for Prediction of Protein-Ligand Binding Geometry and Free Energy.
Rahaman, Obaidur and Estrada, Trilce P and Doren, Douglas J and Taufer, Michela and Brooks, Charles L and Armen, Roger S
Journal of chemical information and modeling, 2011, 51(9), 2047-2065
PMID: 21644546     doi: 10.1021/ci1003009

The performances of several two-step scoring approaches for molecular docking were assessed for their ability to predict binding geometries and free energies. Two new scoring functions designed for "step 2 discrimination" were proposed and compared to our CHARMM implementation of the linear interaction energy (LIE) approach using the Generalized-Born with Molecular Volume (GBMV) implicit solvation model. A scoring function S1 was proposed by considering only "interacting" ligand atoms as the "effective size" of the ligand and extended to an empirical regression-based pair potential S2. The S1 and S2 scoring schemes were trained and 5-fold cross-validated on a diverse set of 259 protein-ligand complexes from the Ligand Protein Database (LPDB). The regression-based parameters for S1 and S2 also demonstrated reasonable transferability in the CSARdock 2010 benchmark using a new data set (NRC HiQ) of diverse protein-ligand complexes. The ability of the scoring functions to accurately predict ligand geometry was evaluated by calculating the discriminative power (DP) of the scoring functions to identify native poses. The parameters for the LIE scoring function with the optimal discriminative power (DP) for geometry (step 1 discrimination) were found to be very similar to the best-fit parameters for binding free energy over a large number of protein-ligand complexes (step 2 discrimination). Reasonable performance of the scoring functions in enrichment of active compounds in four different protein target classes established that the parameters for S1 and S2 provided reasonable accuracy and transferability. Additional analysis was performed to definitively separate scoring function performance from molecular weight effects. This analysis included the prediction of ligand binding efficiencies for a subset of the CSARdock NRC HiQ data set where the number of ligand heavy atoms ranged from 17 to 35. This range of ligand heavy atoms is where improved accuracy of predicted ligand efficiencies is most relevant to real-world drug design efforts.

• PLS-DA - Docking Optimized Combined Energetic Terms (PLSDA-DOCET) protocol: a brief evaluation.
Avram, Sorin and Pacureanu, Liliana Mioara and Seclaman, Edward and Bora, Alina and Kurunczi, Ludovic G
Journal of chemical information and modeling, 2011, 51(12), 3169-3179
PMID: 22066983     doi: 10.1021/ci2002268

Docking studies have become popular approaches in drug design, where the binding energy of the ligand in the active site of the protein is estimated by a scoring function. Many promising techniques were developed to enhance the performance of scoring functions including the fusion of multiple scoring functions outcomes into a so-called consensus scoring function. Hereby, we evaluated the target oriented consensus technique using the energetic terms of several scoring functions. The approach was denoted PLSDA-DOCET. Optimization strategies for consensus energetic terms and scoring functions based on ROC metric were compared to classical rigid docking and to ligand-based similarity search methods comprising 2D fingerprints and ROCS. The ROCS results indicate large performance variations depending on the biological target. The AUC-based strategy of PLSDA-DOCET outperformed the other docking approaches regarding simple retrieval and scaffold-hopping. The superior performance of PLSDA-DOCET protocol relative to single and combined scoring functions was validated on an external test set. We found a relative low mean correlation of the ranks of the chemotypes retrieved by the PLSDA-DOCET protocol and all the other methods employed here.

• Statistical potential for modeling and ranking of protein-ligand interactions.
Fan, Hao and Schneidman-Duhovny, Dina and Irwin, John J and Dong, Guangqiang and Shoichet, Brian K and Sali, Andrej
Journal of chemical information and modeling, 2011, 51(12), 3078-3092
PMID: 22014038     doi: 10.1021/ci200377u

Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2\AA} from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).

• Improving molecular docking through eHiTS' tunable scoring function.
Ravitz, Orr and Zsoldos, Zsolt and Simon, Aniko
Journal of computer-aided molecular design, 2011, 25(11), 1033-1051
PMID: 22076470     doi: 10.1007/s10822-011-9482-5

We present three complementary approaches for score-tuning that improve docking performance in pose prediction, virtual screening and binding affinity assessment. The methodology utilizes experimental data to customize the scoring function for the system of interest considering the specific docking scenario. The tuning approach, which has been implemented as an automated utility in eHiTS, is introduced as a solution to one of the conundrums of the molecular docking paradigm, namely, the lack of a universally well performing scoring function. The accuracy of scoring functions has been shown to be generally system-dependent, and particularly lacking for binding energy and bio-activity predictions. In the proposed approach, pose and energy predictions are enhanced by adjusting the relative weights of the eHiTS energy terms to improve score-RMSD or score-affinity correlations. In a virtual screening context ligand-based similarity is used to rescale the docking score such that better enrichment factors are achieved. We discuss the algorithmic details of the methods, and demonstrate the effects of score tuning on a variety of targets, including CDK2, BACE1 and neuraminidase, as well as on the popular benchmarks-the Directory of Useful Decoys and the PDBBind database.

• BEAR, a novel virtual screening methodology for drug discovery.
Degliesposti, Gianluca and Portioli, Corinne and Parenti, Marco Daniele and Rastelli, Giulio
Journal of biomolecular screening, 2011, 16(1), 129-133
PMID: 21084717     doi: 10.1177/1087057110388276

BEAR (binding estimation after refinement) is a new virtual screening technology based on the conformational refinement of docking poses through molecular dynamics and prediction of binding free energies using accurate scoring functions. Here, the authors report the results of an extensive benchmark of the BEAR performance in identifying a smaller subset of known inhibitors seeded in a large (1.5 million) database of compounds. BEAR performance proved strikingly better if compared with standard docking screening methods. The validations performed so far showed that BEAR is a reliable tool for drug discovery. It is fast, modular, and automated, and it can be applied to virtual screenings against any biological target with known structure and any database of compounds.

• Knowledge-Based Scoring Functions in Drug Design: 3. A Two-Dimensional Knowledge-Based Hydrogen-Bonding Potential for the Prediction of Protein-Ligand Interactions.
Zheng, Mingyue and Xiong, Bing and Luo, Cheng and Li, Shanshan and Liu, Xian and Shen, Qianchen and Li, Jing and Zhu, Weiliang and Luo, Xiaomin and Jiang, Hualiang
Journal of chemical information and modeling, 2011, 50(11), 2994-3004
PMID: 21999432     doi: 10.1021/ci2003939

Hydrogen bonding is a key contributor to the molecular recognition between ligands and their host molecules in biological systems. Here we develop a novel orientation-dependent hydrogen bonding potential based on the geometric characteristics of hydrogen bonds observed in 44,585 protein-ligand complexes. We find a close correspondence between the empirical knowledge and the energy landscape inferred from the distribution of HBs. A scoring function based on the resultant hydrogen-bonding potentials discriminates native protein-ligand structures from incorrectly docked decoys with remarkable predictive power.

• A machine learning-based method to improve docking scoring functions and its application to drug repurposing.
Kinnings, Sarah L and Liu, Nina and Tonge, Peter J and Jackson, Richard M and Xie, Lei and Bourne, Philip E
Journal of chemical information and modeling, 2011, 51(2), 408-419
PMID: 21291174     doi: 10.1021/ci100369f

Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC(50) values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.

• Accelerating molecular docking calculations using graphics processing units.
Korb, Oliver and Stutzle, Thomas and Exner, Thomas E.
Journal of chemical information and modeling, 2011, 51(4), 865-876
PMID: 21434638     doi: 10.1021/ci100459b

The generation of molecular conformations and the evaluation of interaction potentials are common tasks in molecular modeling applications, particularly in protein-ligand or protein-protein docking programs. In this work, we present a GPU-accelerated approach capable of speeding up these tasks considerably. For the evaluation of interaction potentials in the context of rigid protein-protein docking, the GPU-accelerated approach reached speedup factors of up to over 50 compared to an optimized CPU-based implementation. Treating the ligand and donor groups in the protein binding site as flexible, speedup factors of up to 16 can be observed in the evaluation of protein-ligand interaction potentials. Additionally, we introduce a parallel version of our protein-ligand docking algorithm PLANTS that can take advantage of this GPU-accelerated scoring function evaluation. We compared the GPU-accelerated parallel version to the same algorithm running on the CPU and also to the highly optimized sequential CPU-based version. In terms of dependence of the ligand size and the number of rotatable bonds, speedup factors of up to 10 and 7, respectively, can be observed. Finally, a fitness landscape analysis in the context of rigid protein-protein docking was performed. Using a systematic grid-based search methodology, the GPU-accelerated version outperformed the CPU-based version with speedup factors of up to 60.

• AADS - An Automated Active Site Identification, Docking, and Scoring Protocol for Protein Targets Based on Physicochemical Descriptors.
Singh, Tanya and Biswas, D and Jayaram, B.
Journal of chemical information and modeling, 2011, 51(10), 2515-2527
PMID: 21877713     doi: 10.1021/ci200193z

We report here a robust automated active site detection, docking, and scoring (AADS) protocol for proteins with known structures. The active site finder identifies all cavities in a protein and scores them based on the physicochemical properties of functional groups lining the cavities in the protein. The accuracy realized on 620 proteins with sizes ranging from 100 to 600 amino acids with known drug active sites is 100% when the top ten cavity points are considered. These top ten cavity points identified are then submitted for an automated docking of an input ligand/candidate molecule. The docking protocol uses an all atom energy based Monte Carlo method. Eight low energy docked structures corresponding to different locations and orientations of the candidate molecule are stored at each cavity point giving 80 docked structures overall which are then ranked using an effective free energy function and top five structures are selected. The predicted structure and energetics of the complexes agree quite well with experiment when tested on a data set of 170 protein-ligand complexes with known structures and binding affinities. The AADS methodology is implemented on an 80 processor cluster and presented as a freely accessible, easy to use tool at http://www.scfbio-iitd.res.in/dock/ActiveSite_new.jsp .

• Efficient inclusion of receptor flexibility in grid-based protein-ligand docking*
Leis, Simon and Zacharias, Martin
Journal of computational chemistry, 2011, 32(16), 3433-3439
PMID: 21919015     doi: 10.1002/jcc.21923

Accounting for receptor flexibility is an essential component of successful protein-ligand docking but still marks a major computational challenge. For many target molecules of pharmaceutical relevance, global backbone conformational changes are relevant during the ligand binding process. However, popular methods that represent the protein receptor molecule as a potential grid typically assume a rigid receptor structure during ligand-receptor docking. A new approach has been developed that combines inclusion of global receptor flexibility with the efficient potential grid representation of the receptor molecule. This is achieved using interpolation between grid representations of the receptor protein deformed in selected collective degrees of freedom. The method was tested on the docking of three ligands to apo protein kinase A (PKA), an enzyme that undergoes global structural changes upon inhibitor binding. Structural variants of PKA were generated along the softest normal mode of an elastic network representation of apo PKA. Inclusion of receptor deformability during docking resulted in a significantly improved docking performance compared with rigid PKA docking, thus allowing for systematic virtual screening applications at small additional computational cost.

• BetaDock: shape-priority docking method based on beta-complex.
Kim, Deok-Soo and Kim, Chong-Min and Won, Chung-In and Kim, Jae-Kwan and Ryu, Joonghyun and Cho, Youngsong and Lee, Changhee and Bhak, Jong
Journal of biomolecular structure & dynamics, 2011, 29(1), 219-242
PMID: 21696235

This paper presents an approach and a software, BetaDock, to the docking problem by putting the priority on shape complementarity between a receptor and a ligand. The approach is based on the theory of the $\beta$-complex. Given the Voronoi diagram of the receptor whose topology is stored in the quasi-triangulation, the $\beta$-complex corresponding to water molecule is computed. Then, the boundary of the $\beta$-complex defines the $\beta$-shape which has the complete proximity information among all atoms on the receptor boundary. From the $\beta$-shape, we first compute pockets where the ligand may bind. Then, we quickly place the ligand within each pocket by solving the singular value decomposition problem and the assignment problem. Using the conformations of the ligands within the pockets as the initial solutions, we run the genetic algorithm to find the optimal solution for the docking problem. The performance of the proposed algorithm was verified through a benchmark test and showed that BetaDock is superior to a popular docking software AutoDock 4.

• Virtual screening using molecular simulations.
Yang, Tianyi and Wu, Johnny C and Yan, Chunli and Wang, Yuanfeng and Luo, Ray and Gonzales, Michael B and Dalby, Kevin N and Ren, Pengyu
Proteins, 2011, 79(6), 1940-1951
PMID: 21491494     doi: 10.1002/prot.23018

Effective virtual screening relies on our ability to make accurate prediction of protein-ligand binding, which remains a great challenge. In this work, utilizing the molecular-mechanics Poisson-Boltzmann (or Generalized Born) surface area approach, we have evaluated the binding affinity of a set of 156 ligands to seven families of proteins, trypsin $\beta$, thrombin $\alpha$, cyclin-dependent kinase (CDK), cAMP-dependent kinase (PKA), urokinase-type plasminogen activator, $\beta$-glucosidase A, and coagulation factor Xa. The effect of protein dielectric constant in the implicit-solvent model on the binding free energy calculation is shown to be important. The statistical correlations between the binding energy calculated from the implicit-solvent approach and experimental free energy are in the range of 0.56-0.79 across all the families. This performance is better than that of typical docking programs especially given that the latter is directly trained using known binding data whereas the molecular mechanics is based on general physical parameters. Estimation of entropic contribution remains the barrier to accurate free energy calculation. We show that the traditional rigid rotor harmonic oscillator approximation is unable to improve the binding free energy prediction. Inclusion of conformational restriction seems to be promising but requires further investigation. On the other hand, our preliminary study suggests that implicit-solvent based alchemical perturbation, which offers explicit sampling of configuration entropy, can be a viable approach to significantly improve the prediction of binding free energy. Overall, the molecular mechanics approach has the potential for medium to high-throughput computational drug discovery.

• Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors.
Raveh, Barak and London, Nir and Zimmerman, Lior and Schueler-Furman, Ora
PloS one, 2011, 6(4), e18934
PMID: 21572516     doi: 10.1371/journal.pone.0018934

Flexible peptides that fold upon binding to another protein molecule mediate a large number of regulatory interactions in the living cell and may provide highly specific recognition modules. We present Rosetta FlexPepDock ab-initio, a protocol for simultaneous docking and de-novo folding of peptides, starting from an approximate specification of the peptide binding site. Using the Rosetta fragments library and a coarse-grained structural representation of the peptide and the receptor, FlexPepDock ab-initio samples efficiently and simultaneously the space of possible peptide backbone conformations and rigid-body orientations over the receptor surface of a given binding site. The subsequent all-atom refinement of the coarse-grained models includes full side-chain modeling of both the receptor and the peptide, resulting in high-resolution models in which key side-chain interactions are recapitulated. The protocol was applied to a benchmark in which peptides were modeled over receptors in either their bound backbone conformations or in their free, unbound form. Near-native peptide conformations were identified in 18/26 of the bound cases and 7/14 of the unbound cases. The protocol performs well on peptides from various classes of secondary structures, including coiled peptides with unusual turns and kinks. The results presented here significantly extend the scope of state-of-the-art methods for high-resolution peptide modeling, which can now be applied to a wide variety of peptide-protein interactions where no prior information about the peptide backbone conformation is available, enabling detailed structure-based studies and manipulation of those interactions.

• Docking-based virtual screening for ligands of G protein-coupled receptors: not only crystal structures but also in silico models.
Vilar, Santiago and Ferino, Giulio and Phatak, Sharangdhar S and Berk, Barkin and Cavasotto, Claudio N and Costanzi, Stefano
Journal of molecular graphics & modelling, 2011, 29(5), 614-623
PMID: 21146435     doi: 10.1016/j.jmgm.2010.11.005

G protein-coupled receptors (GPCRs) regulate a wide range of physiological functions and hold great pharmaceutical interest. Using the $\beta$(2)-adrenergic receptor as a case study, this article explores the applicability of docking-based virtual screening to the discovery of GPCR ligands and defines methods intended to improve the screening performance. Our controlled computational experiments were performed on a compound dataset containing known agonists and blockers of the receptor as well as a large number of decoys. The screening based on the structure of the receptor crystallized in complex with its inverse agonist carazolol yielded excellent results, with a clearly delineated prioritization of ligands over decoys. Blockers generally were preferred over agonists; however, agonists were also well distinguished from decoys. A method was devised to increase the screening yields by generating an ensemble of alternative conformations of the receptor that accounts for its flexibility. Moreover, a method was devised to improve the retrieval of agonists, based on the optimization of the receptor around a known agonist. Finally, the applicability of docking-based virtual screening also to homology models endowed with different levels of accuracy was proved. This last point is of uttermost importance, since crystal structures are available only for a limited number of GPCRs, and extends our conclusions to the entire superfamily. The outcome of this analysis definitely supports the application of computer-aided techniques to the discovery of novel GPCR ligands, especially in light of the fact that, in the near future, experimental structures are expected to be solved and become available for an ever increasing number of GPCRs.

• Toward prediction of functional protein pockets using blind docking and pocket search algorithms
Hetenyi, Csaba and van der Spoel, David
Protein science : a publication of the Protein Society, 2011, 20(5), 880-893
PMID: 21413095     doi: 10.1002/pro.618

Location of functional binding pockets of bioactive ligands on protein molecules is essential in structural genomics and drug design projects. If the experimental determination of ligand-protein complex structures is complicated, blind docking (BD) and pocket search (PS) calculations can help in the prediction of atomic resolution binding mode and the location of the pocket of a ligand on the entire protein surface. Whereas the number of successful predictions by these methods is increasing even for the complicated cases of exosites or allosteric binding sites, their reliability has not been fully established. For a critical assessment of reliability, we use a set of ligand-protein complexes, which were found to be problematic in previous studies. The robustness of BD and PS methods is addressed in terms of success of the selection of truly functional pockets from among the many putative ones identified on the surfaces of ligand-bound and ligand-free (holo and apo) protein forms. Issues related to BD such as effect of hydration, existence of multiple pockets, and competition of subsidiary ligands are considered. Practical cases of PS are discussed, categorized and strategies are recommended for handling the different situations. PS can be used in conjunction with BD, as we find that a consensus approach combining the techniques improves predictive power.

• Fast docking using the CHARMM force field with EADock DSS.
Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
Journal of computational chemistry, 2011, 32(10), 2149-2159
PMID: 21541955     doi: 10.1002/jcc.21797

The prediction of binding modes (BMs) occurring between a small molecule and a target protein of biological interest has become of great importance for drug development. The overwhelming diversity of needs leaves room for docking approaches addressing specific problems. Nowadays, the universe of docking software ranges from fast and user friendly programs to algorithmically flexible and accurate approaches. EADock2 is an example of the latter. Its multiobjective scoring function was designed around the CHARMM22 force field and the FACTS solvation model. However, the major drawback of such a software design lies in its computational cost. EADock dihedral space sampling (DSS) is built on the most efficient features of EADock2, namely its hybrid sampling engine and multiobjective scoring function. Its performance is equivalent to that of EADock2 for drug-like ligands, while the CPU time required has been reduced by several orders of magnitude. This huge improvement was achieved through a combination of several innovative features including an automatic bias of the sampling toward putative binding sites, and a very efficient tree-based DSS algorithm. When the top-scoring prediction is considered, 57% of BMs of a test set of 251 complexes were reproduced within 2\AA} RMSD to the crystal structure. Up to 70% were reproduced when considering the five top scoring predictions. The success rate is lower in cross-docking assays but remains comparable with that of the latest version of AutoDock that accounts for the protein flexibility.

• SwissParam: a fast force field generation tool for small organic molecules.
Zoete, Vincent and Cuendet, Michel A and Grosdidier, Aurélien and Michielin, Olivier
Journal of computational chemistry, 2011, 32(11), 2359-2368
PMID: 21541964     doi: 10.1002/jcc.21816

The drug discovery process has been deeply transformed recently by the use of computational ligand-based or structure-based methods, helping the lead compounds identification and optimization, and finally the delivery of new drug candidates more quickly and at lower cost. Structure-based computational methods for drug discovery mainly involve ligand-protein docking and rapid binding free energy estimation, both of which require force field parameterization for many drug candidates. Here, we present a fast force field generation tool, called SwissParam, able to generate, for arbitrary small organic molecule, topologies, and parameters based on the Merck molecular force field, but in a functional form that is compatible with the CHARMM force field. Output files can be used with CHARMM or GROMACS. The topologies and parameters generated by SwissParam are used by the docking software EADock2 and EADock DSS to describe the small molecules to be docked, whereas the protein is described by the CHARMM force field, and allow them to reach success rates ranging from 56 to 78%. We have also developed a rapid binding free energy estimation approach, using SwissParam for ligands and CHARMM22/27 for proteins, which requires only a short minimization to reproduce the experimental binding free energy of 214 ligand-protein complexes involving 62 different proteins, with a standard error of 2.0 kcal mol(-1), and a correlation coefficient of 0.74. Together, these results demonstrate the relevance of using SwissParam topologies and parameters to describe small organic molecules in computer-aided drug design applications, together with a CHARMM22/27 description of the target protein. SwissParam is available free of charge for academic users at www.swissparam.ch.

• LigDockCSA: Protein-ligand docking using conformational space annealing.
Shin, Woong-Hee and Heo, Lim and Lee, Juyong and Ko, Junsu and Seok, Chaok and Lee, Jooyoung
Journal of computational chemistry, 2011, 32(15), 3226-3232
PMID: 21837636     doi: 10.1002/jcc.21905

Protein-ligand docking techniques are one of the essential tools for structure-based drug design. Two major components of a successful docking program are an efficient search method and an accurate scoring function. In this work, a new docking method called LigDockCSA is developed by using a powerful global optimization technique, conformational space annealing (CSA), and a scoring function that combines the AutoDock energy and the piecewise linear potential (PLP) torsion energy. It is shown that the CSA search method can find lower energy binding poses than the Lamarckian genetic algorithm of AutoDock. However, lower-energy solutions CSA produced with the AutoDock energy were often less native-like. The loophole in the AutoDock energy was fixed by adding a torsional energy term, and the CSA search on the refined energy function is shown to improve the docking performance. The performance of LigDockCSA was tested on the Astex diverse set which consists of 85 protein-ligand complexes. LigDockCSA finds the best scoring poses within 2\AA} root-mean-square deviation (RMSD) from the native structures for 84.7% of the test cases, compared to 81.7% for AutoDock and 80.5% for GOLD. The results improve further to 89.4% by incorporating the conformational entropy.

• VoteDock: Consensus Docking Method for Prediction of Protein-Ligand Interactions
Plewczynski, Dariusz and Lazniewski, Michal and Von Grotthuss, Marcin and Rychlewski, Leszek and Ginalski, Krzysztof
Journal of computational chemistry, 2011, 32(4), 568-581
PMID: 20812324     doi: 10.1002/jcc.21642

Molecular recognition plays a fundamental role in all biological processes, and that is why great efforts have been made to understand and predict protein ligand interactions. Finding a molecule that can potentially bind to a target protein is particularly essential in drug discovery and still remains an expensive and time-consuming task. In sale, tools are frequently used to screen molecular libraries to identify new lead compounds, and if protein structure is known, various protein ligand docking programs can be used. The aim of docking procedure is to predict correct poses of ligand in the binding site of the protein as well as to score them according to the strength of interaction in a reasonable time frame. The purpose of our studies was to present the novel consensus approach to predict both protein ligand complex structure and its corresponding binding affinity. Our method used as the input the results from seven docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS, and AutoDock) that are widely used for docking of ligands. We evaluated it on the extensive benchmark dataset of 1300 protein-ligands pairs from refined PDBbind database for which the structural and affinity data was available. We compared independently its ability of proper scoring and posing to the previously proposed methods. In most cases, our method is able to dock properly approximately 20% of pairs more than docking methods on average, and over 10% of pairs more than the best single program. The RMSD value of the predicted complex conformation versus its native one is reduced by a factor of 0.5 angstrom. Finally, we were able to increase the Pearson correlation of the predicted binding affinity in comparison with the experimental value up to 0.5. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 32: 568-581, 2011

• NNScore 2.0: a neural-network receptor-ligand scoring function.
Durrant, Jacob D and McCammon, J Andrew
Journal of chemical information and modeling, 2011, 51(11), 2897-2903
PMID: 22017367     doi: 10.1021/ci2003889

NNScore is a neural-network-based scoring function designed to aid the computational identification of small-molecule ligands. While the test cases included in the original NNScore article demonstrated the utility of the program, the application examples were limited. The purpose of the current work is to further confirm that neural-network scoring functions are effective, even when compared to the scoring functions of state-of-the-art docking programs, such as AutoDock, the most commonly cited program, and AutoDock Vina, thought to be two orders of magnitude faster. Aside from providing additional validation of the original NNScore function, we here present a second neural-network scoring function, NNScore 2.0. NNScore 2.0 considers many more binding characteristics when predicting affinity than does the original NNScore. The network output of NNScore 2.0 also differs from that of NNScore 1.0; rather than a binary classification of ligand potency, NNScore 2.0 provides a single estimate of the pK(d). To facilitate use, NNScore 2.0 has been implemented as an open-source python script. A copy can be obtained from http://www.nbcr.net/software/nnscore/ .

• Implementation and evaluation of a docking-rescoring method using molecular footprint comparisons.
Balius, Trent E and Mukherjee, Sudipto and Rizzo, Robert C
Journal of computational chemistry, 2011, 32(10), 2273-2289
PMID: 21541962     doi: 10.1002/jcc.21814

A docking-rescoring method, based on per-residue van der Waals (VDW), electrostatic (ES), or hydrogen bond (HB) energies has been developed to aid discovery of ligands that have interaction signatures with a target (footprints) similar to that of a reference. Biologically useful references could include known drugs, inhibitors, substrates, transition states, or side-chains that mediate protein-protein interactions. Termed footprint similarity (FPS) score, the method, as implemented in the program DOCK, was validated and characterized using: (1) pose identification, (2) crossdocking, (3) enrichment, and (4) virtual screening. Improvements in pose identification (6-12%) were obtained using footprint-based (FPSVDW+ES) vs. standard DOCK (DCEVDW+ES) scoring as evaluated on three large datasets (680-775 systems) from the SB2010 database. Enhanced pose identification was also observed using FPS (45.4% or 70.9%) compared with DCE (17.8%) methods to rank challenging crossdocking ensembles from carbonic anhydrase. Enrichment tests, for three representative systems, revealed FPSVDW+ES scoring yields significant early fold enrichment in the top 10% of ranked databases. For EGFR, top FPS poses are nicely accommodated in the molecular envelope defined by the reference in comparison with DCE, which yields distinct molecular weight bias toward larger molecules. Results from a representative virtual screen of ca. 1 million compounds additionally illustrate how ligands with footprints similar to a known inhibitor can readily be identified from within large commercially available databases. By providing an alternative way to rank ligand poses in a simple yet directed manner we anticipate that FPS scoring will be a useful tool for docking and structure-based design. (C) 2011 Wiley Periodicals, Inc. J Comput Chem 32: 2273-2289, 2011

• Exhaustive search and solvated interaction energy (SIE) for virtual screening and affinity prediction.
Sulea, Traian and Hogues, Hervé and Purisima, Enrico O
Journal of computer-aided molecular design, 2011, 26(5), 617-633
PMID: 22198519     doi: 10.1007/s10822-011-9529-7

We carried out a prospective evaluation of the utility of the SIE (solvation interaction energy) scoring function for virtual screening and binding affinity prediction. Since experimental structures of the complexes were not provided, this was an exercise in virtual docking as well. We used our exhaustive docking program, Wilma, to provide high-quality poses that were rescored using SIE to provide binding affinity predictions. We also tested the combination of SIE with our latest solvation model, first shell of hydration (FiSH), which captures some of the discrete properties of water within a continuum model. We achieved good enrichment in virtual screening of fragments against trypsin, with an area under the curve of about 0.7 for the receiver operating characteristic curve. Moreover, the early enrichment performance was quite good with 50% of true actives recovered with a 15% false positive rate in a prospective calculation and with a 3% false positive rate in a retrospective application of SIE with FiSH. Binding affinity predictions for both trypsin and host-guest complexes were generally within 2 kcal/mol of the experimental values. However, the rank ordering of affinities differing by 2 kcal/mol or less was not well predicted. On the other hand, it was encouraging that the incorporation of a more sophisticated solvation model into SIE resulted in better discrimination of true binders from binders. This suggests that the inclusion of proper Physics in our models is a fruitful strategy for improving the reliability of our binding affinity predictions.

• Predicting Fragment Binding Poses Using a Combined MCSS MM-GBSA Approach.
Haider, Muhammad K and Bertrand, Hugues-Olivier and Hubbard, Roderick E
Journal of chemical information and modeling, 2011, 51(5), 1092-1105
PMID: 21528911     doi: 10.1021/ci100469n

Improved methods are required to predict the position and orientation (pose) of binding to the target protein of low molecular weight compounds identified in fragment screening campaigns. This is particularly important to guide initial chemistry to generate structure-activity relationships for the cases where a high resolution structure cannot be obtained. We have assessed the benefit of an implicit solvent method for assessment of fragment binding poses generated by the Multiple Copy Simultaneous Search (MCSS) method in CHARMm. Additionally, the effect of using multiple receptor structures for a flexible receptor is investigated. The original MCSS performance -50% of fragment positions accurately predicted and scored - was increased up to 67% by scoring MCSS energy minima with a Molecular Mechanics Generalized Born approach with molecular volume integration and Surface Area model (MM-GBSA). The same increase in performance (but occasionally for different targets) was observed when using the docking program GOLD followed by MM-GBSA rescoring. The combined results from both methods resulted in a higher success rate emphasizing that a comparison of different docking methods can increase the correct identification of binding poses. For a receptor where multiple structures are available, Hsp90, the average performance on randomly adding receptor structures was also investigated. The results suggest that predictions using these docking methods can be used with some confidence to guide chemical optimization, if the structure of the target either remains relatively fixed on ligand binding, or if a number of crystal structures are available with diverse ligands bound and there is information on the positions of key water molecules in the binding site.

## 2010

• Homology modeling and metabolism prediction of human carboxylesterase-2 using docking analyses by GriDock: a parallelized tool based on AutoDock 4.0.
Vistoli, Giulio and Pedretti, Alessandro and Mazzolari, Angelica and Testa, Bernard
Journal of computer-aided molecular design, 2010, 24(9), 771-787
PMID: 20623318     doi: 10.1007/s10822-010-9373-1

Metabolic problems lead to numerous failures during clinical trials, and much effort is now devoted to developing in silico models predicting metabolic stability and metabolites. Such models are well known for cytochromes P450 and some transferases, whereas less has been done to predict the activity of human hydrolases. The present study was undertaken to develop a computational approach able to predict the hydrolysis of novel esters by human carboxylesterase hCES2. The study involved first a homology modeling of the hCES2 protein based on the model of hCES1 since the two proteins share a high degree of homology (congruent with 73%). A set of 40 known substrates of hCES2 was taken from the literature; the ligands were docked in both their neutral and ionized forms using GriDock, a parallel tool based on the AutoDock4.0 engine which can perform efficient and easy virtual screening analyses of large molecular databases exploiting multi-core architectures. Useful statistical models (e.g., r (2)

• Rapid flexible docking using a stochastic rotamer library of ligands.
Ding, Feng and Yin, Shuangye and Dokholyan, Nikolay V
Journal of chemical information and modeling, 2010, 50(9), 1623-1632
PMID: 20712341     doi: 10.1021/ci100218t

Existing flexible docking approaches model the ligand and receptor flexibility either separately or in a loosely coupled manner, which captures the conformational changes inefficiently. Here, we propose a flexible docking approach, MedusaDock, which models both ligand and receptor flexibility simultaneously with sets of discrete rotamers. We developed an algorithm to build the ligand rotamer library "on-the-fly" during docking simulations. MedusaDock benchmarks demonstrate a rapid sampling efficiency and high prediction accuracy in both self- (to the cocrystallized state) and cross-docking (to a state cocrystallized with a different ligand), the latter of which mimics the virtual screening procedure in computational drug discovery. We also perform a virtual screening test of four flexible kinase targets, including cyclin-dependent kinase 2, vascular endothelial growth factor receptor 2, HIV reverse transcriptase, and HIV protease. We find significant improvements of virtual screening enrichments when compared to rigid-receptor methods. The predictive power of MedusaDock in cross-docking and preliminary virtual-screening benchmarks highlights the importance to model both ligand and receptor flexibility simultaneously in computational docking.

• Rapid context-dependent ligand desolvation in molecular docking.
Mysinger, Michael M. and Shoichet, Brian K
Journal of chemical information and modeling, 2010, 50(9), 1561-1573
PMID: 20735049     doi: 10.1021/ci100214a

In structure-based screens for new ligands, a molecular docking algorithm must rapidly score many molecules in multiple configurations, accounting for both the ligand's interactions with receptor and its competing interactions with solvent. Here we explore a context-dependent ligand desolvation scoring term for molecular docking. We relate the Generalized-Born effective Born radii for every ligand atom to a fractional desolvation and then use this fraction to scale an atom-by-atom decomposition of the full transfer free energy. The fractional desolvation is precomputed on a scoring grid by numerically integrating over the volume of receptor proximal to a ligand atom, weighted by distance. To test this method's performance, we dock ligands versus property-matched decoys over 40 DUD targets. Context-dependent desolvation better enriches ligands compared to both the raw full transfer free energy penalty and compared to ignoring desolvation altogether, though the improvement is modest. More compellingly, the new method improves docking performance across receptor types. Thus, whereas entirely ignoring desolvation works best for charged sites and overpenalizing with full desolvation works well for neutral sites, the physically more correct context-dependent ligand desolvation is competitive across both types of targets. The method also reliably discriminates ligands from highly charged molecules, where ignoring desolvation performs poorly. Since this context-dependent ligand desolvation may be precalculated, it improves docking reliability with minimal cost to calculation time and may be readily incorporated into any physics-based docking program.

• A reliable docking/scoring scheme based on the semiempirical quantum mechanical PM6-DH2 method accurately covering dispersion and H-bonding: HIV-1 protease with 22 ligands.
Fanfrlík, Jindrich and Bronowska, Agnieszka K and Rezác, Jan and Prenosil, Ondrej and Konvalinka, Jan and Hobza, Pavel
The journal of physical chemistry. B, 2010, 114(39), 12666-12678
PMID: 20839830     doi: 10.1021/jp1032965

In this study, we introduce a fast and reliable rescoring scheme for docked complexes based on a semiempirical quantum mechanical PM6-DH2 method. The method utilizes a PM6-based Hamiltonian with corrections for dispersion energy and hydrogen bonds. The total score is constructed as the sum of the PM6-DH2 interaction enthalpy, the empirical force field (AMBER) interaction entropy, and the sum of the deformation (PM6-DH2, SMD) and the desolvation (SMD) energies of the ligand. The main advantage of the procedure is the fact that we do not add any empirical parameter for either an individual component of the total score or an individual protein-ligand complex. This rescoring method is applied to a very challenging system, namely, the HIV-1 protease with a set of ligands. As opposed to the conventional DOCK procedure, the PM6-DH2 rescoring based on all of the terms distinguishes between binders and nonbinders and provides a reliable correlation of the theoretical and experimental binding free energies. Such a dramatic improvement, resulting from the PM6-DH2 rescoring of all the complexes, provides a valuable yet inexpensive tool for rational drug discovery and de novo ligand design.

• Dockomatic - automated ligand creation and docking.
Bullock, Casey W and Jacob, Reed B. and McDougal, Owen M. and Hampikian, Greg and Andersen, Tim
BMC research notes, 2010, 3, 289
PMID: 21059259     doi: 10.1186/1756-0500-3-289

BACKGROUND:The application of computational modeling to rationally design drugs and characterize macro biomolecular receptors has proven increasingly useful due to the accessibility of computing clusters and clouds. AutoDock is a well-known and powerful software program used to model ligand to receptor binding interactions. In its current version, AutoDock requires significant amounts of user time to setup and run jobs, and collect results. This paper presents DockoMatic, a user friendly Graphical User Interface (GUI) application that eases and automates the creation and management of AutoDock jobs for high throughput screening of ligand to receptor interactions.

• pK(a) based protonation states and microspecies for protein-ligand docking.
ten Brink, Tim and Exner, Thomas E.
Journal of computer-aided molecular design, 2010, 24(11), 935-942
PMID: 20882397     doi: 10.1007/s10822-010-9385-x

In this paper we present our reworked approach to generate ligand protonation states with our structure preparation tool SPORES (Structure PrOtonation and REcognition System). SPORES can be used for the preprocessing of proteins and protein-ligand complexes as e.g. taken from the Protein Data Bank as well as for the setup of 3D ligand databases. It automatically assigns atom and bond types, generates different protonation, tautomeric states as well as different stereoisomers. In the revised version, pKa calculations with the ChemAxon software MARVIN are used either to determine the likeliness of a combinatorial generated protonation state or to determine the titrable atoms used in the combinatorial approach. Additionally, the MARVIN software is used to predict microspecies distributions of ligand molecules. Docking studies were performed with our recently introduced program PLANTS (Protein-Ligand ANT System) on all protomers resulting from the three different selection methods for the well established CCDC/ASTEX clean data set demonstrating the usefulness of especially the latter approach.

• HarmonyDOCK: The Structural Analysis of Poses in Protein-Ligand Docking
Plewczynski, Dariusz and Philips, Anna and Von Grotthuss, Marcin and Rychlewski, Leszek and Ginalski, Krzysztof
Journal of computational biology : a journal of computational molecular cell biology, 2010, 18(00), 1-10
PMID: 21091053     doi: 10.1089/cmb.2009.0111

Abstract Molecular docking is a widely used method for lead optimization. However, docking tools often fail to predict how a ligand (the smaller molecule, such as a substrate or drug candidate) binds to a receptor (the accepting part of a protein). We present here the HarmonyDOCK, a novel method for assessing the docking software accuracy, and creating the scoring function which would determine consensus protein-ligand pose among those generated by available docking programs. Conformations for few hundred protein-ligand complexes with known three-dimensional structure were predicted on a benchmark set by set of different docking programs. On the basis of the derived ranking, the point of reference and the lower score limit were determined for subsequent investigations. The focus of the methodology is on the top-ranked poses, with the assumption being that the conformation of the docked molecules is the most accurate. We found out that some docking programs perform considerably better than the others, yet in all cases the proper selection of decoys, namely HarmonyDOCK, is needed for successful docking procedure.

• Ensemble docking into multiple crystallographically derived protein structures: an evaluation based on the statistical analysis of enrichments.
Craig, Ian R and Essex, Jonathan W and Spiegel, Katrin
Journal of chemical information and modeling, 2010, 50(4), 511-524
PMID: 20222690     doi: 10.1021/ci900407c

Docking into multiple receptor conformations ("ensemble docking") has been proposed, and employed, in the hope that it may account for receptor flexibility in virtual screening and thus provide higher enrichments than docking into single rigid receptor structures. The statistical analyses presented in this paper provide quantitative evidence that in some cases docking into a crystallographically derived conformational ensemble does indeed yield better enrichment than docking into any of the individual members of the ensemble. However, these "successful" ensembles account for only a minority of those examined and it would not have been possible to prospectively predict their identity using only protein structural information. A more frequently observed outcome is that the ensemble enrichment is higher than the mean of the enrichments provided by its individual members. An additional and promising finding is that, if a set of known active compounds is available, an approach based on induced-fit docking appears to be a reliable way to construct ensembles which provide relatively high enrichments.

• Q-Dock(LHM): Low-resolution refinement for ligand comparative modeling.
Brylinski, Michal and Skolnick, Jeffrey
Journal of computational chemistry, 2010, 31(5), 1093-1105
PMID: 19827144     doi: 10.1002/jcc.21395

The success of ligand docking calculations typically depends on the quality of the receptor structure. Given improvements in protein structure prediction approaches, approximate protein models now can be routinely obtained for the majority of gene products in a given proteome. Structure-based virtual screening of large combinatorial libraries of lead candidates against theoretically modeled receptor structures requires fast and reliable docking techniques capable of dealing with structural inaccuracies in protein models. Here, we present Q-Dock(LHM), a method for low-resolution refinement of binding poses provided by FINDSITE(LHM), a ligand homology modeling approach. We compare its performance to that of classical ligand docking approaches in ligand docking against a representative set of experimental (both holo and apo) as well as theoretically modeled receptor structures. Docking benchmarks reveal that unlike all-atom docking, Q-Dock(LHM) exhibits the desired tolerance to the receptor's structure deformation. Our results suggest that the use of an evolution-based approach to ligand homology modeling followed by fast low-resolution refinement is capable of achieving satisfactory performance in ligand-binding pose prediction with promising applicability to proteome-scale applications.

• Chemical space sampling by different scoring functions and crystal structures.
Brooijmans, Natasja and Humblet, Christine
Journal of computer-aided molecular design, 2010, 24(5), 433-447
PMID: 20401681     doi: 10.1007/s10822-010-9356-2

Virtual screening has become a popular tool to identify novel leads in the early phases of drug discovery. A variety of docking and scoring methods used in virtual screening have been the subject of active research in an effort to gauge limitations and articulate best practices. However, how to best utilize different scoring functions and various crystal structures, when available, is not yet well understood. In this work we use multiple crystal structures of PI3 K-gamma in both prospective and retrospective virtual screening experiments. Both Glide SP scoring and Prime MM-GBSA rescoring are utilized in the prospective and retrospective virtual screens, and consensus scoring is investigated in the retrospective virtual screening experiments. The results show that each of the different crystal structures that was used, samples a different chemical space, i.e. different chemotypes are prioritized by each structure. In addition, the different (re)scoring functions prioritize different chemotypes as well. Somewhat surprisingly, the Prime MM-GBSA scoring function generally gives lower enrichments than Glide SP. Finally we investigate the impact of different ligand preparation protocols on virtual screening enrichment factors. In summary, different crystal structures and different scoring functions are complementary to each other and allow for a wider variety of chemotypes to be considered for experimental follow-up.

• Improving performance of docking-based virtual screening by structural filtration.
Novikov, Fedor N and Stroylov, Viktor S and Stroganov, Oleg V and Chilov, Ghermes G
Journal of Molecular Modeling, 2010, 16(7), 1223-1230
PMID: 20041273     doi: 10.1007/s00894-009-0633-8

In the current study an innovative method of structural filtration of docked ligand poses is introduced and applied to improve the virtual screening results. The structural filter is defined by a protein-specific set of interactions that are a) structurally conserved in available structures of a particular protein with its bound ligands, and b) that can be viewed as playing the crucial role in protein-ligand binding. The concept was evaluated on a set of 10 diverse proteins, for which the corresponding structural filters were developed and applied to the results of virtual screening obtained with the Lead Finder software. The application of structural filtration resulted in a considerable improvement of the enrichment factor ranging from several folds to hundreds folds depending on the protein target. It appeared that the structural filtration had effectively repaired the deficiencies of the scoring functions that used to overestimate decoy binding, resulting into a considerably lower false positive rate. In addition, the structural filters were also effective in dealing with some deficiencies of the protein structure models that would lead to false negative predictions otherwise. The ability of structural filtration to recover relatively small but specifically bound molecules creates promises for the application of this technology in the fragment-based drug discovery.

• Prediction of protein-ligand binding affinities using multiple instance learning.
Teramoto, Reiji and Kashima, Hisashi
Journal of molecular graphics & modelling, 2010, 29(3), 492-497
PMID: 20965757     doi: 10.1016/j.jmgm.2010.09.006

Accurate prediction of protein-ligand binding affinities for lead optimization in drug discovery remains an important and challenging problem on scoring functions for docking simulation. In this paper, we propose a data-driven approach that integrates multiple scoring functions to predict protein-ligand binding affinity directly. We then propose a new method called multiple instance regression based scoring (MIRS) that incorporates unbound ligand conformations using multiple scoring functions. We evaluated the predictive performance of MIRS using 100 protein-ligand complexes and their binding affinities. The experimental results showed that MIRS outperformed the 11 conventional scoring functions including LigScore, PLP, AutoDock, G-Score, D-Score, LUDI, F-Score, ChemScore, X-Score, PMF, and DrugScore. In addition, we confirmed that MIRS performed well on binding pose prediction. Our results reveal that it is indispensable to incorporate unbound ligand conformations in both binding affinity prediction and binding pose prediction. The proposed method will accelerate efficient lead optimization on structure-based drug design and provide a new direction to designing of new scoring score functions.

• A fast protein-ligand docking algorithm based on hydrogen bond matching and surface shape complementarity.
Luo, Wenjia and Pei, Jianfeng and Zhu, Yushan
Journal of Molecular Modeling, 2010, 16(5), 903-913
PMID: 19823881     doi: 10.1007/s00894-009-0598-7

With the rapid development of structural determination of target proteins for human diseases, high throughout virtual screening based drug discovery is gaining popularity gradually. In this paper, a fast docking algorithm (H-DOCK) based on hydrogen bond matching and surface shape complementarity was developed. In H-DOCK, firstly a divide-and-conquer strategy based enumeration approach is applied to rank the intermolecular modes between protein and ligand by maximizing their hydrogen bonds matching, then each docked conformation of the ligand is calculated according to the matched hydrogen bonding geometry, finally a simple but effective scoring function reflecting mainly the van der Waals interaction is used to evaluate the docked conformations of the ligand. H-DOCK is tested for rigid ligand docking and flexible one, the latter is implemented by repeating rigid docking for multiple conformations of a small molecule and ranking all together. For rigid ligands, H-DOCK was tested on a set of 271 complexes where there is at least one intermolecular hydrogen bond, and H-DOCK achieved success rate (RMSD<2.0 A) of 91.1%. For flexible ligands, H-DOCK was tested on another set of 93 complexes, where each case was a conformation ensemble containing native ligand conformation as well as 100 decoy ones generated by AutoDock, and the success rate reached 81.7%. The high success rate of H-DOCK indicates that the hydrogen bonding and steric hindrance can grasp the key interaction between protein and ligand. H-DOCK is quite efficient compared with the conventional docking algorithms, and it takes only about 0.14 seconds for a rigid ligand docking and about 8.25 seconds for a flexible one on average. According to the preliminary docking results, it implies that H-DOCK can be potentially used for large scale virtual screening as a pre-filter for a more accurate but less efficient docking algorithm.

• Use of the FACTS solvation model for protein-ligand docking calculations. Application to EADock.
Zoete, Vincent and Grosdidier, Aurélien and Cuendet, Michel and Michielin, Olivier
Journal of molecular recognition : JMR, 2010, 23(5), 457-461
PMID: 20101644     doi: 10.1002/jmr.1012

Protein-ligand docking has made important progress during the last decade and has become a powerful tool for drug development, opening the way to virtual high throughput screening and in silico structure-based ligand design. Despite the flattering picture that has been drawn, recent publications have shown that the docking problem is far from being solved, and that more developments are still needed to achieve high successful prediction rates and accuracy. Introducing an accurate description of the solvation effect upon binding is thought to be essential to achieve this goal. In particular, EADock uses the Generalized Born Molecular Volume 2 (GBMV2) solvent model, which has been shown to reproduce accurately the desolvation energies calculated by solving the Poisson equation. Here, the implementation of the Fast Analytical Continuum Treatment of Solvation (FACTS) as an implicit solvation model in small molecules docking calculations has been assessed using the EADock docking program. Our results strongly support the use of FACTS for docking. The success rates of EADock/FACTS and EADock/GBMV2 are similar, i.e. around 75% for local docking and 65% for blind docking. However, these results come at a much lower computational cost: FACTS is 10 times faster than GBMV2 in calculating the total electrostatic energy, and allows a speed up of EADock by a factor of 4. This study also supports the EADock development strategy relying on the CHARMM package for energy calculations, which enables straightforward implementation and testing of the latest developments in the field of Molecular Modeling.

• Reducing docking score variations arising from input differences.
Feher, Miklos and Williams, Christopher I
Journal of chemical information and modeling, 2010, 50(9), 1549-1560
PMID: 20698562     doi: 10.1021/ci100204x

The variability of docking results as a function of variations in ligand input conformations was studied for the GOLD, Glide, FlexX, and Surflex programs. It is concluded that there are two major effects leading to such variability: the adequacy of conformational search during docking and random "chaotic" effects arising from sensitivity to small input perturbations. It is shown that although the former is generally the stronger effect, the latter is also highly significant for almost all docking engines. The strong target-to-target variation of the magnitude of these effects is emphasized. The performance of different packages is compared using these measures. Guidelines are provided for different programs to reduce variability and improve reproducibility, which involve using a small number of input conformations as starting points for docking, followed by the selection of the top scoring docked pose from the results as the best docked solution.

• SKATE: a docking program that decouples systematic sampling from scoring.
Feng, Jianwen A and Marshall, Garland R.
Journal of computational chemistry, 2010, 31(14), 2540-2554
PMID: 20740553     doi: 10.1002/jcc.21545

SKATE is a docking prototype that decouples systematic sampling from scoring. This novel approach removes any interdependence between sampling and scoring functions to achieve better sampling and, thus, improves docking accuracy. SKATE systematically samples a ligand's conformational, rotational and translational degrees of freedom, as constrained by a receptor pocket, to find sterically allowed poses. Efficient systematic sampling is achieved by pruning the combinatorial tree using aggregate assembly, discriminant analysis, adaptive sampling, radial sampling, and clustering. Because systematic sampling is decoupled from scoring, the poses generated by SKATE can be ranked by any published, or in-house, scoring function. To test the performance of SKATE, ligands from the Asetex/CDCC set, the Surflex set, and the Vertex set, a total of 266 complexes, were redocked to their respective receptors. The results show that SKATE was able to sample poses within 2 A RMSD of the native structure for 98, 95, and 98% of the cases in the Astex/CDCC, Surflex, and Vertex sets, respectively. Cross-docking accuracy of SKATE was also assessed by docking 10 ligands to thymidine kinase and 73 ligands to cyclin-dependent kinase.

• AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.
Trott, Oleg and Olson, Arthur J
Journal of computational chemistry, 2010, 31(2), 455-461
PMID: 19499576     doi: 10.1002/jcc.21334

AutoDock Vina, a new program for molecular docking and virtual screening, is presented. AutoDock Vina achieves an approximately two orders of magnitude speed-up compared with the molecular docking software previously developed in our lab (AutoDock 4), while also significantly improving the accuracy of the binding mode predictions, judging by our tests on the training set used in AutoDock 4 development. Further speed-up is achieved from parallelism, by using multithreading on multicore machines. AutoDock Vina automatically calculates the grid maps and clusters the results in a way transparent to the user.

• ParaDockS: a framework for molecular docking with population-based metaheuristics.
Meier, René and Pippel, Martin and Brandt, Frank and Sippl, Wolfgang and Baldauf, Carsten
Journal of chemical information and modeling, 2010, 50(5), 879-889
PMID: 20415499     doi: 10.1021/ci900467x

Molecular docking is a simulation technique that aims to predict the binding pose between a ligand and a receptor. The resulting multidimensional continuous optimization problem is practically unsolvable in an exact way. One possible approach is the combination of an optimization algorithm and an objective function that describes the interaction. The software ParaDockS is designed to hold different optimization algorithms and objective functions. At the current stage, an adapted particle-swarm optimizer (PSO) is implemented. Available objective functions are (i) the empirical objective function p-Score and (ii) an adapted version of the knowledge-based potential PMF04. We tested the docking accuracy in terms of reproducing known crystal structures from the PDBbind core set. For 73% of the test instances the native binding mode was found with an rmsd below 2 A. The virtual screening efficiency was tested with a subset of 13 targets and the respective ligands and decoys from the directory of useful decoys (DUD). ParaDockS with PMF04 shows a superior early enrichment. The here presented approach can be employed for molecular docking experiments and virtual screenings of large compound libraries in academia as well as in industrial research and development. The performance in terms of accuracy and enrichment is close to the results of commercial software solutions.

• Virtual fragment docking by Glide: a validation study on 190 protein-fragment complexes.
Sándor, Márk and Kiss, Róbert and Keseru, György M
Journal of chemical information and modeling, 2010, 50(6), 1165-1172
PMID: 20459088     doi: 10.1021/ci1000407

The docking accuracy of Glide was evaluated using 16 different docking protocols on 190 protein-fragment complexes representing 78 targets. Standard precision docking (Glide SP) based protocols showed the best performance. The average root-mean-square deviation (rmsd) between the docked and cocrystallized poses achieved by Glide SP with pre- and postprocessing was 1.17 A, and an acceptable binding mode with rmsd < 2 A could be found in 80% of the cases. Comparison of the docking results produced by different protocols suggests that the sampling efficacy of Glide is adequate for fragment docking. The docking accuracy seems to be limited by the performance of scoring schemes, which is supported by the weak correlation between experimental binding affinities and GlideScores. Cross-docking experiments performed on 8 targets represented by 63 complexes revealed that Glide SP gave similar results to that of the computationally more intensive Glide XP. The average rmsd achieved by Glide SP with pre- and postprocessing was 2.06 A, and an acceptable binding mode with rmsd < 2 A could be found in 63% of the cases. These cross-docking results were improved significantly selecting the optimal X-ray structure for each target (average rmsd

• Comparison of three preprocessing filters efficiency in virtual screening: identification of new putative LXRbeta regulators as a test case.
Ghemtio, Léo and Devignes, Marie-Dominique and Smaïl-Tabbone, Malika and Souchet, Michel and Leroux, Vincent and Maigret, Bernard
Journal of chemical information and modeling, 2010, 50(5), 701-715
PMID: 20420434     doi: 10.1021/ci900356m

In silico screening methodologies are widely recognized as efficient approaches in early steps of drug discovery. However, in the virtual high-throughput screening (VHTS) context, where hit compounds are searched among millions of candidates, three-dimensional comparison techniques and knowledge discovery from databases should offer a better efficiency to finding novel drug leads than those of computationally expensive molecular dockings. Therefore, the present study aims at developing a filtering methodology to efficiently eliminate unsuitable compounds in VHTS process. Several filters are evaluated in this paper. The first two are structure-based and rely on either geometrical docking or pharmacophore depiction. The third filter is ligand-based and uses knowledge-based and fingerprint similarity techniques. These filtering methods were tested with the Liver X Receptor (LXR) as a target of therapeutic interest, as LXR is a key regulator in maintaining cholesterol homeostasis. The results show that the three considered filters are complementary so that their combination should generate consistent compound lists of potential hits.

• VSDocker: a tool for parallel high-throughput virtual screening using AutoDock on Windows-based computer clusters
Prakhov, Nikita D. and Chernorudskiy, Alexander L. and Gainullin, Murat R.
Bioinformatics (Oxford, England), 2010, 26(10), 1374-1375
PMID: 20378556     doi: 10.1093/bioinformatics/btq149

VSDocker is an original program that allows using AutoDock4 for optimized virtual ligand screening on computer clusters or multiprocessor workstations. This tool is the first implementation of parallel high-performance virtual screening of ligands for MS Windows-based computer systems.

• NNScore: a neural-network-based scoring function for the characterization of protein-ligand complexes.
Durrant, Jacob D and McCammon, J Andrew
Journal of chemical information and modeling, 2010, 50(10), 1865-1871
PMID: 20845954     doi: 10.1021/ci100244v

As high-throughput biochemical screens are both expensive and labor intensive, researchers in academia and industry are turning increasingly to virtual-screening methodologies. Virtual screening relies on scoring functions to quickly assess ligand potency. Although useful for in silico ligand identification, these scoring functions generally give many false positives and negatives; indeed, a properly trained human being can often assess ligand potency by visual inspection with greater accuracy. Given the success of the human mind at protein-ligand complex characterization, we present here a scoring function based on a neural network, a computational model that attempts to simulate, albeit inadequately, the microscopic organization of the brain. Computer-aided drug design depends on fast and accurate scoring functions to aid in the identification of small-molecule ligands. The scoring function presented here, used either on its own or in conjunction with other more traditional functions, could prove useful in future drug-discovery efforts.

• Docking Validation Resources: Protein Family and Ligand Flexibility Experiments
Mukherjee, Sudipto and Balius, Trent E and Rizzo, Robert C
Journal of chemical information and modeling, 2010, 50(11), 1986-2000
PMID: 21033739     doi: 10.1021/ci1001982

A database consisting of 780 ligand-receptor complexes, termed SB2010, has been derived from the Protein Databank to evaluate the accuracy of docking protocols for regenerating bound ligand conformations. The goal is to provide easily accessible community resources for development of improved procedures to aid virtual screening for ligands with a wide range of flexibilities. Three core experiments using the program DOCK, which employ rigid (ROD), fixed anchor (FAD), and flexible (FLX) protocols, were used to gauge performance by several different metrics: (I) global results, (2) ligand flexibility, (3) protein family, and (4) cross-docking. Global spectrum plots of successes and failures vs rmsd reveal well-defined inflection regions, which suggest the commonly used 2 angstrom criteria is a reasonable choice for defining success. Across all 780 systems, success tracks with the relative difficulty of the calculations: RGD (82.3%) > FAD (78.1%) > FLX (63.8%). In general, failures due to scoring strongly outweigh those due to sampling. Subsets of SB2010 grouped by ligand flexibility (7-or-less, 8-to-15, and 15-plus rotatable bonds) reveal that success degrades linearly for FAD and FLX protocols, in contrast to ROD, which remains constant. Despite the challenges associated with FLX anchor orientation and on-the-fly flexible growth, success rates for the 7-or-less (74.5%) and, in particular, the 8-to-15 (55.2%) subset are encouraging. Poorer results for the very flexible 15-plus set (39.3%) indicate substantial room for improvement. Family-based success appears largely independent of ligand flexibility, suggesting a strong dependence on the binding site environment. For example, zinc-containing proteins are generally problematic, despite moderately flexible ligands. Finally, representative cross-docking examples, for carbonic anhydrase, thermolysin, and neuraminidase families, show the utility of family-based analysis for rapid identification of particularly good or bad docking trends, and the type of failures involved (scoring/sampling), which will likely be of interest to researchers making specific receptor choices for virtual screening. SB2010 is available for download at http://rizzolab.org.

• Blind docking method combining search of low-resolution binding sites with ligand pose refinement by molecular dynamics-based global optimization.
Vorobjev, Yury N
Journal of computational chemistry, 2010, 31(5), 1080-1092
PMID: 19821514     doi: 10.1002/jcc.21394

This study describes the development of a new blind hierarchical docking method, bhDock, its implementation, and accuracy assessment. The bhDock method uses two-step algorithm. First, a comprehensive set of low-resolution binding sites is determined by analyzing entire protein surface and ranked by a simple score function. Second, ligand position is determined via a molecular dynamics-based method of global optimization starting from a small set of high ranked low-resolution binding sites. The refinement of the ligand binding pose starts from uniformly distributed multiple initial ligand orientations and uses simulated annealing molecular dynamics coupled with guided force-field deformation of protein-ligand interactions to find the global minimum. Assessment of the bhDock method on the set of 37 protein-ligand complexes has shown the success rate of predictions of 78%, which is better than the rate reported for the most cited docking methods, such as AutoDock, DOCK, GOLD, and FlexX, on the same set of complexes.

• Exploring hierarchical refinement techniques for induced fit docking with protein and ligand flexibility.
Borrelli, Kenneth W and Cossins, Benjamin and Guallar, Victor
Journal of computational chemistry, 2010, 31(6), 1224-1235
PMID: 19885871     doi: 10.1002/jcc.21409

We present a series of molecular-mechanics-based protein refinement methods, including two novel ones, applied as part of an induced fit docking procedure. The methods used include minimization; protein and ligand sidechain prediction; a hierarchical ligand placement procedure similar to a-priori protein loop predictions; and a minimized Monte Carlo approach using normal mode analysis as a move step. The results clearly indicate the importance of a proper opening of the active site backbone, which might not be accomplished when the ligand degrees of freedom are prioritized. The most accurate method consisted of the minimized Monte Carlo procedure designed to open the active site followed by a hierarchical optimization of the sidechain packing around a mobile flexible ligand. The methods have been used on a series of 88 protein-ligand complexes including both cross-docking and apo-docking members resulting in complex conformations determined to within 2.0 A heavy-atom RMSD in 75% of cases where the protein backbone rearrangement upon binding is less than 1.0 A alpha-carbon RMSD. We also demonstrate that physics-based all-atom potentials can be more accurate than docking-style potentials when complexes are sufficiently refined.

• A new Lamarckian genetic algorithm for flexible ligand-receptor docking.
Fuhrmann, Jan and Rurainski, Alexander and Lenhof, Hans-Peter and Neumann, Dirk
Journal of computational chemistry, 2010, 31(9), 1911-1918
PMID: 20082382     doi: 10.1002/jcc.21478

We present a Lamarckian genetic algorithm (LGA) variant for flexible ligand-receptor docking which allows to handle a large number of degrees of freedom. Our hybrid method combines a multi-deme LGA with a recently published gradient-based method for local optimization of molecular complexes. We compared the performance of our new hybrid method to two non gradient-based search heuristics on the Astex diverse set for flexible ligand-receptor docking. Our results show that the novel approach is clearly superior to other LGAs employing a stochastic optimization method. The new algorithm features a shorter run time and gives substantially better results, especially with increasing complexity of the ligands. Thus, it may be used to dock ligands with many rotatable bonds with high efficiency.

• Improved docking, screening and selectivity prediction for small molecule nuclear receptor modulators using conformational ensembles.
Park, So-Jung and Kufareva, Irina and Abagyan, Ruben
Journal of computer-aided molecular design, 2010, 24(5), 459-471
PMID: 20455005     doi: 10.1007/s10822-010-9362-4

Nuclear receptors (NRs) are ligand dependent transcriptional factors and play a key role in reproduction, development, and homeostasis of organism. NRs are potential targets for treatment of cancer and other diseases such as inflammatory diseases, and diabetes. In this study, we present a comprehensive library of pocket conformational ensembles of thirteen human nuclear receptors (NRs), and test the ability of these ensembles to recognize their ligands in virtual screening, as well as predict their binding geometry, functional type, and relative binding affinity. 157 known NR modulators and 66 structures were used as a benchmark. Our pocket ensemble library correctly predicted the ligand binding poses in 94% of the cases. The models were also highly selective for the active ligands in virtual screening, with the areas under the ROC curves ranging from 82 to a remarkable 99%. Using the computationally determined receptor-specific binding energy offsets, we showed that the ensembles can be used for predicting selectivity profiles of NR ligands. Our results evaluate and demonstrate the advantages of using receptor ensembles for compound docking, screening, and profiling.

• Multiple ligand simultaneous docking: orchestrated dancing of ligands in binding sites of protein.
Li, Huameng and Li, Chenglong
Journal of computational chemistry, 2010, 31(10), 2014-2022
PMID: 20166125     doi: 10.1002/jcc.21486

Present docking methodologies simulate only one single ligand at a time during docking process. In reality, the molecular recognition process always involves multiple molecular species. Typical protein-ligand interactions are, for example, substrate and cofactor in catalytic cycle; metal ion coordination together with ligand(s); and ligand binding with water molecules. To simulate the real molecular binding processes, we propose a novel multiple ligand simultaneous docking (MLSD) strategy, which can deal with all the above processes, vastly improving docking sampling and binding free energy scoring. The work also compares two search strategies: Lamarckian genetic algorithm and particle swarm optimization, which have respective advantages depending on the specific systems. The methodology proves robust through systematic testing against several diverse model systems: E. coli purine nucleoside phosphorylase (PNP) complex with two substrates, SHP2NSH2 complex with two peptides and Bcl-xL complex with ABT-737 fragments. In all cases, the final correct docking poses and relative binding free energies were obtained. In PNP case, the simulations also capture the binding intermediates and reveal the binding dynamics during the recognition processes, which are consistent with the proposed enzymatic mechanism. In the other two cases, conventional single-ligand docking fails due to energetic and dynamic coupling among ligands, whereas MLSD results in the correct binding modes. These three cases also represent potential applications in the areas of exploring enzymatic mechanism, interpreting noisy X-ray crystallographic maps, and aiding fragment-based drug design, respectively.

• An interaction-motif-based scoring function for protein-ligand docking.
Xie, Zhong-Ru and Hwang, Ming-Jing
Bmc Bioinformatics, 2010, 11, 298
PMID: 20525216     doi: 10.1186/1471-2105-11-298

BACKGROUND:A good scoring function is essential for molecular docking computations. In conventional scoring functions, energy terms modeling pairwise interactions are cumulatively summed, and the best docking solution is selected. Here, we propose to transform protein-ligand interactions into three-dimensional geometric networks, from which recurring network substructures, or network motifs, are selected and used to provide probability-ranked interaction templates with which to score docking solutions.

## 2009

• Elastic potential grids: accurate and efficient representation of intermolecular interactions for fully flexible docking.
Kazemi, Sina and Krüger, Dennis M and Sirockin, Finton and Gohlke, Holger
Chemmedchem, 2009, 4(8), 1264-1268
PMID: 19514026     doi: 10.1002/cmdc.200900146

• Scoring confidence index: statistical evaluation of ligand binding mode predictions.
Zavodszky, Maria I and Stumpff-Kane, Andrew W and Lee, David J and Feig, Michael
Journal of computer-aided molecular design, 2009, 23(5), 289-299
PMID: 19153808     doi: 10.1007/s10822-008-9258-8

Protein-ligand docking programs can generate a large number of possible binding orientations for each ligand candidate. The challenge is to identify the orientations closest to the native binding mode using a scoring method. Many different scoring functions have been developed for protein-ligand scoring, but their performance on binding mode prediction is often target-dependent. In this study, a statistical approach was employed to provide a confidence measure of scoring performance in finding close to the correct docked ligand orientations. It exploits the fact that the scores provided by an adequately performing scoring function generally improve as the ligand binding modes get closer to the correct native orientation. For such cases, the correlation coefficient of scores versus distances is expected to be highest when the most native-like orientation is used as a reference. This correlation coefficient, called the correlation-based score (CBScore), was used as an indicator of how far the docked pose was from the native orientation. The correlation between the original scores and CBScores as well as the range of CBScores were found to be good measures of scoring performance. They were combined into a single quantity, called the scoring confidence index. High values of the scoring confidence index were indicative of pronounced and relatively smooth binding energy landscapes with easily discernable global minima, resulting in reliable binding mode predictions. Low values of this index reflected rugged energy landscapes making the prediction of the correct binding mode very difficult and often unreliable. The diagnostic ability of the scoring confidence index was tested on a non-redundant set of 50 protein-ligand complexes scored with three commonly employed scoring functions: AffiScore, DrugScore and X-Score. Binding mode predictions were found to be three times more reliable for complexes with scoring confidence indices in the upper half than for cases with values in the lower half of the resulting range of 0-1.6. This new confidence measure of scoring performance is expected to be a valuable tool for virtual screening applications.

• FINDSITE: a threading-based approach to ligand homology modeling.
Brylinski, Michal and Skolnick, Jeffrey
PLoS computational biology, 2009, 5(6), e1000405
PMID: 19503616     doi: 10.1371/journal.pcbi.1000405

Ligand virtual screening is a widely used tool to assist in new pharmaceutical discovery. In practice, virtual screening approaches have a number of limitations, and the development of new methodologies is required. Previously, we showed that remotely related proteins identified by threading often share a common binding site occupied by chemically similar ligands. Here, we demonstrate that across an evolutionarily related, but distant family of proteins, the ligands that bind to the common binding site contain a set of strongly conserved anchor functional groups as well as a variable region that accounts for their binding specificity. Furthermore, the sequence and structure conservation of residues contacting the anchor functional groups is significantly higher than those contacting ligand variable regions. Exploiting these insights, we developed FINDSITE(LHM) that employs structural information extracted from weakly related proteins to perform rapid ligand docking by homology modeling. In large scale benchmarking, using the predicted anchor-binding mode and the crystal structure of the receptor, FINDSITE(LHM) outperforms classical docking approaches with an average ligand RMSD from native of approximately 2.5 A. For weakly homologous receptor protein models, using FINDSITE(LHM), the fraction of recovered binding residues and specific contacts is 0.66 (0.55) and 0.49 (0.38) for highly confident (all) targets, respectively. Finally, in virtual screening for HIV-1 protease inhibitors, using similarity to the ligand anchor region yields significantly improved enrichment factors. Thus, the rather accurate, computationally inexpensive FINDSITE(LHM) algorithm should be a useful approach to assist in the discovery of novel biopharmaceuticals.

• Four-dimensional docking: a fast and accurate account of discrete receptor flexibility in ligand docking.
Bottegoni, Giovanni and Kufareva, Irina and Totrov, Maxim and Abagyan, Ruben
Journal of medicinal chemistry, 2009, 52(2), 397-406
PMID: 19090659     doi: 10.1021/jm8009958

Many available methods aimed at incorporating the receptor flexibility in ligand docking are computationally expensive, require a high level of user intervention, and were tested only on benchmarks of limited size and diversity. Here we describe the four-dimensional (4D) docking approach that allows seamless incorporation of receptor conformational ensembles in a single docking simulation and reduces the sampling time while preserving the accuracy of traditional ensemble docking. The approach was tested on a benchmark of 99 therapeutically relevant proteins and 300 diverse ligands (half of them experimental or marketed drugs). The conformational variability of the binding pockets was represented by the available crystallographic data, with the total of 1113 receptor structures. The 4D docking method reproduced the correct ligand binding geometry in 77.3% of the benchmark cases, matching the success rate of the traditional approach but employed on average only one-fourth of the time during the ligand sampling phase.

• Predicting multiple ligand binding modes using self-consistent pharmacophore hypotheses.
Wallach, Izhar and Lilien, Ryan
Journal of chemical information and modeling, 2009, 49(9), 2116-2128
PMID: 19711952     doi: 10.1021/ci900199e

The ability to predict ligand binding modes without the aid of wet-lab experiments may accelerate and reduce the cost of drug discovery research. Despite significant recent progress, virtual screening has not yet eliminated the need for wet-lab experiments. For example, after a lead compound has been identified, the precise binding mode is still typically determined by experimental structural biology. This structural knowledge is then employed to guide lead optimization. We present a step toward improving protein-ligand binding mode prediction for a set of ligands known to interact with a common protein. There is thus an important distinction between this work and traditional virtual screening algorithms. Whereas traditional approaches attempt to identify binding ligands from a large database of available compounds, our approach aims to more accurately predict the binding mode for a set of ligands which are already known to bind the target protein. The approach is based on the hypothesis that each active site contains a set of interaction points which binding ligands tend to exploit. In a more traditional context, these interaction points make up a pharmacophoric map. Our algorithm first performs traditional protein-ligand docking for each known binder. The ranked lists of candidate binding modes are then evaluated to identify a set of poses maximally self-consistent with respect to a pharmacophoric map generated from the same poses. We have extensively demonstrated the application of the algorithm to four protein systems (thrombin, cyclin-dependent kinase 2, dihydrofolate reductase, and HIV-1 protease) and attained predictions with an average RMSD < 2.5 A for all tested systems. This represents a typical improvement of 0.5-1.0 A (up to 25%) RMSD over the naive virtual docking predictions. Our algorithm is independent of the docking method and may significantly improve binding mode prediction of virtual docking experiments.

• Improving virtual screening performance against conformational variations of receptors by shape matching with ligand binding pocket.
Lee, Hui Sun and Lee, Cheol Soon and Kim, Jeong Sook and Kim, Dong Hou and Choe, Han
Journal of chemical information and modeling, 2009, 49(11), 2419-2428
PMID: 19852439     doi: 10.1021/ci9002365

In this report, we present a novel virtual high-throughput screening methodology to assist in computer-aided drug discovery. Our method, designated as SLIM, involves ligand-free shape and chemical feature matching. The procedure takes advantage of a negative image of a binding pocket in a target receptor. The negative image is a set of virtual atoms representing the inner shape and chemical features of the binding pocket. Using this image, SLIM implements a shape-based similarity search based on molecular volume superposition for the ensemble of conformers of each molecule. The superposed structures, prioritized by shape similarity, are subjected to comparison of chemical feature similarities. To validate the merits of the SLIM method, we compared its performance with those of three distinct widely used tools ROCS, GLIDE, and GOLD. ROCS was selected as a representative of the ligand-centric methods, and docking programs GLIDE and GOLD as representatives of the receptor-centric methods. Our data suggest that SLIM has overall hit ranking ability that is comparable to that of the docking method, retaining the high computational speed of the ligand-centric method. It is notable that the SLIM method offers consistently reliable screening quality against conformational variations of receptors, whereas the docking methods have limited screening performance.

• Scoring ligand similarity in structure-based virtual screening.
Zavodszky, Maria I and Rohatgi, Anjali and Van Voorst, Jeffrey R and Yan, Honggao and Kuhn, Leslie A
Journal of molecular recognition : JMR, 2009, 22(4), 280-292
PMID: 19235177     doi: 10.1002/jmr.942

Scoring to identify high-affinity compounds remains a challenge in virtual screening. On one hand, protein-ligand scoring focuses on weighting favorable and unfavorable interactions between the two molecules. Ligand-based scoring, on the other hand, focuses on how well the shape and chemistry of each ligand candidate overlay on a three-dimensional reference ligand. Our hypothesis is that a hybrid approach, using ligand-based scoring to rank dockings selected by protein-ligand scoring, can ensure that high-ranking molecules mimic the shape and chemistry of a known ligand while also complementing the binding site. Results from applying this approach to screen nearly 70 000 National Cancer Institute (NCI) compounds for thrombin inhibitors tend to support the hypothesis. EON ligand-based ranking of docked molecules yielded the majority (4/5) of newly discovered, low to mid-micromolar inhibitors from a panel of 27 assayed compounds, whereas ranking docked compounds by protein-ligand scoring alone resulted in one new inhibitor. Since the results depend on the choice of scoring function, an analysis of properties was performed on the top-scoring docked compounds according to five different protein-ligand scoring functions, plus EON scoring using three different reference compounds. The results indicate that the choice of scoring function, even among scoring functions measuring the same types of interactions, can have an unexpectedly large effect on which compounds are chosen from screening. Furthermore, there was almost no overlap between the top-scoring compounds from protein-ligand versus ligand-based scoring, indicating the two approaches provide complementary information. Matchprint analysis, a new addition to the SLIDE (Screening Ligands by Induced-fit Docking, Efficiently) screening toolset, facilitated comparison of docked molecules' interactions with those of known inhibitors. The majority of interactions conserved among top-scoring compounds for a given scoring function, and from the different scoring functions, proved to be conserved interactions in known inhibitors. This was particularly true in the S1 pocket, which was occupied by all the docked compounds.

• PLATINUM: a web tool for analysis of hydrophobic/hydrophilic organization of biomolecular complexes.
Pyrkov, Timothy V and Chugunov, Anton O and Krylov, Nikolay A and Nolde, Dmitry E and Efremov, Roman G
Bioinformatics (Oxford, England), 2009, 25(9), 1201-1202
PMID: 19244385     doi: 10.1093/bioinformatics/btp111

The PLATINUM (Protein-Ligand ATtractions Investigation NUMerically) web service is designed for analysis and visualization of hydrophobic/hydrophilic properties of biomolecules supplied as 3D-structures. Furthermore, PLATINUM provides a number of tools for quantitative characterization of the hydrophobic/hydrophilic match in biomolecular complexes e.g. in docking poses. These complement standard scoring functions. The calculations are based on the concept of empirical Molecular Hydrophobicity Potential (MHP). AVAILABILITY: The PLATINUM web tool as well as detailed documentation and tutorial are available free of charge for academic users at http://model.nmr.ru/platinum/. PLATINUM requires Java 5 or higher and Adobe Flash Player 9. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

• Assessment of QM/MM scoring functions for molecular docking to HIV-1 protease.
Fong, Pedro and McNamara, Jonathan P and Hillier, Ian H and Bryce, Richard A
Journal of chemical information and modeling, 2009, 49(4), 913-924
PMID: 19309119     doi: 10.1021/ci800432s

We explore the ability of four quantum mechanical (QM)/molecular mechanical (MM) models to accurately identify the native pose of six HIV-1 protease inhibitors and compare them with the AMBER force field and ChemScore and GoldScore scoring functions. Three QM/MM scoring functions treated the ligand at the HF/6-31G*, AM1d, and PM3 levels; the fourth QM/MM function modeled the ligand and active site at the PM3-D level. For the discrimination of native from non-native poses, solvent-corrected HF/6-31G*:AMBER and AMBER functions exhibited the best overall performance. While the electrostatic component of the MM and QM/MM functions appears important for discriminating the native pose of the ligand, the polarization contribution in the QM/MM functions was relatively insensitive to a ligand's binding mode and, for one ligand, actually hindered discrimination. The inclusion of a desolvation penalty, here using a generalized Born solvent model, improved discrimination for the MM and QM/MM methods. There appeared to be no advantage to binding mode prediction by incorporating active site polarization at the PM3-D level. Finally, we found that choice of the protonation state of the aspartyl dyad in the HIV-1 protease active site influenced the ability of scoring methods to determine the native binding pose.

• Docking ligands into flexible and solvated macromolecules. 5. Force-field-based prediction of binding affinities of ligands to proteins.
Englebienne, Pablo and Moitessier, Nicolas
Journal of chemical information and modeling, 2009, 49(11), 2564-2571
PMID: 19928836     doi: 10.1021/ci900251k

We report herein our efforts in the development of three empirical scoring functions with application in protein-ligand docking. A first scoring function was developed from 209 crystal structures of protein-ligand complexes and a second one from 946 cross-docked complexes. Tuning of the coefficients for the different terms making up these functions was performed by an iterative approach to optimize the correlations between observed activities and calculated scores. A third scoring function was developed from libraries of known actives and decoys docked to six different protein conformational ensembles. In the latter case, the tuning of the coefficients was performed so as to optimize the area under the curve of a receiver operating characteristic (ROC) for the discrimination of actives and inactives. The newly developed scoring functions were next assessed on independent sets of protein-ligand complexes for their ability to predict binding affinities and to discriminate actives from inactives. In the first validation the first function, which was trained on active compounds only, performed as well as other commonly used ones. On a high-throughput virtual screening validation on five protein conformational ensembles, the third scoring function that included data from inactive compounds performed significantly better. This validation showed that the inclusion of data from inactive compounds is critical for performance in virtual high-throughput screening applications.

• APIF: a new interaction fingerprint based on atom pairs and its application to virtual screening.
Pérez-Nueno, Violeta I and Rabal, Obdulia and Borrell, José I and Teixidó, Jordi
Journal of chemical information and modeling, 2009, 49(5), 1245-1260
PMID: 19364101     doi: 10.1021/ci900043r

A new interaction fingerprint (IF) called APIF (atom-pairs-based interaction fingerprint) has been developed for postprocessing protein-ligand docking results. Unlike other existing fingerprints which employ absolute locations of individual interactions, APIF considers the relative positions of pairs of interacting atoms. Docking-based virtual screening was performed with GOLD using the crystal structures of trypsin, rhinovirus, HIV protease, carboxypeptidase, and estrogen receptor-alpha as targets. A score derived from the similarity of the bit strings for each docking solution to that of a known reference binding mode was obtained. Comparisons between APIF, GoldScore function, and standard interaction fingerprint (CHIF) scores were performed using enrichment plots. Superior recovery rates were observed in the IF score cases. Comparable results were achieved by using either of the two interaction fingerprints, substantially improving GoldScore function enrichment factors. Binding mode analyses were also carried out in order to study the best method for selecting conformations with a binding mode similar to that of the reference crystallized complex. These showed that the first conformations retrieved by interaction fingerprint scores had a more similar binding mode to the reference complex than those retrieved by the GoldScore function.

• LigMatch: a multiple structure-based ligand matching method for 3D virtual screening.
Kinnings, Sarah L and Jackson, Richard M
Journal of chemical information and modeling, 2009, 49(9), 2056-2066
PMID: 19685924     doi: 10.1021/ci900204y

We have developed a new virtual screening (VS) method called LigMatch and evaluated its performance on 13 protein targets using a filtered and clustered version of the directory of useful decoys (DUD). The method uses 3D structural comparison to a crystallographically determined ligand in a bioactive 'template' conformation, using a geometric hashing method, in order to prioritize each database compound. We show that LigMatch outperforms several other widely used VS methods on the 13 DUD targets. We go on to demonstrate that improved VS performance can be gained from using multiple, structurally diverse templates rather than a single template ligand for a particular protein target. In this case, a 2D fingerprint-based method is used to select a ligand template from a set of known bioactive conformations. Furthermore, we show that LigMatch performs well even in the absence of 2D similarity to the template ligands, thereby demonstrating its robustness with respect to purely 2D methods and its potential for scaffold hopping.

• Blind docking of 260 protein-ligand complexes with EADock 2.0.
Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
Journal of computational chemistry, 2009, 30(13), 2021-2030
PMID: 19130502     doi: 10.1002/jcc.21202

Molecular docking softwares are one of the important tools of modern drug development pipelines. The promising achievements of the last 10 years emphasize the need for further improvement, as reflected by several recent publications (Leach et al., J Med Chem 2006, 49, 5851; Warren et al., J Med Chem 2006, 49, 5912). Our initial approach, EADock, showed a good performance in reproducing the experimental binding modes for a set of 37 different ligand-protein complexes (Grosdidier et al., Proteins 2007, 67, 1010). This article presents recent improvements regarding the scoring and sampling aspects over the initial implementation, as well as a new seeding procedure based on the detection of cavities, opening the door to blind docking with EADock. These enhancements were validated on 260 complexes taken from the high quality Ligand Protein Database [LPDB, (Roche et al., J Med Chem 2001, 44, 3592)]. Two issues were identified: first, the quality of the initial structures cannot be assumed and a manual inspection and/or a search in the literature are likely to be required to achieve the best performance. Second the description of interactions involving metal ions still has to be improved. Nonetheless, a remarkable success rate of 65% was achieved for a large scale blind docking assay, when considering only the top ranked binding mode and a success threshold of 2 A RMSD to the crystal structure. When looking at the five-top ranked binding modes, the success rate increases up to 76%. In a standard local docking assay, success rates of 75 and 83% were obtained, considering only the top ranked binding mode, or the five top binding modes, respectively.

• Docking to heme proteins.
Röhrig, Ute F and Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
Journal of computational chemistry, 2009, 30(14), 2305-2315
PMID: 19288474     doi: 10.1002/jcc.21244

In silico screening has become a valuable tool in drug design, but some drug targets represent real challenges for docking algorithms. This is especially true for metalloproteins, whose interactions with ligands are difficult to parametrize. Our docking algorithm, EADock, is based on the CHARMM force field, which assures a physically sound scoring function and a good transferability to a wide range of systems, but also exhibits difficulties in case of some metalloproteins. Here, we consider the therapeutically important case of heme proteins featuring an iron core at the active site. Using a standard docking protocol, where the iron-ligand interaction is underestimated, we obtained a success rate of 28% for a test set of 50 heme-containing complexes with iron-ligand contact. By introducing Morse-like metal binding potentials (MMBP), which are fitted to reproduce density functional theory calculations, we are able to increase the success rate to 62%. The remaining failures are mainly due to specific ligand-water interactions in the X-ray structures. Testing of the MMBP on a second data set of non iron binders (14 cases) demonstrates that they do not introduce a spurious bias towards metal binding, which suggests that they may reliably be used also for cross-docking studies.

• An Evaluation of Explicit Receptor Flexibility in Molecular Docking Using Molecular Dynamics and Torsion Angle Molecular Dynamics.
Armen, Roger S and Chen, Jianhan and Brooks, Charles L
Journal of chemical theory and computation, 2009, 5(10), 2909-2923
PMID: 20160879     doi: 10.1021/ct900262t

Incorporating receptor flexibility into molecular docking should improve results for flexible proteins. However, the incorporation of explicit all-atom flexibility with molecular dynamics for the entire protein chain may also introduce significant error and "noise" that could decrease docking accuracy and deteriorate the ability of a scoring function to rank native-like poses. We address this apparent paradox by comparing the success of several flexible receptor models in cross-docking and multiple receptor ensemble docking for p38$\alpha$ mitogen-activated protein (MAP) kinase. Explicit all-atom receptor flexibility has been incorporated into a CHARMM-based molecular docking method (CDOCKER) using both molecular dynamics (MD) and torsion angle molecular dynamics (TAMD) for the refinement of predicted protein-ligand binding geometries. These flexible receptor models have been evaluated, and the accuracy and efficiency of TAMD sampling is directly compared to MD sampling. Several flexible receptor models are compared, encompassing flexible side chains, flexible loops, multiple flexible backbone segments, and treatment of the entire chain as flexible. We find that although including side chain and some backbone flexibility is required for improved docking accuracy as expected, docking accuracy also diminishes as additional and unnecessary receptor flexibility is included into the conformational search space. Ensemble docking results demonstrate that including protein flexibility leads to to improved agreement with binding data for 227 active compounds. This comparison also demonstrates that a flexible receptor model enriches high affinity compound identification without significantly increasing the number of false positives from low affinity compounds.

• RosettaLigand docking with full ligand and receptor flexibility.
Davis, Ian W and Baker, David
Journal of molecular biology, 2009, 385(2), 381-392
PMID: 19041878     doi: 10.1016/j.jmb.2008.11.010

Computational docking of small-molecule ligands into protein receptors is an important tool for modern drug discovery. Although conformational adjustments are frequently observed between the free and ligand-bound states, the conformational flexibility of the protein is typically ignored in protein-small molecule docking programs. We previously described the program RosettaLigand, which leverages the Rosetta energy function and side-chain repacking algorithm to account for flexibility of all side chains in the binding site. Here we present extensions to RosettaLigand that incorporate full ligand flexibility as well as receptor backbone flexibility. Including receptor backbone flexibility is found to produce more correct docked complexes and to lower the average RMSD of the best-scoring docked poses relative to the rigid-backbone results. On a challenging set of retrospective and prospective cross-docking tests, we find that the top-scoring ligand pose is correctly positioned within 2 A RMSD for 64% (54/85) of cases overall.

• Blind docking of pharmaceutically relevant compounds using RosettaLigand.
Davis, Ian W and Raha, Kaushik and Head, Martha S and Baker, David
Protein science : a publication of the Protein Society, 2009, 18(9), 1998-2002
PMID: 19554568     doi: 10.1002/pro.192

It is difficult to properly validate algorithms that dock a small molecule ligand into its protein receptor using data from the public domain: the predictions are not blind because the correct binding mode is already known, and public test cases may not be representative of compounds of interest such as drug leads. Here, we use private data from a real drug discovery program to carry out a blind evaluation of the RosettaLigand docking methodology and find that its performance is on average comparable with that of the best commercially available current small molecule docking programs. The strength of RosettaLigand is the use of the Rosetta sampling methodology to simultaneously optimize protein sidechain, protein backbone and ligand degrees of freedom; the extensive benchmark test described here identifies shortcomings in other aspects of the protocol and suggests clear routes to improving the method.

• Ligand mapping on protein surfaces by the 3D-RISM theory: toward computational fragment-based drug design.
Imai, Takashi and Oda, Koji and Kovalenko, Andriy and Hirata, Fumio and Kidera, Akinori
Journal of the American Chemical Society, 2009, 131(34), 12430-12440
PMID: 19655800     doi: 10.1021/ja905029t

In line with the recent development of fragment-based drug design, a new computational method for mapping of small ligand molecules on protein surfaces is proposed. The method uses three-dimensional (3D) spatial distribution functions of the atomic sites of the ligand calculated using the molecular theory of solvation, known as the 3D reference interaction site model (3D-RISM) theory, to identify the most probable binding modes of ligand molecules. The 3D-RISM-based method is applied to the binding of several small organic molecules to thermolysin, in order to show its efficiency and accuracy in detecting binding sites. The results demonstrate that our method can reproduce the major binding modes found by X-ray crystallographic studies with sufficient precision. Moreover, the method can successfully identify some binding modes associated with a known inhibitor, which could not be detected by X-ray analysis. The dependence of ligand-binding modes on the ligand concentration, which essentially cannot be treated with other existing computational methods, is also investigated. The results indicate that some binding modes are readily affected by the ligand concentration, whereas others are not significantly altered. In the former case, it is the subtle balance in the binding affinity between the ligand and water that determines the dominant ligand-binding mode.

• 3-D clustering: a tool for high throughput docking
Priestle, John P.
Journal of Molecular Modeling, 2009, 15(5), 551-560
PMID: 19085027     doi: 10.1007/s00894-008-0360-6

This report describes a computer program for clustering docking poses based on their 3-dimensional (3D) coordinates as well as on their chemical structures. This is chiefly intended for reducing a set of hits coming from high throughput docking, since the capacity to prepare and biologically test such molecules is generally far more limited than the capacity to generate such hits. The advantage of clustering molecules based on 3D, rather than 2D, criteria is that small variations on a scaffold may bring about different binding modes for molecules that would not be predicted by 2D similarity alone. The program does a pose-by-pose/atom-by-atom comparison of a set of docking hits (poses), scoring both spatial and chemical similarity. Using these pair-wise similarities, the whole set is clustered based on a user-supplied similarity threshold. An output coordinate file is created that mirrors the input coordinate file, but contains two new properties: a cluster number and similarity to the cluster center. Poses in this output file can easily be sorted by cluster and displayed together for visual inspection with any standard molecular viewing program, and decisions made about which molecule should be selected for biological testing as the best representative of this group of similar molecules with similar binding modes.

• An improved adaptive genetic algorithm for protein-ligand docking.
Kang, Ling and Li, Honglin and Jiang, Hualiang and Wang, Xicheng
Journal of computer-aided molecular design, 2009, 23(1), 1-12
PMID: 18777161     doi: 10.1007/s10822-008-9232-5

A new optimization model of molecular docking is proposed, and a fast flexible docking method based on an improved adaptive genetic algorithm is developed in this paper. The algorithm takes some advanced techniques, such as multi-population genetic strategy, entropy-based searching technique with self-adaptation and the quasi-exact penalty. A new iteration scheme in conjunction with above techniques is employed to speed up the optimization process and to ensure very rapid and steady convergence. The docking accuracy and efficiency of the method are evaluated by docking results from GOLD test data set, which contains 134 protein-ligand complexes. In over 66.2% of the complexes, the docked pose was within 2.0 A root-mean-square deviation (RMSD) of the X-ray structure. Docking time is approximately in proportion to the number of the rotatable bonds of ligands.

• SeleX-CS: a new consensus scoring algorithm for hit discovery and lead optimization.
Bar-Haim, Shay and Aharon, Ayelet and Ben-Moshe, Tal and Marantz, Yael and Senderowitz, Hanoch
Journal of chemical information and modeling, 2009, 49(3), 623-633
PMID: 19231809     doi: 10.1021/ci800335j

Identifying active compounds (hits) that bind to biological targets of pharmaceutical relevance is the cornerstone of drug design efforts. Structure based virtual screening, namely, the in silico evaluation of binding energies and geometries between a protein and its putative ligands, has emerged over the past few years as a promising approach in this field. The success of the method relies on the availability of reliable 3-dimensional (3D) structures of the target protein and its candidate ligands (the screening library), a reliable docking method that can fit the different ligands into the protein's binding site, and an accurate scoring function that can rank the resulting binding modes in accord with their binding affinities. This last requirement is arguably the most difficult to meet due to the complexity of the binding process. A potential solution to this so-called scoring problem is the usage of multiple scoring functions in an approach known as consensus scoring. Several consensus scoring methods were suggested in the literature and have generally demonstrated an improved ranking of screening libraries relative to individual scoring functions. Nevertheless, current consensus scoring strategies suffer from several shortcomings, in particular, strong dependence on the initial parameters and an incomplete treatment of inactive compounds. In this work we present a new consensus scoring algorithm (SeleX-Consensus Scoring abbreviated to SeleX-CS) specifically designed to address these limitations: (i) A subset of the initial set of the scoring functions is allowed to form the consensus score, and this subset is optimized via a Monte Carlo/Simulated Annealing procedure. (ii) Rank redundancy between the members of the screening library is removed. (iii) The method explicitly considers the presence of inactive compounds. The new algorithm was applied to the ranking of screening libraries targeting two G-protein coupled receptors (GPCR). Excellent enrichment factors were obtained in both cases: For the cannabinoid receptor 1 (CB1), SeleX-CS outperformed the best single score and afforded an enrichment factor of 41 at 1% of the screening library compared with the best single score value of 15 (GOLD_Fitness). For the chemokine receptor type 2 (CCR2) SeleX-CS afforded an enrichment factor of 72 (again at 1% of the screening library) once more outperforming any single score (enrichment factor of 20 by GSCORE). Moreover, SeleX-CS demonstrated success rates of 67% (CCR2) and 73% (CB1) when applied to ranking an external test set. In both cases, the new algorithm also afforded good derichment of inactive compounds (i.e., the ability to push inactive compounds to the bottom of the ranked library). The method was then extended to rank a lead optimization series targeting the Kv4.3 potassium ion channel, resulting in a Spearman's correlation coefficient, p

• Comparative assessment of scoring functions on a diverse test set.
Cheng, Tiejun and Li, Xun and Li, Yan and Liu, Zhihai and Wang, Renxiao
Journal of chemical information and modeling, 2009, 49(4), 1079-1093
PMID: 19358517     doi: 10.1021/ci9000053

Scoring functions are widely applied to the evaluation of protein-ligand binding in structure-based drug design. We have conducted a comparative assessment of 16 popular scoring functions implemented in main-stream commercial software or released by academic research groups. A set of 195 diverse protein-ligand complexes with high-resolution crystal structures and reliable binding constants were selected through a systematic nonredundant sampling of the PDBbind database and used as the primary test set in our study. All scoring functions were evaluated in three aspects, that is, "docking power", "ranking power", and "scoring power", and all evaluations were independent from the context of molecular docking or virtual screening. As for "docking power", six scoring functions, including GOLD::ASP, DS::PLP1, DrugScore(PDB), GlideScore-SP, DS::LigScore, and GOLD::ChemScore, achieved success rates over 70% when the acceptance cutoff was root-mean-square deviation < 2.0 A. Combining these scoring functions into consensus scoring schemes improved the success rates to 80% or even higher. As for "ranking power" and "scoring power", the top four scoring functions on the primary test set were X-Score, DrugScore(CSD), DS::PLP, and SYBYL::ChemScore. They were able to correctly rank the protein-ligand complexes containing the same type of protein with success rates around 50%. Correlation coefficients between the experimental binding constants and the binding scores computed by these scoring functions ranged from 0.545 to 0.644. Besides the primary test set, each scoring function was also tested on four additional test sets, each consisting of a certain number of protein-ligand complexes containing one particular type of protein. Our study serves as an updated benchmark for evaluating the general performance of today's scoring functions. Our results indicate that no single scoring function consistently outperforms others in all three aspects. Thus, it is important in practice to choose the appropriate scoring functions for different purposes.

• Testing assumptions and hypotheses for rescoring success in protein-ligand docking.
O'Boyle, Noel M. and Liebeschuetz, John W and Cole, Jason C
Journal of chemical information and modeling, 2009, 49(8), 1871-1878
PMID: 19645429     doi: 10.1021/ci900164f

In protein-ligand docking, the scoring function is responsible for identifying the correct pose of a particular ligand as well as separating ligands from nonligands. Recently there has been considerable interest in schemes that combine results from several scoring functions in an effort to achieve improved performance in virtual screens. One such scheme is consensus scoring, which involves combining the results from several rescoring experiments. Although there have been a number of studies that have investigated factors affecting success in consensus scoring, these studies have not addressed the question of why a rescoring strategy works in the first place. Here we propose and test two alternative hypotheses for why rescoring has the potential to improve results, using GOLD 4.0. The "consensus" hypothesis is that rescoring is a way of combining results from two scoring functions such that only true positives are likely to score highly. The "complementary" hypothesis is that the two scoring functions used in rescoring have complementary strengths; one is better at ranking actives with respect to inactives while the other is better at ranking poses of actives. We find that in general it is this hypothesis that explains success in a rescoring experiment. We also test an assumption of any rescoring method, which is that the scores obtained are representative of the fitness of the docked pose. We find that although rescored poses tended to have slightly higher clash values than their docked equivalents, in general the scores were representative.

• Empirical scoring functions for advanced protein-ligand docking with PLANTS.
Korb, Oliver and Stutzle, Thomas and Exner, Thomas E.
Journal of chemical information and modeling, 2009, 49(1), 84-96
PMID: 19125657     doi: 10.1021/ci800298z

In this paper we present two empirical scoring functions, PLANTS(CHEMPLP) and PLANTS(PLP), designed for our docking algorithm PLANTS (Protein-Ligand ANT System), which is based on ant colony optimization (ACO). They are related, regarding their functional form, to parts of already published scoring functions and force fields. The parametrization procedure described here was able to identify several parameter settings showing an excellent performance for the task of pose prediction on two test sets comprising 298 complexes in total. Up to 87% of the complexes of the Astex diverse set and 77% of the CCDC/Astex clean listnc (noncovalently bound complexes of the clean list) could be reproduced with root-mean-square deviations of less than 2 A with respect to the experimentally determined structures. A comparison with the state-of-the-art docking tool GOLD clearly shows that this is, especially for the druglike Astex diverse set, an improvement in pose prediction performance. Additionally, optimized parameter settings for the search algorithm were identified, which can be used to balance pose prediction reliability and search speed.

• DOCK 6: combining techniques to model RNA-small molecule complexes.
Lang, P Therese and Brozell, Scott R and Mukherjee, Sudipto and Pettersen, Eric F and Meng, Elaine C and Thomas, Veena and Rizzo, Robert C and Case, David A and James, Thomas L and Kuntz, Irwin D
RNA (New York, N.Y.), 2009, 15(6), 1219-1230
PMID: 19369428     doi: 10.1261/rna.1563609

With an increasing interest in RNA therapeutics and for targeting RNA to treat disease, there is a need for the tools used in protein-based drug design, particularly DOCKing algorithms, to be extended or adapted for nucleic acids. Here, we have compiled a test set of RNA-ligand complexes to validate the ability of the DOCK suite of programs to successfully recreate experimentally determined binding poses. With the optimized parameters and a minimal scoring function, 70% of the test set with less than seven rotatable ligand bonds and 26% of the test set with less than 13 rotatable bonds can be successfully recreated within 2 A heavy-atom RMSD. When DOCKed conformations are rescored with the implicit solvent models AMBER generalized Born with solvent-accessible surface area (GB/SA) and Poisson-Boltzmann with solvent-accessible surface area (PB/SA) in combination with explicit water molecules and sodium counterions, the success rate increases to 80% with PB/SA for less than seven rotatable bonds and 58% with AMBER GB/SA and 47% with PB/SA for less than 13 rotatable bonds. These results indicate that DOCK can indeed be useful for structure-based drug design aimed at RNA. Our studies also suggest that RNA-directed ligands often differ from typical protein-ligand complexes in their electrostatic properties, but these differences can be accommodated through the choice of potential function. In addition, in the course of the study, we explore a variety of newly added DOCK functions, demonstrating the ease with which new functions can be added to address new scientific questions.

• AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility.
Morris, Garrett M and Huey, Ruth and Lindstrom, William and Sanner, Michel F and Belew, Richard K and Goodsell, David S and Olson, Arthur J
Journal of computational chemistry, 2009, 30(16), 2785-2791
PMID: 19399780     doi: 10.1002/jcc.21256

We describe the testing and release of AutoDock4 and the accompanying graphical user interface AutoDockTools. AutoDock4 incorporates limited flexibility in the receptor. Several tests are reported here, including a redocking experiment with 188 diverse ligand-protein complexes and a cross-docking experiment using flexible sidechains in 87 HIV protease complexes. We also report its utility in analysis of covalently bound ligands, using both a grid-based docking method and a modification of the flexible sidechain technique.

• Energetic analysis of fragment docking and application to structure-based pharmacophore hypothesis generation.
Loving, Kathryn and Salam, Noeris K. and Sherman, Woody
Journal of computer-aided molecular design, 2009, 23(8), 541-554
PMID: 19421721     doi: 10.1007/s10822-009-9268-1

We have developed a method that uses energetic analysis of structure-based fragment docking to elucidate key features for molecular recognition. This hybrid ligand- and structure-based methodology uses an atomic breakdown of the energy terms from the Glide XP scoring function to locate key pharmacophoric features from the docked fragments. First, we show that Glide accurately docks fragments, producing a root mean squared deviation (RMSD) of <1.0 A for the top scoring pose to the native crystal structure. We then describe fragment-specific docking settings developed to generate poses that explore every pocket of a binding site while maintaining the docking accuracy of the top scoring pose. Next, we describe how the energy terms from the Glide XP scoring function are mapped onto pharmacophore sites from the docked fragments in order to rank their importance for binding. Using this energetic analysis we show that the most energetically favorable pharmacophore sites are consistent with features from known tight binding compounds. Finally, we describe a method to use the energetically selected sites from fragment docking to develop a pharmacophore hypothesis that can be used in virtual database screening to retrieve diverse compounds. We find that this method produces viable hypotheses that are consistent with known active compounds. In addition to retrieving diverse compounds that are not biased by the co-crystallized ligand, the method is able to recover known active compounds from a database screen, with an average enrichment of 8.1 in the top 1% of the database.

• Virtual fragment screening: an exploration of various docking and scoring protocols for fragments using Glide.
Kawatkar, Sameer and Wang, Hongming and Czerminski, Ryszard and Joseph-McCarthy, Diane
Journal of computer-aided molecular design, 2009, 23(8), 527-539
PMID: 19495993     doi: 10.1007/s10822-009-9281-4

Fragment-based drug discovery approaches allow for a greater coverage of chemical space and generally produce high efficiency ligands. As such, virtual and experimental fragment screening are increasingly being coupled in an effort to identify new leads for specific therapeutic targets. Fragment docking is employed to create target-focussed subset of compounds for testing along side generic fragment libraries. The utility of the program Glide with various scoring schemes for fragment docking is discussed. Fragment docking results for two test cases, prostaglandin D2 synthase and DNA ligase, are presented and compared to experimental screening data. Self-docking, cross-docking, and enrichment studies are performed. For the enrichment runs, experimental data exists indicating that the docking decoys in fact do not inhibit the corresponding enzyme being examined. Results indicate that even for difficult test cases fragment docking can yield enrichments significantly better than random.

## 2008

• Bootstrap-based consensus scoring method for protein-ligand docking.
Journal of chemical information and modeling, 2008, 48(5), 988-996
PMID: 18426197     doi: 10.1021/ci700204v

To improve the performance of a single scoring function used in a protein-ligand docking program, we developed a bootstrap-based consensus scoring (BBCS) method, which is based on ensemble learning. BBCS combines multiple scorings, each of which has the same function form but different energy-parameter sets. These multiple energy-parameter sets are generated in two steps: (1) generation of training sets by a bootstrap method and (2) optimization of energy-parameter set by a Z-score approach, which is based on energy landscape theory as used in protein folding, against each training set. In this study, we applied BBCS to the FlexX scoring function. Using given 50 complexes, we generated 100 training sets and obtained 100 optimized energy-parameter sets. These parameter sets were tested against 48 complexes different from the training sets. BBCS was shown to be an improvement over single scoring when using a parameter set optimized by the same Z-score approach. Comparing BBCS with the original FlexX scoring function, we found that (1) the success rate of recognizing the crystal structure at the top relative to decoys increased from 33.3% to 52.1% and that (2) the rank of the crystal structure improved for 54.2% of the complexes and worsened for none. We also found that BBCS performed better than conventional consensus scoring (CS).

• Q-Dock: Low-resolution flexible ligand docking with pocket-specific threading restraints.
Brylinski, Michal and Skolnick, Jeffrey
Journal of computational chemistry, 2008, 29(10), 1574-1588
PMID: 18293308     doi: 10.1002/jcc.20917

The rapidly growing number of theoretically predicted protein structures requires robust methods that can utilize low-quality receptor structures as targets for ligand docking. Typically, docking accuracy falls off dramatically when apo or modeled receptors are used in docking experiments. Low-resolution ligand docking techniques have been developed to deal with structural inaccuracies in predicted receptor models. In this spirit, we describe the development and optimization of a knowledge-based potential implemented in Q-Dock, a low-resolution flexible ligand docking approach. Self-docking experiments using crystal structures reveals satisfactory accuracy, comparable with all-atom docking. All-atom models reconstructed from Q-Dock's low-resolution models can be further refined by even a simple all-atom energy minimization. In decoy-docking against distorted receptor models with a root-mean-square deviation, RMSD, from native of approximately 3 A, Q-Dock recovers on average 15-20% more specific contacts and 25-35% more binding residues than all-atom methods. To further improve docking accuracy against low-quality protein models, we propose a pocket-specific protein-ligand interaction potential derived from weakly homologous threading holo-templates. The success rate of Q-Dock employing a pocket-specific potential is 6.3 times higher than that previously reported for the Dolores method, another low-resolution docking approach.

• Consensus scoring with feature selection for structure-based virtual screening
Teramoto, Reiji and Fukunishi, Hiroaki
Journal of chemical information and modeling, 2008, 48(2), 288-295
doi: 10.1021/ci700239t

The evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, and scoring functions play significant roles in it. While consensus scoring (CS) generally improves enrichment by compensating for the deficiencies of each scoring function, the strategy of how individual scoring functions are selected remains a challenging task when few known active compounds are available. To address this problem, we propose feature selection-based consensus scoring (FSCS), which performs supervised feature selection with docked native ligand conformations to select complementary scoring functions. We evaluated the enrichments of five scoring functions (F-Score, D-Score, PMF, G-Score, and ChemScore), FSCS, and RCS (rank-by-rank consensus scoring) for four different target proteins: acetylcholine esterase (AChE), thrombin (thrombin), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPAR gamma). The results indicated that FSCS was able to select the complementary scoring functions and enhance ligand enrichments and that it outperformed RCS and the individual scoring functions for all target proteins. They also indicated that the performances of the single scoring functions were strongly dependent on the target protein. An especially favorable result with implications for practical drug screening is that FSCS performs well even if only one 3D structure of the protein-ligand complex is known. Moreover, we found that one can infer which scoring functions significantly enrich active compounds by using feature selection before actual docking and that the selected scoring functions are complementary.

• DOVIS: an implementation for high-throughput virtual screening using AutoDock.
Zhang, Shuxing and Kumar, Kamal and Jiang, Xiaohui and Wallqvist, Anders and Reifman, Jaques
Bmc Bioinformatics, 2008, 9, 126
PMID: 18304355     doi: 10.1186/1471-2105-9-126

BACKGROUND:Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial.

• Flexible ligand docking to multiple receptor conformations: a practical alternative.
Totrov, Maxim and Abagyan, Ruben
Current opinion in structural biology, 2008, 18(2), 178-184
PMID: 18302984     doi: 10.1016/j.sbi.2008.01.004

State of the art docking algorithms predict an incorrect binding pose for about 50-70% of all ligands when only a single fixed receptor conformation is considered. In many more cases, lack of receptor flexibility results in meaningless ligand binding scores, even when the correct pose is obtained. Incorporating conformational rearrangements of the receptor binding pocket into predictions of both ligand binding pose and binding score is crucial for improving structure-based drug design and virtual ligand screening methodologies. However, direct modeling of protein binding site flexibility remains challenging because of the large conformational space that must be sampled, and difficulties remain in constructing a suitably accurate energy function. Here we show that using multiple fixed receptor conformations, either experimentally determined by crystallography or NMR, or computationally generated, is a practical shortcut that may improve docking calculations. In several cases, such an approach has led to experimentally validated predictions.

• An anchor-dependent molecular docking process for docking small flexible molecules into rigid protein receptors.
Lin, Thy-Hou and Lin, Guan-Liang
Journal of chemical information and modeling, 2008, 48(8), 1638-1655
PMID: 18642894     doi: 10.1021/ci800124g

A molecular docking method designated as ADDock, anchor-dependent molecular docking process for docking small flexible molecules into rigid protein receptors, is presented in this article. ADDock makes the bond connection lists for atoms based on anchors chosen for building molecular structures for docking small flexible molecules or ligands into rigid active sites of protein receptors. ADDock employs an extended version of piecewise linear potential for scoring the docked structures. Since no translational motion for small molecules is implemented during the docking process, ADDock searches the best docking result by systematically changing the anchors chosen, which are usually the single-edge connected nodes or terminal hydrogen atoms of ligands. ADDock takes intact ligand structures generated during the docking process for computing the docked scores; therefore, no energy minimization is required in the evaluation phase of docking. The docking accuracy by ADDock for 92 receptor-ligand complexes docked is 91.3%. All these complexes have been docked by other groups using other docking methods. The receptor-ligand steric interaction energies computed by ADDock for some sets of active and inactive compounds selected and docked into the same receptor active sites are apparently separated. These results show that based on the steric interaction energies computed between the docked structures and receptor active sites, ADDock is able to separate active from inactive compounds for both being docked into the same receptor.

• High quality binding modes in docking ligands to proteins.
Gorelik, Boris and Goldblum, Amiram
Proteins, 2008, 71(3), 1373-1386
PMID: 18058908     doi: 10.1002/prot.21847

Multiple near-optimal conformations of protein-ligand complexes provide a better chance for accurate representation of biomolecular interactions, compared with a single structure. We present ISE-dock-a docking program which is based on the iterative stochastic elimination (ISE) algorithm. ISE eliminates values that consistently lead to the worst results, thus optimizing the search for docking poses. It constructs large sets of such poses with no additional computational cost compared with single poses. ISE-dock is validated using 81 protein-ligand complexes from the PDB and its performance was compared with those of Glide, GOLD, and AutoDock. ISE-dock has a better chance than the other three to find more than 60% top single poses under RMSD

• Bias, reporting, and sharing: computational evaluations of docking methods.
Jain, Ajay N
Journal of computer-aided molecular design, 2008, 22(3-4), 201-212
PMID: 18075713     doi: 10.1007/s10822-007-9151-x

Computational methods for docking ligands to protein binding sites have become ubiquitous in drug discovery. Despite the age of the field, no standards have been established with respect to methodological evaluation of docking accuracy, virtual screening utility, or scoring accuracy. There are critical issues relating to data sharing, data set design and preparation, and statistical reporting that have an impact on the degree to which a report will translate into real-world performance. These issues also have an impact on whether there is a transparent relationship between methodological changes and reported performance improvements. This paper presents detailed examples of pitfalls in each area and makes recommendations as to best practices.

• Protein-ligand docking accounting for receptor side chain and global flexibility in normal modes: evaluation on kinase inhibitor cross docking.
May, Andreas and Zacharias, Martin
Journal of medicinal chemistry, 2008, 51(12), 3499-3506
PMID: 18517186     doi: 10.1021/jm800071v

Efficient treatment of conformational changes during docking of drug-like ligands to receptor molecules is a major computational challenge. A new docking methodology has been developed that includes ligand flexibility and both global backbone flexibility and side chain flexibility of the protein receptor. Whereas side chain flexibility is based on a discrete rotamer approach, global backbone conformational changes are modeled by relaxation in a few precalculated soft collective degrees of freedom of the receptor. The method was applied to docking of several known cyclin dependent kinase 2 inhibitors to the unbound kinase structure and to cross-docking of inhibitors to several bound kinase structures. Significant improvement of ranking and deviation of predicted binding geometries from experiment was obtained compared to docking to a rigid receptor. The inclusion of only the soft collective degrees of freedom during docking resulted in improved docking performance at a very modest increase (doubling) of the computational demand.

• Docking ligands into flexible and solvated macromolecules. 2. Development and application of fitted 1.5 to the virtual screening of potential HCV polymerase inhibitors.
Corbeil, Christopher R and Englebienne, Pablo and Yannopoulos, Constantin G and Chan, Laval and Das, Sanjoy K and Bilimoria, Darius and L'heureux, Lucille and Moitessier, Nicolas
Journal of chemical information and modeling, 2008, 48(4), 902-909
PMID: 18341269     doi: 10.1021/ci700398h

HCV NS5B polymerase is a validated target for the treatment of hepatitis C, known to be one of the most challenging enzymes for docking programs. In order to improve the low accuracy of existing docking methods observed with this challenging enzyme, we have significantly modified and updated F itted 1.0, a recently reported docking program, into F itted 1.5. This enhanced version is now applicable to the virtual screening of compound libraries and includes new features such as filters and pharmacophore- or interaction-site-oriented docking. As a first validation, F itted 1.5 was applied to the testing set previously developed for F itted 1.0 and extended to include hepatitis C virus (HCV) polymerase inhibitors. This first validation showed an increased accuracy as well as an increase in speed. It also shows that the accuracy toward HCV polymerase is better than previously observed with other programs. Next, application of F itted 1.5 to the virtual screening of the Maybridge library seeded with known HCV polymerase inhibitors revealed its ability to recover most of these actives in the top 5% of the hit list. As a third validation, further biological assays uncovered HCV polymerase inhibition for selected Maybridge compounds ranked in the top of the hit list.

• PDTD: a web-accessible protein database for drug target identification
Gao, Zhenting and Li, Honglin and Zhang, Hailei and Liu, Xiaofeng and Kang, Ling and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Wang, Xicheng and Jiang, Hualiang
Bmc Bioinformatics, 2008, 9, -
PMID: 18282303     doi: 10.1186/1471-2105-9-104

Background: Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking) http://www.dddc.ac.cn/tarfisdock, which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation.Description: PDTD is a web-accessible protein database for in silico target identification. It currently contains > 1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of > 830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores.Conclusion: PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at http://www.dddc.ac.cn/pdtd/.

• MS-DOCK: Accurate multiple conformation generator and rigid docking protocol for multi-step virtual ligand screening
Sauton, Nicolas and Lagorce, David and Villoutreix, Bruno O. and Miteva, Maria A.
Bmc Bioinformatics, 2008, 9, -
PMID: 18402678     doi: 10.1186/1471-2105-9-184

Background: The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites.Results: Here we present an efficient multiple conformation rigid-body docking approach, MS-DOCK, which is based on the program DOCK. This approach can be used as the first step of a multi-stage docking/scoring protocol. First, we developed and validated the Multiconf-DOCK tool that generates several conformers per input ligand. Then, each generated conformer (bioactives and 37970 decoys) was docked rigidly using DOCK6 with our optimized protocol into seven different receptor-binding sites. MS-DOCK was able to significantly reduce the size of the initial input library for all seven targets, thereby facilitating subsequent more CPU demanding flexible docking procedures.Conclusion: MS-DOCK can be easily used for the generation of multi-conformer libraries and for shape-based filtering within a multi-step structure-based screening protocol in order to shorten computation times.

• Lead finder: an approach to improve accuracy of protein-ligand docking, binding energy estimation, and virtual screening.
Stroganov, Oleg V and Novikov, Fedor N and Stroylov, Viktor S and Kulkov, Val and Chilov, Ghermes G
Journal of chemical information and modeling, 2008, 48(12), 2371-2385
PMID: 19007114     doi: 10.1021/ci800166p

An innovative molecular docking algorithm and three specialized high accuracy scoring functions are introduced in the Lead Finder docking software. Lead Finder's algorithm for ligand docking combines the classical genetic algorithm with various local optimization procedures and resourceful exploitation of the knowledge generated during docking process. Lead Finder's scoring functions are based on a molecular mechanics functional which explicitly accounts for different types of energy contributions scaled with empiric coefficients to produce three scoring functions tailored for (a) accurate binding energy predictions; (b) correct energy-ranking of docked ligand poses; and (c) correct rank-ordering of active and inactive compounds in virtual screening experiments. The predicted values of the free energy of protein-ligand binding were benchmarked against a set of experimentally measured binding energies for 330 diverse protein-ligand complexes yielding rmsd of 1.50 kcal/mol. The accuracy of ligand docking was assessed on a set of 407 structures, which included almost all published test sets of the following programs: FlexX, Glide SP, Glide XP, Gold, LigandFit, MolDock, and Surflex. rmsd of 2 A or less was observed for 80-96% of the structures in the test sets (80.0% on the Glide XP and FlexX test sets, 96.0% on the Surflex and MolDock test sets). The ability of Lead Finder to distinguish between active and inactive compounds during virtual screening experiments was benchmarked against 34 therapeutically relevant protein targets. Impressive enrichment factors were obtained for almost all of the targets with the average area under receiver operator curve being equal to 0.92.

• Similarity based docking.
Marialke, J and Tietze, S and Apostolakis, Joannis
Journal of chemical information and modeling, 2008, 48(1), 186-196
PMID: 18044949     doi: 10.1021/ci700124r

We have recently introduced GMA, a highly efficient method for flexible molecular alignment. Here we show how this approach can be used to improve docking accuracy and efficiency, in cases where a complex structure of a ligand with the target protein is known. In cases where a known ligand exists, yet the complex structure is unknown it is possible to make use of the advantages offered by this approach, by combining it with standard ligand docking.

• Exploiting ordered waters in molecular docking.
Huang, Niu and Shoichet, Brian K
Journal of medicinal chemistry, 2008, 51(16), 4862-4865
PMID: 18680357     doi: 10.1021/jm8006239

A current weakness in docking is the treatment of water-mediated protein-ligand interactions. We explore switching ordered water molecules "on" and "off" during docking screens of a large library. The method assumes additivity and scales linearly with the number of waters sampled despite the exponential growth in configurations. It is tested for ligand enrichment against 24 targets, exploring up to 256 water configurations. Water inclusion increased enrichment substantially for 12 targets, while most others were largely unaffected.

• Ligand-protein docking with water molecules.
Roberts, Benjamin C and Mancera, Ricardo L
Journal of chemical information and modeling, 2008, 48(2), 397-408
PMID: 18211049     doi: 10.1021/ci700285e

The presence of water molecules plays an important role in the accuracy of ligand-protein docking predictions. Comprehensive docking simulations have been performed on a large set of ligand-protein complexes whose crystal structures contain water molecules in their binding sites. Only those water molecules found in the immediate vicinity of both the ligand and the protein were considered. We have investigated whether prior optimization of the orientation of water molecules in either the presence or absence of the bound ligand has any effect on the accuracy of docking predictions. We have observed a statistically significant overall increase in accuracy when water molecules are included during docking simulations and have found this to be independent of the method of optimization of the orientation of water molecules. These results confirm the importance of including water molecules whenever possible in a ligand-protein docking simulation. Our findings also reveal that prior optimization of the orientation of water molecules, in the absence of any bound ligand, does not have a detrimental effect on the improved accuracy of ligand-protein docking. This is important, given the use of docking simulations to predict the binding modes of new ligands or drug molecules.

• ASEDock-docking based on alpha spheres and excluded volumes
Goto, Junichi and Kataoka, Ryoichi and Muta, Hajime and Hirayama, Noriaki
Journal of chemical information and modeling, 2008, 48(3), 583-590
PMID: 18278891     doi: 10.1021/ci700352q

ASEDock is a novel docking program based on a shape similarity assessment between a concave portion (i.e., concavity) on a protein and the ligand. We have introduced two novel concepts into ASEDock. One is an ASE model, which is defined by the combination of alpha spheres generated at a concavity in a protein and the excluded volumes around the concavity. The other is an ASE score, which evaluates the shape similarity between the ligand and the ASE model. The ASE score selects and refines the initial pose by maximizing the overlap between the alpha spheres and the ligand, and minimizing the overlap between the excluded volume and the ligand. Because the ASE score makes good use of the Gaussian-type function for evaluating and optimizing the overlap between the ligand and the site model, it can pose a ligand onto the docking site relatively faster and more effectively than using potential energy functions. The posing stage through the use of the ASE score is followed by full atomistic energy minimization. Because the posing algorithm of ASEDock is free from any bias except for shape, it is a very robust docking method. A validation study using 59 high-quality X-ray structures of the complexes between drug-like molecules and the target proteins has demonstrated that ASEDock can faithfully reproduce experimentally determined docking modes of various druglike molecules in their target proteins. Almost 80% of the structures were reconstructed within the estimated experimental error. The success rate of similar to 98% was attained based on the docking criterion of the root-mean-square deviation (RMSD) of non-hydrogen atoms (<

• Molecular docking with multi-objective Particle Swarm Optimization
Janson, Stefan and Merkle, Daniel and Middendorf, Martin
Applied Soft Computing, 2008, 8(1), 666-675
doi: 10.1016/j.asoc.2007.05.005

The molecular docking problem is to find a good position and orientation for docking a small molecule (ligand) to a larger receptor molecule. In the first part of this paper we propose a new algorithm for solving the docking problem. This algorithm - called ClustMPSO - is based on Particle Swarm Optimization (PSO) and follows a multi-objective approach for comparing the quality of solutions. For the energy evaluation the algorithm uses the binding free energy function that is provided by the Autodock 3.05 tool. The experimental results show that ClustMPSO computes a more diverse set of possible docking conformations than the standard Simulated Annealing and Lamarckian Genetic Algorithm that are incorporated into Autodock. Moreover, ClustMPSO is significantly faster and more reliable in finding good solutions. In the second part of this paper a new approach for the prediction of a docking trajectory is proposed. In this approach the ligand is un-docked'' via a controlled random walk that can be biased into a given direction and where only positions are accepted that have an energy level that is below a given threshold.

• Using buriedness to improve discrimination between actives and inactives in docking
O'Boyle, Noel M. and Brewerton, Suzanne C. and Taylor, Robin
Journal of chemical information and modeling, 2008, 48(6), 1269-1278
PMID: 18533645     doi: 10.1021/ci8000452

A continuing problem in protein-ligand docking is the correct relative ranking of active molecules versus inactives. Using the ChemScore scoring function as implemented in the GOLD docking software, we have investigated the effect of scaling hydrogen bond, metal-ligand, and lipophilic interactions based on the buriedness of the interaction. Buriedness was measured using the receptor density, the number of protein heavy atoms within 8.0 angstrom. Terms in the scaling functions were optimized using negative data, represented by docked poses of inactive molecules. The objective function was the mean rank of the scores of the active poses in the Astex Diverse Set (Hartshorn et al. J. Med. Chem., 2007, 50, 726) with respect to the docked poses of 99 inactives. The final four-parameter model gave a substantial improvement in the average rank from 18.6 to 12.5. Similar results were obtained for an independent test set. Receptor density scaling is available as an option in the recent GOLD release.

• MedusaScore: an accurate force field-based scoring function for virtual drug screening.
Yin, Shuangye and Biedermannova, Lada and Vondrasek, Jiri and Dokholyan, Nikolay V
Journal of chemical information and modeling, 2008, 48(8), 1656-1662
PMID: 18672869     doi: 10.1021/ci8001167

Virtual screening is becoming an important tool for drug discovery. However, the application of virtual screening has been limited by the lack of accurate scoring functions. Here, we present a novel scoring function, MedusaScore, for evaluating protein-ligand binding. MedusaScore is based on models of physical interactions that include van der Waals, solvation, and hydrogen bonding energies. To ensure the best transferability of the scoring function, we do not use any protein-ligand experimental data for parameter training. We then test the MedusaScore for docking decoy recognition and binding affinity prediction and find superior performance compared to other widely used scoring functions. Statistical analysis indicates that one source of inaccuracy of MedusaScore may arise from the unaccounted entropic loss upon ligand binding, which suggests avenues of approach for further MedusaScore improvement.

## 2007

• Ensemble docking of multiple protein structures: considering protein structural variations in molecular docking.
Huang, Sheng-You and Zou, Xiaoqin
Proteins, 2007, 66(2), 399-421
PMID: 17096427     doi: 10.1002/prot.21214

One approach to incorporate protein flexibility in molecular docking is the use of an ensemble consisting of multiple protein structures. Sequentially docking each ligand into a large number of protein structures is computationally too expensive to allow large-scale database screening. It is challenging to achieve a good balance between docking accuracy and computational efficiency. In this work, we have developed a fast, novel docking algorithm utilizing multiple protein structures, referred to as ensemble docking, to account for protein structural variations. The algorithm can simultaneously dock a ligand into an ensemble of protein structures and automatically select an optimal protein structure that best fits the ligand by optimizing both ligand coordinates and the conformational variable m, where m represents the m-th structure in the protein ensemble. The docking algorithm was validated on 10 protein ensembles containing 105 crystal structures and 87 ligands in terms of binding mode and energy score predictions. A success rate of 93% was obtained with the criterion of root-mean-square deviation <2.5 A if the top five orientations for each ligand were considered, comparable to that of sequential docking in which scores for individual docking are merged into one list by re-ranking, and significantly better than that of single rigid-receptor docking (75% on average). Similar trends were also observed in binding score predictions and enrichment tests of virtual database screening. The ensemble docking algorithm is computationally efficient, with a computational time comparable to that for docking a ligand into a single protein structure. In contrast, the computational time for the sequential docking method increases linearly with the number of protein structures in the ensemble. The algorithm was further evaluated using a more realistic ensemble in which the corresponding bound protein structures of inhibitors were excluded. The results show that ensemble docking successfully predicts the binding modes of the inhibitors, and discriminates the inhibitors from a set of noninhibitors with similar chemical properties. Although multiple experimental structures were used in the present work, our algorithm can be easily applied to multiple protein conformations generated by computational methods, and helps improve the efficiency of other existing multiple protein structure(MPS)-based methods to accommodate protein flexibility.

• WinDock: structure-based drug discovery on Windows-based PCs.
Hu, Zengjian and Southerland, William
Journal of computational chemistry, 2007, 28(14), 2347-2351
PMID: 17476686     doi: 10.1002/jcc.20756

In recent years, virtual database screening using high-throughput docking (HTD) has emerged as a very important tool and a well-established method for finding new lead compounds in the drug discovery process. With the advent of powerful personal computers (PCs), it is now plausible to perform HTD investigations on these inexpensive PCs. To make HTD more accessible to a broad community, we present here WinDock, an integrated application designed to help researchers perform structure-based drug discovery tasks under a uniform, user friendly graphical interface for Windows-based PCs. WinDock combines existing small molecule searchable three-dimensional (3D) libraries, homology modeling tools, and ligand-protein docking programs in a semi-automatic, interactive manner, which guides the user through the use of each integrated software component. WinDock is coded in C++.

• Solvated interaction energy (SIE) for scoring protein-ligand binding affinities. 1. Exploring the parameter space.
Naïm, Marwen and Bhat, Sathesh and Rankin, Kathryn N and Dennis, Sheldon and Chowdhury, Shafinaz F and Siddiqi, Imran and Drabik, Piotr and Sulea, Traian and Bayly, Christopher I and Jakalian, Araz and Purisima, Enrico O
Journal of chemical information and modeling, 2007, 47(1), 122-133
PMID: 17238257     doi: 10.1021/ci600406v

We present a binding free energy function that consists of force field terms supplemented by solvation terms. We used this function to calibrate the solvation model along with the binding interaction terms in a self-consistent manner. The motivation for this approach was that the solute dielectric-constant dependence of calculated hydration gas-to-water transfer free energies is markedly different from that of binding free energies (J. Comput. Chem. 2003, 24, 954). Hence, we sought to calibrate directly the solvation terms in the context of a binding calculation. The five parameters of the model were systematically scanned to best reproduce the absolute binding free energies for a set of 99 protein-ligand complexes. We obtained a mean unsigned error of 1.29 kcal/mol for the predicted absolute binding affinity in a parameter space that was fairly shallow near the optimum. The lowest errors were obtained with solute dielectric values of Din

• A flexible approach to induced fit docking.
Nabuurs, Sander B and Wagener, Markus and de Vlieg, Jacob
Journal of medicinal chemistry, 2007, 50(26), 6507-6518
PMID: 18031000     doi: 10.1021/jm070593p

We present Fleksy, a new approach to consider both ligand and receptor flexibility in small molecule docking. Pivotal to our method is the use of a receptor ensemble to describe protein flexibility. To construct these ensembles, we use a backbone-dependent rotamer library and implement the concept of interaction sampling. The latter allows the evaluation of different orientations of ambivalent interaction partners. The docking stage consists of an ensemble-based soft-docking experiment using FlexX-Ensemble, followed by an effective flexible receptor-ligand complex optimization using Yasara. Fleksy produces a set of receptor-ligand complexes ranked using a consensus scoring function combining docking scores and force field energies. Averaged over three cross-docking datasets, containing 35 different receptor-ligand complexes in total, Fleksy reproduces the observed binding mode within 2.0 A for 78% of the complexes. This compares favorably to the rigid receptor FlexX program, which on average reaches a success rate of 44% for these datasets.

• Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search.
Jain, Ajay N
Journal of computer-aided molecular design, 2007, 21(5), 281-306
PMID: 17387436     doi: 10.1007/s10822-007-9114-2

The Surflex flexible molecular docking method has been generalized and extended in two primary areas related to the search component of docking. First, incorporation of a small-molecule force-field extends the search into Cartesian coordinates constrained by internal ligand energetics. Whereas previous versions searched only the alignment and acyclic torsional space of the ligand, the new approach supports dynamic ring flexibility and all-atom optimization of docked ligand poses. Second, knowledge of well established molecular interactions between ligand fragments and a target protein can be directly exploited to guide the search process. This offers advantages in some cases over the search strategy where ligand alignment is guided solely by a "protomol" (a pre-computed molecular representation of an idealized ligand). Results are presented on both docking accuracy and screening utility using multiple publicly available benchmark data sets that place Surflex's performance in the context of other molecular docking methods. In terms of docking accuracy, Surflex-Dock 2.1 performs as well as the best available methods. In the area of screening utility, Surflex's performance is extremely robust, and it is clearly superior to other methods within the set of cases for which comparative data are available, with roughly double the screening enrichment performance.

• SODOCK: swarm optimization for highly flexible protein-ligand docking.
Chen, Hung-Ming and Liu, Bo-Fu and Huang, Hui-Ling and Hwang, Shiow-Fen and Ho, Shinn-Ying
Journal of computational chemistry, 2007, 28(2), 612-623
PMID: 17186483     doi: 10.1002/jcc.20542

Protein-ligand docking can be formulated as a parameter optimization problem associated with an accurate scoring function, which aims to identify the translation, orientation, and conformation of a docked ligand with the lowest energy. The parameter optimization problem for highly flexible ligands with many rotatable bonds is more difficult than that for less flexible ligands using genetic algorithm (GA)-based approaches, due to the large numbers of parameters and high correlations among these parameters. This investigation presents a novel optimization algorithm SODOCK based on particle swarm optimization (PSO) for solving flexible protein-ligand docking problems. To improve efficiency and robustness of PSO, an efficient local search strategy is incorporated into SODOCK. The implementation of SODOCK adopts the environment and energy function of AutoDock 3.05. Computer simulation results reveal that SODOCK is superior to the Lamarckian genetic algorithm (LGA) of AutoDock, in terms of convergence performance, robustness, and obtained energy, especially for highly flexible ligands. The results also reveal that PSO is more suitable than the conventional GA in dealing with flexible docking problems with high correlations among parameters. This investigation also compared SODOCK with four state-of-the-art docking methods, namely GOLD 1.2, DOCK 4.0, FlexX 1.8, and LGA of AutoDock 3.05. SODOCK obtained the smallest RMSD in 19 of 37 cases. The average 2.29 A of the 37 RMSD values of SODOCK was better than those of other docking programs, which were all above 3.0 A.

• Docking ligands into flexible and solvated macromolecules. 1. Development and validation of FITTED 1.0.
Corbeil, Christopher R and Englebienne, Pablo and Moitessier, Nicolas
Journal of chemical information and modeling, 2007, 47(2), 435-449
PMID: 17305329     doi: 10.1021/ci6002637

We report the development and validation of a novel suite of programs, FITTED 1.0, for the docking of flexible ligands into flexible proteins. This docking tool is unique in that it can deal with both the flexibility of macromolecules (side chains and main chains) and the presence of bridging water molecules while treating protein/ligand complexes as realistically dynamic systems. This software relies on a genetic algorithm to account for the flexibility of the two molecules as well as the location of bridging water molecules. In addition, FITTED 1.0 features a novel application of a switching function to retain or displace key water molecules from the protein-ligand complexes. Two independent modules, ProCESS and SMART, were developed to set up the proteins and the ligands prior to the docking stage. Validation of the accuracy of the software was achieved via the application of FITTED 1.0 to the docking of inhibitors of HIV-1 protease, thymidine kinase, trypsin, factor Xa, and MMP to their respective proteins.

• ParDOCK: An all atom energy based Monte Carlo docking protocol for protein-ligand complexes
Gupta, A. and Gandhimathi, A. and Sharma, P. and Jayaram, B.
Protein and peptide letters, 2007, 14(7), 632-646
PMID: 17897088

We report here an all-atom energy based Monte Carlo docking procedure tested on a dataset of 226 protein-ligand complexes. Average root mean square deviation ( RMSD) from crystal conformation was observed to be similar to 0.53 angstrom. The correlation coefficient (r(2)) for the predicted binding free energies calculated using the docked structures against experimental binding affinities was 0.72. The docking protocol is web-enabled as a free software at www.scfbio-iitd.res.in/dock.

• EADock: docking of small molecules into protein active sites with a multiobjective evolutionary optimization.
Grosdidier, Aurélien and Zoete, Vincent and Michielin, Olivier
Proteins, 2007, 67(4), 1010-1025
PMID: 17380512     doi: 10.1002/prot.21367

In recent years, protein-ligand docking has become a powerful tool for drug development. Although several approaches suitable for high throughput screening are available, there is a need for methods able to identify binding modes with high accuracy. This accuracy is essential to reliably compute the binding free energy of the ligand. Such methods are needed when the binding mode of lead compounds is not determined experimentally but is needed for structure-based lead optimization. We present here a new docking software, called EADock, that aims at this goal. It uses an hybrid evolutionary algorithm with two fitness functions, in combination with a sophisticated management of the diversity. EADock is interfaced with the CHARMM package for energy calculations and coordinate handling. A validation was carried out on 37 crystallized protein-ligand complexes featuring 11 different proteins. The search space was defined as a sphere of 15 A around the center of mass of the ligand position in the crystal structure, and on the contrary to other benchmarks, our algorithm was fed with optimized ligand positions up to 10 A root mean square deviation (RMSD) from the crystal structure, excluding the latter. This validation illustrates the efficiency of our sampling strategy, as correct binding modes, defined by a RMSD to the crystal structure lower than 2 A, were identified and ranked first for 68% of the complexes. The success rate increases to 78% when considering the five best ranked clusters, and 92% when all clusters present in the last generation are taken into account. Most failures could be explained by the presence of crystal contacts in the experimental structure. Finally, the ability of EADock to accurately predict binding modes on a real application was illustrated by the successful docking of the RGD cyclic pentapeptide on the alphaVbeta3 integrin, starting far away from the binding pocket.

• FLIPDock: docking flexible ligands into flexible receptors.
Zhao, Yong and Sanner, Michel F
Proteins, 2007, 68(3), 726-737
PMID: 17523154     doi: 10.1002/prot.21423

Conformational changes of biological macromolecules when binding with ligands have long been observed and remain a challenge for automated docking methods. Here we present a novel protein-ligand docking software called FLIPDock (Flexible LIgand-Protein Docking) allowing the automated docking of flexible ligand molecules into active sites of flexible receptor molecules. In FLIPDock, conformational spaces of molecules are encoded using a data structure that we have developed recently called the Flexibility Tree (FT). While the FT can represent fully flexible ligands, it was initially designed as a hierarchical and multiresolution data structure for the selective encoding of conformational subspaces of large biological macromolecules. These conformational subspaces can be built to span a range of conformations important for the biological activity of a protein. A variety of motions can be combined, ranging from domains moving as rigid bodies or backbone atoms undergoing normal mode-based deformations, to side chains assuming rotameric conformations. In addition, these conformational subspaces are parameterized by a small number of variables which can be searched during the docking process, thus effectively modeling the conformational changes in a flexible receptor. FLIPDock searches the variables using genetic algorithm-based search techniques and evaluates putative docking complexes with a scoring function based on the AutoDock3.05 force-field. In this paper, we describe the concepts behind FLIPDock and the overall architecture of the program. We demonstrate FLIPDock's ability to solve docking problems in which the assumption of a rigid receptor previously prevented the successful docking of known ligands. In particular, we repeat an earlier cross docking experiment and demonstrate an increased success rate of 93.5%, compared to original 72% success rate achieved by AutoDock over the 400 cross-docking calculations. We also demonstrate FLIPDock's ability to handle conformational changes involving backbone motion by docking balanol to an adenosine-binding pocket of protein kinase A.

• Alternative to consensus scoring-a new approach toward the qualitative combination of docking algorithms.
Wolf, Antje and Zimmermann, Marc and Hofmann-Apitius, Martin
Journal of chemical information and modeling, 2007, 47(3), 1036-1044
PMID: 17492829     doi: 10.1021/ci6004965

Since the development of the first docking algorithm in the early 1980s a variety of different docking approaches and tools has been created in order to solve the docking problem. Subsequent studies have shown that the docking performance of most tools strongly depends on the considered target. Thus it is hard to choose the best algorithm in the situation at hand. The docking tools FlexX and AutoDock are among the most popular programs for docking flexible ligands into target proteins. Their analysis, comparison, and combination are the topics of this study. In contrast to standard consensus scoring techniques which integrate different scoring algorithms usually only by their rank, we focus on a more general approach. Our new combined docking workflow-AutoxX-unifies the interaction models of AutoDock and FlexX rather than combining the scores afterward which allows interpretability of the results. The performance of FlexX, AutoDock, and the combined algorithm AutoxX was evaluated on the basis of a test set of 204 structures from the Protein Data Bank (PDB). AutoDock and FlexX show a highly diverse redocking accuracy at the different complexes which assures again the usefulness of taking several docking algorithms into account. With the combined docking the number of complexes reproduced below an rmsd of 2.5 A could be raised by 10. AutoxX had a strong positive effect on several targets. The highest performance increase could be found when redocking 20 protein-ligand complexes of alpha-thrombin, plasmepsin, neuraminidase, and d-xylose isomerase. A decrease was found for gamma-chymotrypsin. The results show that-applied to the right target-AutoxX can improve the docking performance compared to AutoDock and FlexX alone.

• pso@autodock: a fast flexible molecular docking program based on Swarm intelligence.
Namasivayam, Vigneshwaran and Günther, Robert
Chemical biology & drug design, 2007, 70(6), 475-484
PMID: 17986206     doi: 10.1111/j.1747-0285.2007.00588.x

On the quest of novel therapeutics, molecular docking methods have proven to be valuable tools for screening large libraries of compounds determining the interactions of potential drugs with the target proteins. A widely used docking approach is the simulation of the docking process guided by a binding energy function. On the basis of the molecular docking program autodock, we present pso@autodock as a tool for fast flexible molecular docking. Our novel Particle Swarm Optimization (PSO) algorithms varCPSO and varCPSO-ls are suited for rapid docking of highly flexible ligands. Thus, a ligand with 23 rotatable bonds was successfully docked within as few as 100 000 computing steps (rmsd

• A multivariate approach to investigate docking parameters' effects on docking performance
Andersson, C. David and Thysell, Elin and Lindstrom, Anton and Bylesjo, Max and Raubacher, Florian and Linusson, Anna
Journal of chemical information and modeling, 2007, 47(4), 1673-1687
PMID: 17559207     doi: 10.1021/ci6005596

Increasingly powerful docking programs for analyzing and estimating the strength of protein-ligand interactions have been developed in recent decades, and they are now valuable tools in drug discovery. Software used to perform dockings relies on a number of parameters that affect various steps in the docking procedure. However, identifying the best choices of the settings for these parameters is often challenging. Therefore, the settings of the parameters are quite often left at their default values, even though scientists with long experience with a specific docking tool know that modifying certain parameters can improve the results. In the study presented here, we have used statistical experimental design and subsequent regression based on root-mean-square deviation values using partial least-square projections to latent structures (PLS) to scrutinize the effects of different parameters on the docking performance of two software packages: FRED and GOLD. Protein-ligand complexes with a high level of ligand diversity were selected from the PDBbind database for the study, using principal component analysis based on 1D and 2D descriptors, and space-filling design. The PLS models showed quantitative relationships between the docking parameters and the ability of the programs to reproduce the ligand crystallographic conformation. The PLS models also revealed which of the parameters and what parameter settings were important for the docking performance of the two programs. Furthermore, the variation in docking results obtained with specific parameter settings for different protein-ligand complexes in the diverse set examined indicates that there is great potential for optimizing the parameter settings for selected sets of proteins.

• eHiTS: A new fast, exhaustive flexible ligand docking system
Zsoldos, Zsolt and Reid, Darryl and Simon, Aniko and Sadjad, Sayyed Bashir and Johnson, A. Peter
Journal of molecular graphics & modelling, 2007, 26(1), 198-212
PMID: 16860582     doi: 10.1016/j.jmgm.2006.06.002

The flexible ligand docking problem is divided into two subproblems: pose/conformation search and scoring function. For successful virtual screening the search algorithm must be fast and able to find the optimal binding pose and conformation of the ligand. Statistical analysis of experimental data of bound ligand conformations is presented with conclusions about the sampling requirements for docking algorithms.eHiTS is an exhaustive flexible-docking method that systematically covers the part of the conformational and positional search space that avoids severe steric clashes, producing highly accurate docking poses at a speed practical for virtual high-throughput screening.The customizable scoring function of eHiTS combines novel terms (based on local surface point contact evaluation) with traditional empirical and statistical approaches.Validation results of eHiTS are presented and compared to three other docking software on a set of 91 PDB structures that are common to the validation sets published for the other programs. (C) 2006 Elsevier Inc. All rights reserved.

• Optimizing fragment and scaffold docking by use of molecular interaction fingerprints.
Marcou, Gilles and Rognan, Didier
Journal of chemical information and modeling, 2007, 47(1), 195-207
PMID: 17238265     doi: 10.1021/ci600342e

Protein-ligand interaction fingerprints have been used to postprocess docking poses of three ligand data sets: a set of 40 low-molecular-weight compounds from the Protein Data Bank, a collection of 40 scaffolds from pharmaceutically relevant protein ligands, and a database of 19 scaffolds extracted from true cdk2 inhibitors seeded in 2230 scaffold decoys. Four popular docking tools (FlexX, Glide, Gold, and Surflex) were used to generate poses for ligands of the three data sets. In all cases, scoring by the similarity of interaction fingerprints to a given reference was statistically superior to conventional scoring functions in posing low-molecular-weight fragments, predicting protein-bound scaffold coordinates according to the known binding mode of related ligands, and screening a scaffold library to enrich a hit list in true cdk2-targeted scaffolds.

• Supervised scoring models with docked ligand conformations for structure-based virtual screening.
Teramoto, Reiji and Fukunishi, Hiroaki
Journal of chemical information and modeling, 2007, 47(5), 1858-1867
PMID: 17685604     doi: 10.1021/ci700116z

Protein-ligand docking programs have been used to efficiently discover novel ligands for target proteins from large-scale compound databases. However, better scoring methods are needed. Generally, scoring functions are optimized by means of various techniques that affect their fitness for reproducing X-ray structures and protein-ligand binding affinities. However, these scoring functions do not always work well for all target proteins. A scoring function should be optimized for a target protein to enhance enrichment for structure-based virtual screening. To address this problem, we propose the supervised scoring model (SSM), which takes into account the protein-ligand binding process using docked ligand conformations with supervised learning for optimizing scoring functions against a target protein. SSM employs a rough linear correlation between binding free energy and the root mean square deviation of a native ligand for predicting binding energy. We applied SSM to the FlexX scoring function, that is, F-Score, with five different target proteins: thymidine kinase (TK), estrogen receptor (ER), acetylcholine esterase (AChE), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). For these five proteins, SSM always enhanced enrichment better than F-Score, exhibiting superior performance that was particularly remarkable for TK, AChE, and PPARgamma. We also demonstrated that SSM is especially good at enhancing enrichments of the top ranks of screened compounds, which is useful in practical drug screening.

• GlamDock: development and validation of a new docking tool on several thousand protein-ligand complexes.
Tietze, Simon and Apostolakis, Joannis
Journal of chemical information and modeling, 2007, 47(4), 1657-1672
PMID: 17585857     doi: 10.1021/ci7001236

In this study, we present GlamDock, a new docking tool for flexible ligand docking. GlamDock (version 1.0) is based on a simple Monte Carlo with minimization procedure. The main features of the method are the energy function, which is a continuously differentiable empirical potential, and the definition of the search space, which combines internal coordinates for the conformation of the ligand, with a mapping-based description of the rigid body translation and rotation. First, we validate GlamDock on a standard benchmark, a set of 100 protein-ligand complexes, which allows comparative evaluation to existing docking tools. The results on this benchmark show that GlamDock is at least comparable in efficiency and accuracy to the best existing docking tools. The main focus of this work is the validation on the scPDB database of protein-ligand complexes. The size of this data set allows a thorough analysis of the dependencies of docking accuracy on features of the protein-ligand system. In particular, it allows a two-dimensional analysis of the results, which identifies a number of interesting dependencies that are generally lost or even misinterpreted in the one-dimensional approach. The overall result that GlamDock correctly predicts the complex structure in practically half of the cases in the scPDB is important not only for screening ligands against a particular protein but even more so for inverse screening, that is, the identification of the correct targets for a particular ligand.

• A semiempirical free energy force field with charge-based desolvation.
Huey, Ruth and Morris, Garrett M and Olson, Arthur J and Goodsell, David S
Journal of computational chemistry, 2007, 28(6), 1145-1152
PMID: 17274016     doi: 10.1002/jcc.20634

The authors describe the development and testing of a semiempirical free energy force field for use in AutoDock4 and similar grid-based docking methods. The force field is based on a comprehensive thermodynamic model that allows incorporation of intramolecular energies into the predicted free energy of binding. It also incorporates a charge-based method for evaluation of desolvation designed to use a typical set of atom types. The method has been calibrated on a set of 188 diverse protein-ligand complexes of known structure and binding energy, and tested on a set of 100 complexes of ligands with retroviral proteases. The force field shows improvement in redocking simulations over the previous AutoDock3 force field.

## 2006

• M-Score: A Knowledge-Based Potential Scoring Function Accounting for Protein Atom Mobility
Yang, Chao-Yie and Wang, Renxiao and Wang, Shaomeng
Journal of medicinal chemistry, 2006, 49(20), 5903-5911
doi: 10.1021/jm050043w

• Information-driven protein-DNA docking using HADDOCK: it is a matter of flexibility.
van Dijk, Marc and van Dijk, Aalt D J and Hsu, Victor and Boelens, Rolf and Bonvin, Alexandre M J J
Nucleic acids research, 2006, 34(11), 3317-3325
PMID: 16820531     doi: 10.1093/nar/gkl412

Intrinsic flexibility of DNA has hampered the development of efficient protein-DNA docking methods. In this study we extend HADDOCK (High Ambiguity Driven DOCKing) [C. Dominguez, R. Boelens and A. M. J. J. Bonvin (2003) J. Am. Chem. Soc. 125, 1731-1737] to explicitly deal with DNA flexibility. HADDOCK uses non-structural experimental data to drive the docking during a rigid-body energy minimization, and semi-flexible and water refinement stages. The latter allow for flexibility of all DNA nucleotides and the residues of the protein at the predicted interface. We evaluated our approach on the monomeric repressor-DNA complexes formed by bacteriophage 434 Cro, the Escherichia coli Lac headpiece and bacteriophage P22 Arc. Starting from unbound proteins and canonical B-DNA we correctly predict the correct spatial disposition of the complexes and the specific conformation of the DNA in the published complexes. This information is subsequently used to generate a library of pre-bent and twisted DNA structures that served as input for a second docking round. The resulting top ranking solutions exhibit high similarity to the published complexes in terms of root mean square deviations, intermolecular contacts and DNA conformation. Our two-stage docking method is thus able to successfully predict protein-DNA complexes from unbound constituents using non-structural experimental data to drive the docking.

• A method for induced-fit docking, scoring, and ranking of flexible ligands. Application to peptidic and pseudopeptidic beta-secretase (BACE 1) inhibitors.
Moitessier, Nicolas and Therrien, Eric and Hanessian, Stephen
Journal of medicinal chemistry, 2006, 49(20), 5885-5894
PMID: 17004704     doi: 10.1021/jm050138y

Inhibition of beta-secretase (BACE 1) has recently been investigated as a promising therapeutic approach in the treatment of Alzheimer's disease, and a growing number of BACE 1 inhibitors and crystal structures of BACE 1/inhibitors complexes have been reported. We report herein a predictive computational method and its application to potential BACE 1 inhibitors. Using a training set of 50 known highly flexible inhibitors, we developed a docking method that accounts for the flexibility of both the protein and the inhibitors. Protein flexibility is accounted for using a specifically designed genetic algorithm. We next developed a scoring function consisting of force field evaluation of the inhibitor/protein interactions and two additional terms for hydrogen bonding and entropy change upon binding. Discarding three outliers from the training set, our protocol was found to perform well with an rmsd of 1.19 kcal/mol. Evaluation of the predictive power was next carried out by virtual screening of 80 synthetic compounds. The significant enrichment at the top of the ranking list in active compounds demonstrated the ability of the docking and scoring protocol to rank the compounds relative to their activities.

• Parameter estimation for scoring protein-ligand interactions using negative training data.
Pham, Tuan A and Jain, Ajay N
Journal of medicinal chemistry, 2006, 49(20), 5856-5868
PMID: 17004701     doi: 10.1021/jm050040j

Surflex-Dock employs an empirically derived scoring function to rank putative protein-ligand interactions by flexible docking of small molecules to proteins of known structure. The scoring function employed by Surflex was developed purely on the basis of positive data, comprising noncovalent protein-ligand complexes with known binding affinities. Consequently, scoring function terms for improper interactions received little weight in parameter estimation, and an ad hoc scheme for avoiding protein-ligand interpenetration was adopted. We present a generalized method for incorporating synthetically generated negative training data, which allows for rigorous estimation of all scoring function parameters. Geometric docking accuracy remained excellent under the new parametrization. In addition, a test of screening utility covering a diverse set of 29 proteins and corresponding ligand sets showed improved performance. Maximal enrichment of true ligands over nonligands exceeded 20-fold in over 80% of cases, with enrichment of greater than 100-fold in over 50% of cases.

• PSI-DOCK: towards highly efficient and accurate flexible ligand docking.
Pei, Jianfeng and Wang, Qi and Liu, Zhenming and Li, Qingliang and Yang, Kun and Lai, Luhua
Proteins, 2006, 62(4), 934-946
PMID: 16395666     doi: 10.1002/prot.20790

We have developed a new docking method, Pose-Sensitive Inclined (PSI)-DOCK, for flexible ligand docking. An improved SCORE function has been developed and used in PSI-DOCK for binding free energy evaluation. The improved SCORE function was able to reproduce the absolute binding free energies of a training set of 200 protein-ligand complexes with a correlation coefficient of 0.788 and a standard error of 8.13 kJ/mol. For ligand binding pose exploration, a unique searching strategy was designed in PSI-DOCK. In the first step, a tabu-enhanced genetic algorithm with a rapid shape-complementary scoring function is used to roughly explore and store potential binding poses of the ligand. Then, these predicted binding poses are optimized and compete against each other by using a genetic algorithm with the accurate SCORE function to determine the binding pose with the lowest docking energy. The PSI-DOCK 1.0 program is highly efficient in identifying the experimental binding pose. For a test dataset of 194 complexes, PSI-DOCK 1.0 achieved a 67% success rate (RMSD < 2.0 A) for only one run and a 74% success rate for 10 runs. PSI-DOCK can also predict the docking binding free energy with high accuracy. For a test set of 64 complexes, the correlation between the experimentally observed binding free energies and the docking binding free energies for 64 complexes is r

• Protein Alpha Shape (PAS) Dock: a new gaussian-based score function suitable for docking in homology modelled protein structures.
T{\o}ndel, Kristin and Anderssen, Endre and Drabl{\o}s, Finn
Journal of computer-aided molecular design, 2006, 20(3), 131-144
PMID: 16652207     doi: 10.1007/s10822-006-9041-7

Protein Alpha Shape (PAS) Dock is a new empirical score function suitable for virtual library screening using homology modelled protein structures. Here, the score function is used in combination with the geometry search method Tabu search. A description of the protein binding site is generated using gaussian property fields like in Protein Alpha Shape Similarity Analysis (PASSA). Gaussian property fields are also used to describe the ligand properties. The overlap between the receptor and ligand hydrophilicity and lipophilicity fields is maximised, while minimising steric clashes. Gaussian functions introduce a smoothing of the property fields. This makes the score function robust against small structural variations, and therefore suitable for use with homology models. This also makes it less critical to include protein flexibility in the docking calculations. We use a fast and simplified version of the score function in the geometry search, while a more detailed version is used for the final prediction of the binding free energies. This use of a two-level scoring makes PAS-Dock computationally efficient, and well suited for virtual screening. The PAS-Dock score function is trained on 218 X-ray structures of protein- ligand complexes with experimental binding affinities. The performance of PAS-Dock is compared to two other docking methods, AutoDock and MOE-Dock, with respect to both accuracy and computational efficiency. According to this study, PAS-Dock is more computationally efficient than both AutoDock and MOE-Dock, and gives a better prediction of the free energies of binding. PAS-Dock is also more robust against structural variations than AutoDock.

• kinDOCK: a tool for comparative docking of protein kinase ligands.
Martin, Laetitia and Catherinot, Vincent and Labesse, Gilles
Nucleic acids research, 2006, 34(Web Server issue), W325-9
PMID: 16845019     doi: 10.1093/nar/gkl211

KinDOCK is a new web server for the analysis of ATP-binding sites of protein kinases. This characterization is based on the docking of ligands already co-crystallized with other protein kinases. A structural library of protein kinase-ligand complexes has been extracted from the Protein Data Bank (PDB). This library can provide both potential ligands and their putative binding orientation for a given protein kinase. After protein-protein structural superposition, the ligands are transferred from the template complexes to the target protein kinase. The resulting complexes are evaluated using the program SCORE to compute a theoretical affinity. They can be dynamically visualized to allow a rapid mapping of important steric clashes and potential substitutions relevant for specificity and affinity. These characteristics allow a quick characterization of protein kinase active sites including conformation changes potentially required to accommodate particular ligands. Additionally, promising pharmacophores can be identified in the focussed library. These features will help to rationalize or optimize virtual screening (VS) on larger chemical compound libraries. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/kindock/.

• Effective handling of induced-fit motion in flexible docking.
Mizutani, Miho Yamada and Takamatsu, Yoshihiro and Ichinose, Tazuko and Nakamura, Kensuke and Itai, Akiko
Proteins, 2006, 63(4), 878-891
PMID: 16532451     doi: 10.1002/prot.20931

For structure-based drug design, where various ligand structures need to be docked to a target protein structure, a docking method that can handle conformational flexibility of not only the ligand, but also the protein, is indispensable. We have developed a simple and effective approach for dealing with the local induced-fit motion of the target protein, and implemented it in our docking tool, ADAM. Our approach efficiently combines the following two strategies: a vdW-offset grid in which the protein cavity is enlarged uniformly, and structure optimization allowing the motion of ligand and protein atoms. To examine the effectiveness of our approach, we performed docking validation studies, including redocking in 18 test cases and foreign-docking, in which various ligands from foreign crystal structures of complexes are docked into a target protein structure, in 22 cases (on five target proteins). With the original ADAM, the correct docking modes (RMSD < 2.0 A) were not present among the top 20 models in one case of redocking and four cases of foreign-docking. When the handling of induced-fit motion was implemented, the correct solutions were acquired in all 40 test cases. In foreign-docking on thymidine kinase, the correct docking modes were obtained as the top-ranked solutions for all 10 test ligands by our combinatorial approach, and this appears to be the best result ever reported with any docking tool. The results of docking validation have thus confirmed the effectiveness of our approach, which can provide reliable docking models even in the case of foreign-docking, where conformational change of the target protein cannot be ignored. We expect that this approach will contribute substantially to actual drug design, including virtual screening.

• TarFisDock: a web server for identifying drug targets with docking approach
Li, Honglin and Gao, Zhenting and Kang, Ling and Zhang, Hailei and Yang, Kun and Yu, Kunqian and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Shen, Jianhua and Wang, Xicheng and Jiang, Hualiang
Nucleic acids research, 2006, 34(Web Server issue), W219-W224
PMID: 16844997     doi: 10.1093/nar/gkl114

TarFisDock is a web-based tool for automating the procedure of searching for small molecule-protein interactions over a large repertoire of protein structures. It offers PDTD (potential drug target database), a target database containing 698 protein structures covering 15 therapeutic areas and a reverse ligand protein docking program. In contrast to conventional ligand-protein docking, reverse ligand-protein docking aims to seek potential protein targets by screening an appropriate protein database. The input file of this web server is the small molecule to be tested, in standard mol2 format; TarFisDock then searches for possible binding proteins for the given small molecule by use of a docking approach. The ligand-protein interaction energy terms of the program DOCK are adopted for ranking the proteins. To test the reliability of the TarFisDock server, we searched the PDTD for putative binding proteins for vitamin E and 4H-tamoxifen. The top 2 and 10% candidates of vitamin E binding proteins identified by TarFisDock respectively cover 30 and 50% of reported targets verified or implicated by experiments; and 30 and 50% of experimentally confirmed targets for 4H-tamoxifen appear amongst the top 2 and 5% of the TarFisDock predicted candidates, respectively. Therefore, TarFisDock may be a useful tool for target identification, mechanism study of old drugs and probes discovered from natural products. TarFisDock and PDTD are available at http://www.dddc.ac.cn/tarfisdock/.

• ROSETTALIGAND: protein-small molecule docking with full side-chain flexibility.
Meiler, Jens and Baker, David
Proteins, 2006, 65(3), 538-548
PMID: 16972285     doi: 10.1002/prot.21086

Protein-small molecule docking algorithms provide a means to model the structure of protein-small molecule complexes in structural detail and play an important role in drug development. In recent years the necessity of simulating protein side-chain flexibility for an accurate prediction of the protein-small molecule interfaces has become apparent, and an increasing number of docking algorithms probe different approaches to include protein flexibility. Here we describe a new method for docking small molecules into protein binding sites employing a Monte Carlo minimization procedure in which the rigid body position and orientation of the small molecule and the protein side-chain conformations are optimized simultaneously. The energy function comprises van der Waals (VDW) interactions, an implicit solvation model, an explicit orientation hydrogen bonding potential, and an electrostatics model. In an evaluation of the scoring function the computed energy correlated with experimental small molecule binding energy with a correlation coefficient of 0.63 across a diverse set of 229 protein- small molecule complexes. The docking method produced lowest energy models with a root mean square deviation (RMSD) smaller than 2 A in 71 out of 100 protein-small molecule crystal structure complexes (self-docking). In cross-docking calculations in which both protein side-chain and small molecule internal degrees of freedom were varied the lowest energy predictions had RMSDs less than 2 A in 14 of 20 test cases.

• Multiple target screening method for robust and accurate in silico ligand screening.
Fukunishi, Yoshifumi and Mikami, Yoshiaki and Kubota, Satoru and Nakamura, Haruki
Journal of molecular graphics & modelling, 2006, 25(1), 61-70
PMID: 16376595     doi: 10.1016/j.jmgm.2005.11.006

We developed a new in silico multiple target screening (MTS) method, based on a multi-receptor versus multi-ligand docking affinity matrixes, and examined its robustness against changes in the scoring system. According to this method, compounds in a database are docked to multiple proteins. The compounds among these proteins that are likely bind to the target protein are selected as the members of the candidate-hit compound group. Then, the compounds in the group are sorted into descending order using the docking score: the first (n-th) compound is expected to be the most (n-th) probable hit compound. This method was applied to the analysis of a set of 142 receptors and 142 compounds using a receptor-ligand docking program, Sievgene [Y. Fukunishi, Y. Mikami, H. Nakamura, Similarities among receptor pockets and among compounds: analysis and application to in silico ligand screening, J. Mol. Graphics Modelling, 24 (2005) 34-45], and the results demonstrated that this method achieves a high hit ratio compared to uniform sampling. We prepared two new scores: the DeltaG score, designed to reproduce the protein-ligand binding free energy, and the hit-optimized score, designed to maximize the hit ratio of in silico screening. Using the Sievgene docking score, DeltaG score and hit-optimized score, the MTS method is more robust than the multiple active-site correction scoring method [G.P.A. Vigers, J.P. Rizzi, Multiple active site corrections for docking and virtual screening, J. Med. Chem., 47 (2004) 80-89].

• Automatic and efficient decomposition of two-dimensional structures of small molecules for fragment-based high-throughput docking.
Kolb, Peter and Caflisch, Amedeo
Journal of medicinal chemistry, 2006, 49(25), 7384-7392
PMID: 17149868     doi: 10.1021/jm060838i

The computer program DAIM (Decomposition and Identification of Molecules) has been developed to automatically break up compounds in small-molecule libraries for fragment-based docking as well as database analysis. Here, DAIM is evaluated on 130 ligands derived from known crystal structures of ligand-protein complexes. The decomposition and a new fingerprint-based identification technique are used to select anchor fragments for docking. The docking results show that the DAIM selection is superior to size-based or random selection of fragments. To evaluate the usefulness for analyzing the fragment composition of a large library, DAIM is applied to a collection of about 1.85 million commercially available compounds. Interestingly, it is found that the set of most frequent cyclic and acyclic fragments originating from the decomposition of the 1.85 million molecules shows a large overlap with the most frequent fragments in a library of 5120 known drugs. DAIM has been successfully used in the in silico screening for inhibitors of beta-secretase and EphB4 kinase by fragment-based high-throughput docking. Possible future applications for de novo ligand design are briefly discussed.

• Critical assessment of the automated AutoDock as a new docking tool for virtual screening.
Park, Hwangseo and Lee, Jinuk and Lee, Sangyoub
Proteins, 2006, 65(3), 549-554
PMID: 16988956     doi: 10.1002/prot.21183

A major problem in virtual screening concerns the accuracy of the binding free energy between a target protein and a putative ligand. Here we report an example supporting the outperformance of the AutoDock scoring function in virtual screening in comparison to the other popular docking programs. The original AutoDock program is in itself inefficient to be used in virtual screening because the grids of interaction energy have to be calculated for each putative ligand in chemical database. However, the automation of the AutoDock program with the potential grids defined in common for all putative ligands leads to more than twofold increase in the speed of virtual database screening. The utility of the automated AutoDock in virtual screening is further demonstrated by identifying the actual inhibitors of various target enzymes in chemical databases with accuracy higher than the other docking tools including DOCK and FlexX. These results exemplify the usefulness of the automated AutoDock as a new promising tool in structure-based virtual screening.

• eHITS: An innovative approach to the docking and scoring function problems
Zsoldos, Zsolt and Reid, Darryl and Simon, Aniko and Sadjad, Bashir S and Johnson, A. Peter
Current Protein & Peptide Science, 2006, 7(5), 421-435
PMID: 17073694

Virtual Ligand Screening (VLS) has become an integral part of the drug design process for many pharmaceutical companies. In protein structure based VLS the aim is to find a ligand that has a high binding affinity to the target receptor whose 3D structure is known. This review will describe the docking tool eHiTS. eHiTS is an exhaustive and systematic docking tool which contains many automated features that simplify the drug design workflow. A description of the unique docking algorithm and novel approach to scoring used within eHiTS is presented. In addition a validation study is presented that demonstrates the accuracy and wide applicability of eHiTs in re-docking bound ligands into their receptors.

• MolDock: A new technique for high-accuracy molecular docking
Thomsen, R and Christensen, MH
Journal of medicinal chemistry, 2006, 49(11), 3315-3321
PMID: 16722650     doi: 10.1021/jm051197e

In this article we introduce a molecular docking algorithm called MolDock. MolDock is based on a new heuristic search algorithm that combines differential evolution with a cavity prediction algorithm. The docking scoring function of MolDock is an extension of the piecewise linear potential (PLP) including new hydrogen bonding and electrostatic terms. To further improve docking accuracy, a re-ranking scoring function is introduced, which identifies the most promising docking solution from the solutions obtained by the docking algorithm. The docking accuracy of MolDock has been evaluated by docking flexible ligands to 77 protein targets. MolDock was able to identify the correct binding mode of 87% of the complexes. In comparison, the accuracy of Glide and Surflex is 82% and 75%, respectively. FlexX obtained 58% and GOLD 78% on subsets containing 76 and 55 cases, respectively.

• Fully automated flexible docking of ligands into flexible synthetic receptors using forward and inverse docking strategies.
Kämper, Andreas and Apostolakis, Joannis and Rarey, Matthias and Marian, Christel M and Lengauer, Thomas
Journal of chemical information and modeling, 2006, 46(2), 903-911
PMID: 16563022     doi: 10.1021/ci050467z

The prediction of the structure of host-guest complexes is one of the most challenging problems in supramolecular chemistry. Usual procedures for docking of ligands into receptors do not take full conformational freedom of the host molecule into account. We describe and apply a new docking approach which performs a conformational sampling of the host and then sequentially docks the ligand into all receptor conformers using the incremental construction technique of the FlexX software platform. The applicability of this approach is validated on a set of host-guest complexes with known crystal structure. Moreover, we demonstrate that due to the interchangeability of the roles of host and guest, the docking process can be inverted. In this inverse docking mode, the receptor molecule is docked around its ligand. For all investigated test cases, the predicted structures are in good agreement with the experiment for both normal (forward) and inverse docking. Since the ligand is often smaller than the receptor and, thus, its conformational space is more restricted, the inverse docking approach leads in most cases to considerable speed-up. By having the choice between two alternative docking directions, the application range of the method is significantly extended. Finally, an important result of this study is the suitability of the simple energy function used here for structure prediction of complexes in organic media.

• Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes.
Friesner, Richard A and Murphy, Robert B and Repasky, Matthew P and Frye, Leah L and Greenwood, Jeremy R and Halgren, Thomas A and Sanschagrin, Paul C and Mainz, Daniel T
Journal of medicinal chemistry, 2006, 49(21), 6177-6196
PMID: 17034125     doi: 10.1021/jm051256o

A novel scoring function to estimate protein-ligand binding affinities has been developed and implemented as the Glide 4.0 XP scoring function and docking protocol. In addition to unique water desolvation energy terms, protein-ligand structural motifs leading to enhanced binding affinity are included: (1) hydrophobic enclosure where groups of lipophilic ligand atoms are enclosed on opposite faces by lipophilic protein atoms, (2) neutral-neutral single or correlated hydrogen bonds in a hydrophobically enclosed environment, and (3) five categories of charged-charged hydrogen bonds. The XP scoring function and docking protocol have been developed to reproduce experimental binding affinities for a set of 198 complexes (RMSDs of 2.26 and 1.73 kcal/mol over all and well-docked ligands, respectively) and to yield quality enrichments for a set of fifteen screens of pharmaceutical importance. Enrichment results demonstrate the importance of the novel XP molecular recognition and water scoring in separating active and inactive ligands and avoiding false positives.

• Development and validation of a modular, extensible docking program: DOCK 5.
Moustakas, Demetri T and Lang, P Therese and Pegg, Scott and Pettersen, Eric and Kuntz, Irwin D and Brooijmans, Natasja and Rizzo, Robert C
Journal of computer-aided molecular design, 2006, 20(10-11), 601-619
PMID: 17149653     doi: 10.1007/s10822-006-9060-4

We report on the development and validation of a new version of DOCK. The algorithm has been rewritten in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools. We validated the sampling algorithm with a test set of 114 protein-ligand complexes. Using an optimized parameter set, we are able to reproduce the crystal ligand pose to within 2 A of the crystal structure for 79% of the test cases using our rigid ligand docking algorithm with an average run time of 1 min per complex and for 72% of the test cases using our flexible ligand docking algorithm with an average run time of 5 min per complex. Finally, we perform an analysis of the docking failures in the test set and determine that the sampling algorithm is generally sufficient for the binding pose prediction problem for up to 7 rotatable bonds; i.e. 99% of the rigid ligand docking cases and 95% of the flexible ligand docking cases are sampled successfully. We point out that success rates could be improved through more advanced modeling of the receptor prior to docking and through improvement of the force field parameters, particularly for structures containing metal-based cofactors.

• sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank.
Kellenberger, Esther and Muller, Pascal and Schalon, Claire and Bret, Guillaume and Foata, Nicolas and Rognan, Didier
Journal of chemical information and modeling, 2006, 46(2), 717-727
PMID: 16563002     doi: 10.1021/ci050372x

The sc-PDB is a collection of 6 415 three-dimensional structures of binding sites found in the Protein Data Bank (PDB). Binding sites were extracted from all high-resolution crystal structures in which a complex between a protein cavity and a small-molecular-weight ligand could be identified. Importantly, ligands are considered from a pharmacological and not a structural point of view. Therefore, solvents, detergents, and most metal ions are not stored in the sc-PDB. Ligands are classified into four main categories: nucleotides (< 4-mer), peptides (< 9-mer), cofactors, and organic compounds. The corresponding binding site is formed by all protein residues (including amino acids, cofactors, and important metal ions) with at least one atom within 6.5 angstroms of any ligand atom. The database was carefully annotated by browsing several protein databases (PDB, UniProt, and GO) and storing, for every sc-PDB entry, the following features: protein name, function, source, domain and mutations, ligand name, and structure. The repository of ligands has also been archived by diversity analysis of molecular scaffolds, and several chemoinformatics descriptors were computed to better understand the chemical space covered by stored ligands. The sc-PDB may be used for several purposes: (i) screening a collection of binding sites for predicting the most likely target(s) of any ligand, (ii) analyzing the molecular similarity between different cavities, and (iii) deriving rules that describe the relationship between ligand pharmacophoric points and active-site properties. The database is periodically updated and accessible on the web at http://bioinfo-pharma.u-strasbg.fr/scPDB/.

## 2005

• In silico prediction of harmful effects triggered by drugs and chemicals.
Vedani, Angelo and Dobler, Max and Lill, Markus A
Toxicology and applied pharmacology, 2005, 207(2 Suppl), 398-407
PMID: 16045954     doi: 10.1016/j.taap.2005.01.055

While the computer-assisted discovery and optimization of drug candidates based on the known three-dimensional structure of the macromolecular target (structure-based design) or a binding-site surrogate (receptor modeling) is doubtless one of the more potent approaches in rational drug design, the simulation and quantification of side effects triggered by drugs and chemicals are still in their infancy. Major obstacles include the often not available 3D structure of the molecular target, the low specificity of the involved bioregulators and the identification of the controlling metabolic pathways. In the recent past, our laboratory has explored concepts allowing to simulate receptor-mediated toxic phenomena by developing algorithms, allowing to construct realistic 3D binding-site surrogates of receptors known or assumed triggering adverse effects and validating them against large batches of molecular data. The underlying technology (software Quasar and Raptor, respectively) specifically allows for induced fit, solvation phenomena and entropic effects. It has been applied to various systems both of pharmacological and toxicological interest including the neurokinin-1, chemokine-3, bradykinin B(2), steroid, 5 HT(2A), aryl hydrocarbon, estrogen and androgen receptor, respectively. In this account, we describe the design of a virtual laboratory allowing for a reliable estimation of harmful effects triggered by drugs, chemicals and their metabolites in silico. In the recent past, the Biographics Laboratory 3R has compiled a 3D database including the surrogates of three major receptor systems known to mediate adverse effects (the aryl hydrocarbon, the estrogen and the androgen receptor, respectively) and validated them against a total of 345 compounds (drugs, chemicals, toxins) using multidimensional QSAR technologies. Within this pilot project, we could demonstrate that our virtual laboratory is able to both recognize toxic compounds substantially different from those used in the training set as well as to classify harmless compounds as being nontoxic. This suggests that our approach may be used for the prediction of adverse effects of drug molecules and chemicals. It is the aim to provide cost-covering access to this technology-particularly to universities, hospitals and regulatory bodies-as it bears a significant potential to recognize hazardous compounds early in the development process and hence improve resource and waste management as well as reduce animal testing. The Biographics Laboratory 3R is a non-profit-oriented organization aimed at reducing animal experimentation in the biomedical sciences by computational approaches (cf. http://www.biograf.ch).

• Validation and use of the MM-PBSA approach for drug discovery.
Kuhn, Bernd and Gerber, Paul and Schulz-Gasch, Tanja and Stahl, Martin
Journal of medicinal chemistry, 2005, 48(12), 4040-4048
PMID: 15943477     doi: 10.1021/jm049081q

The MM-PBSA approach has become a popular method for calculating binding affinities of biomolecular complexes. Published application examples focus on small test sets and few proteins and, hence, are of limited relevance in assessing the general validity of this method. To further characterize MM-PBSA, we report on a more extensive study involving a large number of ligands and eight different proteins. Our results show that applying the MM-PBSA energy function to a single, relaxed complex structure is an adequate and sometimes more accurate approach than the standard free energy averaging over molecular dynamics snapshots. The use of MM-PBSA on a single structure is shown to be valuable (a) as a postdocking filter in further enriching virtual screening results, (b) as a helpful tool to prioritize de novo design solutions, and (c) for distinguishing between good and weak binders (DeltapIC(50) > or

• DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction.
Velec, Hans F G and Gohlke, Holger and Klebe, Gerhard
Journal of medicinal chemistry, 2005, 48(20), 6296-6303
PMID: 16190756     doi: 10.1021/jm050436v

Following the formalism used for the development of the knowledge-based scoring function DrugScore, new distance-dependent pair potentials are obtained from nonbonded interactions in small organic molecule crystal packings. Compared to potentials derived from protein-ligand complexes, the better resolved small molecule structures provide relevant contact data in a more balanced distribution of atom types and produce potentials of superior statistical significance and more detailed shape. Applied to recognizing binding geometries of ligands docked into proteins, this new scoring function (DrugScore(CSD)) ranks the crystal structures of 100 protein-ligand complexes best among up to 100 generated decoy geometries in 77% of all cases. Accepting root-mean-square deviations (rmsd) of up to 2 angstroms from the native pose as well-docked solutions, a correct binding mode is found in 87% of the cases. This translates into an improvement of the new scoring function of 57% with respect to the retrieval of the crystal structure and 20% with respect to the identification of a well-docked ligand pose compared to the original Protein Data Bank-based DrugScore. In the analysis of decoy geometries of cross-docking studies, DrugScore(CSD) shows equivalent or increased performance compared to the original PDB-based DrugScore. Furthermore, DrugScore(CSD) predicts binding affinities convincingly. Reducing the set of docking solutions to examples that deviate increasingly from the native pose results in a loss of performance of DrugScore(CSD). This indicates that a necessary prerequisite to successfully resolving the scoring problem with a more discriminative scoring function is the generation of highly accurate ligand poses, which approximate the native pose to below 1 angstroms rmsd, in a docking run.

• Yucca: an efficient algorithm for small-molecule docking.
Choi, Vicky
Chemistry & biodiversity, 2005, 2(11), 1517-1524
PMID: 17191951     doi: 10.1002/cbdv.200590123

In this paper, we present a new algorithm, which is based on an efficient heuristic for local search, for rigid protein-small-molecule docking. We tested our algorithm, called Yucca, on the recent 100-complex benchmark, using the conformer generator OMEGA to generate a set of low-energy conformers. The results showed that Yucca is competitive both in terms of algorithm efficiency and docking accuracy.

• Receptor flexibility in de novo ligand design and docking.
Alberts, Ian L and Todorov, Nikolay P and Dean, Philip M
Journal of medicinal chemistry, 2005, 48(21), 6585-6596
PMID: 16220975     doi: 10.1021/jm050196j

One of the major problems in computational drug design is incorporation of the intrinsic flexibility of protein binding sites. This is particularly crucial in ligand binding events, when induced fit can lead to protein structure rearrangements. As a consequence of the huge conformational space available to protein structures, receptor flexibility is rarely considered in ligand design procedures. In this work, we present an algorithm for integrating protein binding-site flexibility into de novo ligand design and docking processes. The approach allows dynamic rearrangement of amino acid side chains during the docking and design simulations. The impact of protein conformational flexibility is investigated in the docking of highly active inhibitors in the binding sites of acetylcholinesterase and human collagenase (matrix metalloproteinase-1) and in the design of ligands in the S1' pocket of MMP-1. The results of corresponding simulations for both rigid and flexible binding sites are compared in order to gauge the influence of receptor flexibility in drug discovery protocols.

• Side-chain flexibility in protein-ligand binding: the minimal rotation hypothesis.
Zavodszky, Maria I and Kuhn, Leslie A
Protein science : a publication of the Protein Society, 2005, 14(4), 1104-1114
PMID: 15772311     doi: 10.1110/ps.041153605

The goal of this work is to learn from nature about the magnitudes of side-chain motions that occur when proteins bind small organic molecules, and model these motions to improve the prediction of protein-ligand complexes. Following analysis of protein side-chain motions upon ligand binding in 63 complexes, we tested the ability of the docking tool SLIDE to model these motions without being restricted to rotameric transitions or deciding which side chains should be considered as flexible. The model tested is that side-chain conformational changes involving more atoms or larger rotations are likely to be more costly and less prevalent than small motions due to energy barriers between rotamers and the potential of large motions to cause new steric clashes. Accordingly, SLIDE adjusts the protein and ligand side groups as little as necessary to achieve steric complementarity. We tested the hypothesis that small motions are sufficient to achieve good dockings using 63 ligands and the apo structures of 20 different proteins and compared SLIDE side-chain rotations to those experimentally observed. None of these proteins undergoes major main-chain conformational change upon ligand binding, ensuring that side-chain flexibility modeling is not required to compensate for main-chain motions. Although more frugal in the number of side-chain rotations performed, this model substantially mimics the experimentally observed motions. Most side chains do not shift to a new rotamer, and small motions are both necessary and sufficient to predict the correct binding orientation and most protein-ligand interactions for the 20 proteins analyzed.

• ProPose: steered virtual screening by simultaneous protein-ligand docking and ligand-ligand alignment.
Seifert, Markus H J
Journal of chemical information and modeling, 2005, 45(2), 449-460
PMID: 15807511     doi: 10.1021/ci0496393

The 'model-free' screening engine ProPose implements a general method for performing simultaneous protein-ligand docking, ligand-ligand alignment, pharmacophore queries-and combinations thereof-in order to incorporate a priori information into screening protocols. In this manuscript we describe a case study on herpes simplex virus thymidine kinase, an important antiviral drug target, where we evaluate different approaches for handling a specific type of a priori information, i.e., multiple target structures. We demonstrate that a simultaneous alignment on two target structures-in conjunction with logic operations on interactions and docking constraints derived from protein structure-is an effective means of (i) improving the enrichment of chemical substructures that are compatible with the a priori known ligands, (ii) ensuring the steric fit into the target protein, and (iii) handling target flexibility. The combination of ligand- and receptor-based methods steers the virtual screening by ranking molecules according to the similarity of their interaction pattern with known ligands, thereby-to some extent-outweighing the deficiencies of simple scoring functions often used in initial virtual screening.

• MEDock: a web server for efficient prediction of ligand binding sites based on a novel optimization algorithm
Chang, DTH and Oyang, YJ and Lin, JH
Nucleic acids research, 2005, 33(Web Server issue), W233-W238
PMID: 15991337     doi: 10.1093/nar/gki586

The prediction of ligand binding sites is an essential part of the drug discovery process. Knowing the location of binding sites greatly facilitates the search for hits, the lead optimization process, the design of site-directed mutagenesis experiments and the hunt for structural features that influence the selectivity of binding in order to minimize the drug's adverse effects. However, docking is still the rate-limiting step for such predictions; consequently, much more efficient algorithms are required. In this article, the design of the MEDock web server is described. The goal of this sever is to provide an efficient utility for predicting ligand binding sites. The MEDock web server incorporates a global search strategy that exploits the maximum entropy property of the Gaussian probability distribution in the context of information theory. As a result of the global search strategy, the optimization algorithm incorporated in MEDock is significantly superior when dealing with very rugged energy landscapes, which usually have insurmountable barriers. This article describes four different benchmark cases that span a diverse set of different types of ligand binding interactions. These benchmarks were compared with the use of the Lamarckian genetic algorithm (LGA), which is the major workhorse of the well-known AutoDock program. These results demonstrate that MEDock consistently converged to the correct binding-modes with significantly smaller numbers of energy evaluations than the LGA required. When judged by a threshold of the number of energy evaluations consumed in the docking simulation, MEDock also greatly elevates the rate of accurate predictions for all benchmark cases. MEDock is available at http://medock.csie.ntu.edu.tw/ and http://bioinfo.mc.ntu.edu.tw/medock/.

• PatchDock and SymmDock: servers for rigid and symmetric docking.
Schneidman-Duhovny, Dina and Inbar, Yuval and Nussinov, Ruth and Wolfson, Haim J
Nucleic acids research, 2005, 33(Web Server issue), W363-7
PMID: 15980490     doi: 10.1093/nar/gki481

Here, we describe two freely available web servers for molecular docking. The PatchDock method performs structure prediction of protein-protein and protein-small molecule complexes. The SymmDock method predicts the structure of a homomultimer with cyclic symmetry given the structure of the monomeric unit. The inputs to the servers are either protein PDB codes or uploaded protein structures. The services are available at http://bioinfo3d.cs.tau.ac.il. The methods behind the servers are very efficient, allowing large-scale docking experiments.

• Improved FlexX docking using FlexS-determined base fragment placement.
Cross, Simon S J
Journal of chemical information and modeling, 2005, 45(4), 993-1001
PMID: 16045293     doi: 10.1021/ci050026f

We report on a novel hybrid FlexX/FlexS docking approach, whereby the base fragment of the test ligand is chosen by FlexS superposition onto a cocrystallized template ligand and then fed into FlexX for the incremental construction of the final solution. The new approach is tested on the diverse 200 protein-ligand complex dataset that has been previously described for FlexX validation. In total, 62.9% of the complexes can be reproduced at rank 1 by our approach, which compares favorably with 46.9% when using FlexX alone. In addition, we report "cross-docking" experiments in which several receptor structures of complexes with identical proteins have been used for docking all cocrystallized ligands of these complexes. The results show that, in almost all cases, the hybrid approach can acceptably dock a ligand into a foreign receptor structure using a different ligand template, can give solutions where FlexX alone fails, and tends to give solutions that are more accurately positioned.

• Binding mode prediction of cytochrome p450 and thymidine kinase protein-ligand complexes by consideration of water and rescoring in automated docking.
de Graaf, Chris and Pospisil, Pavel and Pos, Wouter and Folkers, Gerd and Vermeulen, Nico P E
Journal of medicinal chemistry, 2005, 48(7), 2308-2318
PMID: 15801824     doi: 10.1021/jm049650u

The popular docking programs AutoDock, FlexX, and GOLD were used to predict binding modes of ligands in crystallographic complexes including X-ray water molecules or computationally predicted water molecules. Isoenzymes of two different enzyme systems were used, namely cytochromes P450 (n

• Modeling water molecules in protein-ligand docking using GOLD.
Verdonk, Marcel L and Chessari, Gianni and Cole, Jason C and Hartshorn, Michael J and Murray, Christopher W and Nissink, J Willem M and Taylor, Richard D and Taylor, Robin
Journal of medicinal chemistry, 2005, 48(20), 6504-6515
PMID: 16190776     doi: 10.1021/jm050543p

We implemented a novel approach to score water mediation and displacement in the protein-ligand docking program GOLD. The method allows water molecules to switch on and off and to rotate around their three principal axes. A constant penalty, sigma(p), representing the loss of rigid-body entropy, is added for water molecules that are switched on, hence rewarding water displacement. We tested the methodology in an extensive validation study. First, sigma(p) is optimized against a training set of 58 protein-ligand complexes. For this training set, our algorithm correctly predicts water mediation/displacement in approximately 92% of the cases. We observed small improvements in the quality of the predicted binding modes for water-mediated complexes. In the second part of this work, an entirely independent set of 225 complexes is used. For this test set, our algorithm correctly predicts water mediation/displacement in approximately 93% of the cases. Improvements in binding mode quality were observed for individual water-mediated complexes.

## 2004

• Native atom types for knowledge-based potentials: application to binding energy prediction.
Dominy, Brian N and Shakhnovich, Eugene I
Journal of medicinal chemistry, 2004, 47(18), 4538-4558
PMID: 15317465     doi: 10.1021/jm0498046

Knowledge-based potentials have been found useful in a variety of biophysical studies of macromolecules. Recently, it has also been shown in self-consistent studies that it is possible to extract quantities consistent with pair potentials from model structural databases. In this study, we attempt to extend the results obtained from these self-consistent studies toward the extraction of realistic pair potentials from the Protein Data Bank (PDB). The new method utilizes a clustering approach to define atom types within the PDB consistent with the optimal effective pairwise potential. The method has been integrated into the SMoG drug design package, resulting in an improved approach for the rapid and accurate estimation of binding affinities from structural information. Using this approach, it is possible to generate simple knowledge-based potentials that correlate (R

• OptiDock: virtual HTS of combinatorial libraries by efficient sampling of binding modes in product space.
Sprous, Dennis G and Lowis, David R and Leonard, Joseph M and Heritage, Trevor and Burkett, Steven N and Baker, David S and Clark, Robert D
Journal of combinatorial chemistry, 2004, 6(4), 530-539
PMID: 15244414     doi: 10.1021/cc034068x

Products from combinatorial libraries generally share a common core structure that can be exploited to improve the efficiency of virtual high-throughput screening (vHTS). In general, it is more efficient to find a method that scales with the total number of reagents (Sigma growth) rather with the number of products (Pi growth). The OptiDock methodology described herein entails selecting a diverse but representative subset of compounds that span the structural space encompassed by the full library. These compounds are docked individually using the FlexX program (Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. J. Mol. Biol. 1995, 251, 470-489) to define distinct docking modes in terms of reference placements for combinatorial core atoms. Thereafter, substituents in R-cores (consisting of the core structure substituted at a single variation site) are docked, keeping the core atoms fixed at the coordinates dictated by each reference placement. Interaction energies are calculated for each docked R-core with respect to the target protein, and energies for whole compounds are calculated by finding the reference core placement for which the sum of corresponding R-core energies is most negative. The use of diverse whole compounds to define binding modes is a key advantage of the protocol over other combinatorial docking programs. As a result, OptiDock returns better-scoring conformers than does serially applied FlexX. OptiDock is also better able to find a viable docked pose for each library member than are other combinatorial approaches.

• SDOCKER: a method utilizing existing X-ray structures to improve docking accuracy.
Wu, Guosheng and Vieth, Michal
Journal of medicinal chemistry, 2004, 47(12), 3142-3148
PMID: 15163194     doi: 10.1021/jm040015y

This paper introduces a new strategy for structure-based drug design that combines high-quality docking with data from existing ligand-protein cocrystal X-ray structures. The main goal of SDOCKER, a new algorithm that implements this strategy, is docking accuracy improvement. In this new paradigm, simulated annealing molecular dynamics is used for conformational sampling and optimization and an additional similarity force is applied on the basis of the positions of ligands from X-ray data that focus the sampling on relevant regions of the active site. Because the structural information from both the ligand and protein active site is included, this approach is more effective in finding the optimal conformation for a ligand-protein complex than the classical docking or similarity overlays. Interestingly, it was found that a 3D similarity-only approach gives comparable docking accuracy to the regular force field approach used in classical docking, given the final structures are minimized in the presence of the protein. The combination of both, as implemented in SDOCKER, is shown here to be more accurate. A significant improvement in docking accuracy has been observed for three different test systems. Specifically an improvement of 10%, 17.5%, and 10% is seen for 37 HIV-1 protease, 32 thrombin, and 23 CDK2 ligands, respectively, compared to docking using the force field alone. In addition, SDOCKER's accuracy performance dependence on the similarity template is discussed. The strategy of utilizing existing ligand X-ray information should prove effective in light of the multitude of structures available from structural genomics approaches.

• Validation of an empirical RNA-ligand scoring function for fast flexible docking using Ribodock.
Morley, S David and Afshar, Mohammad
Journal of computer-aided molecular design, 2004, 18(3), 189-208
PMID: 15368919

We report the design and validation of a fast empirical function for scoring RNA-ligand interactions, and describe its implementation within RiboDock, a virtual screening system for automated flexible docking. Building on well-known protein-ligand scoring function foundations, features were added to describe the interactions of common RNA-binding functional groups that were not handled adequately by conventional terms, to disfavour non-complementary polar contacts, and to control non-specific charged interactions. The results of validation experiments against known structures of RNA-ligand complexes compare favourably with previously reported methods. Binding modes were well predicted in most cases and good discrimination was achieved between native and non-native ligands for each binding site, and between native and non-native binding sites for each ligand. Further evidence of the ability of the method to identify true RNA binders is provided by compound selection ('enrichment factor') experiments based around a series of HIV-1 TAR RNA-binding ligands. Significant enrichment in true binders was achieved amongst high scoring docking hits, even when selection was from a library of structurally related, positively charged molecules. Coupled with a semi-automated cavity detection algorithm for identification of putative ligand binding sites, also described here, the method is suitable for the screening of very large databases of molecules against RNA and RNA-protein interfaces, such as those presented by the bacterial ribosome.

• Ph4Dock: Pharmacophore-based protein-ligand docking
Goto, J and Kataoka, R and Hirayama, N
Journal of medicinal chemistry, 2004, 47(27), 6804-6811
PMID: 15615529     doi: 10.1021/jm0493818

The development and validation of the program Ph4Dock is presented. Ph4Dock is a novel automated ligand docking program that makes best use of pharmacophoric features both in a ligand and at concave portions of a protein. By mapping of pharmacophores of the ligand to the pharmacophoric features that represent the concaves of the target protein, Ph4Dock realizes an efficient and accurate prediction of the binding modes between the ligand and the protein. To validate the potential of this unique docking algorithm, we have selected 43 reliable crystal structures of protein-ligand complexes. All of the ligands are druglike, and they are varied in nature. The diffraction-component precision index (DPI) originally used in crystallography was applied in this study in order to evaluate the docking results quantitatively. The root-mean-square deviation (rmsd) between non-hydrogen atoms of the ligand in the prediction and experimental results were analyzed using DPI. The rmsd values for 25 structures, consisting of almost 60% of the dataset, are less than three times of the corresponding DPI values. It means that the precision of docking results obtained by Ph4Dock is mostly equivalent to the experimental error in these cases. The present study has demonstrated that Ph4Dock can accurately reproduce the experimentally determined docking modes if the reliable crystal structures are used. Normally the success rate of the docking is judged using rmsd less than or equal to 2.0 Angstrom as the criterion. The Ph4Dock marked an appreciably good success rate of 86% based on this criterion.

• HierVLS hierarchical docking protocol for virtual ligand screening of large-molecule databases
Floriano, WB and Vaidehi, N and Zamanakos, G and Goddard, WA
Journal of medicinal chemistry, 2004, 47(1), 56-71
PMID: 14695820     doi: 10.1021/jm030271v

To provide practical means for rapidly scanning the extensive experimental combinatorial chemistry libraries now available for high-throughput screening (HTS), it is essential to establish computational virtual ligand screening (VLS) techniques to rapidly identify out of a large library all active compounds against a particular protein target. Toward this goal we developed HierVLS, a fast hierarchical docking approach that starts with a coarse grain conformational search over a large number of configurations filtered with a fast but crude energy function, followed by a succession of finer grain levels, using successively more accurate but more expensive descriptions of the ligand-protein-solvent interactions to filter successively fewer cases. The final step of this procedure optimizes one configuration of the ligand in the protein site using our most accurate energy expression and description of the solvent, which would be impractical for all conformations and sites sampled in the coarse level. HierVLS is based on the HierDock approach, but rather than allowing an hour or more to determine the best binding site and energy for each ligands (as in HierDock), we have adapted our procedure so that it can lead to reliable results while using only 4 min (866 MHz Pentium III processor) per ligand. To validate the accuracy for HierVLS to predict the experimentally observed binding conformation, we considered 37 cocrystal structures comprising 11 target proteins. We find that HierVLS identifies the correct binding mode for all 37 cocrystals. In addition, the calculated binding energies correlate well with available experimental binding constants. To validate how well HierVLS can identify the correct ligand in an extensive library of decoys, we considered a library of over 10 000 molecules. HierVLS identifies 26 out of the 37 cases in the top 2% ranked by binding affinity among the 10 037 molecules. The failures result from either metal-containing sites on the protein or water-mediated ligand-protein interactions, which we anticipate can be solved within the constraints of practical VLS. We then applied HierVLS to screen a 55000-compound virtual library against the target protein-tyrosine phosphatase 1B (ptp1b). The top 250 compounds by binding affinity included all six ptp1b cocrystal ligands added to the library plus three other experimentally confirmed binders. The best (top 1) binder is an experimentally confirmed positive. We conclude that HierVLS is useful for selecting leads for a particular target out of large combinatorial databases.

• GAsDock: a new approach for rapid flexible docking based on an improved multi-population genetic algorithm
Li, HL and Li, CL and Gui, CS and Luo, XM and Chen, KX and Shen, JH and Wang, XC and Jiang, HL
Bioorganic & Medicinal Chemistry Letters, 2004, 14(18), 4671-4676
PMID: 15324886     doi: 10.1016/j.bmcl.2004.06.091

Based on an improved multi-population genetic algorithm, a new fast flexible docking program, GAsDock, was developed. The docking accuracy, screening efficiency, and docking speed of GAsDock were evaluated by the docking results of thymidine kinase (TK) and HIV-1 reverse transcriptase (RT) enzyme with 10 available inhibitors of each protein and 990 randomly selected ligands. Nine of the ten known inhibitors of TK were accurately docked into the protein active site, the root-mean-square deviation (RMSD) values between the docking and X-ray crystal structures are less than 1.7Angstrom; binding poses (conformation and orientation) of 9 of the 10 known inhibitors of RT were reproduced by GAsDock with RMSD values less than 2.0Angstrom. The docking time is approximately in proportion to the number of rotatable bonds of ligands; GAsDock can finish a docking simulation within 60s for a ligand with no more than 20 rotatable bonds. Results indicate that GAsDock is an accurate and remarkably faster docking program in comparison with other docking programs, which is applausive in the application of virtual screening. (C) 2004 Elsevier Ltd. All rights reserved.

• Calculation of ligand-nucleic acid binding free energies with the generalized-born model in DOCK.
Kang, Xinshan and Shafer, Richard H and Kuntz, Irwin D
Biopolymers, 2004, 73(2), 192-204
PMID: 14755577     doi: 10.1002/bip.10541

The calculation of ligand-nucleic acid binding free energies is investigated by including solvation effects computed with the generalized-Born model. Modifications of the solvation module in DOCK, including introduction of all-atom parameters and revision of coefficients in front of different terms, are shown to improve calculations involving nucleic acids. This computing scheme is capable of calculating binding energies, with reasonable accuracy, for a wide variety of DNA-ligand complexes, RNA-ligand complexes, and even for the formation of double-stranded DNA. This implementation of GB/SA is also shown to be capable of discriminating strong ligands from poor ligands for a series of RNA aptamers without sacrificing the high efficiency of the previous implementation. These results validate this approach to screening large databases against nucleic acid targets.

• Rapid protein-ligand docking using soft modes from molecular dynamics simulations to account for protein deformability: binding of FK506 to FKBP.
Zacharias, Martin
Proteins, 2004, 54(4), 759-767
PMID: 14997571     doi: 10.1002/prot.10637

Most current docking methods to identify possible ligands and putative binding sites on a receptor molecule assume a rigid receptor structure to allow virtual screening of large ligand databases. However, binding of a ligand can lead to changes in the receptor protein conformation that are sterically necessary to accommodate a bound ligand. An approach is presented that allows relaxation of the protein conformation in precalculated soft flexible degrees of freedom during ligand-receptor docking. For the immunosuppressant FK506-binding protein FKBP, the soft flexible modes are extracted as principal components of motion from a molecular dynamics simulation. A simple penalty function for deformations in the soft flexible mode is used to limit receptor protein deformations during docking that avoids a costly recalculation of the receptor energy by summing over all receptor atom pairs at each step. Rigid docking of the FK506 ligand binding to an unbound FKBP conformation failed to identify a geometry close to experiment as favorable binding site. In contrast, inclusion of the flexible soft modes during systematic docking runs selected a binding geometry close to experiment as lowest energy conformation. This has been achieved at a modest increase of computational cost compared to rigid docking. The approach could provide a computationally efficient way to approximately account for receptor flexibility during docking of large numbers of putative ligands and putative docking geometries.

• Assessment of docking poses: interactions-based accuracy classification (IBAC) versus crystal structure deviations.
Kroemer, Romano T and Vulpetti, Anna and McDonald, Joseph J and Rohrer, Douglas C and Trosset, Jean-Yves and Giordanetto, Fabrizio and Cotesta, Simona and McMartin, Colin and Kihlén, Mats and Stouten, Pieter F W
Journal of Chemical Information and Computer Sciences, 2004, 44(3), 871-881
PMID: 15154752     doi: 10.1021/ci049970m

Six docking programs (FlexX, GOLD, ICM, LigandFit, the Northwestern University version of DOCK, and QXP) were evaluated in terms of their ability to reproduce experimentally observed binding modes (poses) of small-molecule ligands to macromolecular targets. The accuracy of a pose was assessed in two ways: First, the RMS deviation of the predicted pose from the crystal structure was calculated. Second, the predicted pose was compared to the experimentally observed one regarding the presence of key interactions with the protein. The latter assessment is referred to as interactions-based accuracy classification (IBAC). In a number of cases significant discrepancies were found between IBAC and RMSD-based classifications. Despite being more subjective, the IBAC proved to be a more meaningful measure of docking accuracy in all these cases.

• Multiple active site corrections for docking and virtual screening.
Vigers, Guy P A and Rizzi, James P
Journal of medicinal chemistry, 2004, 47(1), 80-89
PMID: 14695822     doi: 10.1021/jm030161o

Several docking programs are now available that can reproduce the bound conformation of a ligand in an active site, for a wide variety of experimentally determined complexes. However, these programs generally perform less well at ranking multiple possible ligands in one site. Since accurate identification of potential ligands is a prerequisite for many aspects of structure-based drug design, this is a serious limitation. We have tested the ability of two docking programs, FlexX and Gold, to match ligands and active sites for multiple complexes. We show that none of the docking scores from either program are able to match consistently ligands and active sites in our tests. We propose a simple statistical correction, the multiple active site correction (MASC), which greatly ameliorates this problem. We have also tested the correction method against an extended set of 63 cocrystals and in a virtual screening experiment. In all cases, MASC significantly improves the results of the docking experiments.

• Automated docking of highly flexible ligands by genetic algorithms: a critical assessment.
Cecchini, Marco and Kolb, Peter and Majeux, Nicolas and Caflisch, Amedeo
Journal of computational chemistry, 2004, 25(3), 412-422
PMID: 14696075     doi: 10.1002/jcc.10384

An improved version of the fragment-based flexible ligand docking approach SEED-FFLD is tested on inhibitors of human immunodeficiency virus type 1 protease, human alpha-thrombin and the estrogen receptor beta. The docking results indicate that it is possible to correctly reproduce the binding mode of inhibitors with more than ten rotatable bonds if the strain in their covalent geometry upon binding is not large. A high degree of convergence towards a unique binding mode in multiple runs of the genetic algorithm is proposed as a necessary condition for successful docking.

• FlexX-Scan: Fast, structure-based virtual screening
Schellhammer, I and Rarey, M
Proteins, 2004, 57(3), 504-517
PMID: 15382244     doi: 10.1002/prot.20217

We present a new software module, FlexX-Scan, for high-throughput, structure-based virtual screening. FlexX-Scan was developed with the aim to further speed up the virtual screening process. Based on the incremental construction docking tool FlexX (Rarey et al., J Mol Biol 1996;261: 470-489), a compact descriptor for representing favorable protein interaction spots within the protein binding site has been developed. The descriptor is calculated using special-purpose clustering techniques applied to the usual interaction points created by FlexX. The algorithm automatically detects a small set of interaction spots in the binding site for positioning ligand functional groups. The parametrizations of the base placement and incremental construction algorithms have been adapted to the new interaction model. We tested the software tool on a diverse set of 200 protein-ligand complexes from the protein database (PDB) (Kramer et al., Proteins 1999;37:228-241). On average, the algorithm proposes about 90 interaction spots per binding site compared to about 1000 interaction dots in FlexX. We observe that the docking solutions of FlexX-Scan have a root-mean-square deviation from the crystal structure similar to the deviation of docking solutions of standard FlexX. For further validation we also performed virtual screening experiments for cyclin-dependent kinase 2, thrombin, angiotensin-converting enzyme, and dihydrofolat reductase. In these experiments, we screened a set of 34,000 random compounds and a number of known actives for each target. With FlexX-Scan, we achieved comparable enrichments to standard FlexX, with an averaged computing time of 5-10 s per compound, depending on parametrization. (C) 2004 Wiley-Liss, Inc.

• GEMDOCK: A generic evolutionary method for molecular docking
Yang, JM and Chen, CC
Proteins, 2004, 55(2), 288-304
PMID: 15048822     doi: 10.1002/prot.20035

We have developed an evolutionary approach for flexible ligand docking. This approval, GEMDOCK, uses a Generic Evolutionary Method for molecular DOCKing and an empirical scoring function. The former combines both discrete and continuous global search strategies with local search strategies to speed up convergence, whereas the latter results in rapid recognition of potential ligands. GEMDOCK was tested on a diverse data set of 100 protein-ligand complexes from the Protein Data Bank. In 79% of these complexes, the docked lowest energy ligand structures had root-mean-square derivations (RMSDs) below 2.0 Angstrom with respect to the corresponding crystal structures. The success rate increased to 85% if the structure water molecules were retained. We evaluated GEMDOCK on two cross-docking experiments in which each ligand of a protein ensemble was docked into each protein of the ensemble. Seventy-six percent of the docked structures had RMSDs below 2.0 Angstrom when the ligands were docked into foreign structures. We analyzed and validated GEMDOCK with respect to various search spaces and scoring functions, and found that if the scoring function was perfect, then the predicted accuracy was also essentially perfect. This study suggests that GEMDOCK is a useful tool for molecular recognition and may be used to systematically evaluate and thus improve scoring functions. (C) 2004 Wiley-Liss, Inc.

• Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening
Halgren, TA and Murphy, RB and Friesner, RA and Beard, HS and Frye, LL and Pollard, WT and Banks, JL
Journal of medicinal chemistry, 2004, 47(7), 1750-1759
PMID: 15027866     doi: 10.1021/jm030644s

Glide's ability to identify active compounds in a database screen is characterized by applying Glide to a diverse set of nine protein receptors. In many cases, two, or even three, protein sites are employed to probe the sensitivity of the results to the site geometry. To make the database screens as realistic as possible, the screens use sets of "druglike" decoy ligands that have been selected to be representative of what we believe is likely to be found in the compound collection of a pharmaceutical or biotechnology company. Results are presented for releases 1.8, 2.0, and 2.5 of Glide. The comparisons show that average measures for both "early" and "global" enrichment for Glide 2.5 are 3 times higher than for Glide 1.8 and more than 2 times higher than for Glide 2.0 because of better results for the least well-handled screens. This improvement in enrichment stems largely from the better balance of the more widely parametrized GlideScore 2.5 function and the inclusion of terms that penalize ligand-protein interactions that violate established principles of physical chemistry, particularly as it concerns the exposure to solvent of charged protein and ligand groups. Comparisons to results for the thymidine kinase and estrogen receptors published by Rognan and co-workers (J. Med. Chem. 2000, 43, 4759-4767) show that Glide 2.5 performs better than GOLD 1.1, FlexX 1.8, or DOCK 4.01.

• Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy
Friesner, RA and Banks, JL and Murphy, RB and Halgren, TA and Klicic, JJ and Mainz, DT and Repasky, MP and Knoll, EH and Shelley, M and Perry, JK and Shaw, DE and Francis, P and Shenkin, PS
Journal of medicinal chemistry, 2004, 47(7), 1739-1749
PMID: 15027865     doi: 10.1021/jm0306430

Unlike other methods for docking ligands to the rigid 3D structure of a known protein receptor, Glide approximates a complete systematic search of the conformational, orientational, and positional space of the docked ligand. In this search, an initial rough positioning and scoring phase that dramatically narrows the search space is followed by torsionally flexible energy optimization on an OPLS-AA nonbonded potential grid for a few hundred surviving candidate poses. The very best candidates are further refined via a Monte Carlo sampling of pose conformation; in some cases, this is crucial to obtaining an accurate docked pose. Selection of the best docked pose uses a model energy function that combines empirical and force-field-based terms. Docking accuracy is assessed by redocking ligands from 282 cocrystallized PDB complexes starting from conformationally optimized ligand geometries that bear no memory of the correctly docked pose. Errors in geometry for the top-ranked pose are less than 1 Angstrom in nearly half of the cases and are greater than 2 Angstrom in only about one-third of them. Comparisons to published data on rms deviations show that Glide is nearly twice as accurate as GOLD and more than twice as accurate as FlexX for ligands having up to 20 rotatable bonds. Glide is also found to be more accurate than the recently described Surflex method.

• Virtual screening using protein-ligand docking: Avoiding artificial enrichment
Verdonk, ML and Berdini, V and Hartshorn, MJ and Mooij, WTM and Murray, CW and Taylor, RD and Watson, P
Journal of Chemical Information and Computer Sciences, 2004, 44(3), 793-806
PMID: 15154744     doi: 10.1021/ci034289q

This study addresses a number of topical issues around the use of protein-ligand docking in virtual screening. We show that, for the validation of such methods, it is key to use focused libraries (containing compounds with one-dimensional properties, similar to the actives), rather than "random" or "drug-like" libraries to test the actives against. We also show that, to obtain good enrichments, the docking program needs to produce reliable binding modes. We demonstrate how pharmacophores can be used to guide the dockings and improve enrichments, and we compare the performance of three consensus-ranking protocols against ranking based on individual scoring functions. Finally, we show that protein-ligand docking can be an effective aid in the screening for weak, fragment-like binders, which has rapidly become a popular strategy for hit identification. All results presented are based on carefully constructed virtual screening experiments against four targets, using the protein-ligand docking program GOLD.

• Assessing scoring functions for protein-ligand interactions.
Ferrara, Philippe and Gohlke, Holger and Price, Daniel J and Klebe, Gerhard and Brooks, Charles L
Journal of medicinal chemistry, 2004, 47(12), 3032-3047
PMID: 15163185     doi: 10.1021/jm030489h

An assessment of nine scoring functions commonly applied in docking using a set of 189 protein-ligand complexes is presented. The scoring functions include the CHARMm potential, the scoring function DrugScore, the scoring function used in AutoDock, the three scoring functions implemented in DOCK, as well as three scoring functions implemented in the CScore module in SYBYL (PMF, Gold, ChemScore). We evaluated the abilities of these scoring functions to recognize near-native configurations among a set of decoys and to rank binding affinities. Binding site decoys were generated by molecular dynamics with restraints. To investigate whether the scoring functions can also be applied for binding site detection, decoys on the protein surface were generated. The influence of the assignment of protonation states was probed by either assigning "standard" protonation states to binding site residues or adjusting protonation states according to experimental evidence. The role of solvation models in conjunction with CHARMm was explored in detail. These include a distance-dependent dielectric function, a generalized Born model, and the Poisson equation. We evaluated the effect of using a rigid receptor on the outcome of docking by generating all-pairs decoys ("cross-decoys") for six trypsin and seven HIV-1 protease complexes. The scoring functions perform well to discriminate near-native from misdocked conformations, with CHARMm, DOCK-energy, DrugScore, ChemScore, and AutoDock yielding recognition rates of around 80%. Significant degradation in performance is observed in going from decoy to cross-decoy recognition for CHARMm in the case of HIV-1 protease, whereas DrugScore and ChemScore, as well as CHARMm in the case of trypsin, show only small deterioration. In contrast, the prediction of binding affinities remains problematic for all of the scoring functions. ChemScore gives the highest correlation value with R(2)

## 2003

• Pharmacophore-based molecular docking to account for ligand flexibility.
Joseph-McCarthy, Diane and Thomas, Bert E and Belmarsh, Michael and Moustakas, Demetri and Alvarez, Juan C
Proteins, 2003, 51(2), 172-188
PMID: 12660987     doi: 10.1002/prot.10266

Rapid computational mining of large 3D molecular databases is central to generating new drug leads. Accurate virtual screening of large 3D molecular databases requires consideration of the conformational flexibility of the ligand molecules. Ligand flexibility can be included without prohibitively increasing the search time by docking ensembles of precomputed conformers from a conformationally expanded database. A pharmacophore-based docking method whereby conformers of the same or different molecules are overlaid by their largest 3D pharmacophore and simultaneously docked by partial matches to that pharmacophore is presented. The method is implemented in DOCK 4.0.

• Discovery of a novel family of CDK inhibitors with the program LIDAEUS: structural basis for ligand-induced disordering of the activation loop.
Wu, Su Ying and McNae, Iain and Kontopidis, George and McClue, Steven J and McInnes, Campbell and Stewart, Kevin J and Wang, Shudong and Zheleva, Daniella I and Marriage, Howard and Lane, David P and Taylor, Paul and Fischer, Peter M and Walkinshaw, Malcolm D
Structure (London, England : 1993), 2003, 11(4), 399-410
PMID: 12679018

A family of 4-heteroaryl-2-amino-pyrimidine CDK2 inhibitor lead compounds was discovered with the new database-mining program LIDAEUS through in silico screening. Four compounds with IC(50) values ranging from 17 to 0.9 microM were selected for X-ray crystal analysis. Two distinct binding modes are observed, one of which resembles the hydrogen bonding pattern of bound ATP. In the second binding mode, the ligands trigger a conformational change in the activation T loop by inducing movement of Lys(33) and Asp(145) side chains. The family of molecules discovered provides an excellent starting point for the design and synthesis of tight binding inhibitors, which may lead to a new class of antiproliferative drugs.

• FDS: flexible ligand and receptor docking with a continuum solvent model and soft-core energy function.
Taylor, Richard D and Jewsbury, Philip J and Essex, Jonathan W
Journal of computational chemistry, 2003, 24(13), 1637-1656
PMID: 12926007     doi: 10.1002/jcc.10295

The docking of flexible small molecule ligands to large flexible protein targets is addressed in this article using a two-stage simulation-based method. The methodology presented is a hybrid approach where the first component is a dock of the ligand to the protein binding site, based on deriving sets of simultaneously satisfied intermolecular hydrogen bonds using graph theory and a recursive distance geometry algorithm. The output structures are reduced in number by cluster analysis based on distance similarities. These structures are submitted to a modified Monte Carlo algorithm using the AMBER-AA molecular mechanics force field with the Generalized Born/Surface Area (GB/SA) continuum model. This solvent model is not only less expensive than an explicit representation, but also yields increased sampling. Sampling is also increased using a rotamer library to direct some of the protein side-chain movements along with large dihedral moves. Finally, a softening function for the nonbonded force field terms is used, enabling the potential energy function to be slowly turned on throughout the course of the simulation. The docking procedure is optimized, and the results are presented for a single complex of the arabinose binding protein. It was found that for a rigid receptor model, the X-ray binding geometry was reproduced and uniquely identified based on the associated potential energy. However, when side-chain flexibility was included, although the X-ray structure was identified, it was one of three possible binding geometries that were energetically indistinguishable. These results suggest that on relaxing the constraint on receptor flexibility, the docking energy hypersurface changes from being funnel-like to rugged. A further 14 complexes were then examined using the optimized protocol. For each complex the docking methodology was tested for a fully flexible ligand, both with and without protein side-chain flexibility. For the rigid protein docking, 13 out of the 15 test cases were able to find the experimental binding mode; this number was reduced to 11 for the flexible protein docking. However, of these 11, in the majority of cases the experimental binding mode was not uniquely identified, but was present in a cluster of low energy structures that were energetically indistinguishable. These results not only support the presence of a rugged docking energy hypersurface, but also suggest that it may be necessary to consider the possibility of more than one binding conformation during ligand optimization.

• Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine
Jain, Ajay N
Journal of medicinal chemistry, 2003, 46(4), 499-511
doi: 10.1021/jm020406h

• LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites
Venkatachalam, CM and Jiang, X and Oldfield, T and Waldman, M
Journal of molecular graphics & modelling, 2003, 21(4), 289-307
PMID: 12479928

We present a new shape-based method, LigandFit, for accurately docking ligands into protein active sites. The method employs a cavity detection algorithm for detecting invaginations in the protein as candidate active site regions. A shape comparison filter is combined with a Monte Carlo conformational search for generating ligand poses consistent with the active site shape. Candidate poses are minimized in the context of the active site using a grid-based method for evaluating protein-ligand interaction energies. Errors arising from grid interpolation are dramatically reduced using a new non-linear interpolation scheme. Results are presented for 19 diverse protein-ligand complexes. The method appears quite promising, reproducing the X-ray structure ligand pose within an RMS of 2Angstrom in 14 out of the 19 complexes. A high-throughput screening study applied to the thymidine kinase receptor is also presented in which LigandFit, when combined with LigScore, an internally developed scoring function [1], yields very good hit rates for a ligand pool seeded with known actives. (C) 2002 Published by Elsevier Science Inc.

• Detailed analysis of grid-based molecular docking: A case study of CDOCKER-A CHARMm-based MD docking algorithm.
Wu, Guosheng and Robertson, Daniel H and Brooks, Charles L and Vieth, Michal
Journal of computational chemistry, 2003, 24(13), 1549-1562
PMID: 12925999     doi: 10.1002/jcc.10306

The influence of various factors on the accuracy of protein-ligand docking is examined. The factors investigated include the role of a grid representation of protein-ligand interactions, the initial ligand conformation and orientation, the sampling rate of the energy hyper-surface, and the final minimization. A representative docking method is used to study these factors, namely, CDOCKER, a molecular dynamics (MD) simulated-annealing-based algorithm. A major emphasis in these studies is to compare the relative performance and accuracy of various grid-based approximations to explicit all-atom force field calculations. In these docking studies, the protein is kept rigid while the ligands are treated as fully flexible and a final minimization step is used to refine the docked poses. A docking success rate of 74% is observed when an explicit all-atom representation of the protein (full force field) is used, while a lower accuracy of 66-76% is observed for grid-based methods. All docking experiments considered a 41-member protein-ligand validation set. A significant improvement in accuracy (76 vs. 66%) for the grid-based docking is achieved if the explicit all-atom force field is used in a final minimization step to refine the docking poses. Statistical analysis shows that even lower-accuracy grid-based energy representations can be effectively used when followed with full force field minimization. The results of these grid-based protocols are statistically indistinguishable from the detailed atomic dockings and provide up to a sixfold reduction in computation time. For the test case examined here, improving the docking accuracy did not necessarily enhance the ability to estimate binding affinities using the docked structures.

• Gaussian docking functions.
McGann, Mark R and Almond, Harold R and Nicholls, Anthony and Grant, J Andrew and Brown, Frank K
Biopolymers, 2003, 68(1), 76-90
PMID: 12579581     doi: 10.1002/bip.10207

A shape-based Gaussian docking function is constructed which uses Gaussian functions to represent the shapes of individual atoms. A set of 20 trypsin ligand-protein complexes are drawn from the Protein Data Bank (PDB), the ligands are separated from the proteins, and then are docked back into the active sites using numerical optimization of this function. It is found that by employing this docking function, quasi-Newton optimization is capable of moving ligands great distances [on average 7 A root mean square distance (RMSD)] to locate the correctly docked structure. It is also found that a ligand drawn from one PDB file can be docked into a trypsin structure drawn from any of the trypsin PDB files. This implies that this scoring function is not limited to more accurate x-ray structures, as is the case for many of the conventional docking methods, but could be extended to homology models.

• Improved protein-ligand docking using GOLD.
Verdonk, Marcel L and Cole, Jason C and Hartshorn, Michael J and Murray, Christopher W and Taylor, Richard D
Proteins, 2003, 52(4), 609-623
PMID: 12910460     doi: 10.1002/prot.10465

The Chemscore function was implemented as a scoring function for the protein-ligand docking program GOLD, and its performance compared to the original Goldscore function and two consensus docking protocols, "Goldscore-CS" and "Chemscore-GS," in terms of docking accuracy, prediction of binding affinities, and speed. In the "Goldscore-CS" protocol, dockings produced with the Goldscore function are scored and ranked with the Chemscore function; in the "Chemscore-GS" protocol, dockings produced with the Chemscore function are scored and ranked with the Goldscore function. Comparisons were made for a "clean" set of 224 protein-ligand complexes, and for two subsets of this set, one for which the ligands are "drug-like," the other for which they are "fragment-like." For "drug-like" and "fragment-like" ligands, the docking accuracies obtained with Chemscore and Goldscore functions are similar. For larger ligands, Goldscore gives superior results. Docking with the Chemscore function is up to three times faster than docking with the Goldscore function. Both combined docking protocols give significant improvements in docking accuracy over the use of the Goldscore or Chemscore function alone. "Goldscore-CS" gives success rates of up to 81% (top-ranked GOLD solution within 2.0 A of the experimental binding mode) for the "clean list," but at the cost of long search times. For most virtual screening applications, "Chemscore-GS" seems optimal; search settings that give docking speeds of around 0.25-1.3 min/compound have success rates of about 78% for "drug-like" compounds and 85% for "fragment-like" compounds. In terms of producing binding energy estimates, the Goldscore function appears to perform better than the Chemscore function and the two consensus protocols, particularly for faster search settings. Even at docking speeds of around 1-2 min/compound, the Goldscore function predicts binding energies with a standard deviation of approximately 10.5 kJ/mol.

• Automated generation of MCSS-derived pharmacophoric DOCK site points for searching multiconformation databases.
Joseph-McCarthy, Diane and Alvarez, Juan C
Proteins, 2003, 51(2), 189-202
PMID: 12660988     doi: 10.1002/prot.10296

All docking methods employ some sort of heuristic to orient the ligand molecules into the binding site of the target structure. An automated method, MCSS2SPTS, for generating chemically labeled site points for docking is presented. MCSS2SPTS employs the program Multiple Copy Simultaneous Search (MCSS) to determine target-based theoretical pharmacophores. More specifically, chemically labeled site points are automatically extracted from selected low-energy functional-group minima and clustered together. These pharmacophoric site points can then be directly matched to the pharmacophoric features of database molecules with the use of either DOCK or PhDOCK to place the small molecules into the binding site. Several examples of the ability of MCSS2SPTS to reproduce the three-dimensional pharmacophoric features of ligands from known ligand-protein complex structures are discussed. In addition, a site-point set calculated for one human immunodeficiency virus 1 (HIV1) protease structure is used with PhDOCK to dock a set of HIV1 protease ligands; the docked poses are compared to the corresponding complex structures of the ligands. Finally, the use of an MCSS2SPTS-derived site-point set for acyl carrier protein synthase is compared to the use of atomic positions from a bound ligand as site points for a large-scale DOCK search. In general, MCSS2SPTS-generated site points focus the search on the more relevant areas and thereby allow for more effective sampling of the target site.

## 2002

• Further development and validation of empirical scoring functions for structure-based binding affinity prediction
Wang, R and Lai, L
Journal of computer-aided molecular design, 2002, 16, 11-26
PMID: 12197663

New empirical scoring functions have been developed to estimate the binding affinity of a given protein-ligand complex with known three-dimensional structure. These scoring functions include terms accounting for van der Waals interaction, hydrogen bonding, deformation penalty, and hydrophobic effect. A special feature is that three different algorithms have been implemented to calculate the hydrophobic effect term, which results in three parallel scoring functions. All three scoring functions are calibrated through multivariate regression analysis of a set of 200 protein-ligand complexes and they reproduce the binding free energies of the entire training set with standard deviations of 2.2 kcal/mol, 2.1 kcal/mol, and 2.0 kcal/mol, respectively. These three scoring functions are further combined into a consensus scoring function, X-CSCORE. When tested on an independent set of 30 protein-ligand complexes, X-CSCORE is able to predict their binding free energies with a standard deviation of 2.2 kcal/mol. The potential application of X-CSCORE to molecular docking is also investigated. Our results show that this consensus scoring function improves the docking accuracy considerably when compared to the conventional force field computation used for molecular docking.

• Simple, intuitive calculations of free energy of binding for protein-ligand complexes. 1. Models without explicit constrained water.
Cozzini, Pietro and Fornabaio, Micaela and Marabotti, Anna and Abraham, Donald J and Kellogg, Glen E and Mozzarelli, Andrea
Journal of medicinal chemistry, 2002, 45(12), 2469-2483
PMID: 12036355

The prediction of the binding affinity between a protein and ligands is one of the most challenging issues for computational biochemistry and drug discovery. While the enthalpic contribution to binding is routinely available with molecular mechanics methods, the entropic contribution is more difficult to estimate. We describe and apply a relatively simple and intuitive calculation procedure for estimating the free energy of binding for 53 protein-ligand complexes formed by 17 proteins of known three-dimensional structure and characterized by different active site polarity. HINT, a software model based on experimental LogP(o/w) values for small organic molecules, was used to evaluate and score all atom-atom hydropathic interactions between the protein and the ligands. These total scores (H(TOTAL)), which have been previously shown to correlate with DeltaG(interaction) for protein-protein interactions, correlate with DeltaG(binding) for protein-ligand complexes in the present study with a standard error of +/-2.6 kcal mol(-1) from the equation DeltaG(binding)

• Q-fit: A probabilistic method for docking molecular fragments by sampling low energy conformational space
Jackson, RM
Journal of computer-aided molecular design, 2002, 16(1), 43-57
PMID: 12197665

A new method is presented that docks molecular fragments to a rigid protein receptor. It uses a probabilistic procedure based on statistical thermodynamic principles to place ligand atom triplets at the lowest energy sites. The probabilistic method ranks receptor binding modes so that the lowest energy ones are sampled first. This allows constraints to be introduced to limit the depth of the search leading to a computationally efficient method of sampling low energy conformational space. This is combined with energy minimization of the initial fragment placement to arrive at a low energy conformation for the molecular fragment. Two different search methods are tested involving (i) geometric hashing and (ii) pose clustering methods. Ten molecular fragments were docked that have commonly been used to test docking methods. The success rate was 8/10 and 10/10 for generating a close solution ranked first using the two different sampling procedures. In general, all five of the top ranked solutions reproduce the observed binding mode, which increases confidence in the predictions. A set of ten molecular fragments that have previously been identified as problematic were docked. Success was achieved in 3/10 and 4/10 using the two different methods. Again there is a high level of agreement between the two methods and again in the successful cases the top ranked solutions are correct whilst in the case of the failures none are. The geometric hashing and pose clustering methods are fast averaging similar to13 and similar to11 s per placement respectively using conservative parameters. The results are very encouraging and will facilitate the process of finding novel small molecule lead compounds by virtual screening of chemical databases.

• Flexible docking under pharmacophore type constraints.
Hindle, Sally A and Rarey, Matthias and Buning, Christian and Lengaue, Thomas
Journal of computer-aided molecular design, 2002, 16(2), 129-149
PMID: 12188022

FLEXX-PHARM, an extended version of the flexible docking tool FLEXX, allows the incorporation of information about important characteristics of protein-ligand binding modes into a docking calculation. This information is introduced as a simple set of constraints derived from receptor-based type pharmacophore features. The constraints are determined by selected FLEXX interactions and inclusion volumes in the receptor active site. They guide the docking process to produce a set of docking solutions with particular properties. By applying a series of look-ahead checks during the flexible construction of ligand fragments within the active site, FLEXX-PHARM determines which partially built docking solutions can potentially obey the constraints. Solutions that will not obey the constraints are deleted as early as possible, often decreasing the calculation time and enabling new docking solutions to emerge. FLEXX-PHARM was evaluated on various individual protein-ligand complexes where the top docking solutions generated by FLEXX had high root mean square deviations (RMSD) from the experimentally observed binding modes. FLEXX-PHARM showed an improvement in the RMSD of the top solutions in most cases, along with a reduction in run time. We also tested FLEXX-PHARM as a database screening tool on a small dataset of molecules for three target proteins. In two cases, FLEXX-PHARM missed one or two of the active molecules due to the constraints selected. However, in general FLEXX-PHARM maintained or improved the enrichment shown with FLEXX, while completing the screen in considerably less run time.

• Consensus scoring for ligand/protein interactions.
Clark, Robert D and Strizhev, Alexander and Leonard, Joseph M and Blake, James F and Matthew, James B
Journal of molecular graphics & modelling, 2002, 20(4), 281-295
PMID: 11858637

Several different functions have been put forward for evaluating the energetics of ligand binding to proteins. Those employed in the DOCK, GOLD and FlexX docking programs have been especially widely used, particularly in connection with virtual high-throughput screening (vHTS) projects. Until recently, such evaluation functions were usually considered only in conjunction with the docking programs that relied on them. In such studies, the evaluation function in question actually fills two distinct roles: it serves as the objective function being optimized (fitness function), but is also the scoring function used to compare the candidate docking configurations generated by the program. We have used descriptions available in the open literature to create free-standing scoring functions based on those used in DOCK and GOLD, and have implemented the more recently formulated PMF [J. Med. Chem. 42 (1999) 791] scoring function as well. The performance of these functions was examined individually for each of several data sets for which both crystal structures and affinities are available, as was the performance of the FlexX scoring function. Various ways of combining individual scores into a consensus score (CScore) were also considered. The individual and consensus scores were also used to try to pick out configurations most similar to those found in crystal structures from among a set of candidate configurations produced by FlexX docking runs. We find that the reliability and interpretability of results can be improved by combining results from all four functions into a CScore.

• Protein-ligand recognition using spherical harmonic molecular surfaces: towards a fast and efficient filter for large virtual throughput screening.
Cai, Wensheng and Shao, Xueguang and Maigret, Bernard
Journal of molecular graphics & modelling, 2002, 20(4), 313-328
PMID: 11858640

Molecular surfaces are important because surface-shape complementarity is often a necessary condition in protein-ligand interactions and docking studies. We have previously described a fast and efficient method to obtain triangulated surface-meshes by topologically mapping ellipsoids on molecular surfaces. In this paper, we present an extension of our work to spherical harmonic surfaces in order to approximate molecular surfaces of both ligands and receptor-cavities and to easily check the surface-shape complementarity. The method consists of (1) finding lobes and holes on both ligand and cavity surfaces using contour maps of radius functions with spherical harmonic expansions, (2) superposing the surfaces around a given binding site by minimizing the distance between their respective expansion coefficients. This docking procedure capabilities was demonstrated by application to 35 protein-ligand complexes of known crystal structures. The method can also be easily and efficiently used as a filter to detect in a large conformational sampling the possible conformations presenting good complementarity with the receptor site, and being, therefore, good candidates for further more elaborate docking studies. This "virtual screening" was demonstrated on the platelet thrombin receptor.

• Automated docking to multiple target structures: incorporation of protein mobility and structural water heterogeneity in AutoDock.
Osterberg, Fredrik and Morris, Garrett M and Sanner, Michel F and Olson, Arthur J and Goodsell, David S
Proteins, 2002, 46(1), 34-40
PMID: 11746701

Protein motion and heterogeneity of structural waters are approximated in ligand-docking simulations, using an ensemble of protein structures. Four methods of combining multiple target structures within a single grid-based lookup table of interaction energies are tested. The method is evaluated using complexes of 21 peptidomimetic inhibitors with human immunodeficiency virus type 1 (HIV-1) protease. Several of these structures show motion of an arginine residue, which is essential for binding of large inhibitors. A structural water is also present in 20 of the structures, but it must be absent in the remaining one for proper binding. Mean and minimum methods perform poorly, but two weighted average methods permit consistent and accurate ligand docking, using a single grid representation of the target protein structures.

## 2001

• Detailed analysis of scoring functions for virtual screening.
Stahl, M and Rarey, M
Journal of medicinal chemistry, 2001, 44(7), 1035-1042
PMID: 11297450

We present a comprehensive study of the performance of fast scoring functions for library docking using the program FlexX as the docking engine. Four scoring functions, among them two recently developed knowledge-based potentials, are evaluated on seven target proteins whose binding sites represent a wide range of size, form, and polarity. The results of these calculations give valuable insight into strengths and weaknesses of current scoring functions. Furthermore, it is shown that a well-chosen combination of two of the tested scoring functions leads to a new, robust scoring scheme with superior performance in virtual screening.

• High throughput docking for library design and library prioritization.
Diller, D J and Merz, K M
Proteins, 2001, 43(2), 113-124
PMID: 11276081

The prioritization of the screening of combinatorial libraries is an extremely important task for the rapid identification of tight binding ligands and ultimately pharmaceutical compounds. When structural information for the target is available, molecular docking is an approach that can be used for prioritization. Here, we present the initial validation of a new rapid approach to molecular docking developed for prioritizing combinatorial libraries. The algorithm is tested on 103 individual cases from the protein data bank and in nearly 90% of these cases docks the ligand to within 2.0 A of the observed binding mode. Because the mean CPU time is <5 s/mol, this approach can process hundreds of thousands of compounds per week. Furthermore, if a somewhat less thorough search is performed, the search time drops to 1 s/mol, thus allowing millions of compounds to be docked per week and tested for potential activity. Proteins 2001;43:113-124.

• Docking ligands onto binding site representations derived from proteins built by homology modelling.
Schafferhans, A and Klebe, G
Journal of molecular biology, 2001, 307(1), 407-427
PMID: 11243828     doi: 10.1006/jmbi.2000.4453

Due to the abundant sequence information available from genome projects, an increasing number of structurally unknown proteins, homologous to examples of known 3D structure, will be discovered as new targets for drug design. Since homology models do not provide sufficient accuracy to apply common drug design tools, a new approach, DragHome, has been developed to dock ligands into such approximate protein models. DragHome combines information from homology modelling with ligand data, used by and derived from 3D quantitative structure-activity relationships (QSAR). The binding-site of a model-built protein is analysed in terms of putative ligand interaction sites and translated via Gaussian functions into a functional binding-site description represented by physico-chemical properties. Ligands to be docked onto these binding-site representations are similarly translated into a description based on Gaussian functions. The docking is computed by optimising the overlap between the functional description of the binding site and the ligand, generating multiple solutions. For a set of different ligands, these solutions are ranked according to the internal similarity consistance among the various ligands in the binding modes obtained from docking. DragHome has been validated at examples for which crystal structures are available: structurally distinct thrombin inhibitors were docked onto models of thrombin generated from serine proteases of 28 to 40 % sequence identity, yielding ligand binding modes with an average RMS deviation of 1.4 A. Mostly the near-native solutions are ranked best. Molecular flexibility of ligands can be considered in terms of pre-calculated multiple conformers. DragHome has been used to automatically generate an alignment of 88 thrombin inhibitors, for which a significant 3D QSAR model could be derived. The contribution maps resulting from this analysis can be interpreted with respect to the surrounding protein model. They highlight inconsistencies and deficiencies present in the model. In future developments, this information could be fed back into a subsequent modelling step to improve the protein model.

• EUDOC: A computer program for identification of drug interaction sites in macromolecules and drug leads from chemical databases
Pang, YP and Perola, E and Xu, K and Prendergast, FG
Journal of computational chemistry, 2001, 22(15), 1750-1771
PMID: 12116409     doi: 10.1002/jcc.1129

The completion of the Human Genome Project, the growing effort on proteomics, and the Structural Genomics Initiative have recently intensified the attention being paid to reliable computer docking programs able to identify molecules that can affect the function of a macromolecule through molecular complexation. We report herein an automated computer docking program, EUDOC, for prediction of ligand-receptor complexes from 3D receptor structures, including metalloproteins, and for identification of a subset enriched in drug leads from chemical databases. This program was evaluated from the standpoints of force field and sampling issues using 154 experimentally determined ligand-receptor complexes and four "real-life" applications of the EUDOC program. The results provide evidence for the reliability and accuracy of the EUDOC program. In addition, key principles underlying molecular recognition, and the effects of structural water molecules in the active site and different atomic charge models on docking results are discussed. (C) 2001 John Wiley & Sons, Inc.

• FlexE: efficient molecular docking considering protein structure variations.
Claussen, H and Buning, C and Rarey, M and Lengauer, T
Journal of molecular biology, 2001, 308(2), 377-395
PMID: 11327774     doi: 10.1006/jmbi.2001.4551

Side-chain or even backbone adjustments upon docking of different ligands to the same protein structure, a phenomenon known as induced fit, are frequently observed. Sometimes point mutations within the active site influence the ligand binding of proteins. Furthermore, for homology derived protein structures there are often ambiguities in side-chain placement and uncertainties in loop modeling which may be critical for docking applications. Nevertheless, only very few molecular docking approaches have taken into account such variations in protein structures. We present the new software tool FlexE which addresses the problem of protein structure variations during docking calculations. FlexE can dock flexible ligands into an ensemble of protein structures which represents the flexibility, point mutations, or alternative models of a protein. The FlexE approach is based on a united protein description generated from the superimposed structures of the ensemble. For varying parts of the protein, discrete alternative conformations are explicitly taken into account, which can be combinatorially joined to create new valid protein structures.FlexE was evaluated using ten protein structure ensembles containing 105 crystal structures from the PDB and one modeled structure with 60 ligands in total. For 50 ligands (83 %) FlexE finds a placement with an RMSD to the crystal structure below 2.0 A. In all cases our results are of similar quality to the best solution obtained by sequentially docking the ligands into all protein structures (cross docking). In most cases the computing time is significantly lower than the accumulated run times for the single structures. FlexE takes about five and a half minutes on average for placing one ligand into the united protein description on a common workstation. The example of the aldose reductase demonstrates the necessity of considering protein structure variations for docking calculations. We docked three potent inhibitors into four protein structures with substantial conformational changes within the active site. Using only one rigid protein structure for screening would have missed potential inhibitors whereas all inhibitors can be docked taking all protein structures into account.

• DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases.
Ewing, T J and Makino, S and Skillman, A G and Kuntz, I D
Journal of computer-aided molecular design, 2001, 15(5), 411-428
PMID: 11394736

In this paper we describe the search strategies developed for docking flexible molecules to macomolecular sites that are incorporated into the widely distributed DOCK software, version 4.0. The search strategies include incremental construction and random conformation search and utilize the existing Coulombic and Lennard-Jones grid-based scoring function. The incremental construction strategy is tested with a panel of 15 crystallographic testcases, created from 12 unique complexes whose ligands vary in size and flexibility. For all testcases, at least one docked position is generated within 2 A of the crystallographic position. For 7 of 15 testcases, the top scoring position is also within 2 A of the crystallographic position. The algorithm is fast enough to successfully dock a few testcases within seconds and most within 100 s. The incremental construction and the random search strategy are evaluated as database docking techniques with a database of 51 molecules docked to two of the crystallographic testcases. Incremental construction outperforms random search and is fast enough to reliably rank the database of compounds within 15 s per molecule on an SGI R10000 cpu.

## 2000

• A method for including protein flexibility in protein-ligand docking: improving tools for database mining and virtual screening.
Broughton, H B
Journal of molecular graphics & modelling, 2000, 18(3), 247-57, 302-4
PMID: 11021541

Second-generation methods for docking ligands into their biological receptors, such as FLOG, provide for flexibility of the ligand but not of the receptor. Molecular dynamics based methods, such as free energy perturbation, account for flexibility, solvent effects, etc., but are very time consuming. We combined the use of statistical analysis of conformational samples from short-run protein molecular dynamics with grid-based docking protocols and demonstrated improved performance in two test cases. Our statistical analysis explores the importance of the average strength of a potential interaction with the biological target and optionally applies a weighting depending on the variability in the strength of the interaction seen during dynamics simulation. Using these methods, we improved the num-top-ranked 10% of a database of drug-like molecules, in searches based on the three-dimensional structure of the protein. These methods are able to match the ability of manual docking to assess likely inactivity on steric grounds and indeed to rank order ligands from a homologous series of cyclooxygenase-2 inhibitors with good correlation to their true activity. Furthermore, these methods reduce the need for human intervention in setting up molecular docking experiments.

• Knowledge-based scoring function to predict protein-ligand interactions.
Gohlke, H and Hendlich, M and Klebe, G
Journal of molecular biology, 2000, 295(2), 337-356
PMID: 10623530     doi: 10.1006/jmbi.1999.3371

The development and validation of a new knowledge-based scoring function (DrugScore) to describe the binding geometry of ligands in proteins is presented. It discriminates efficiently between well-docked ligand binding modes (root-mean-square deviation <2.0 A with respect to a crystallographically determined reference complex) and those largely deviating from the native structure, e.g. generated by computer docking programs. Structural information is extracted from crystallographically determined protein-ligand complexes using ReLiBase and converted into distance-dependent pair-preferences and solvent-accessible surface (SAS) dependent singlet preferences for protein and ligand atoms. Definition of an appropriate reference state and accounting for inaccuracies inherently present in experimental data is required to achieve good predictive power. The sum of the pair preferences and the singlet preferences is calculated based on the 3D structure of protein-ligand binding modes generated by docking tools. For two test sets of 91 and 68 protein-ligand complexes, taken from the Protein Data Bank (PDB), the calculated score recognizes poses generated by FlexX deviating <2 A from the crystal structure on rank 1 in three quarters of all possible cases. Compared to FlexX, this is a substantial improvement. For ligand geometries generated by DOCK, DrugScore is superior to the "chemical scoring" implemented into this tool, while comparable results are obtained using the "energy scoring" in DOCK. None of the presently known scoring functions achieves comparable power to extract binding modes in agreement with experiment. It is fast to compute, regards implicitly solvation and entropy contributions and produces correctly the geometry of directional interactions. Small deviations in the 3D structure are tolerated and, since only contacts to non-hydrogen atoms are regarded, it is independent from assumptions of protonation states.

• Similarity-driven flexible ligand docking.
Fradera, X and Knegtel, R M and Mestres, J
Proteins, 2000, 40(4), 623-636
PMID: 10899786

A similarity-driven approach to flexible ligand docking is presented. Given a reference ligand or a pharmacophore positioned in the protein active site, the method allows inclusion of a similarity term during docking. Two different algorithms have been implemented, namely, a similarity-penalized docking (SP-DOCK) and a similarity-guided docking (SG-DOCK). The basic idea is to maximally exploit the structural information about the ligand binding mode present in cases where ligand-bound protein structures are available, information that is usually ignored in standard docking procedures. SP-DOCK and SG-DOCK have been derived as modified versions of the program DOCK 4.0, where the similarity program MIMIC acts as a module for the calculation of similarity indices that correct docking energy scores at certain steps of the calculation. SP-DOCK applies similarity corrections to the set of ligand orientations at the end of the ligand incremental construction process, penalizing the docking energy and, thus, having only an effect on the relative ordering of the final solutions. SG-DOCK applies similarity corrections throughout the entire ligand incremental construction process, thus affecting not only the relative ordering of solutions but also actively guiding the ligand docking. The performance of SP-DOCK and SG-DOCK for binding mode assessment and molecular database screening is discussed. When applied to a set of 32 thrombin ligands for which crystal structures are available, SG-DOCK improves the average RMSD by ca. 1 A when compared with DOCK. When those 32 thrombin ligands are included into a set of 1,000 diverse molecules from the ACD, DIV, and WDI databases, SP-DOCK significantly improves the retrieval of thrombin ligands within the first 10% of each of the three databases with respect to DOCK, with minimal additional computational cost. In all cases, comparison of SP-DOCK and SG-DOCK results with those obtained by DOCK and MIMIC is performed.

• DoMCoSAR: a novel approach for establishing the docking mode that is consistent with the structure-activity relationship. Application to HIV-1 protease inhibitors and VEGF receptor tyrosine kinase inhibitors.
Vieth, M and Cummins, D J
Journal of medicinal chemistry, 2000, 43(16), 3020-3032
PMID: 10956210

DoMCoSAR is a novel approach for statistically determining the docking mode that is consistent with a structure-activity relationship. The approach establishes the binding mode for the compounds in a chemical series with the assumption that all molecules exhibit the same binding mode. It involves three stages. In the first stage all molecules that belong to a given chemical series are docked to the active site of the protein target. The only bias used in the docking at this stage involves the location of the protein binding site. Coordinates of the common substructure (CS) that results from the unbiased docking are then clustered to establish the major substructure docking modes. In the second stage all molecules are docked to the major docking modes (MDMs) with constraints based on the common substructure. The third stage generates, for the major docking modes, interaction-based descriptors that include electrostatic, VDW, strain, and solvation contributions. The problem of docking mode evaluation is now reduced to the question of which descriptor set is more predictive. To establish a quantitative comparison of the descriptor sets associated with the major docking modes, we use 50 instances of random 4-fold cross-validation. For each 4-fold cross-validation the predictive squared correlation coefficient (R(2)) is computed. t-Tests are applied to establish significance of the differences in mean R(2) for one docking mode versus another. We test the methodology on two test cases: HIV-1 protease inhibitors (Holloway et al. J. Med. Chem. 1995, 38, 305-317) and vascular endothelial growth factor (VEGF) receptor tyrosine kinase oxoindoles (Sun et al. J. Med. Chem. 1998, 41, 2588-2603). For both test cases there is statistically significant preference for the binding mode consistent with the X-ray structure. The appeal of this methodology is that researchers gain the objectivity of statistical justification for the selected docking mode. The methodology is relatively insensitive to subtle variations of the protein structure that include, but are not limited to, side chain and small backbone rearrangement during binding. In addition, predictive models that result from the approach can be used to further optimize chemical series.

• DARWIN: a program for docking flexible molecules.
Taylor, J S and Burnett, R M
Proteins, 2000, 41(2), 173-191
PMID: 10966571

A new program named "DARWIN" has been developed to perform docking calculations with proteins and other biological molecules. The program uses the Genetic Algorithm to optimize the molecule's conformation and orientation under the selective pressure of minimizing the potential energy of the complex. A unique feature of DARWIN is that it communicates with the molecular mechanics program CHARMM to make the energy calculations. A second important feature is its parallel interface, which allows simultaneous use of multiple stand-alone copies of CHARMM to rapidly evaluate large numbers of potential solutions. This permits an "accuracy first" approach to docking, which avoids many of the common assumptions and shortcuts often made to reduce computation time. The method was applied to three protein-carbohydrate complexes: the crystallographically determined structures of Concanavalin A and Fab Se155-4; and a model structure for Fab ME36.1. Conformations close to the crystal structures were obtained with this approach, but some "false positive" solutions were also selected. Many of these could be eliminated by introducing different methods for simulating solvent effects. An effective screening method for docking a database of compounds to a single target enzyme using DARWIN is also presented.

• Similarity‐driven flexible ligand docking
Fradera, X and Knegtel, R and Mestres, J
Proteins: Structure, 2000, 40(4), 623-636

Abstract A similarity -driven approach to flexible ligand docking is presented. Given a reference ligand or a pharmacophore positioned in the protein active site, the method allows inclusion of a similarity term during docking . Two different algorithms have been ...

• Similarity‐driven flexible ligand docking
Fradera, X and Knegtel, R and Mestres, J
Proteins: Structure, 2000, 40(4), 623-636

Abstract A similarity -driven approach to flexible ligand docking is presented. Given a reference ligand or a pharmacophore positioned in the protein active site, the method allows inclusion of a similarity term during docking . Two different algorithms have been ...

## 1999

• A general and fast scoring function for protein-ligand interactions: a simplified potential approach.
Muegge, I and Martin, Y C
Journal of medicinal chemistry, 1999, 42(5), 791-804
PMID: 10072678     doi: 10.1021/jm980536j

A fast, simplified potential-based approach is presented that estimates the protein-ligand binding affinity based on the given 3D structure of a protein-ligand complex. This general, knowledge-based approach exploits structural information of known protein-ligand complexes extracted from the Brookhaven Protein Data Bank and converts it into distance-dependent Helmholtz free interaction energies of protein-ligand atom pairs (potentials of mean force, PMF). The definition of an appropriate reference state and the introduction of a correction term accounting for the volume taken by the ligand were found to be crucial for deriving the relevant interaction potentials that treat solvation and entropic contributions implicitly. A significant correlation between experimental binding affinities and computed score was found for sets of diverse protein-ligand complexes and for sets of different ligands bound to the same target. For 77 protein-ligand complexes taken from the Brookhaven Protein Data Bank, the calculated score showed a standard deviation from observed binding affinities of 1.8 log Ki units and an R2 value of 0.61. The best results were obtained for the subset of 16 serine protease complexes with a standard deviation of 1.0 log Ki unit and an R2 value of 0.86. A set of 33 inhibitors modeled into a crystal structure of HIV-1 protease yielded a standard deviation of 0.8 log Ki units from measured inhibition constants and an R2 value of 0.74. In contrast to empirical scoring functions that show similar or sometimes better correlation with observed binding affinities, our method does not involve deriving specific parameters that fit the observed binding affinities of protein-ligand complexes of a given training set. We compared the performance of the PMF score, Böhm's score (LUDI), and the SMOG score for eight different test sets of protein-ligand complexes. It was found that for the majority of test sets the PMF score performs best. The strength of the new approach presented here lies in its generality as no knowledge about measured binding affinities is needed to derive atomic interaction potentials. The use of the new scoring function in docking studies is outlined.

• BLEEP - potential of mean force describing protein-ligand interactions: II. Calculation of binding energies and comparison with experimental data
Alex, A and Forster, MJ and Thornton, JM
Journal of computational chemistry, 1999, 20(11), 1177-1185

We have developed BLEEP\v Z}biomolecular ligand energy evaluation protocol., an atomic level potential of mean force\v Z}PMF. describing protein􏱌ligand interactions. Here, we present four tests designed to assess different attributes of BLEEP. Calculating the energy of a small hydrogen-bonded complex allows us to compare BLEEP's description of this system with a quantum-chemical description. The results suggest that BLEEP gives an adequate description of hydrogen bonding. A study of the relative energies of various heparin binding geometries for human basic fibroblast growth factor\v Z}bFGF. demonstrates that BLEEP performs excellently in identifying low-energy binding modes from decoy conformations for a given protein􏱌ligand complex. We also calculate binding energies for a set of 90 protein􏱌ligand complexes, obtaining a correlation coefficient of 0.74 when compared with experiment. This shows that BLEEP can perform well in the difficult area of ranking the interaction energies of diverse complexes. We also study a set of nine serine proteinase􏱌inhibitor complexes; BLEEP's good performance here illustrates its ability to determine the relative energies of a series of similar complexes. We find that a protocol for incorporating solvation does not improve correlation with experiment.

• The sensitivity of the results of molecular docking to induced fit effects: application to thrombin, thermolysin and neuraminidase.
Murray, C W and Baxter, C A and Frenkel, A D
Journal of computer-aided molecular design, 1999, 13(6), 547-562
PMID: 10584214

This paper describes the application of PRO_LEADS to the flexible docking of ligands into crystallographically derived enzyme structures that are assumed to be rigid. PRO_LEADS uses a Tabu search methodology to perform the flexible search and an empirically derived estimate of the binding affinity to drive the docking process. The paper tests the extent to which the assumption of a rigid enzyme compromises the accuracy of the results. All-pairs docking experiments are performed for three enzymes (thrombin, thermolysin and influenza virus neuraminidase) based on six or more ligand-enzyme crystal structures for each enzyme. In 76% of the cases, PRO_LEADS can successfully identify the correct ligand conformation as the lowest energy configuration when the enzyme structure is derived from that ligand's crystal structure, but the methodology only docks 49% of the cases successfully when the ligand is docked against enzyme crystal structures derived from other ligands. Small movements in the enzyme structure lead to an under-prediction in the energy of the correct binding mode by up to 14 kJ/mol and in some cases this under-prediction can lead to the native mode not being recognised as the lowest energy solution. The type of movements responsible for mis-docking are: the movement of sidechains as a result of changes in C alpha position; the movement of sidechains without changes in C alpha position; the movement of flexible portions of main chains to facilitate the formation of hydrogen bonds; and the movement of metal atoms bound to the enzyme active site. The work illustrates that the assumption of a rigid active site can lead to errors in identification of the correct binding mode and the assessment of binding affinity, even for enzymes which show relatively small shift in atomic positions from one ligand to the next. A good docking code, such as PRO_LEADS, can usually dock successfully if there is induced fit in relatively rigid enzymes but there remains the need to develop improved strategies for dealing with enzyme flexibility. The work implies that treatments of enzyme flexibility which focus only on sidechain rotations will not deal with the critical shifts responsible for mis-docking of ligands in thrombin, thermolysin and neuraminidase. The paper demonstrates the utility of all pairs docking experiments as a method of assessing the effectiveness of docking methodologies in dealing with enzyme flexibility.

• PRODOCK: Software package for protein modeling and docking
Trosset, JY and Scheraga, HA
Journal of computational chemistry, 1999, 20(4), 412-427

A new software package, PRODOCK, for protein modeling and flexible docking is presented. The protein system is described in internal coordinates with an arbitrary level of flexibility for the proteins or ligands. The protein is represented by an all-atom model with the ECEPP/3 or AMBER IV force field, depending on whether the Ligand is a peptidic molecule or not. PRODOCK is based on a new residue data dictionary that makes the programming easier and the definition of molecular flexibility more straigthforward. Two versions of the dictionary have been constructed for the ECEPP/3 and AMBER IV geometry, respectively. The global optimization of the energy function is carried out with the scaled collective variable Monte Carlo method plus energy minimization. The incorporation of a local minimization during the conformational sampling has been shown to be very important for distinguishing low-energy normative conformations from native structures. To make the Monte Carlo minimization method efficient for docking, a new grid-based energy evaluation technique using Bezier splines has been incorporated. This article includes some techniques and simulation tools that significantly improve the efficiency of flexible docking simulations, in particular forward/backward polypeptide chain generation. A comparative study to illustrate the advantage of using quaternions over Euler angles for the rigid-body rotational variables is presented in this paper. Several applications of the program PRODOCK are also discussed. (C) 1999 John Wiley & Sons, Inc.

• MCDOCK: A Monte Carlo simulation approach to the molecular docking problem
Liu, M and Wang, SM
Journal of computer-aided molecular design, 1999, 13(5), 435-451
PMID: 10483527

Prediction of the binding mode of a ligand (a drug molecule) to its macromolecular receptor, or molecular docking, is an important problem in rational drug design. We have developed a new docking method in which a non-conventional Monte Carlo (MC) simulation technique is employed. A computer program, MCDOCK, was developed to carry out the molecular docking operation automatically. The current version of the MCDOCK program (version 1.0) allows for the full flexibility of ligands in the docking calculations. The scoring function used in MCDOCK is the sum of the interaction energy between the ligand and its receptor, and the conformational energy of the ligand. To validate the MCDOCK method, 19 small ligands, the binding modes of which had been determined experimentally using X-ray diffraction, were docked into their receptor binding sites. To produce statistically significant results, 20 MCDOCK runs were performed for each protein-ligand complex. It was found that a significant percentage of these MCDOCK runs converge to the experimentally observed binding mode. The root-mean-square (rms) of all non-hydrogen atoms of the ligand between the predicted and experimental binding modes ranges from 0.25 to 1.84 Angstrom for these 19 cases. The computational time for each run on an SGI Indigo2/R10000 varies from less than 1 min to 15 min, depending upon the size and the flexibility of the ligands. Thus MCDOCK may be used to predict the precise binding mode of ligands in lead optimization and to discover novel lead compounds through structure-based database searching.

• Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins.
Charifson, P S and Corkery, J J and Murcko, M A and Walters, W P
Journal of medicinal chemistry, 1999, 42(25), 5100-5109
PMID: 10602695

We present the results of an extensive computational study in which we show that combining scoring functions in an intersection-based consensus approach results in an enhancement in the ability to discriminate between active and inactive enzyme inhibitors. This is illustrated in the context of docking collections of three-dimensional structures into three different enzymes of pharmaceutical interest: p38 MAP kinase, inosine monophosphate dehydrogenase, and HIV protease. An analysis of two different docking methods and thirteen scoring functions provides insights into which functions perform well, both singly and in combination. Our data shows that consensus scoring further provides a dramatic reduction in the number of false positives identified by individual scoring functions, thus leading to a significant enhancement in hit-rates.

• Evaluation of the FLEXX incremental construction algorithm for protein-ligand docking.
Kramer, B and Rarey, M and Lengauer, T
Proteins, 1999, 37(2), 228-241
PMID: 10584068

We report on a test of FLEXX, a fully automatic docking tool for flexible ligands, on a highly diverse data set of 200 protein-ligand complexes from the Protein Data Bank. In total 46.5% of the complexes of the data set can be reproduced by a FLEXX docking solution at rank 1 with an rms deviation (RMSD) from the observed structure of less than 2 A. This rate rises to 70% if one looks at the entire generated solution set. FLEXX produces reliable results for ligands with up to 15 components which can be docked in 80% of the cases with acceptable accuracy. Ligands with more than 15 components tend to generate wrong solutions more often. The average runtime of FLEXX on this test set is 93 seconds per complex on a SUN Ultra-30 workstation. In addition, we report on "cross-docking" experiments, in which several receptor structures of complexes with identical proteins have been used for docking all cocrystallized ligands of these complexes. In most cases, these experiments show that FLEXX can acceptably dock a ligand into a foreign receptor structure. Finally we report on screening runs of ligands out of a library with 556 entries against ten different proteins. In eight cases FLEXX is able to find the original inhibitor within the top 7% of the total library.

• Exhaustive docking of molecular fragments with electrostatic solvation.
Majeux, N and Scarsi, M and Apostolakis, J and Ehrhardt, C and Caflisch, A
Proteins, 1999, 37(1), 88-105
PMID: 10451553

A new method is presented for docking molecular fragments to a rigid protein with evaluation of the binding energy. Polar fragments are docked with at least one hydrogen bond with the protein while apolar fragments are positioned in the hydrophobic pockets. The electrostatic contribution to the binding energy, which consists of screened intermolecular energy and protein and fragment desolvation terms, is evaluated efficiently by a numerical approach based on the continuum dielectric approximation. The latter is also used to predetermine the hydrophobic pockets of the protein by rolling a low dielectric sphere over the protein surface and calculating the electrostatic desolvation of the protein and van der Waals interaction energy. The method was implemented in the program SEED (solvation energy for exhaustive docking). The SEED continuum electrostatic approach has been successfully validated by a comparison with finite difference solutions of the Poisson equation for more than 2,500 complexes of small molecules with thrombin and the monomer of HIV-1 aspartic proteinase. The fragments docked by SEED in the active site of thrombin reproduce the structural features of the interaction patterns between known inhibitors and thrombin. Moreover, the combinatorial connection of these fragments yields a number of compounds that are very similar to potent inhibitors of thrombin. Proteins 1999;37:88-105.

## 1998

• Surface solid angle-based site points for molecular docking.
Hendrix, D K and Kuntz, I D
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 1998, 317-326
PMID: 9697192

We are developing a new site descriptor for the DOCK molecular modeling program suite. Sphgen, the current site description program for the DOCK suite, describes the pockets of a macromolecule by filling a volume with intersecting spheres. DOCK then identifies possible ligand orientations in the pocket by overlapping the atoms of proposed ligands with the sphere centers. Sphgen limits use of the DOCK program to concave binding regions, but macromolecular binding regions can be solvent-exposed rather than buried pockets. We present a more general site descriptor, based on the surface solid angle, which generates site points by determining the solid angle of exposure for points on the surface of the molecule, then identifying patches of surface with similar solid angle values which are then built into site points. We find possible ligand orientations by matching shape-based site points on the ligand and protein and demanding complementary solid angle values. Orientations are evaluated using the DOCK's force field-based score, which evaluates the Coulombic and van der Waals energy. The surface solid angle descriptor displays the complementary characteristics of the interfaces of our test systems: trypsin/trypsin inhibitor, chymotrypsin/turkey ovomucoid third domain, and subtilisin/chymotrypsin inhibitor. The solid angle site points can be used by DOCK to generate orientations within 1.5 A r.m.s.d. of the crystal structure orientation.

• Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function
Morris, GM and Goodsell, DS and Halliday, RS
Journal of computational chemistry, 1998, 19(14), 1639-1662

A novel and robust automated docking method that predicts the bound conformations of flexible ligands to macromolecular targets has been developed and tested, in combination with a new scoring function that estimates the free energy change upon binding. Interestingly, this method applies a Lamarckian model of genetics, in which environmental adaptations of an individual's phenotype are reverse transcribed into its genotype and become heritable traits\v Z}sic.. We consider three search methods, Monte Carlo simulated annealing, a traditional genetic algorithm, and the Lamarckian genetic algorithm, and compare their performance in dockings of seven protein􏱌ligand test systems having known three-dimensional structure. We show that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AUTODOCK, and that the Lamarckian genetic algorithm is the most efficient, reliable, and successful of the three. The empirical free energy function was calibrated using a set of 30 structurally known protein􏱌ligand complexes with experimentally determined binding constants. Linear regression analysis of the observed binding constants in terms of a wide variety of structure-derived molecular properties was performed. The final model had a residual standard error of 9.11 kJ mol􏳡1\v Z}2.177 kcal mol􏳡1 . and was chosen as the new energy

• Screening a peptidyl database for potential ligands to proteins with side-chain flexibility.
Schnecke, V and Swanson, C A and Getzoff, E D and Tainer, J A and Kuhn, L A
Proteins, 1998, 33(1), 74-87
PMID: 9741846

The three key challenges addressed in our development of SPECITOPE, a tool for screening large structural databases for potential ligands to a protein, are to eliminate infeasible candidates early in the search, incorporate ligand and protein side-chain flexibility upon docking, and provide an appropriate rank for potential new ligands. The protein ligand-binding site is modeled by a shell of surface atoms and by hydrogen-bonding template points for the ligand to match, conferring specificity to the interaction. SPECITOPE combinatorially matches all hydrogen-bond donors and acceptors of the screened molecules to the template points. By eliminating molecules that cannot match distance or hydrogen-bond constraints, the transformation of potential docking candidates into the ligand-binding site and the shape and hydrophobic complementarity evaluations are only required for a small subset of the database. SPECITOPE screens 140,000 peptide fragments in about an hour and has identified and docked known inhibitors and potential new ligands to the free structures of four distinct targets: a serine protease, a DNA repair enzyme, an aspartic proteinase, and a glycosyltransferase. For all four, protein side-chain rotations were critical for successful docking, emphasizing the importance of inducible complementarity for accurately modeling ligand interactions. SPECITOPE has a range of potential applications for understanding and engineering protein recognition, from inhibitor and linker design to protein docking and macromolecular assembly.

• Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3D database search programs.
Böhm, H J
Journal of computer-aided molecular design, 1998, 12(4), 309-323
PMID: 9777490

A dataset of 82 protein-ligand complexes of known 3D structure and binding constant Ki was analysed to elucidate the important factors that determine the strength of protein-ligand interactions. The following parameters were investigated: the number and geometry of hydrogen bonds and ionic interactions between the protein and the ligand, the size of the lipophilic contact surface, the flexibility of the ligand, the electrostatic potential in the binding site, water molecules in the binding site, cavities along the protein-ligand interface and specific interactions between aromatic rings. Based on these parameters, a new empirical scoring function is presented that estimates the free energy of binding for a protein-ligand complex of known 3D structure. The function distinguishes between buried and solvent accessible hydrogen bonds. It tolerates deviations in the hydrogen bond geometry of up to 0.25 A in the length and up to 30 degrees in the hydrogen bond angle without penalizing the score. The new energy function reproduces the binding constants (ranging from 3.7 x 10(-2) M to 1 x 10(-14) M, corresponding to binding energies between -8 and -80 kJ/mol) of the dataset with a standard deviation of 7.3 kJ/mol corresponding to 1.3 orders of magnitude in binding affinity. The function can be evaluated very fast and is therefore also suitable for the application in a 3D database search or de novo ligand design program such as LUDI. The physical significance of the individual contributions is discussed.

• Empirical scoring functions. II. The testing of an empirical scoring function for the prediction of ligand-receptor binding affinities and the use of Bayesian regression to improve the quality of the model.
Murray, C W and Auton, T R and Eldridge, M D
Journal of computer-aided molecular design, 1998, 12(5), 503-519
PMID: 9834910

This paper tests the performance of a simple empirical scoring function on a set of candidate designs produced by a de novo design package. The scoring function calculates approximate ligand-receptor binding affinities given a putative binding geometry. To our knowledge this is the first substantial test of an empirical scoring function of this type on a set of molecular designs which were then subsequently synthesised and assayed. The performance illustrates that the methods used to construct the scoring function and the reliance on plausible, yet potentially false, binding modes can lead to significant over-prediction of binding affinity in bad cases. This is anticipated on theoretical grounds and provides caveats on the reliance which can be placed when using the scoring function as a screen in the choice of molecular designs. To improve the predictability of the scoring function and to understand experimental results, it is important to perform subsequent Quantitative Structure-Activity Relationship (QSAR) studies. In this paper, Bayesian regression is performed to improve the predictability of the scoring function in the light of the assay results. Bayesian regression provides a rigorous mathematical framework for the incorporation of prior information, in this case information from the original training set, into a regression on the assay results of the candidate molecular designs. The results indicate that Bayesian regression is a useful and practical technique when relevant prior knowledge is available and that the constraints embodied in the prior information can be used to improve the robustness and accuracy of regression models. We believe this to be the first application of Bayesian regression to QSAR analysis in chemistry.

• An example of a protein ligand found by database mining: description of the docking method and its verification by a 2.3 A X-ray structure of a thrombin-ligand complex.
Burkhard, P and Taylor, P and Walkinshaw, M D
Journal of molecular biology, 1998, 277(2), 449-466
PMID: 9514757     doi: 10.1006/jmbi.1997.1608

A computer program (SANDOCK) has been developed for the automated docking of small ligands to a target protein. It uses a guided matching algorithm to fit ligand atoms into the protein binding pocket. The protein is described by a modified Lee-Richard's dotted surface with each dot coded by chemical property and accessibility. Orientations of the ligand in the active site are generated such that a chemical and a shape complementary between the ligand and the active site cavity have to be fulfilled. The generated fits are evaluated with scoring functions which account for van der Waals, hydrophobic and hydrogen bonding interactions. This newly developed docking program can efficiently screen very large databases in a reasonable time and has been used to successfully identify novel ligands. The X-ray structure of a thrombin-ligand complex predicted by SANDOCK is described. The ligand binds to thrombin with a Kd of 65 microM and has an rmsd of 0.7 A for all ligand atoms from the predicted binding mode by SANDOCK.

• Assessing search strategies for flexible docking
Vieth, M and Hirst, JD and Dominy, BN and Daigler, H and Brooks, CL
Journal of computational chemistry, 1998, 19(14), 1623-1631

We assess the efficiency of molecular dynamics (MD), Monte Carlo (MC), and genetic algorithms (GA) for docking five representative ligand-receptor complexes. All three algorithms employ a modified CHARMM-based energy function. The algorithms are also compared with an established docking algorithm, AutoDock. The receptors are kept rigid while flexibility of ligands is permitted. To test the efficiency of the algorithms, two search spaces are used: an 11-Angstrom-radius sphere and a 2.5-Angstrom-radius sphere, both centered on the active site. We find MD is most efficient in the case of the large search space, and GA outperforms the other methods in the small search space. We also find that MD provides structures that are, on average, lower in energy and closer to the crystallographic conformation. The GA obtains good solutions over the course of the fewest energy evaluations. However, due to the nature of the nonbonded interaction calculations, the GA requires the longest time for a single energy evaluation, which results in a decreased efficiency. The GA and MC search algorithms are implemented in the CHARMM macromolecular package. (C) 1998 John Wiley & Sons, Inc.

## 1997

• Molecular docking to ensembles of protein structures.
Knegtel, R M and Kuntz, I D and Oshiro, C M
Journal of molecular biology, 1997, 266(2), 424-440
PMID: 9047373     doi: 10.1006/jmbi.1996.0776

Until recently, applications of molecular docking assumed that the macromolecular receptor exists in a single, rigid conformation. However, structural studies involving different ligands bound to the same target biomolecule frequently reveal modest but significant conformational changes in the target. In this paper, two related methods for molecular docking are described that utilize information on conformational variability from ensembles of experimental receptor structures. One method combines the information into an "energy-weighted average" of the interaction energy between a ligand and each receptor structure. The other method performs the averaging on a structural level, producing a "geometry-weighted average" of the inter-molecular force field score used in DOCK 3.5. Both methods have been applied in docking small molecules to ensembles of crystal and solution structures, and we show that experimentally determined binding orientations and computed energies of known ligands can be reproduced accurately. The use of composite grids, when conformationally different protein structures are available, yields an improvement in computational speed for database searches in proportion to the number of structures.

• SMoG:  de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 2. Case Studies in Molecular Design
Robert S DeWitte and Alexey V Ishchenko, and and Shakhnovich, Eugene I
Journal of the American Chemical Society, 1997, 119(20), 4608-4617

In this paper, we summarize three ligand design studies performed using the program SMoG, which was developed in our lab. The aim of this presentation is to communicate through examples the potential of this method:  the richness of the molecules that can be developed and the ease with which they are found. In particular, we present suggestions for ligands to Src SH3 domain (specificity pocket and LP site) and CD4.

• Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes.
Eldridge, M D and Murray, C W and Auton, T R and Paolini, G V and Mee, R P
Journal of computer-aided molecular design, 1997, 11(5), 425-445
PMID: 9385547

This paper describes the development of a simple empirical scoring function designed to estimate the free energy of binding for a protein-ligand complex when the 3D structure of the complex is known or can be approximated. The function uses simple contact terms to estimate lipophilic and metal-ligand binding contributions, a simple explicit form for hydrogen bonds and a term which penalises flexibility. The coefficients of each term are obtained using a regression based on 82 ligand-receptor complexes for which the binding affinity is known. The function reproduces the binding affinity of the complexes with a cross-validated error of 8.68 kJ/mol. Tests on internal consistency indicate that the coefficients obtained are stable to changes in the composition of the training set. The function is also tested on two test sets containing a further 20 and 10 complexes, respectively. The deficiencies of this type of function are discussed and it is compared to approaches by other workers.

• QXP: Powerful, rapid computer algorithms for structure-based drug design
McMartin, C and Bohacek, RS
Journal of computer-aided molecular design, 1997, 11(4), 333-344
PMID: 9334900

New methods for docking, template fitting and building pseudo-receptors are described. Full conformational searches are carried out for flexible cyclic and acyclic molecules. QXP (quick explore) search algorithms are derived from the method of Monte Carlo perturbation with energy minimization in Cartesian space. An additional fast search step is introduced between the initial perturbation and energy minimization. The fast search produces approximate low-energy structures, which are likely to minimize to a low energy. For template fitting, QXP uses a superposition force field which automatically assigns short-range attractive forces to similar atoms in different molecules. The docking algorithms were evaluated using X-ray data for 12 protein-ligand complexes. The ligands had up to 24 rotatable bonds and ranged from highly polar to mostly nonpolar. Docking searches of the randomly disordered ligands gave rms differences between the lowest energy docked structure and the energy-minimized X-ray structure, of less than 0.76 Angstrom for 10 of the ligands. For all the ligands, the rms difference between the energy-minimized X-ray structure and the closest docked structure was less than 0.4 Angstrom, when parts of one of the molecules which are in the solvent were excluded from the rms calculation. Template fitting was tested using four ACE inhibitors. Three ACE templates have been previously published. A single run using QXP generated a series of templates which contained examples of each of the three. A pseudo-receptor, complementary to an ACE template, was built out of small molecules, such as pyrrole, cyclo-pentanone and propane. When individually energy minimized in the pseudo-receptor, each of the four ACE inhibitors moved with an rms of less than 0.25 Angstrom. After random perturbation, the inhibitors were docked into the pseudo-receptor. Each lowest energy docked structure matched the energy-minimized geometry with an rms of less than 0.08 Angstrom. Thus, the pseudo-receptor shows steric and chemical complementarity to all four molecules. The QXP program is reliable, easy to use and sufficiently rapid for routine application in structure-based drug design.

• Development and validation of a genetic algorithm for flexible docking.
Jones, G and Willett, P and Glen, R C and Leach, A R and Taylor, R
Journal of molecular biology, 1997, 267(3), 727-748
PMID: 9126849     doi: 10.1006/jmbi.1996.0897

Prediction of small molecule binding modes to macromolecules of known three-dimensional structure is a problem of paramount importance in rational drug design (the "docking" problem). We report the development and validation of the program GOLD (Genetic Optimisation for Ligand Docking). GOLD is an automated ligand docking program that uses a genetic algorithm to explore the full range of ligand conformational flexibility with partial flexibility of the protein, and satisfies the fundamental requirement that the ligand must displace loosely bound water on binding. Numerous enhancements and modifications have been applied to the original technique resulting in a substantial increase in the reliability and the applicability of the algorithm. The advanced algorithm has been tested on a dataset of 100 complexes extracted from the Brookhaven Protein DataBank. When used to dock the ligand back into the binding site, GOLD achieved a 71% success rate in identifying the experimental binding mode.

## 1996

• SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence
DeWitte, Robert S and Shakhnovich, Eugene I
Journal of the American Chemical Society, 1996, 118(47), 11733-11744
doi: 10.1021/ja960751u

In this paper, we present SMoG (Small Molecule Growth), a novel, straightforward method for de novo lead design and the evidence for its effectiveness. It is based on a simple model for ligand-protein interactions and a scoring that is directly related to the free energy through a knowledge-based potential. A large number of structures are examined by an efficient metropolis Monte Carlo molecular growth algorithm that generates molecules through the adjoining of functional groups directly in the binding region. Thus SMoG is a method that is able to rank a large number of potential compounds according to binding free energy in a short time. In this sense, SMoG represents a step toward an ideal computational tool for ligand design.

• VALIDATE: A New Method for the Receptor-Based Prediction of Binding Affinities of Novel Ligands
Head, Richard D and Smythe, Mark L and Oprea, Tudor I and Waller, Chris L and Green, Stuart M and Marshall, Garland R.
Journal of the American Chemical Society, 1996, 118(16), 3959-3969
doi: 10.1021/ja9539002

VALIDATE is a hybrid approach to predict the binding affinity of novel ligands for receptors of known three-dimensional structure. This approach calculates physicochemical properties of the ligand and the receptor- ligand complex to estimate the free energy of binding. The enthalpy of binding is calculated by molecular mechanics while properties such as complementary hydrophobic surface area are used to estimate the entropy of binding through heuristics. A diverse training set of 51 crystalline complexes was assembled, and their relevant physicochemical properties were computed. These properties were analyzed by partial least squares (PLS) statistics, or neural network analysis (SONNIC), to generate models for the general prediction of the affinity of ligands with receptors of known three-dimensional structure. The ability of the model to predict the affinity of novel complexes not included in the training set was demonstrated with three independent test sets: 14 complexes of known three-dimensional structure including 3 DNA complexes, a class of compound not included in the training set, 13 HIV protease inhibitors fit to HIV-1 protease, and 11 thermolysin inhibitors fit to thermolysin.

• Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities.
Jain, A N
Journal of computer-aided molecular design, 1996, 10(5), 427-440
PMID: 8951652

Exploitation of protein structures for potential drug leads by molecular docking is critically dependent on methods for scoring putative protein-ligand interactions. An ideal function for scoring must exhibit predictive accuracy and high computational speed, and must be tolerant of variations in the relative protein-ligand molecular alignment and conformation. This paper describes the development of an empirically derived scoring function, based on the binding affinities of protein-ligand complexes coupled with their crystallographically determined structures. The function's primary terms involve hydrophobic and polar complementarity, with additional terms for entropic and solvation effects. The issue of alignment/conformation dependence was solved by constructing a continuous differentiable nonlinear function with the requirement that maxima in ligand conformation/alignment space corresponded closely to crystallographically determined structures. The expected error in the predicted affinity based on cross-validation was 1.0 log unit. The function is sufficiently fast and accurate to serve as the objective function of a molecular-docking search engine. The function is particularly well suited to the docking problem, since it has spatially narrow maxima that are broadly accessible via gradient descent.

• A fast flexible docking method using an incremental construction algorithm.
Rarey, M and Kramer, B and Lengauer, T and Klebe, G
Journal of molecular biology, 1996, 261(3), 470-489
PMID: 8780787     doi: 10.1006/jmbi.1996.0477

We present an automatic method for docking organic ligands into protein binding sites. The method can be used in the design process of specific protein ligands. It combines an appropriate model of the physico-chemical properties of the docked molecules with efficient methods for sampling the conformational space of the ligand. If the ligand is flexible, it can adopt a large variety of different conformations. Each such minimum in conformational space presents a potential candidate for the conformation of the ligand in the complexed state. Our docking method samples the conformation space of the ligand on the basis of a discrete model and uses a tree-search technique for placing the ligand incrementally into the active site. For placing the first fragment of the ligand into the protein, we use hashing techniques adapted from computer vision. The incremental construction algorithm is based on a greedy strategy combined with efficient methods for overlap detection and for the search of new interactions. We present results on 19 complexes of which the binding geometry has been crystallographically determined. All considered ligands are docked in at most three minutes on a current workstation. The experimentally observed binding mode of the ligand is reproduced with 0.5 to 1.2 A rms deviation. It is almost always found among the highest-ranking conformations computed.

• Molecular docking using surface complementarity.
Sobolev, V and Wade, R C and Vriend, G and Edelman, M
Proteins, 1996, 25(1), 120-129
PMID: 8727324     doi: 10.1002/(SICI)1097-0134(199605)25:1{ <120::AID-PROT10{ >3.0.CO;2-M

A method is described to dock a ligand into a binding site in a protein on the basis of the complementarity of the intermolecular atomic contacts. Docking is performed by maximization of a complementarity function that is dependent on atomic contact surface area and the chemical properties of the contacting atoms. The generality and simplicity of the complementarity function ensure that a wide range of chemical structures can be handled. The ligand and the protein are treated as rigid bodies, but displacement of a small number of residues lining the ligand binding site can be taken into account. The method can assist in the design of improved ligands by indicating what changes in complementarity may occur as a result of the substitution of an atom in the ligand. The capabilities of the method are demonstrated by application to 14 protein-ligand complexes of known crystal structure.

• Hammerhead: Fast, fully automated docking of flexible ligands to protein binding sites
Welch, W and Ruppert, J and Jain, AN
Chemistry & Biology, 1996, 3(6), 449-462
PMID: 8807875

Background: Molecular docking seeks to predict the geometry and affinity of the binding of a small molecule to a given protein of known structure. Rigid docking has long been used to screen databases of small molecules, because docking techniques that account for ligand flexibility have either been too slow or have required significant human intervention, Here we describe a docking algorithm, Hammerhead, which is a fast, automated tool to screen for the binding of flexible molecules to protein binding sites.Results: We used Hammerhead to successfully dock a variety of positive control ligands into their cognate proteins. The empirically tuned scoring function of the algorithm predicted binding affinities within 1.3 log units of the known affinities for these ligands, Conformations and alignments close to those determined crystallographically received the highest scores. We screened 80 000 compounds for binding to streptavidin, and biotin was predicted as the top-scoring ligand, with other known ligands included among the highest-scoring dockings, The screen ran in a few days on commonly available hardware.Conclusions: Hammerhead is suitable for screening large databases of flexible molecules for binding to a protein of known structure. It correctly docks a variety of known flexible ligands, and it spends an average of only a few seconds on each compound during a screen. The approach is completely automated, from the elucidation of protein binding sites, through the docking of molecules, to the final selection of compounds for assay.

• Automated docking of flexible ligands: applications of AutoDock.
Goodsell, D S and Morris, G M and Olson, A J
Journal of molecular recognition : JMR, 1996, 9(1), 1-5
PMID: 8723313     doi: 10.1002/(SICI)1099-1352(199601)9:1{ <1::AID-JMR241{ >3.0.CO;2-6

AutoDock is a suite of C programs used to predict the bound conformations of a small, flexible ligand to a macromolecular target of known structure. The technique combines simulated annealing for conformation searching with a rapid grid-based method of energy evaluation. This paper reviews recent applications of the technique and describes the enhancements included in the current release.

• Distributed automated docking of flexible ligands to proteins: parallel applications of AutoDock 2.4.
Morris, G M and Goodsell, D S and Huey, R and Olson, A J
Journal of computer-aided molecular design, 1996, 10(4), 293-304
PMID: 8877701

AutoDock 2.4 predicts the bound conformations of a small, flexible ligand to a nonflexible macromolecular target of known structure. The technique combines simulated annealing for conformation searching with a rapid grid-based method of energy evaluation based on the AMBER force field. AutoDock has been optimized in performance without sacrificing accuracy; it incorporates many enhancements and additions, including an intuitive interface. We have developed a set of tools for launching and analyzing many independent docking jobs in parallel on a heterogeneous network of UNIX-based workstations. This paper describes the current release, and the results of a suite of diverse test systems. We also present the results of a systematic investigation into the effects of varying simulated-annealing parameters on molecular docking. We show that even for ligands with a large number of degrees of freedom, root-mean-square deviations of less than 1 A from the crystallographic conformation are obtained for the lowest-energy dockings, although fewer dockings find the crystallographic conformation when there are more degrees of freedom.

## 1995

• Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming.
Gehlhaar, D K and Verkhivker, G M and Rejto, P A and Sherman, C J and Fogel, D B and Fogel, L J and Freer, S T
Chemistry & Biology, 1995, 2(5), 317-324
PMID: 9383433

BACKGROUND:An important prerequisite for computational structure-based drug design is prediction of the structures of ligand-protein complexes that have not yet been experimentally determined by X-ray crystallography or NMR. For this task, docking of rigid ligands is inadequate because it assumes knowledge of the conformation of the bound ligand. Docking of flexible ligands would be desirable, but requires one to search an enormous conformational space. We set out to develop a strategy for flexible docking by combining a simple model of ligand-protein interactions for molecular recognition with an evolutionary programming search technique.

• Flexible Ligand Docking Without Parameter Adjustment Across 4 Ligand-Receptor Complexes
CLARK, KP andAJAY}
Journal of computational chemistry, 1995, 16(10), 1210-1226

Understanding molecular recognition is one of the fundamental problems in molecular biology. Computationally, molecular recognition is formulated as a docking problem. Ideally, a molecular docking algorithm should be computationally efficient, provide reasonably thorough search of conformational space, obtain solutions with reasonable consistency, and not require parameter adjustments. With these goals in mind, we developed DIVALI (Docking with eVolutionary Algorithms), a program which efficiently and reliably searches for the possible binding modes of a ligand within a fixed receptor. We use an AMBER-type potential function and search for good ligand conformations using a genetic algorithm (GA). We apply our system to study the docking of both rigid and flexible ligands in four different complexes. Our results indicate that it is possible to find diverse binding modes, including structures like the crystal structure, all with comparable potential function values. To achieve this, certain modifications to the standard GA recipe are essential. (C) 1995 by John Wiley & Sons, Inc.

## 1994

• Rational automatic search method for stable docking models of protein and ligand.
Mizutani, M Y and Tomioka, N and Itai, A
Journal of molecular biology, 1994, 243(2), 310-326
PMID: 7932757     doi: 10.1006/jmbi.1994.1656

An efficient automatic method has been developed for docking a ligand molecule to a protein molecule. The method can construct energetically favorable docking models, considering specific interactions between the two molecules and conformational flexibility in the ligand. In the first stage of docking, likely binding modes are searched and estimated effectively in terms of hydrogen bonds, together with conformations in part of the ligand structure that includes hydrogen bonding groups. After that part is placed in the protein cavity and is optimized, conformations in the remaining part are also examined systematically. Finally, several stable docking models are obtained after optimization of the position, orientation and conformation of the whole ligand molecule. In all the screening processes, the total potential energy including intra- and intermolecular interaction energy, consisting of van der Waals, electrostatic and hydrogen bonding energies, is used as the index. The characteristics of our docking method are high accuracy of the results, fully automatic generation of models and short computational time. The efficiency of the method was confirmed by four docking trials using two enzyme systems. In two attempts to dock methotrexate to dihydrofolate reductase and 2'-GMP to ribonuclease T1, the exact structures of complexes in crystals were reproduced as the most stable docking models, without any assumptions concerning the binding modes and ligand conformations. The most stable docking models of dihydrofolate and trimethoprim, respectively, to dihydrofolate reductase were also in good agreement with those suggested by experiment. In all test cases, it was shown that our method can accurately predict the correct docking structures, discriminating the correct model from incorrect ones. The efficiency of our method was further tested from the viewpoint of ability to predict the relative stability of the docking structures of two triazine derivatives to dihydrofolate reductase. Our docking method provides a useful tool for rational drug design and investigations of biochemical reaction mechanisms.

• FLOG: a system to select 'quasi-flexible' ligands complementary to a receptor of known three-dimensional structure.
Miller, M D and Kearsley, S K and Underwood, D J and Sheridan, R P
Journal of computer-aided molecular design, 1994, 8(2), 153-174
PMID: 8064332

We present a system, FLOG (Flexible Ligands Oriented on Grid), that searches a database of 3D coordinates to find molecules complementary to a macromolecular receptor of known 3D structure. The philosophy of FLOG is similar to that reported for DOCK [Shoichet, B.K. et al., J. Comput. Chem., 13 (1992) 380]. In common with that system, we use a match center representation of the volume of the binding cavity and we use a clique-finding algorithm to generate trial orientations of each candidate ligand in the binding site. Also we use a grid representation of the receptor to assess the fit of each orientation. We have introduced a number of novel features within this paradigm. First, we address ligand flexibility by including up to 25 explicit conformations of each structure in our databases. Nonhydrogen atoms in each database entry are assigned one of seven atom types (anion, cation, donor, acceptor, polar, hydrophobic and other) based on their local bonded chemical environments. Second, we have devised a new grid-based scoring function compatible with this 'heavy atom' representation of the ligands. This includes several potentials (electrostatic, hydrogen bonding, hydrophobic and van der Waals) calculated from the location of the receptor atoms. Third, we have improved the fitting stage of the search. Initial dockings are generated with a more efficient clique-finding algorithm. This new algorithm includes the concept of 'essential points', match centers that must be paired with a ligand atom. Also, we introduce the use of a rapid simplex-based rigid-body optimizer to refine the orientations. We demonstrate, using dihydrofolate reductase as a sample receptor, that the FLOG system can select known inhibitors from a large database of drug-like compounds.

• Icm - a New Method for Protein Modeling and Design - Applications to Docking and Structure Prediction From the Distorted Native Conformation
ABAGYAN, R and TOTROV, M and KUZNETSOV, D
Journal of computational chemistry, 1994, 15(5), 488-506

An efficient methodology, further referred to as ICM, for versatile modeling operations and global energy optimization on arbitrarily fixed multimolecular systems is described. It is aimed at protein structure prediction, homology modeling, molecular docking, nuclear magnetic resonance (NMR) structure determination, and protein design. The method uses and further develops a previously introduced approach to model biomolecular structures in which bond lengths, bond angles, and torsion angles are considered as independent variables, any subset of them being fixed. Here we simplify and generalize the basic description of the system, introduce the variable dihedral phase angle, and allow arbitrary connections of the molecules and conventional definition of the torsion angles. Algorithms for calculation of energy derivatives with respect to internal variables in the topological tree of the system and for rapid evaluation of accessible surface are presented. Multidimensional variable restraints are proposed to represent the statistical information about the torsion angle distributions in proteins. To incorporate complex energy terms as solvation energy and electrostatics into a structure prediction procedure, a ''double-energy'' Monte Carlo minimization procedure in which these terms are omitted during the minimization stage of the random step and included for the comparison with the previous conformation in a Markov chain is proposed and justified. The ICM method is applied successfully to a molecular docking problem. The procedure finds the correct parallel arrangement of two rigid helixes from a leucine zipper domain as the lowest-energy conformation (0.5 Angstrom root mean square, rms, deviation from the native structure) starting from completely random configuration. Structures with antiparallel helixes or helixes staggered by one helix turn had energies higher by about 7 or 9 kcal/mol, respectively. Soft docking was also attempted. A docking procedure allowing side-chain flexibility also converged to the parallel configuration, starting from the helixes optimized individually. To justify an internal coordinate approach to the structure prediction as opposed to a Cartesian one, energy hypersurfaces around the native structure of the squash seeds trypsin inhibitor were studied. Torsion angle minimization from the optimal conformation randomly distorted up to the rms deviation of 2.2 Angstrom or angular rms deviation of 10 degrees restored the native conformation in most cases. In contrast, Cartesian coordinate minimization did not reach the minimum from deviations as small as 0.3 Angstrom or 2 degrees. We conclude that the most promising detailed approach to the protein-folding problem would consist of some coarse global sampling strategy combined with the local energy minimization in the torsion coordinate space. (C) 1994 by John Wiley & Sons, Inc.

## 1992

• Automated docking with grid‐based energy evaluation
Meng, EC and Shoichet, BK
Journal of computational chemistry, 1992, 13(4), 505-524

The ability to generate feasible binding orientations of a small molecule within a site of known structure is important for ligand design. We present a method that combines a rapid, geometric docking algorithm with the evaluation of molecular mechanics interaction energies.The computational costs of evaluation are minimal because we precalculate the receptor-dependent terms in the potential function at points on a three- dimensional grid. In four test cases where the components of crystallographically determined complexes are redocked, the force field' score correctly identifies the family of orientations closest to the experimental binding geometry. Scoring functions that consider only steric factors or only electrostatic factors are less successful. The force field function will play an important role in our efforts to search databases for potential lead compounds.

• A multiple-start Monte Carlo docking method.
Hart, T N and Read, R J
Proteins, 1992, 13(3), 206-222
PMID: 1603810     doi: 10.1002/prot.340130304

We present a method to search for possible binding modes of molecular fragments at a specific site of a potential drug target of known structure. Our method is based on a Monte Carlo (MC) algorithm applied to the translational and rotational degrees of freedom of the probe fragment. Starting from a randomly generated initial configuration, favorable binding modes are generated using a two-step process. An MC run is first performed in which the energy in the Metropolis algorithm is substituted by a score function that measures the average distance of the probe to the target surface. This has the effect of making buried probes move toward the target surface and also allows enhanced sampling of deep pockets. In a second MC run, a pairwise atom potential function is used, and the temperature parameter is slowly lowered during the run (Simulated Annealing). We repeat this procedure starting from a large number of different randomly generated initial configurations in order to find all energetically favorable docking modes in a specified region around the target. We test this method using two inhibitor-receptor systems: Streptomyces griseus proteinase B in complex with the third domain of the ovomucoid inhibitor from turkey, and dihydrofolate reductase from E. coli in complex with methotrexate. The method could consistently reproduce the complex found in the crystal structure searching from random initial positions in cubes ranging from 25 A to 50 A about the binding site. In the case of SGPB, we were also successful in docking to the native structure. In addition, we were successful in docking small probes in a search that included the entire protein surface.

## 1991

• Functionality maps of binding sites: A multiple copy simultaneous search method
Miranker, Andrew and Karplus, Martin
Proteins, 1991, 11(1), 29-34
PMID: 1961699     doi: 10.1002/prot.340110104

A new method is proposed for determining energetically favorable positions and orientations for functional groups on the surface of proteins with known three-dimensional structure. From 1,000 to 5,000 copies of a functional group are randomly placed in the site and subjected to simultaneous energy minimization and/or quenched molecular dynamics. The resulting functionality maps of a protein receptor site, which can take account of its flexibility, can be used for the analysis of protein ligand interactions and rational drug design. Application of the method to the sialic acid binding site of the influenza coat protein, hemagglutinin, yields functional group minima that correspond with those of the ligand in a cocrystal structure.

## 1982

• A Network Approach for Computational Drug Repositioning
Li, Jiao and Lu, Zhiyong
Journal of molecular biology, 2012, 161(2), 83-83
PMID: 7154081     doi: 10.1109/HISB.2012.26

Computational drug repositioning offers promise for discovering new uses of existing drugs, as drug related molecular, chemical, and clinical information has increased over the past decade and become broadly accessible. In this study, we present a new computational approach for identifying potential new indications of an existing drug through its relation to similar drugs in disease-drug-target network. When measuring drug pairwise similarly, we used a bipartite-graph based method which combined similarity of drug compound structures, similarity of target protein profiles, and interaction between target proteins. In evaluation, our method compared favorably to the state of the art, achieving AUC of 0.888. The results indicated that our method is able to identify drug repositioning opportunities by exploring complex relationships in disease-drug-target network.