Bibliography of computer-aided Drug Design

Updated on 7/18/2014. Currently 2130 references

Screening / All

2014 / 2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2007 / 2006 / 2005 / 2004 / 2003 / 2002 / 2001 / 1998 /


  • The optimization of running time for a maximum common substructure-based algorithm and its application in drug design.
    Chen, Jian and Sheng, Jia and Lv, Dijing and Zhong, Yang and Zhang, Guoqing and Nan, Peng
    Computational biology and chemistry, 2014, 48, 14-20
    PMID: 24291488     doi: 10.1016/j.compbiolchem.2013.10.003
    In the field of drug discovery, it is particularly important to discover bioactive compounds through high-throughput virtual screening. The maximum common substructure-based (MCS) algorithm is a promising method for the virtual screening of drug candidates. However, in practical applications, there is always a trade-off between efficiency and accuracy. In this paper, we optimized this method by running time evaluation using essential drugs defined by WHO and FDA-approved small-molecule drugs. The amount of running time allocated to the MCS-based virtual screening was varied, and statistical analysis was conducted to study the impact of computation running time on the screening results. It was determined that the running time efficiency can be improved without compromising accuracy by setting proper running time thresholds. In addition, the similarity of compound structures and its relevance to biological activity are analyzed quantitatively, which highlight the applicability of the MCS-based methods in predicting functions of small molecules. 15-30s was established as a reasonable range for selecting a candidate running time threshold. The effect of CPU speed is considered and the conclusion is generalized. The potential biological activity of small molecules with unknown functions can be predicted by the MCS-based methods.

  • HTS navigator: freely accessible cheminformatics software for analyzing high-throughput screening data.
    Fourches, Denis and Sassano, Maria F and Roth, Bryan L and Tropsha, Alexander
    Bioinformatics (Oxford, England), 2014, 30(4), 588-589
    PMID: 24376084     doi: 10.1093/bioinformatics/btt718
    SUMMARY:We report on the development of the high-throughput screening (HTS) Navigator software to analyze and visualize the results of HTS of chemical libraries. The HTS Navigator processes output files from different plate readers' formats, computes the overall HTS matrix, automatically detects hits and has different types of baseline navigation and correction features. The software incorporates advanced cheminformatics capabilities such as chemical structure storage and visualization, fast similarity search and chemical neighborhood analysis for retrieved hits. The software is freely available for academic laboratories.

  • Combining in silico and in cerebro approaches for virtual screening and pose prediction in SAMPL4.
    Voet, Arnout R D and Kumar, Ashutosh and Berenger, Francois and Zhang, Kam Y J
    Journal of computer-aided molecular design, 2014
    PMID: 24446075     doi: 10.1007/s10822-013-9702-2
    The SAMPL challenges provide an ideal opportunity for unbiased evaluation and comparison of different approaches used in computational drug design. During the fourth round of this SAMPL challenge, we participated in the virtual screening and binding pose prediction on inhibitors targeting the HIV-1 integrase enzyme. For virtual screening, we used well known and widely used in silico methods combined with personal in cerebro insights and experience. Regular docking only performed slightly better than random selection, but the performance was significantly improved upon incorporation of additional filters based on pharmacophore queries and electrostatic similarities. The best performance was achieved when logical selection was added. For the pose prediction, we utilized a similar consensus approach that amalgamated the results of the Glide-XP docking with structural knowledge and rescoring. The pose prediction results revealed that docking displayed reasonable performance in predicting the binding poses. However, prediction performance can be improved utilizing scientific experience and rescoring approaches. In both the virtual screening and pose prediction challenges, the top performance was achieved by our approaches. Here we describe the methods and strategies used in our approaches and discuss the rationale of their performances.

  • DiSCuS: an open platform for (not only) virtual screening results management.
    Wójcikowski, Maciej and Zielenkiewicz, Piotr and Siedlecki, Pawel
    Journal of chemical information and modeling, 2014, 54(1), 347-354
    PMID: 24364790     doi: 10.1021/ci400587f
    DiSCuS, a "Database System for Compound Selection", has been developed. The primary goal of DiSCuS is to aid researchers in the steps subsequent to generating high-throughput virtual screening (HTVS) results, such as selection of compounds for further study, purchase, or synthesis. To do so, DiSCuS provides (1) a storage facility for ligand-receptor complexes (generated with external programs), (2) a number of tools for validating these complexes, such as scoring functions, potential energy contributions, and med-chem features with ligand similarity estimates, and (3) powerful searching and filtering options with logical operators. DiSCuS supports multiple receptor targets for a single ligand, so it can be used either to evaluate different variants of an active site or for selectivity studies. DiSCuS documentation, installation instructions, and source code can be found at .

  • SABRE: ligand/structure-based virtual screening approach using consensus molecular-shape pattern recognition.
    Wei, Ning-Ning and Hamza, Adel
    Journal of chemical information and modeling, 2014, 54(1), 338-346
    PMID: 24328054     doi: 10.1021/ci4005496
    We present an efficient and rational ligand/structure shape-based virtual screening approach combining our previous ligand shape-based similarity SABRE (shape-approach-based routines enhanced) and the 3D shape of the receptor binding site. Our approach exploits the pharmacological preferences of a number of known active ligands to take advantage of the structural diversities and chemical similarities, using a linear combination of weighted molecular shape density. Furthermore, the algorithm generates a consensus molecular-shape pattern recognition that is used to filter and place the candidate structure into the binding pocket. The descriptor pool used to construct the consensus molecular-shape pattern consists of four dimensional (4D) fingerprints generated from the distribution of conformer states available to a molecule and the 3D shapes of a set of active ligands computed using SABRE software. The virtual screening efficiency of SABRE was validated using the Database of Useful Decoys (DUD) and the filtered version (WOMBAT) of 10 DUD targets. The ligand/structure shape-based similarity SABRE algorithm outperforms several other widely used virtual screening methods which uses the data fusion of multiscreening tools (2D and 3D fingerprints) and demonstrates a superior early retrieval rate of active compounds (EF(0.1%)

  • istar: a web platform for large-scale protein-ligand docking.
    Li, Hongjian and Leung, Kwong-Sak and Ballester, Pedro J and Wong, Man-Hon
    PloS one, 2014, 9(1), e85678
    PMID: 24475049     doi: 10.1371/journal.pone.0085678
    Protein-ligand docking is a key computational method in the design of starting points for the drug discovery process. We are motivated by the desire to automate large-scale docking using our popular docking engine idock and thus have developed a publicly-accessible web platform called istar. Without tedious software installation, users can submit jobs using our website. Our istar website supports 1) filtering ligands by desired molecular properties and previewing the number of ligands to dock, 2) monitoring job progress in real time, and 3) visualizing ligand conformations and outputting free energy and ligand efficiency predicted by idock, binding affinity predicted by RF-Score, putative hydrogen bonds, and supplier information for easy purchase, three useful features commonly lacked on other online docking platforms like DOCK Blaster or iScreen. We have collected 17,224,424 ligands from the All Clean subset of the ZINC database, and revamped our docking engine idock to version 2.0, further improving docking speed and accuracy, and integrating RF-Score as an alternative rescoring function. To compare idock 2.0 with the state-of-the-art AutoDock Vina 1.1.2, we have carried out a rescoring benchmark and a redocking benchmark on the 2,897 and 343 protein-ligand complexes of PDBbind v2012 refined set and CSAR NRC HiQ Set 24Sept2010 respectively, and an execution time benchmark on 12 diverse proteins and 3,000 ligands of different molecular weight. Results show that, under various scenarios, idock achieves comparable success rates while outperforming AutoDock Vina in terms of docking speed by at least 8.69 times and at most 37.51 times. When evaluated on the PDBbind v2012 core set, our istar platform combining with RF-Score manages to reproduce Pearson's correlation coefficient and Spearman's correlation coefficient of as high as 0.855 and 0.859 respectively between the experimental binding affinity and the predicted binding affinity of the docked conformation. istar is freely available at


  • SMIfp (SMILES fingerprint) Chemical Space for Virtual Screening and Visualization of Large Databases of Organic Molecules
    Schwartz, Julian and Awale, Mahendra and Reymond, Jean-Louis
    Journal of chemical information and modeling, 2013, 53(8), 1979-1989
    PMID: 23845040     doi: 10.1021/ci400206h
    SMIfp (SMILES fingerprint) is defined here as a scalar fingerprint describing organic molecules by counting the occurrences of 34 different symbols in their SMILES strings, which creates a 34-dimensional chemical space. Ligand-based virtual screening using the city-block distance CBDSMIfp as similarity measure provides good AUC values and enrichment factors for recovering series of actives from the directory of useful decoys (DUD-E) and from ZINC. DrugBank, ChEMBL, ZINC, PubChem, GDB-11, GDB-13, and GDB-17 can be searched by CBDSMIfp using an online SMIfp-browser at . Visualization of the SMIfp chemical space was performed by principal component analysis and color-coded maps of the (PC1, PC2)-planes, with interactive access to the molecules enabled by the Java application SMIfp-MAPPLET available from . These maps spread molecules according to their fraction of aromatic atoms, size and polarity. SMIfp provides a new and relevant entry to explore the small molecule chemical space.

  • Comparing neural-network scoring functions and the state of the art: applications to common library screening.
    Durrant, Jacob D and Friedman, Aaron J and Rogers, Kathleen E and McCammon, J Andrew
    Journal of chemical information and modeling, 2013, 53(7), 1726-1735
    PMID: 23734946     doi: 10.1021/ci400042y
    We compare established docking programs, AutoDock Vina and Schrödinger's Glide, to the recently published NNScore scoring functions. As expected, the best protocol to use in a virtual-screening project is highly dependent on the target receptor being studied. However, the mean screening performance obtained when candidate ligands are docked with Vina and rescored with NNScore 1.0 is not statistically different than the mean performance obtained when docking and scoring with Glide. We further demonstrate that the Vina and NNScore docking scores both correlate with chemical properties like small-molecule size and polarizability. Compensating for these potential biases leads to improvements in virtual screen performance. Composite NNScore-based scoring functions suited to a specific receptor further improve performance. We are hopeful that the current study will prove useful for those interested in computer-aided drug design.

  • Pathway-based Screening Strategy for Multitarget Inhibitors of Diverse Proteins in Metabolic Pathways.
    Hsu, Kai-Cheng and Cheng, Wen-Chi and Chen, Yen-Fu and Wang, Wen-Ching and Yang, Jinn-Moon
    PLoS computational biology, 2013, 9(7), e1003127
    PMID: 23861662     doi: 10.1371/journal.pcbi.1003127
    Many virtual screening methods have been developed for identifying single-target inhibitors based on the strategy of "one-disease, one-target, one-drug". The hit rates of these methods are often low because they cannot capture the features that play key roles in the biological functions of the target protein. Furthermore, single-target inhibitors are often susceptible to drug resistance and are ineffective for complex diseases such as cancers. Therefore, a new strategy is required for enriching the hit rate and identifying multitarget inhibitors. To address these issues, we propose the pathway-based screening strategy (called PathSiMMap) to derive binding mechanisms for increasing the hit rate and discovering multitarget inhibitors using site-moiety maps. This strategy simultaneously screens multiple target proteins in the same pathway; these proteins bind intermediates with common substructures. These proteins possess similar conserved binding environments (pathway anchors) when the product of one protein is the substrate of the next protein in the pathway despite their low sequence identity and structure similarity. We successfully discovered two multitarget inhibitors with IC50 of <10 µM for shikimate dehydrogenase and shikimate kinase in the shikimate pathway of Helicobacter pylori. Furthermore, we found two selective inhibitors (IC50 of <10 µM) for shikimate dehydrogenase using the specific anchors derived by our method. Our experimental results reveal that this strategy can enhance the hit rates and the pathway anchors are highly conserved and important for biological functions. We believe that our strategy provides a great value for elucidating protein binding mechanisms and discovering multitarget inhibitors.

  • The future of virtual compound screening.
    Heikamp, Kathrin and Bajorath, Jürgen
    Chemical biology & drug design, 2013, 81(1), 33-40
    PMID: 23253129     doi: 10.1111/cbdd.12054
    We provide a future perspective of the virtual screening field. A number of challenges will be highlighted that virtual screening will likely face when compound data will further grow at or beyond current rates and when much more target information will become available. These challenges go beyond computational efficiency issues (that will of course also play a critical role). For example, for structure-based approaches, the accuracy of scoring functions and energy calculations will need to be improved. For ligand-based approaches, the compound class-dependence of similarity methods needs to be further explored and relationships between molecular similarity and activity similarity need to be established. We also comment on the current and future value of virtual screening. Opportunities for further development in a postgenome era are also discussed. It is hoped that some of the views and hypotheses we articulate might stimulate further discussion about the virtual screening field going forward.

  • Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening.
    Heikamp, Kathrin and Bajorath, Jürgen
    Journal of chemical information and modeling, 2013, 53(7), 1595-1601
    PMID: 23799269     doi: 10.1021/ci4002712
    The choice of negative training data for machine learning is a little explored issue in chemoinformatics. In this study, the influence of alternative sets of negative training data and different background databases on support vector machine (SVM) modeling and virtual screening has been investigated. Target-directed SVM models have been derived on the basis of differently composed training sets containing confirmed inactive molecules or randomly selected database compounds as negative training instances. These models were then applied to search background databases consisting of biological screening data or randomly assembled compounds for available hits. Negative training data were found to systematically influence compound recall in virtual screening. In addition, different background databases had a strong influence on the search results. Our findings also indicated that typical benchmark settings lead to an overestimation of SVM-based virtual screening performance compared to search conditions that are more relevant for practical applications.

  • DockoMatic 2.0: High Throughput Inverse Virtual Screening and Homology Modeling
    Bullock, Casey and Cornia, Nic and Jacob, Reed and Remm, Andrew and Peavey, Thomas and Weekes, Ken and Mallory, Chris and Oxford, Julia T and McDougal, Owen M. and Andersen, Timothy L
    Journal of chemical information and modeling, 2013, 53(8), 2161-2170
    PMID: 23808933     doi: 10.1021/ci400047w
    DockoMatic is a free and open source application that unifies a suite of software programs within a user-friendly graphical user interface (GUI) to facilitate molecular docking experiments. Here we describe the release of DockoMatic 2.0; significant software advances include the ability to (1) conduct high throughput inverse virtual screening (IVS); (2) construct 3D homology models; and (3) customize the user interface. Users can now efficiently setup, start, and manage IVS experiments through the DockoMatic GUI by specifying receptor(s), ligand(s), grid parameter file(s), and docking engine (either AutoDock or AutoDock Vina). DockoMatic automatically generates the needed experiment input files and output directories and allows the user to manage and monitor job progress. Upon job completion, a summary of results is generated by Dockomatic to facilitate interpretation by the user. DockoMatic functionality has also been expanded to facilitate the construction of 3D protein homology models using the Timely Integrated Modeler (TIM) wizard. The wizard TIM provides an interface that accesses the basic local alignment search tool (BLAST) and MODELER programs and guides the user through the necessary steps to easily and efficiently create 3D homology models for biomacromolecular structures. The DockoMatic GUI can be customized by the user, and the software design makes it relatively easy to integrate additional docking engines, scoring functions, or third party programs. DockoMatic is a free comprehensive molecular docking software program for all levels of scientists in both research and education.

  • Broad Coverage of Commercially Available Lead-like Screening Space with Fewer than 350,000 Compounds
    Baell, Jonathan B
    Journal of chemical information and modeling, 2013, 53(1), 39-55
    PMID: 23198812    
    In establishing what we propose is the globe's highest quality collection of available screening compounds, it is convincingly shown that the globe's pool of such compounds is extremely shallow and can be represented by fewer than 350,000 compounds. To support our argument, we discuss and fully disclose our extensive battery of functional group filters. We discuss the use of PAINS filters and also show the effect of similarity exclusion on structure-activity relationships. We show why limited analogue representation requires screening at higher concentrations to capture hit classes for difficult targets that otherwise may be prosecuted unsuccessfully. We construct our arguments in a structurally focused manner to be most useful to medicinal chemists, the key players in drug discovery.

  • Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 - A Public Library of Challenging Docking Benchmark Sets.
    Bauer, Matthias R and Ibrahim, Tamer M and Vogel, Simon M and Boeckler, Frank M
    Journal of chemical information and modeling, 2013, 53(6), 1447-1462
    PMID: 23705874     doi: 10.1021/ci400115b
    The application of molecular benchmarking sets helps to assess the actual performance of virtual screening (VS) workflows. To improve the efficiency of structure-based VS approaches, the selection and optimization of various parameters can be guided by benchmarking. With the DEKOIS 2.0 library, we aim to further extend and complement the collection of publicly available decoy sets. Based on BindingDB bioactivity data, we provide 81 new and structurally diverse benchmark sets for a wide variety of different target classes. To ensure a meaningful selection of ligands, we address several issues that can be found in bioactivity data. We have improved our previously introduced DEKOIS methodology with enhanced physicochemical matching, now including the consideration of molecular charges, as well as a more sophisticated elimination of latent actives in the decoy set (LADS). We evaluate the docking performance of Glide, GOLD, and AutoDock Vina with our data sets and highlight existing challenges for VS tools. All DEKOIS 2.0 benchmark sets will be made accessible at .

  • HitPick: a web server for hit identification and target prediction of chemical screenings
    Liu, X and Vogt, I and Haque, T and Campillos, M
    Bioinformatics (Oxford, England), 2013, 29(15), 1910-1912
    PMID: 23716196     doi: 10.1093/bioinformatics/btt303
    MOTIVATION: High-throughput phenotypic assays reveal information about the molecules that modulate biological processes, such as a disease phenotype and a signaling pathway. In these assays, the identification of hits along with their molecular targets is critical to understand the chemical activities modulating the biological system. Here, we present HitPick, a web server for identification of hits in high-throughput chemical screenings and prediction of their molecular targets. HitPick applies the B-score method for hit identification and a newly developed approach combining 1-nearest-neighbor (1NN) similarity searching and Laplacian-modified naïve Bayesian target models to predict targets of identified hits. The performance of the HitPick web server is presented and discussed. AVAILABILITY: The server can be accessed at CONTACT:

  • Hit Identification and Optimization in Virtual Screening: Practical Recommendations Based on a Critical Literature Analysis
    Zhu, Tian and Cao, Shuyi and Su, Pin-Chih and Patel, Ram and Shah, Darshan and Chokshi, Heta B and Szukala, Richard and Johnson, Michael E and Hevener, Kirk E
    Journal of medicinal chemistry, 2013, 56(17), 6560-6572
    PMID: 23688234     doi: 10.1021/jm301916b
    A critical analysis of virtual screening results published between 2007 and 2011 was performed. The activity of reported hit compounds from over 400 studies was compared to their hit identification criteria. Hit rates and ligand efficiencies were calculated to assist in these analyses, and the results were compared with factors such as the size of the virtual library and the number of compounds tested. A series of promiscuity, druglike, and ADMET filters were applied to the reported hits to assess the quality of compounds reported, and a careful analysis of a subset of the studies that presented hit optimization was performed. These data allowed us to make several practical recommendations with respect to selection of compounds for experimental testing, definition of hit identification criteria, and general virtual screening hit criteria to allow for realistic hit optimization. A key recommendation is the use of size-targeted ligand efficiency values as hit identification criteria.

  • Ligand-Optimized Homology Models of D1 and D2 Dopamine Receptors: Application for Virtual Screening
    Kolaczkowski, Marcin and Bucki, Adam and Feder, Marcin and Pawlowski, Maciej
    Journal of chemical information and modeling, 2013, 53(3), 638-648
    PMID: 23398329    
    Recent breakthroughs in crystallographic studies of G protein-coupled receptors (GPCRs), together with continuous progress in molecular modeling methods, have opened new perspectives for structure-based drug discovery. A crucial enhancement in this area was development of induced fit docking procedures that allow optimization of binding pocket conformation guided by the features of its active ligands. In the course of our research program aimed at discovery of novel antipsychotic agents, our attention focused on dopaminergic D2 and D1 receptors (D2R and D1R). Thus we decided to investigate whether the availability of a novel structure of the closely-related D3 receptor and application of induced fit docking procedures for binding pocket refinement would permit the building of models of D2R and D1R that facilitate a successful virtual screening (VS). Here, we provide an in-depth description of the modeling procedure and the discussion of the results of a VS benchmark we performed to compare efficiency of the ligand-optimized receptors in comparison with the regular homology models. We observed that application of the ligand-optimized models significantly improved the VS performance both in terms of BEDROC (0.325 vs. 0.182 for D1R and 0.383 vs. 0.301 for D2R) as well as EF1% (17 vs. 11 for D1R and 18 vs. 7.1 for D2R). In contrast, no improvement was observed for the performance of a D2R model built on the D3R template, when compared with that derived from the structure of the previously published and more evolutionary distant $\beta$2 adrenergic receptor. The comparison of results for receptors built according to various protocols and templates revealed that the most significant factor for the receptor performance was a proper selection of "tool ligand" used in induced fit docking procedure. Taken together, our results suggest that the described homology modeling procedure could be a viable tool for structure-based GPCR ligand design, even for the targets for which only a relatively distant structural template is available.

  • Are predicted protein structures of any value for binding site prediction and virtual ligand screening?
    Skolnick, Jeffrey and Zhou, Hongyi and Gao, Mu
    Current Opinion in Structural Biology VL -, 2013(0 SP - EP - PY - T2 -)
    PMID: 23415854     doi: 10.1016/
    The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ∼75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.

  • Scaffold-Focused Virtual Screening: Prospective Application to the Discovery of TTK Inhibitors.
    Langdon, Sarah R and Westwood, Isaac M and van Montfort, Rob L M and Brown, Nathan and Blagg, Julian
    Journal of chemical information and modeling, 2013, 53(5), 1100-1112
    PMID: 23672464     doi: 10.1021/ci400100c
    We describe and apply a scaffold-focused virtual screen based upon scaffold trees to the mitotic kinase TTK (MPS1). Using level 1 of the scaffold tree, we perform both 2D and 3D similarity searches between a query scaffold and a level 1 scaffold library derived from a 2 million compound library; 98 compounds from 27 unique top-ranked level 1 scaffolds are selected for biochemical screening. We show that this scaffold-focused virtual screen prospectively identifies eight confirmed active compounds that are structurally differentiated from the query compound. In comparison, 100 compounds were selected for biochemical screening using a virtual screen based upon whole molecule similarity resulting in 12 confirmed active compounds that are structurally similar to the query compound. We elucidated the binding mode for four of the eight confirmed scaffold hops to TTK by determining their protein-ligand crystal structures; each represents a ligand-efficient scaffold for inhibitor design.

  • Cheminformatics aspects of high throughput screening: from robots to models: symposium summary
    Jane Tseng, Y and Martin, Eric and Bologa, Cristian and Shelat, AnangA
    Journal of computer-aided molecular design, 2013, 27(5), 443-453
    PMID: 23636795     doi: 10.1007/s10822-013-9646-6
    The "Cheminformatics aspects of high throughput screening (HTS): from robots to models" symposium was part of the computers in chemistry technical program at the American Chemical Society National Meeting in Denver, Colorado during the fall of 2011. This symposium brought together researchers from high throughput screening centers and molecular modelers from academia and industry to discuss the integration of currently available high throughput screening data and assays with computational analysis. The topics discussed at this symposium covered the data-infrastructure at various academic, hospital, and National Institutes of Health-funded high throughput screening centers, the cheminformatics and molecular modeling methods used in real world examples to guide screening and hit-finding, and how academic and non-profit organizations can benefit from current high throughput screening cheminformatics resources. Specifically, this article also covers the remarks and discussions in the open panel discussion of the symposium and summarizes the following talks on "Accurate Kinase virtual screening: biochemical, cellular and selectivity", "Selective, privileged and promiscuous chemical patterns in high-throughput screening" and "Visualizing and exploring relationships among HTS hits using network graphs".

  • Docking-Based Virtual Screening of Covalently Binding Ligands: An Orthogonal Lead Discovery Approach
    Schröder, Jörg and Klinger, Anette and Oellien, Frank and Marhofer, Richard J and Duszenko, Michael and Selzer, Paul M
    Journal of medicinal chemistry, 2013, 56(4), 1478-1490
    PMID: 23350811    
    In pharmaceutical industry, lead discovery strategies and screening collections have been predominantly tailored to discover compounds that modulate target proteins through noncovalent interactions. Conversely, covalent linkage formation is an important mechanism for a quantity of successful drugs in the market, which are discovered in most cases by hindsight instead of systematical design. In this article, the implementation of a docking-based virtual screening workflow for the retrieval of covalent binders is presented considering human cathepsin K as a test case. By use of the docking conditions that led to the best enrichment of known actives, 44 candidate compounds with unknown activity on cathepsin K were finally selected for experimental evaluation. The most potent inhibitor, 4-(N-phenylanilino)-6-pyrrolidin-1-yl-1,3,5-triazine-2-carbonitrile (CP243522), showed a K(i) of 21 nM and was confirmed to have a covalent reversible mechanism of inhibition. The presented approach will have great potential in cases where covalent inhibition is the desired drug discovery strategy.

  • Multiple structures for virtual ligand screening: defining binding site properties-based criteria to optimize the selection of the query.
    Ben Nasr, Nesrine and Guillemain, Hélène and Lagarde, Nathalie and Zagury, Jean-François and Montes, Matthieu
    Journal of chemical information and modeling, 2013, 53(2), 293-311
    PMID: 23312043    
    Virtual ligand screening is an integral part of the modern drug discovery process. Traditional ligand-based, virtual screening approaches are fast but require a set of structurally diverse ligands known to bind to the target. Traditional structure-based approaches require high-resolution target protein structures and are computationally demanding. In contrast, the recently developed threading/structure-based FINDSITE-based approaches have the advantage that they are as fast as traditional ligand-based approaches and yet overcome the limitations of traditional ligand- or structure-based approaches. These new methods can use predicted low-resolution structures and infer the likelihood of a ligand binding to a target by utilizing ligand information excised from the target's remote or close homologous proteins and/or libraries of ligand binding databases. Here, we develop an improved version of FINDSITE, FINDSITEfilt, that filters out false positive ligands in threading identified templates by a better binding site detection procedure that includes information about the binding site amino acid similarity. We then combine FINDSITEfilt with FINDSITEX that uses publicly available binding databases ChEMBL and DrugBank for virtual ligand screening. The combined approach, FINDSITEcomb, is compared to two traditional docking methods, AUTODOCK Vina and DOCK 6, on the DUD benchmark set. It is shown to be significantly better in terms of enrichment factor, dependence on target structure quality, and speed. FINDSITEcomb is then tested for virtual ligand screening on a large set of 3576 generic targets from the DrugBank database as well as a set of 168 Human GPCRs. Excluding close homologues, FINDSITEcomb gives an average enrichment factor of 52.1 for generic targets and 22.3 for GPCRs within the top 1% of the screened compound library. Around 65% of the targets have better than random enrichment factors. The performance is insensitive to target structure quality, as long as it has a TM-score ≥ 0.4 to native. Thus, FINDSITEcomb makes the screening of millions of compounds across entire proteomes feasible. The FINDSITEcomb web service is freely available for academic users at

  • Discovery of Novel Acetohydroxyacid Synthase Inhibitors as Active Agents against Mycobacterium tuberculosis by Virtual Screening and Bioassay
    Wang, Di and Zhu, Xuelian and Cui, Changjun and Dong, Mei and Jiang, Hualiang and Li, Zhengming and Liu, Zhen and Zhu, Weiliang and Wang, Jian-Guo
    Journal of chemical information and modeling, 2013, 53(2), 343-353
    Acetohydroxyacid synthase (AHAS) has been regarded as a promising drug target against Mycobacterium tuberculosis (MTB) as it catalyzes the biosynthesis of branched-chain amino acids. In this study, 23 novel AHAS inhibitors were identified through molecular docking followed by similarity search. The determined IC50 values range from 0.385 $\pm$ 0.026 ?M to >200 ?M against bacterium AHAS. Five of the identified compounds show significant in vitro activity against H37Rv strains (MICs in the range of 2.5?80 mg/L) and clinical MTB strains, including MDR and XDR isolates. More impressively, compounds 5 and 7 can enhance the killing ability against macrophages infected pathogen remarkably. This study suggests our discovered inhibitors can be further developed as novel anti-MTB therapeutics targeting AHAS.

  • Consensus Docking: Improving the Reliability of Docking in a Virtual Screening Context
    Houston, Douglas R and Walkinshaw, Malcolm D
    Journal of chemical information and modeling, 2013, 53(2), 384-390
    PMID: 23351099    
    Structure-based virtual screening relies on scoring the predicted binding modes of compounds docked into the target. Because the accuracy of this scoring relies on the accuracy of the docking, methods that increase docking accuracy are valuable. Here, we present a relatively straightforward method for improving the probability of identifying accurately docked poses. The method is similar in concept to consensus scoring schemes, which have been shown to increase ranking power and thus hit rates, but combines information about predicted binding modes rather than predicted binding affinities. The pose prediction success rate of each docking program alone was found in this trial to be 55% for Autodock, 58% for DOCK, and 64% for Vina. By using more than one docking program to predict the binding pose, correct poses were identified in 82% or more of cases, a significant improvement. In a virtual screen, these more reliably posed compounds can be preferentially advanced to subsequent scoring stages to improve hit rates. Consensus docking can be easily introduced into established structure-based virtual screening methodologies.

  • Fragment-based Shape Signatures: a new tool for virtual screening and drug discovery.
    Zauhar, Randy J and Gianti, Eleonora and Welsh, William J
    Journal of computer-aided molecular design, 2013, 27(12), 1009-1036
    PMID: 24366428     doi: 10.1007/s10822-013-9698-7
    Since its introduction in 2003, the Shape Signatures method has been successfully applied in a number of drug design projects. Because it uses a ray-tracing approach to directly measure molecular shape and properties (as opposed to relying on chemical structure), it excels at scaffold hopping, and is extraordinarily easy to use. Despite its advantages, a significant drawback of the method has hampered its application to certain classes of problems; namely, when the chemical structures considered are large and contain heterogeneous ring-systems, the method produces descriptors that tend to merely measure the overall size of the molecule, and begin to lose selective power. To remedy this, the approach has been reformulated to automatically decompose compounds into fragments using ring systems as anchors, and to likewise partition the ray-trace in accordance with the fragment assignments. Subsequently, descriptors are generated that are fragment-based, and query and target molecules are compared by mapping query fragments onto target fragments in all ways consistent with the underlying chemical connectivity. This has proven to greatly extend the selective power of the method, while maintaining the ease of use and scaffold-hopping capabilities that characterized the original implementation. In this work, we provide a full conceptual description of the next generation Shape Signatures, and we underline the advantages of the method by discussing its practical applications to ligand-based virtual screening. The new approach can also be applied in receptor-based mode, where protein-binding sites (partitioned into subsites) can be matched against the new fragment-based Shape Signatures descriptors of library compounds.

  • A novel and efficient ligand-based virtual screening approach using the HWZ scoring function and an enhanced shape-density model.
    Hamza, Adel and Wei, Ning-Ning and Hao, Ce and Xiu, Zhilong and Zhan, Chang-Guo
    Journal of biomolecular structure & dynamics, 2013, 31(11), 1236-1250
    PMID: 23140256     doi: 10.1080/07391102.2012.732341
    In this work, we extend our previous ligand shape-based virtual screening approach by using the scoring function Hamza-Wei-Zhan (HWZ) score and an enhanced molecular shape-density model for the ligands. The performance of the method has been tested against the 40 targets in the Database of Useful Decoys and compared with the performance of our previous HWZ score method. The virtual screening results using the novel ligand shape-based approach demonstrated a favorable improvement (area under the receiver operator characteristics curve AUC 

  • Similarity searching for potent compounds using feature selection.
    Vogt, Martin and Bajorath, Jürgen
    Journal of chemical information and modeling, 2013, 53(7), 1613-1619
    PMID: 23808911     doi: 10.1021/ci4003206
    In similarity searching, compound potency is usually not taken into account. Given a set of active reference compounds, similarity to database molecules is calculated using different metrics without considering compound potency as a search parameter. Herein, we introduce a feature selection method for fingerprint similarity searching to maximize compound recall and preferentially detect potent compounds. On the basis of training examples, fingerprint features are selected that identify potent compounds and produce high recall. Using the reduced fingerprint representations, potent hits are preferentially detected, even if reference compounds have only moderate or low potency. Small sets of simple chemical features are found to yield high search performance.

  • Visualization and Virtual Screening of the Chemical Universe Database GDB-17.
    Ruddigkeit, Lars and Blum, Lorenz C and Reymond, Jean-Louis
    Journal of chemical information and modeling, 2013, 53(1), 56-65
    PMID: 23259841     doi: 10.1021/ci300535x
    The chemical universe database GDB-17 contains 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens obeying rules for chemical stability, synthetic feasibility, and medicinal chemistry. GDB-17 was analyzed using 42 integer value descriptors of molecular structure which we term "Molecular Quantum Numbers" (MQN). Principal component analysis and representation of the (PC1, PC2)-plane provided a graphical overview of the GDB-17 chemical space. Rapid ligand-based virtual screening (LBVS) of GDB-17 using the city-block distance CBD(MQN) as a similarity search measure was enabled by a hashed MQN-fingerprint. LBVS of the entire GDB-17 and of selected subsets identified shape similar, scaffold hopping analogs (ROCS > 1.6 and T(SF) < 0.5) of 15 drugs. Over 97% of these analogs occurred within CBD(MQN) ≤ 12 from each drug, a constraint which might help focus advanced virtual screening. An MQN-searchable 50 million subset of GDB-17 is publicly available at .

  • Characterization of small molecule binding. I. Accurate identification of strong inhibitors in virtual screening.
    Ding, Bo and Wang, Jian and Li, Nan and Wang, Wei
    Journal of chemical information and modeling, 2013, 53(1), 114-122
    PMID: 23259763     doi: 10.1021/ci300508m
    Accurately ranking docking poses remains a great challenge in computer-aided drug design. In this study, we present an integrated approach called MIEC-SVM that combines structure modeling and statistical learning to characterize protein-ligand binding based on the complex structure generated from docking. Using the HIV-1 protease as a model system, we showed that MIEC-SVM can successfully rank the docking poses and consistently outperformed the state-of-art scoring functions when the true positives only account for 1% or 0.5% of all the compounds under consideration. More excitingly, we found that MIEC-SVM can achieve a significant enrichment in virtual screening even when trained on a set of known inhibitors as small as 50, especially when enhanced by a model average approach. Given these features of MIEC-SVM, we believe it provides a powerful tool for searching for and designing new drugs.

  • Structure-Based Fragment Screening Is Demonstrated To Be a Practical Lead Discovery Method for a Representative G-Protein-Coupled Receptor
    Stevens, Benjamin D
    Journal of medicinal chemistry, 2013
    PMID: 23614494    

  • Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments.
    Madhavi Sastry, G and Adzhigirey, Matvey and Day, Tyler and Annabhimoju, Ramakrishna and Sherman, Woody
    Journal of computer-aided molecular design, 2013, 27(3), 221-234
    PMID: 23579614     doi: 10.1007/s10822-013-9644-8
    Structure-based virtual screening plays an important role in drug discovery and complements other screening approaches. In general, protein crystal structures are prepared prior to docking in order to add hydrogen atoms, optimize hydrogen bonds, remove atomic clashes, and perform other operations that are not part of the x-ray crystal structure refinement process. In addition, ligands must be prepared to create 3-dimensional geometries, assign proper bond orders, and generate accessible tautomer and ionization states prior to virtual screening. While the prerequisite for proper system preparation is generally accepted in the field, an extensive study of the preparation steps and their effect on virtual screening enrichments has not been performed. In this work, we systematically explore each of the steps involved in preparing a system for virtual screening. We first explore a large number of parameters using the Glide validation set of 36 crystal structures and 1,000 decoys. We then apply a subset of protocols to the DUD database. We show that database enrichment is improved with proper preparation and that neglecting certain steps of the preparation process produces a systematic degradation in enrichments, which can be large for some targets. We provide examples illustrating the structural changes introduced by the preparation that impact database enrichment. While the work presented here was performed with the Protein Preparation Wizard and Glide, the insights and guidance are expected to be generalizable to structure-based virtual screening with other docking methods.

  • Fragment-Based Drug Discovery Using a Multidomain, Parallel MD-MM/PBSA Screening Protocol
    Zhu, Tian and Lee, Hyun and Lei, Hao and Jones, Christopher and Patel, Kavankumar and Johnson, Michael E and Hevener, Kirk E
    Journal of chemical information and modeling, 2013, 53(3), 560-572
    PMID: 23432621    
    We have developed a rigorous computational screening protocol to identify novel fragment-like inhibitors of N(5)-CAIR mutase (PurE), a key enzyme involved in de novo purine synthesis that represents a novel target for the design of antibacterial agents. This computational screening protocol utilizes molecular docking, graphics processing unit (GPU)-accelerated molecular dynamics, and Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) free energy estimations to investigate the binding modes and energies of fragments in the active sites of PurE. PurE is a functional octamer comprised of identical subunits. The octameric structure, with its eight active sites, provided a distinct advantage in these studies because, for a given simulation length, we were able to place eight separate fragment compounds in the active sites to increase the throughput of the MM/PBSA analysis. To validate this protocol, we have screened an in-house fragment library consisting of 352 compounds. The theoretical results were then compared with the results of two experimental fragment screens, Nuclear Magnetic Resonance (NMR) and Surface Plasmon Resonance (SPR) binding analyses. In these validation studies, the protocol was able to effectively identify the competitive binders that had been independently identified by experimental testing, suggesting the potential utility of this method for the identification of novel fragments for future development as PurE inhibitors.


  • Directory of Useful Decoys, Enhanced (DUD-E) - Better Ligands and Decoys for Better Benchmarking.
    Mysinger, Michael and Carchia, Michael and Irwin, John J and Shoichet, Brian K
    Journal of medicinal chemistry, 2012, 55(14), 6582-6594
    PMID: 22716043     doi: 10.1021/jm300687e
    A key metric to assess molecular docking remains ligand enrichment against challenging decoys. Whereas the directory of useful decoys (DUD) has been widely used, clear areas for optimization have emerged. Here we describe an improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22,886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC. To ensure chemotype diversity we cluster each target's ligands by their Bemis-Murcko atomic frameworks. We add net charge to the matched physico-chemical properties, and include only the most dissimilar decoys, by topology, from the ligands. An online automated tool ( generates these improved matched decoys for user-supplied ligands. We test this dataset by docking all 102 targets, using the results to improve the balance between ligand desolvation and electrostatics in DOCK 3.6. The complete DUD-E benchmarking set is freely available at

  • Application of Drug-perturbed Essential Dynamics/Molecular Dynamics (ED/MD) to Virtual Screening and Rational Drug Design.
    Chaudhuri, Rima and Carrillo, Oliver and Laughton, Charles Anthony and Orozco, Modesto
    Journal of chemical theory and computation, 2012, 8(7), 2204-2214
    doi: 10.1021/ct300223c
    We present here the first application of a new algorithm, essential dynamics/molecular dynamics (ED/MD), to the field of small molecule docking. The method uses a previously existing molecular dynamics (MD) ensemble of a protein or protein-drug complex to generate, with a very small computational cost, perturbed ensembles which represent ligand-induced binding site flexibility in a more accurate way than the original trajectory. The use of these perturbed ensembles in a standard docking program leads to superior performance than the same docking procedure using the crystal structure or ensembles obtained from conventional MD simulations as templates. The simplicity and accuracy of the method opens up the possibility of introducing protein flexibility in high-throughput docking experiments.

  • Accessible high-throughput virtual screening molecular docking software for students and educators.
    Jacob, Reed B. and Andersen, Tim and McDougal, Owen M.
    PLoS computational biology, 2012, 8(5), e1002499
    PMID: 22693435     doi: 10.1371/journal.pcbi.1002499
    We survey low cost high- throughput virtual screening (HTVS) computer programs for instructors who wish to demonstrate molecular docking in their courses. Since HTVS programs are a useful adjunct to the time consuming and expensive wet bench experiments necessary to discover new drug therapies, the topic of molecular docking is core to the instruction of biochemistry and molecular biology. The availability of HTVS programs coupled with decreasing costs and advances in computer hardware have made computational approaches to drug discovery possible at institutional and non-profit budgets. This paper focuses on HTVS programs with graphical user interfaces (GUIs) that use either DOCK or AutoDock for the prediction of DockoMatic, PyRx, DockingServer, and MOLA since their utility has been proven by the research community, they are free or affordable, and the programs operate on a range of computer platforms.

  • FRED and HYBRID docking performance on standardized datasets.
    McGann, Mark
    Journal of computer-aided molecular design, 2012, 26(8), 897-906
    PMID: 22669221     doi: 10.1007/s10822-012-9584-8
    The docking performance of the FRED and HYBRID programs are evaluated on two standardized datasets from the Docking and Scoring Symposium of the ACS Spring 2011 national meeting. The evaluation includes cognate docking and virtual screening performance. FRED docks 70 % of the structures to within 2\AA} in the cognate docking test. In the virtual screening test, FRED is found to have a mean AUC of 0.75. The HYBRID program uses a modified version of FRED's algorithm that uses both ligand- and structure-based information to dock molecules, which increases its mean AUC to 0.78. HYBRID can also implicitly account for protein flexibility by making use of multiple crystal structures. Using multiple crystal structures improves HYBRID's performance (mean AUC 0.80) with a negligible increase in docking time (~15 %).

  • A comparative analysis of pharmacophore screening tools.
    Sanders, Marijn P A and Barbosa, Arménio J M and Zarzycka, Barbara and Nicolaes, Gerry A F and Klomp, Jan P G and de Vlieg, Jacob and Del Rio, Alberto
    Journal of chemical information and modeling, 2012, 52(6), 1607-1620
    PMID: 22646988     doi: 10.1021/ci2005274
    The pharmacophore concept is of central importance in computer-aided drug design (CADD) mainly due to its successful application in medicinal chemistry and, in particular, high-throughput virtual screening (HTVS). The simplicity of the pharmacophore definition enables the complexity of molecular interactions between ligand and receptor to be reduced to a handful set of features. With many pharmacophore screening software available, it is of the utmost interest to explore the behavior of these tools when applied to different biological systems. In this work we present a comparative analysis of eight pharmacophore screening algorithms (Catalyst, Unity, LigandScout, Phase, Pharao, MOE, Pharmer and POT) for their use in typical HTVS campaigns against four different biological targets by using default settings. The results herein presented show how the performance of each pharmacophore screening tool might be specifically related to factors such as the characteristics of the binding pocket, the use of specific pharmacophore features and the use of these techniques in specific steps/contexts of the drug discovery pipeline. Algorithms with RMSD-based scoring functions are able to predict more compound poses correctly as overlay-based scoring functions. However the ratio of correctly predicted compound poses versus incorrectly predicted poses is better for overlay-based scoring functions which also insure better performances in compound library enrichments. While the ensemble of these observations can be used to choose the most appropriate class of algorithm for specific virtual screening projects, we remarked that pharmacophore algorithms are often equally good and in this respect we also analyzed how pharmacophore algorithms can be combined together in order to increase the success of hit compound identification. This study provides a valuable benchmark set for further developments in the field of pharmacophore search algorithms e.g. by using pose predictions and compound library enrichment criteria.

  • Structure-based drug screening for G-protein-coupled receptors.
    Shoichet, Brian K and Kobilka, Brian K
    Trends in pharmacological sciences, 2012, 33(5), 268-272
    PMID: 22503476     doi: 10.1016/
    G-protein-coupled receptors (GPCRs) represent a large family of signaling proteins that includes many therapeutic targets; however, progress in identifying new small molecule drugs has been disappointing. The past 4 years have seen remarkable progress in the structural biology of GPCRs, raising the possibility of applying structure-based approaches to GPCR drug discovery efforts. Of the various structure-based approaches that have been applied to soluble protein targets, such as proteases and kinases, in silico docking is among the most ready applicable to GPCRs. Early studies suggest that GPCR binding pockets are well suited to docking, and docking screens have identified potent and novel compounds for these targets. This review will focus on the current state of in silico docking for GPCRs.

  • In silico design of small molecules.
    Bernardo, Paul H and Tong, Joo Chuan
    Methods in molecular biology (Clifton, N.J.), 2012, 800, 25-31
    PMID: 21964780     doi: 10.1007/978-1-61779-349-3_3
    Computational methods now play an integral role in modern drug discovery, and include the design and management of small molecule libraries, initial hit identification through virtual screening, optimization of the affinity and selectivity of hits, and improving the physicochemical properties of the lead compounds. In this chapter, we survey the most important data sources for the discovery of new molecular entities, and discuss the key considerations and guidelines for virtual chemical library design.

  • AMMOS software: method and application.
    Pencheva, T and Lagorce, D and Pajeva, I and Villoutreix, B O and Miteva, M A
    Methods in molecular biology (Clifton, N.J.), 2012, 819, 127-141
    PMID: 22183534     doi: 10.1007/978-1-61779-465-0_9
    Recent advances in computational sciences enabled extensive use of in silico methods in projects at the interface between chemistry and biology. Among them virtual ligand screening, a modern set of approaches, facilitates hit identification and lead optimization in drug discovery programs. Most of these approaches require the preparation of the libraries containing small organic molecules to be screened or a refinement of the virtual screening results. Here we present an overview of the open source AMMOS software, which is a platform performing an automatic procedure that allows for a structural generation and optimization of drug-like molecules in compound collections, as well as a structural refinement of protein-ligand complexes to assist in silico screening exercises.

  • Inverse Virtual Screening allows the discovery of the biological activity of natural compounds.
    Lauro, Gianluigi and Masullo, Milena and Piacente, Sonia and Riccio, Raffaele and Bifulco, Giuseppe
    Bioorganic & Medicinal Chemistry, 2012, 20(11), 3596-3602
    PMID: 22537682     doi: 10.1016/j.bmc.2012.03.072
    A small library of phenolic natural compounds belonging to different chemical classes was screened on a panel of targets involved in the genesis and progression of cancer. The re-investigation of their potential activity was achieved through the Inverse Virtual Screening approach. The normalization of the predicted binding energies permitted the selection of promising compounds on definite targets, avoiding the selection of false positive results. In vitro biological tests revealed the inhibitory activity of xanthohumol and isoxanthohumol on PDK1 and PKC protein kinases. This study validates the robustness of the Inverse Virtual Screening in silico approach as a useful tool for the identification of the specific biological activity of a given set of compounds.

  • Automated recycling of chemistry for virtual screening and library design.
    Vainio, Mikko and Kogej, Thierry and Raubacher, Florian
    Journal of chemical information and modeling, 2012, 52(7), 1777-1786
    PMID: 22657574     doi: 10.1021/ci300157m
    An early stage drug discovery project needs to identify a number of chemically diverse and attractive compounds. These hit compounds are typically found through high-throughput screening campaigns. The diversity of the chemical libraries used in screening is therefore important. In this study, we describe a virtual high-throughput screening system called Virtual Library. The system automatically "recycles" validated synthetic protocols and available starting materials to generate a large number of virtual compound libraries, and allows for fast searches in the generated libraries using a 2D fingerprint based screening method. Virtual Library links the returned virtual hit compounds back to experimental protocols to quickly assess the synthetic accessibility of the hits. The system can be used as an idea generator for library design to enrich the screening collection, and to explore the structure-activity landscape around a specific active compound.

  • Virtual fragment screening: Discovery of histamine H(3) receptor ligands using ligand-based and protein-based molecular fingerprints.
    Sirci, Francesco and Istyastono, Enade P and Vischer, Henry F and Kooistra, Albert J and Nijmeijer, Saskia and Kuijer, Martien and Wijtmans, Maikel and Mannhold, Raimund and Leurs, Rob and de Esch, Iwan J P and de Graaf, Chris
    Journal of chemical information and modeling, 2012, 52(12), 3308-3324
    PMID: 23140085     doi: 10.1021/ci3004094
    Virtual Fragment Screening (VFS) is a promising new method that uses computer models to identify small, fragment-like biologically active molecules as useful starting points for Fragment-Based Drug Discovery (FBDD). Training sets of true active and inactive fragment-like molecules to construct and validate target customized VFS methods are however lacking. We have for the first time explored the possibilities and challenges of VFS using molecular fingerprints derived from a unique set of fragment affinity data for the histamine H(3) receptor (H(3)R), a pharmaceutically relevant G Protein-coupled Receptor (GPCR). Optimized FLAP (Fingerprint of Ligands And Proteins) models containing essential molecular interaction fields that discriminate known H(3)R binders from inactive molecules were successfully used for the identification of new H(3)R ligands. Prospective virtual screening of 156,090 molecules yielded a high hit rate of 62% (18 of the 29 tested) experimentally confirmed novel fragment-like H(3)R ligands that offer new potential starting points for the design of H(3)R targeting drugs. The first construction and application of customized FLAP models for the discovery of fragment-like biologically active molecules demonstrates that VFS is an efficient way to explore protein-fragment interaction space in silico.

  • Shaping a Screening File for Maximal Lead Discovery Efficiency and Effectiveness: Elimination of Molecular Redundancy.
    Bakken, Gregory A and Boehm, Markus and Bell, Andrew S and Everett, Jeremy R and Gonzales, Rosalia and Hepworth, David and Klug-McLeod, Jacquelyn L and Lanfear, Jeremy and Loesel, Jens and Mathias, John and Wood, Terence P
    Journal of chemical information and modeling, 2012, 52(11), 2937-2949
    PMID: 23062111     doi: 10.1021/ci300372a
    High Throughput Screening (HTS) is a successful strategy for finding hits and leads that have the opportunity to be converted into drugs. In this paper we highlight novel computational methods used to select compounds to build a new screening file at Pfizer and the analytical methods we used to assess their quality. We also introduce the novel concept of molecular redundancy to help decide on the density of compounds required in any region of chemical space in order to be confident of running successful HTS campaigns.

  • Integrated Virtual Screening for the Identification of Novel and Selective Peroxisome Proliferator-Activated Receptor (PPAR) Scaffolds.
    Nevin, Daniel K and Peters, Martin B and Carta, Giorgio and Fayne, Darren and Lloyd, David G
    Journal of medicinal chemistry, 2012, 55(11), 4978-4989
    PMID: 22582973     doi: 10.1021/jm300068n
    We describe a fully customizable and integrated target-specific "tiered" virtual screening approach tailored to identifying and characterizing novel peroxisome proliferator activated receptor $\gamma$ (PPAR$\gamma$) scaffolds. Built on structure- and ligand-based computational techniques, a consensus protocol was developed for use in the virtual screening of chemical databases, focused toward retrieval of novel bioactive chemical scaffolds for PPAR$\gamma$. Consequent from application, three novel PPAR scaffolds displaying distinct chemotypes have been identified, namely, 5-(4-(benzyloxy)-3-chlorobenzylidene)dihydro-2-thioxopyrimidine-4,6(1H,5H)-dione (MDG 548), 3-((4-bromophenoxy)methyl)-N-(4-nitro-1H-pyrazol-1-yl)benzamide (MDG 559), and ethyl 2-[3-hydroxy-5-(5-methyl-2-furyl)-2-oxo-4-(2-thienylcarbonyl)-2,5-dihydro-1H-pyrrol-1-yl]-4-methyl-1,3-thiazole-5-carboxylate (MDG 582). Fluorescence polarization(FP) and time resolved fluorescence resonance energy transfer (TR-FRET) show that these compounds display high affinity competitive binding to the PPAR$\gamma$-LBD (EC(50) of 215 nM to 5.45 $\mu$M). Consequent characterization by a TR-FRET activation reporter assay demonstrated agonism of PPAR$\gamma$ by all three compounds (EC(50) of 467-594nM). Additionally, differential PPAR isotype specificity was demonstrated through assay against PPAR$\alpha$ and PPAR$\delta$ subtypes. This work showcases the ability of target specific "tiered screen" protocols to successfully identify novel scaffolds of individual receptor subtypes with greater efficacy than isolated screening methods.

  • Recent Trends and Applications in 3D Virtual Screening.
    Ghemtio, Léo and Pérez-Nueno, Violeta I and Leroux, Vincent and Asses, Yasmine and Souchet, Michel and Mavridis, Lazaros and Maigret, Bernard and Ritchie, David W
    Combinatorial chemistry & high throughput screening, 2012, 15(9), 749-769
    PMID: 22934947    
    Virtual screening (VS) is becoming an increasingly important approach for identifying and selecting biologically active molecules against specific pharmaceutically relevant targets. Compared to conventional high throughput screening techniques, in silico screening is fast and inexpensive, and is increasing in popularity in early-stage drug discovery endeavours. This paper reviews and discusses recent trends and developments in three-dimensional (3D) receptor-based and ligand-based VS methodologies. First, we describe the concept of accessible chemical space and its exploration. We then describe 3D structural ligand-based VS techniques, hybrid approaches, and new approaches to exploit additional knowledge that can now be found in large chemogenomic databases. We also briefly discuss some potential issues relating to pharmacokinetics, toxicity profiling, target identification and validation, inverse docking, scaffold-hopping and drug re-purposing. We propose that the best way to advance the state of the art in 3D VS is to integrate complementary strategies in a single drug discovery pipeline, rather than to focus only on theoretical or computational improvements of individual techniques. Two recent 3D VS case studies concerning the LXR-$\beta$ receptor and the CCR5/CXCR4 HIV co-receptors are presented as examples, which implement some of the complementary methods and strategies that are reviewed here.

  • Structure-based virtual screening for drug discovery: a problem-centric review.
    Cheng, Tiejun and Li, Qingliang and Zhou, Zhigang and Wang, Yanli and Bryant, Stephen H
    The AAPS journal, 2012, 14(1), 133-141
    PMID: 22281989     doi: 10.1208/s12248-012-9322-0
    Structure-based virtual screening (SBVS) has been widely applied in early-stage drug discovery. From a problem-centric perspective, we reviewed the recent advances and applications in SBVS with a special focus on docking-based virtual screening. We emphasized the researchers' practical efforts in real projects by understanding the ligand-target binding interactions as a premise. We also highlighted the recent progress in developing target-biased scoring functions by optimizing current generic scoring functions toward certain target classes, as well as in developing novel ones by means of machine learning techniques.

  • FINDSITE X: A Structure-Based, Small Molecule Virtual Screening Approach with Application to All Identified Human GPCRs
    Zhou, Hongyi and Skolnick, Jeffrey
    Molecular Pharmaceutics, 2012, 9(6), 1775-1784
    PMID: 22574683     doi: 10.1021/mp3000716
    We have developed FINDSITEX, an extension of FINDSITE, a protein threading based algorithm for the inference of protein binding sites, biochemical function and virtual ligand screening, that removes the limitation that holo protein structures (those containing bound ligands) of a sufficiently large set of distant evolutionarily related proteins to the target be solved; rather, predicted protein structures and experimental ligand binding information are employed. To provide the predicted protein structures, a fast and accurate version of our recently developed TASSERVMT, TASSERVMT-lite, for template-based protein structural modeling applicable up to 1000 residues is developed and tested, with comparable performance to the top CASP9 servers. Then, a hybrid approach that combines structure alignments with an evolutionary similarity score for identifying functional relationships between target and proteins with binding data has been developed. By way of illustration, FINDSITEX is applied to 998 identified human G-protein coupled receptors (GPCRs). First, TASSERVMT-lite provides updates of all human GPCR structures previously modeled in our lab. We then use these structures and the new function similarity detection algorithm to screen all human GPCRs against the ZINC8 nonredundant (TC < 0.7) ligand set combined with ligands from the GLIDA database (a total of 88,949 compounds). Testing (excluding GPCRs whose sequence identity > 30% to the target from the binding data library) on a 168 human GPCR set with known binding data, the average enrichment factor in the top 1% of the compound library (EF0.01) is 22.7, whereas EF0.01 by FINDSITE is 7.1. For virtual screening when just the target and its native ligands are excluded, the average EF0.01 reaches 41.4. We also analyze off-target interactions for the 168 protein test set. All predicted structures, virtual screening data and off-target interactions for the 998 human GPCRs are available at

  • Consensus Induced Fit Docking (cIFD): methodology, validation, and application to the discovery of novel Crm1 inhibitors.
    Kalid, Ori and Toledo Warshaviak, Dora and Shechter, Sharon and Sherman, Woody and Shacham, Sharon
    Journal of computer-aided molecular design, 2012, 26(11), 1217-1228
    PMID: 23053738     doi: 10.1007/s10822-012-9611-9
    We present the Consensus Induced Fit Docking (cIFD) approach for adapting a protein binding site to accommodate multiple diverse ligands for virtual screening. This novel approach results in a single binding site structure that can bind diverse chemotypes and is thus highly useful for efficient structure-based virtual screening. We first describe the cIFD method and its validation on three targets that were previously shown to be challenging for docking programs (COX-2, estrogen receptor, and HIV reverse transcriptase). We then demonstrate the application of cIFD to the challenging discovery of irreversible Crm1 inhibitors. We report the identification of 33 novel Crm1 inhibitors, which resulted from the testing of 402 purchased compounds selected from a screening set containing 261,680 compounds. This corresponds to a hit rate of 8.2 %. The novel Crm1 inhibitors reveal diverse chemical structures, validating the utility of the cIFD method in a real-world drug discovery project. This approach offers a pragmatic way to implicitly account for protein flexibility without the additional computational costs of ensemble docking or including full protein flexibility during virtual screening.

  • Novel Inhibitor Discovery through Virtual Screening against Multiple Protein Conformations Generated via Ligand-Directed Modeling: A Maternal Embryonic Leucine Zipper Kinase Example.
    Mahasenan, Kiran V and Li, Chenglong
    Journal of chemical information and modeling, 2012, 52(5), 1345-1355
    PMID: 22540736     doi: 10.1021/ci300040c
    Kinase targets have been demonstrated to undergo major conformational reorganization upon ligand binding. Such protein conformational plasticity remains a significant challenge in structure-based virtual screening methodology and may be approximated by screening against an ensemble of diverse protein conformations. Maternal embryonic leucine zipper kinase (MELK), a member of serine-threonine kinase family, has been recently found to be involved in the tumerogenic state of glioblastoma, breast, ovarian, and colon cancers. We therefore modeled several conformers of MELK utilizing the available chemogenomic and crystallographic data of homologous kinases. We carried out docking pose prediction and virtual screening enrichment studies with these conformers. The performances of the ensembles were evaluated by their ability to reproduce known inhibitor bioactive conformations and to efficiently recover known active compounds early in the virtual screen when seeded with decoy sets. A few of the individual MELK conformers performed satisfactorily in reproducing the native protein-ligand pharmacophoric interactions up to 50% of the cases. By selecting an ensemble of a few representative conformational states, most of the known inhibitor binding poses could be rationalized. For example, a four conformer ensemble is able to recover 95% of the studied actives, especially with imperfect scoring function(s). The virtual screening enrichment varied considerably among different MELK conformers. Enrichment appears to improve by selection of a proper protein conformation. For example, several holo and unliganded active conformations are better to accommodate diverse chemotypes than ATP-bound conformer. These results prove that using an ensemble of diverse conformations could give a better performance. Applying this approach, we were able to screen a commercially available library of half a million compounds against three conformers to discover three novel inhibitors of MELK, one from each template. Among the three compounds validated via experimental enzyme inhibition assays, one is relatively potent (15; K(d)

  • Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening?
    Grigoryan, Arsen V and Wang, Hong and Cardozo, Timothy J
    PloS one, 2012, 7(10), e46532
    doi: 10.1371/journal.pone.0046532
    The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of ...

  • Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening?
    Grigoryan, Arsen V and Wang, Hong and Cardozo, Timothy J
    PloS one, 2012, 7(10), e46532
    doi: 10.1371/journal.pone.0046532
    The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of ...

  • On the Value of Homology Models for Virtual Screening: Discovering hCXCR3 Antagonists by Pharmacophore-Based and Structure-Based Approaches.
    Huang, Dane and Gu, Qiong and Ge, Hu and Ye, Jiming and Salam, Noeris K. and Hagler, Arnie and Chen, Hongzhuan and Xu, Jun
    Journal of chemical information and modeling, 2012, 52(5), 1356-1366
    PMID: 22545675     doi: 10.1021/ci300067q
    Human chemokine receptor CXCR3 (hCXCR3) antagonists have potential therapeutic applications as antivirus, antitumor, and anti-inflammatory agents. A novel virtual screening protocol, which combines pharmacophore-based and structure-based approaches, was proposed. A three-dimensional QSAR pharmacophore model and a structure-based docking model were built to virtually screen for hCXCR3 antagonists. The hCXCR3 antagonist binding site was constructed by homology modeling and molecular dynamics (MD) simulation. By combining the structure-based and ligand-based screenings results, 95% of the compounds satisfied either pharmacophore or docking score criteria and would be chosen as hits if the union of the two searches was taken. The false negative rates were 15% for the pharmacophore model, 14% for the homology model, and 5% for the combined model. Therefore, the consistency of the pharmacophore model and the structural binding model is 219/273

  • Performance Evaluation of 2D Fingerprint and 3D Shape Similarity Methods in Virtual Screening.
    Hu, Guoping and Kuang, Guanglin and Xiao, Wen and Li, Weihua and Liu, Guixia and Tang, Yun
    Journal of chemical information and modeling, 2012, 52(5), 1103-1113
    PMID: 22551340     doi: 10.1021/ci300030u
    Virtual screening (VS) can be accomplished in either ligand- or structure-based methods. In recent times, an increasing number of 2D fingerprint and 3D shape similarity methods have been used in ligand-based VS. To evaluate the performance of these ligand-based methods, retrospective VS was performed on a tailored directory of useful decoys (DUD). The VS performances of 14 2D fingerprints and four 3D shape similarity methods were compared. The results revealed that 2D fingerprints ECFP_2 and FCFP_4 yielded better performance than the 3D Phase Shape methods. These ligand-based methods were also compared with structure-based methods, such as Glide docking and Prime molecular mechanics generalized Born surface area rescoring, which demonstrated that both 2D fingerprint and 3D shape similarity methods could yield higher enrichment during early retrieval of active compounds. The results demonstrated the superiority of ligand-based methods over the docking-based screening in terms of both speed and hit enrichment. Therefore, considering ligand-based methods first in any VS workflow would be a wise option.

  • Evaluation of DOCK 6 as a pose generation and database enrichment tool.
    Brozell, Scott R and Mukherjee, Sudipto and Balius, Trent E and Roe, Daniel R and Case, David A and Rizzo, Robert C
    Journal of computer-aided molecular design, 2012, 26(6), 749-773
    PMID: 22569593     doi: 10.1007/s10822-012-9565-y
    In conjunction with the recent American Chemical Society symposium titled "Docking and Scoring: A Review of Docking Programs" the performance of the DOCK6 program was evaluated through (1) pose reproduction and (2) database enrichment calculations on a common set of organizer-specified systems and datasets (ASTEX, DUD, WOMBAT). Representative baseline grid score results averaged over five docking runs yield a relatively high pose identification success rate of 72.5 % (symmetry corrected rmsd) and sampling rate of 91.9 % for the multi site ASTEX set (N

  • Lead Finder docking and virtual screening evaluation with Astex and DUD test sets.
    Novikov, Fedor N and Stroylov, Viktor S and Zeifman, Alexey A and Stroganov, Oleg V and Kulkov, Val and Chilov, Ghermes G
    Journal of computer-aided molecular design, 2012, 26(6), 725-735
    PMID: 22569592     doi: 10.1007/s10822-012-9549-y
    Lead Finder is a molecular docking software. Sampling uses an original implementation of the genetic algorithm that involves a number of additional optimization procedures. Lead Finder's scoring functions employ a set of semi-empiric molecular mechanics functionals that have been parameterized independently for docking, binding energy predictions and rank-ordering for virtual screening. Sampling and scoring both utilize a staged approach, moving from fast but less accurate algorithm versions to computationally more intensive but more accurate versions. Lead Finder includes tools for the preparation of full atom protein and ligand models. In this exercise, Lead Finder achieved 72.9% docking success rate on the Astex test set when the original author-prepared full atom models were used, and 74.1% success rate when the structures were prepared by Lead Finder. The major cause of docking failures were scoring errors resulting from the use of imperfect solvation models. In many cases, docking errors could be corrected by the proper protonation and the use of correct cyclic conformations of ligands. In virtual screening experiments on the DUD test set the early enrichment factor of several tens was achieved on average. However, the area under the ROC curve ("AUC ROC") ranged from 0.70 to 0.74 depending on the screening protocol used, and the separation from the null model was not perfect-0.12-0.15 units of AUC ROC. We assume that effective virtual screening in the whole range of enrichment curve and not just at the early enrichment stages requires more accurate solvation modeling and accounting for the protein backbone flexibility.

  • ChemBioServer: a web-based pipeline for filtering, clustering and visualization of chemical compounds used in drug discovery
    Athanasiadis, Emmanouil and Cournia, Zoe and Spyrou, George
    Bioinformatics (Oxford, England), 2012, 28(22), 3002-3003
    PMID: 22962344     doi: 10.1093/bioinformatics/bts551
    Summary: ChemBioServer is a publicly available web application for effectively mining and filtering chemical compounds used in drug discovery. It provides researchers with the ability to (i) browse and visualize compounds along with their properties, (ii) filter chemical compounds for a variety of properties such as steric clashes and toxicity, (iii) apply perfect match substructure search, (iv) cluster compounds according to their physicochemical properties providing representative compounds for each cluster, (v) build custom compound mining pipelines and (vi) quantify through property graphs the top ranking compounds in drug discovery procedures. ChemBioServer allows for pre-processing of compounds prior to an in silico screen, as well as for post-processing of top-ranked molecules resulting from a docking exercise with the aim to increase the efficiency and the quality of compound selection that will pass to the experimental test phase.Availability: The ChemBioServer web application is available at:

  • Surflex-Dock: Docking benchmarks and real-world application.
    Spitzer, Russell and Jain, Ajay N
    Journal of computer-aided molecular design, 2012, 26(6), 687-699
    PMID: 22569590     doi: 10.1007/s10822-011-9533-y
    Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.

  • Docking performance of the glide program as evaluated on the Astex and DUD datasets: a complete set of glide SP results and selected results for a new scoring function integrating WaterMap and glide.
    Repasky, Matthew P and Murphy, Robert B and Banks, Jay L and Greenwood, Jeremy R and Tubert-Brohman, Ivan and Bhat, Sathesh and Friesner, Richard A
    Journal of computer-aided molecular design, 2012, 26(6), 787-799
    PMID: 22576241     doi: 10.1007/s10822-012-9575-9
    Glide SP mode enrichment results for two preparations of the DUD dataset and native ligand docking RMSDs for two preparations of the Astex dataset are presented. Following a best-practices preparation scheme, an average RMSD of 1.140\AA} for native ligand docking with Glide SP is computed. Following the same best-practices preparation scheme for the DUD dataset an average area under the ROC curve (AUC) of 0.80 and average early enrichment via the ROC (0.1 %) metric of 0.12 were observed. 74 and 56 % of the 39 best-practices prepared targets showed AUC over 0.7 and 0.8, respectively. Average AUC was greater than 0.7 for all best-practices protein families demonstrating consistent enrichment performance across a broad range of proteins and ligand chemotypes. In both Astex and DUD datasets, docking performance is significantly improved employing a best-practices preparation scheme over using minimally-prepared structures from the PDB. Enrichment results for WScore, a new scoring function and sampling methodology integrating WaterMap and Glide, are presented for four DUD targets, hivrt, hsp90, cdk2, and fxa. WScore performance in early enrichment is consistently strong and all systems examined show AUC > 0.9 and superior early enrichment to DUD best-practices Glide SP results.

  • e-Drug3D: 3D structure collections dedicated to drug repurposing and fragment-based drug design.
    Pihan, Emilie and Colliandre, Lionel and Guichou, Jean-François and Douguet, Dominique
    Bioinformatics (Oxford, England), 2012, 28(11), 1540-1541
    PMID: 22539672     doi: 10.1093/bioinformatics/bts186
    MOTIVATION: In the drug discovery field, new uses for old drugs, selective optimization of side activities and Fragment-Based Drug Design (FBDD) have proved to be successful alternatives to high throughput screening (HTS). e-Drug3D is a database of 3D chemical structures of drugs that provides several collections of ready-to-screen SD Files of drugs and commercial drug fragments. They are natural inputs in studies dedicated to drug repurposing and FBDD. AVAILABILITY: e-Drug3D collections are freely available at either for download or for direct in silico web-based screenings. CONTACT: SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • QSAR Classification Model for Antibacterial Compounds and Its Use in Virtual Screening.
    Singh, Narender and Chaudhury, Sidhartha and Liu, Ruifeng and Abdulhameed, Mohamed Diwan M and Tawa, Gregory and Wallqvist, Anders
    Journal of chemical information and modeling, 2012, 52(10), 2559-2569
    PMID: 23013546     doi: 10.1021/ci300336v
    As novel and drug-resistant bacterial strains continue to present an emerging health threat, the development of new antibacterial agents is critical. This includes making improvements to existing antibacterial scaffolds as well as identifying novel ones. The aim of this study is to apply a Bayesian classification QSAR approach to rapidly screen chemical libraries for compounds predicted to have antibacterial activity. Toward this end we assembled a data set of 317 known antibacterial compounds as well as a second data set of diverse, well-validated, non-antibacterial compounds from 215 PubChem Bioassays against various bacterial species. We constructed a Bayesian classification model using structural fingerprints and physicochemical property descriptors and achieved an accuracy of 84% and precision of 86% on an independent test set in identifying antibacterial compounds. To demonstrate the practical applicability of the model in virtual screening, we screened an independent data set of ∼200k compounds. The results show that the model can screen top hits of PubChem Bioassay actives with accuracy up to ∼76%, representing a 1.5-2-fold enrichment. The top screened hits represented a mixture of both known antibacterial scaffolds as well as novel scaffolds. Our study suggests that a well-validated Bayesian classification QSAR approach could compliment other screening approaches in identifying novel and promising hits. The data sets used in constructing and validating this model have been made publicly available.

  • Exploring Protein Flexibility: Incorporating Structural Ensembles From Crystal Structures and Simulation into Virtual Screening Protocols.
    Osguthorpe, David J and Sherman, Woody and Hagler, Arnold T
    The journal of physical chemistry. B, 2012, 116(23), 6952-6959
    PMID: 22424156     doi: 10.1021/jp3003992
    The capacity of proteins to adapt their structure in response to various perturbations including covalent modifications, and interactions with ligands and other proteins plays a key role in biological processes. Here, we explore the ability of molecular dynamics (MD), replica exchange molecular dynamics (REMD), and a library of structures of crystal-ligand complexes, to sample the protein conformational landscape and especially the accessible ligand binding site geometry. The extent of conformational space sampled is measured by the diversity of the shapes of the ligand binding sites. Since our focus here is the effect of this plasticity on the ability to identify active compounds through virtual screening, we use the structures generated by these techniques to generate a small ensemble for further docking studies, using binding site shape hierarchical clustering to determine four structures for each ensemble. These are then assessed for their capacity to optimize enrichment and diversity in docking. We test these protocols on three different receptors: androgen receptor (AR), HIV protease, and CDK2. We show that REMD enhances structural sampling slightly as compared both to MD, and the distortions induced by ligand binding as reflected in the crystal structures. The improved sampling of the simulation methods does not translate directly into improved docking performance, however. The ensemble approach did improve enrichment and diversity, and the ensemble derived from the crystal structures performed somewhat better than those derived from the simulations.

  • GA(M)E-QSAR: A Novel, Fully Automatic Genetic-Algorithm-(Meta)-Ensembles Approach for Binary Classification in Ligand-Based Drug Design.
    Pérez-Castillo, Yunierkis and Lazar, Cosmin and Taminau, Jonatan and Froeyen, Mathy and Cabrera-Pérez, Miguel\'A}ngel and Nowé, Ann
    Journal of chemical information and modeling, 2012, 52(9), 2366-2386
    PMID: 22856471     doi: 10.1021/ci300146h
    Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.

  • Computational Approach for Fast Screening of Small Molecular Candidates To Inhibit Crystallization in Amorphous Drugs.
    Pajula, Katja and Lehto, Vesa-Pekka and Ketolainen, Jarkko and Korhonen, Ossi
    Molecular Pharmaceutics, 2012, 9(10), 2844-2855
    PMID: 22867030     doi: 10.1021/mp300135h
    The applicability of the computational docking approach was investigated to create a novel method for quick additive screening to inhibit the crystallization taking place in amorphous drugs. Surface energy and attachment energy were utilized to recognize the morphologically most important crystal faces. The surfaces (100), (001), and (010) were identified as target faces, and the estimated free energies of binding of additives on these surfaces were computationally determined. The molecule of the crystallizing compound was included in the group of the modeled additives as the reference and for the validation of the approach. Additives having a lower estimated free energy of binding than the reference molecule itself were considered as potential crystallization inhibitors. Salicylamide, salicylic acid, and sulfanilamide with computationally prescreened additives were melt-quenched, and the nucleation and crystal growth rates were subsequently monitored by polarized light microscopy. As a result, computationally screened additives decelerated the nucleation and crystal growth rates of the studied drugs while the pure drugs crystallized too fast to be measured. The use of a computational approach enabled fast and cost-effective additive selection to retard nucleation and crystal growth, thus facilitating the production of amorphous binary small molecular compounds with stabilized disordered structures.

  • DecoyFinder: an easy-to-use python GUI application for building target-specific decoy sets.
    Cereto-Massagué, Adrià and Guasch, Laura and Valls, Cristina and Mulero, Miquel and Pujadas, Gerard and Garcia-Vallvé, Santiago
    Bioinformatics (Oxford, England), 2012, 28(12), 1661-1662
    PMID: 22539671     doi: 10.1093/bioinformatics/bts249
    Decoys are molecules that are presumed to be inactive against a target (i.e. will not likely bind to the target) and are used to validate the performance of molecular docking or a virtual screening workflow. The Directory of Useful Decoys database ( provides a free directory of decoys for use in virtual screening, though it only contains a limited set of decoys for 40 targets.To overcome this limitation, we have developed an application called DecoyFinder that selects, for a given collection of active ligands of a target, a set of decoys from a database of compounds. Decoys are selected if they are similar to active ligands according to five physical descriptors (molecular weight, number of rotational bonds, total hydrogen bond donors, total hydrogen bond acceptors and the octanol-water partition coefficient) without being chemically similar to any of the active ligands used as an input (according to the Tanimoto coefficient between MACCS fingerprints). To the best of our knowledge, DecoyFinder is the first application designed to build target-specific decoy sets. AVAILABILITY: A complete description of the software is included on the application home page. A validation of DecoyFinder on 10 DUD targets is provided as Supplementary Table S1. DecoyFinder is freely available at

  • Integrating Ligand-Based and Protein-Centric Virtual Screening of Kinase Inhibitors Using Ensembles of Multiple Protein Kinase Genes and Conformations.
    Dixit, Anshuman and Verkhivker, Gennady M
    Journal of chemical information and modeling, 2012, 52(10), 2501-2515
    PMID: 22992037     doi: 10.1021/ci3002638
    The rapidly growing wealth of structural and functional information about kinase genes and kinase inhibitors that is fueled by a significant therapeutic role of this protein family provides a significant impetus for development of targeted computational screening approaches. In this work, we explore an ensemble-based, protein-centric approach that allows for simultaneous virtual ligand screening against multiple kinase genes and multiple kinase receptor conformations. We systematically analyze and compare the results of ligand-based and protein-centric screening approaches using both single-receptor and ensemble-based docking protocols. A panel of protein kinase targets that includes ABL, EGFR, P38, CDK2, TK, and VEGFR2 kinases is used in this comparative analysis. By applying various performance metrics we have shown that ligand-centric shape matching can provide an effective enrichment of active compounds outperforming single-receptor docking screening. However, ligand-based approaches can be highly sensitive to the choice of inhibitor queries. Employment of multiple inhibitor queries combined with parallel selection ranking criteria can improve the performance and efficiency of ligand-based virtual screening. We also demonstrated that replica-exchange Monte Carlo docking with kinome-based ensembles of multiple crystal structures can provide a superior early enrichment on the kinase targets. The central finding of this study is that incorporation of the template-based structural information about kinase inhibitors and protein kinase structures in diverse functional states can significantly enhance the overall performance and robustness of both ligand and protein-centric screening strategies. The results of this study may be useful in virtual screening of kinase inhibitors potentially offering a beneficial spectrum of therapeutic activities across multiple disease states.

  • Potential and Limitations of Ensemble Docking.
    Korb, Oliver and Olsson, Tjelvar S G and Bowden, Simon J and Hall, Richard J and Verdonk, Marcel L and Liebeschuetz, John W and Cole, Jason C
    Journal of chemical information and modeling, 2012, 52(5), 1262-1274
    PMID: 22482774     doi: 10.1021/ci2005934
    A major problem in structure-based virtual screening applications is the appropriate selection of a single or even multiple protein structures to be used in the virtual screening process. A priori it is unknown which protein structure(s) will perform best in a virtual screening experiment. We investigated the performance of ensemble docking, as a function of ensemble size, for eight targets of pharmaceutical interest. Starting from single protein structure docking results, for each ensemble size up to 500 000 combinations of protein structures were generated, and, for each ensemble, pose prediction and virtual screening results were derived. Comparison of single to multiple protein structure results suggests improvements when looking at the performance of the worst and the average over all single protein structures to the performance of the worst and average over all protein ensembles of size two or greater, respectively. We identified several key factors affecting ensemble docking performance, including the sampling accuracy of the docking algorithm, the choice of the scoring function, and the similarity of database ligands to the cocrystallized ligands of ligand-bound protein structures in an ensemble. Due to these factors, the prospective selection of optimum ensembles is a challenging task, shown by a reassessment of published ensemble selection protocols.

  • Recognizing Pitfalls in Virtual Screening: A Critical Review.
    Scior, Thomas and Bender, Andreas and Tresadern, Gary and Medina-Franco, José L and Martínez-Mayorga, Karina and Langer, Thierry and Cuanalo-Contreras, Karina and Agrafiotis, Dimitris K
    Journal of chemical information and modeling, 2012, 52(4), 867-881
    PMID: 22435959     doi: 10.1021/ci200528d
    The aim of virtual screening (VS) is to identify bioactive compounds through computational means, by employing knowledge about the protein target (structure-based VS) or known bioactive ligands (ligand-based VS). In VS, a large number of molecules are ranked according to their likelihood to be bioactive compounds, with the aim to enrich the top fraction of the resulting list (which can be tested in bioassays afterward). At its core, VS attempts to improve the odds of identifying bioactive molecules by maximizing the true positive rate, that is, by ranking the truly active molecules as high as possible (and, correspondingly, the truly inactive ones as low as possible). In choosing the right approach, the researcher is faced with many questions: where does the optimal balance between efficiency and accuracy lie when evaluating a particular algorithm; do some methods perform better than others and in what particular situations; and what do retrospective results tell us about the prospective utility of a particular method? Given the multitude of settings, parameters, and data sets the practitioner can choose from, there are many pitfalls that lurk along the way which might render VS less efficient or downright useless. This review attempts to catalogue published and unpublished problems, shortcomings, failures, and technical traps of VS methods with the aim to avoid pitfalls by making the user aware of them in the first place.

  • Comprehensive predictions of target proteins based on protein-chemical interaction using virtual screening and experimental verifications.
    Kobayashi, Hiroki and Harada, Hiroko and Nakamura, Masaomi and Futamura, Yushi and Ito, Akihiro and Yoshida, Minoru and Iemura, Shun-Ichiro and Shin-Ya, Kazuo and Doi, Takayuki and Takahashi, Takashi and Natsume, Tohru and Imoto, Masaya and Sakakibara, Yasubumi
    BMC chemical biology, 2012, 12(1), 2
    PMID: 22480302     doi: 10.1186/1472-6769-12-2
    ABSTRACT: BACKGROUND: Identification of the target proteins of bioactive compounds is critical for elucidating the mode of action; however, target identification has been difficult in general, mostly due to the low sensitivity of detection using affinity chromatography followed by CBB staining and MS/MS analysis. RESULTS: We applied our protocol of predicting target proteins combining in silico screening and experimental verification for incednine, which inhibits the anti-apoptotic function of Bcl-xL by an unknown mechanism. One hundred eighty-two target protein candidates were computationally predicted to bind to incednine by the statistical prediction method, and the predictions were verified by in vitro binding of incednine to seven proteins, whose expression can be confirmed in our cell system. As a result, 40% accuracy of the computational predictions was achieved successfully, and we newly found 3 incednine-binding proteins. CONCLUSIONS: This study revealed that our proposed protocol of predicting target protein combining in silico screening and experimental verification is useful, and provides new insight into a strategy for identifying target proteins of small molecules.

  • Ligand-Based Virtual Screening Approach Using a New Scoring Function.
    Hamza, Adel and Wei, Ning-Ning and Zhan, Chang-Guo
    Journal of chemical information and modeling, 2012, 52(4), 963-974
    PMID: 22486340     doi: 10.1021/ci200617d
    In this study, we aimed to develop a new ligand-based virtual screening approach using an effective shape-overlapping procedure and a more robust scoring function (denoted by the HWZ score for convenience). The HWZ score-based virtual screening approach was tested against the compounds for 40 protein targets available in the Database of Useful Decoys (DUD; ), and the virtual screening performance was evaluated in terms of the area under the receiver operator characteristic (ROC) curve (AUC), enrichment factor (EF), and hit rate (HR), demonstrating an improved overall performance compared to other popularly used approaches examined. In particular, the HWZ score-based virtual screening led to an average AUC value of 0.84 $\pm$ 0.02 (95% confidence interval) for the 40 targets. The average HR values at the top 1% and 10% of the active compounds for the 40 targets were 46.3% $\pm$ 6.7% and 59.2% $\pm$ 4.7%, respectively. In addition, the performance of the HWZ score-based virtual screening approach is less sensitive to the choice of the target.

  • Chemical and biological properties of frequent screening hits.
    Che, Jianwei and King, Frederick and Zhou, Bin and Zhou, Yingyao
    Journal of chemical information and modeling, 2012, 52(4), 913-926
    PMID: 22435989     doi: 10.1021/ci300005y
    High throughput screening (HTS) has become an important technology for the drug discovery process. It has been noted that certain compounds frequently appear as hits in many screening campaigns. By data mining an HTS database covering large chemical space and diverse biological functions, we identified many novel chemical features, as well as several biological processes that were associated with a significant portion of frequent hits. However, we also noted that several marketed drugs also contained characteristics that commonly were associated with frequent hits. This observation suggested that current generally employed strategies for triaging compounds may result in the removal of compounds with desirable properties. Therefore, we developed a novel strategy that overlaid chemical scaffolds, biological processes, along with empirical hit frequency data, in order to provide a more functional frequent hit triage strategy; the risk of removing biologically-relevant frequent hits was reduced compared to the typical empirical hit frequency-based filtering strategy.

  • Assessment of a Rule-Based Virtual Screening Technology (INDDEx) on a Benchmark Data Set.
    Reynolds, Christopher R and Amini, Ata C and Muggleton, Stephen H and Sternberg, Michael J. E.
    The journal of physical chemistry. B, 2012, 116(23), 6732-6739
    PMID: 22380596     doi: 10.1021/jp212084f
    The Investigational Novel Drug Discovery by Example (INDDEx) package has been developed to find active compounds by linking activity to chemical substructure and to guide the process of further drug development. INDDEx is a machine-learning technique, based on forming qualitative logical rules about substructural features of active molecules, weighting the rules to form a quantitative model, and then using the model to screen a molecular database. INDDEx is shown to be able to learn from multiple active compounds and to be useful for scaffold-hopping when performing virtual screening, giving high retrieval rates even when learning from a small number of compounds. Across the data sets tested, at 1% of the data, INDDEx was found to have average enrichment factors of 69.2, 82.7, and 90.4 when learning from 2, 4, and 8 active ligands, respectively. At 0.1% of the data, INDDEx had average enrichment factors of 492, 631, and 707 when learning from 2, 4, and 8 active ligands, respectively. Excluding all ligands with more than 0.5 Tanimoto Maximum Common Substructure, INDDEx had average enrichment factors at 1% of 52.3, 63.6, and 66.9 when learning from 2, 4, and 8 active ligands, respectively. The performance of INDDEx is compared with that of eHiTS LASSO, PharmaGist, and DOCK.

  • Application of Support Vector Machine to Three-Dimensional Shape-Based Virtual Screening Using Comprehensive Three-Dimensional Molecular Shape Overlay with Known Inhibitors.
    Sato, Tomohiro and Yuki, Hitomi and Takaya, Daisuke and Sasaki, Shunta and Tanaka, Akiko and Honma, Teruki
    Journal of chemical information and modeling, 2012, 52(4), 1015-1026
    PMID: 22424085     doi: 10.1021/ci200562p
    In this study, machine learning using support vector machine was combined with three-dimensional (3D) molecular shape overlay, to improve the screening efficiency. Since the 3D molecular shape overlay does not use fingerprints or descriptors to compare two compounds, unlike 2D similarity methods, the application of machine learning to a 3D shape-based method has not been extensively investigated. The 3D similarity profile of a compound is defined as the array of 3D shape similarities with multiple known active compounds of the target protein and is used as the explanatory variable of support vector machine. As the measures of 3D shape similarity for our new prediction models, the prediction performances of the 3D shape similarity metrics implemented in ROCS, such as ShapeTanimoto and ScaledColor, were validated, using the known inhibitors of 15 target proteins derived from the ChEMBL database. The learning models based on the 3D similarity profiles stably outperformed the original ROCS when more than 10 known inhibitors were available as the queries. The results demonstrated the advantages of combining machine learning with the 3D similarity profile to process the 3D shape information of plural active compounds.

  • Improving Classical Substructure-Based Virtual Screening to Handle Extrapolation Challenges.
    Biniashvili, Tammy and Schreiber, Ehud and Kliger, Yossef
    Journal of chemical information and modeling, 2012, 52(3), 678-685
    PMID: 22360790     doi: 10.1021/ci200472s
    Target-oriented substructure-based virtual screening (sSBVS) of molecules is a promising approach in drug discovery. Yet, there are doubts whether sSBVS is suitable also for extrapolation, that is, for detecting molecules that are very different from those used for training. Herein, we evaluate the predictive power of classic virtual screening methods, namely, similarity searching using Tanimoto coefficient (MTC) and Naive Bayes (NB). As could be expected, these classic methods perform better in interpolation than in extrapolation tasks. Consequently, to enhance the predictive ability for extrapolation tasks, we introduce the Shadow approach, in which inclusion relations between substructures are considered, as opposed to the classic sSBVS methods that assume independence between substructures. Specifically, we discard contributions from substructures included in ("shaded" by) others which are, in turn, included in the molecule of interest. Indeed, the Shadow classifier significantly outperforms both MTC (pValue

  • A reverse combination of structure-based and ligand-based strategies for virtual screening.
    Cortés-Cabrera, Alvaro and Gago, Federico and Morreale, Antonio
    Journal of computer-aided molecular design, 2012, 26(3), 319-327
    PMID: 22395903     doi: 10.1007/s10822-012-9558-x
    A new approach is presented that combines structure- and ligand-based virtual screening in a reverse way. Opposite to the majority of the methods, a docking protocol is first employed to prioritize small ligands ("fragments") that are subsequently used as queries to search for similar larger ligands in a database. For a given chemical library, a three-step strategy is followed consisting of (1) contraction into a representative, non-redundant, set of fragments, (2) selection of the three best-scoring fragments docking into a given macromolecular target site, and (3) expansion of the fragments' structures back into ligands by using them as queries to search the library by means of fingerprint descriptions and similarity criteria. We tested the performance of this approach on a collection of fragments and ligands found in the ZINC database and the directory of useful decoys, and compared the results with those obtained using a standard docking protocol. The new method provided better overall results and was several times faster. We also studied the chemical diversity that both methods cover using an in-house compound library and concluded that the novel approach performs similarly but at a much smaller computational cost.

  • Core Site-Moiety Maps Reveal Inhibitors and Binding Mechanisms of Orthologous Proteins by Screening Compound Libraries
    Hsu, Kai-Cheng and Cheng, Wen-Chi and Chen, Yen-Fu and Wang, Hung-Jung and Li, Ling-Ting and Wang, Wen-Ching and Yang, Jinn-Moon
    PloS one, 2012, 7(2), e32142
    doi: 10.1371/journal.pone.0032142.g007
    Members of protein families often share conserved structural subsites for interaction with chemically similar moieties despite low sequence identity. We propose a core site-moiety map of multiple proteins (called CoreSiMMap) to discover inhibitors and mechanisms by profiling subsite-moiety interactions of immense screening compounds. The consensus anchor, the subsite-moiety interactions with statistical significance, of a CoreSiMMap can be regarded as a ``hot spot'' that represents the conserved binding environments involved in biological functions. Here, we derive the CoreSiMMap with six consensus anchors and identify six inhibitors (IC50,8.0 mM) of shikimate kinases (SKs) of Mycobacterium tuberculosis and Helicobacter pylori from the NCI database (236,962 compounds). Studies of site-directed mutagenesis and analogues reveal that these conserved interacting residues and moieties contribute to pocket-moiety interaction spots and biological functions. These results reveal that our multi-target screening strategy and the CoreSiMMap can increase the accuracy of screening in the identification of novel inhibitors and subsite-moiety environments for elucidating the binding mechanisms of targets.

  • Pose prediction and virtual screening performance of GOLD scoring functions in a standardized test.
    Liebeschuetz, John W and Cole, Jason C and Korb, Oliver
    Journal of computer-aided molecular design, 2012, 26(6), 737-748
    PMID: 22371207     doi: 10.1007/s10822-012-9551-4
    The performance of all four GOLD scoring functions has been evaluated for pose prediction and virtual screening under the standardized conditions of the comparative docking and scoring experiment reported in this Edition. Excellent pose prediction and good virtual screening performance was demonstrated using unmodified protein models and default parameter settings. The best performing scoring function for both pose prediction and virtual screening was demonstrated to be the recently introduced scoring function ChemPLP. We conclude that existing docking programs already perform close to optimally in the cognate pose prediction experiments currently carried out and that more stringent pose prediction tests should be used in the future. These should employ cross-docking sets. Evaluation of virtual screening performance remains problematic and much remains to be done to improve the usefulness of publically available active and decoy sets for virtual screening. Finally we suggest that, for certain target/scoring function combinations, good enrichment may sometimes be a consequence of 2D property recognition rather than a modelling of the correct 3D interactions.

  • Virtual fragment screening: exploration of MM-PBSA re-scoring.
    Kawatkar, Sameer and Moustakas, Demetri and Miller, Matthew and Joseph-McCarthy, Diane
    Journal of computer-aided molecular design, 2012, 26(8), 921-934
    PMID: 22869295     doi: 10.1007/s10822-012-9590-x
    An NMR fragment screening dataset with known binders and decoys was used to evaluate the ability of docking and re-scoring methods to identify fragment binders. Re-scoring docked poses using the Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) implicit solvent model identifies additional active fragments relative to either docking or random fragment screening alone. Early enrichment, which is clearly most important in practice for selecting relatively small sets of compounds for experimental testing, is improved by MM-PBSA re-scoring. In addition, the value in MM-PBSA re-scoring of docked poses for virtual screening may be in lessening the effect of the variation in the protein complex structure used.

  • Systematic assessment of scaffold distances in ChEMBL: prioritization of compound data sets for scaffold hopping analysis in virtual screening.
    Li, Ruifang and Bajorath, Jürgen
    Journal of computer-aided molecular design, 2012, 26(10), 1101-1109
    PMID: 22972561     doi: 10.1007/s10822-012-9603-9
    The evaluation of the scaffold hopping potential of computational methods is of high relevance for virtual screening. For benchmark calculations, classes of known active compounds are utilized. Ideally, such classes should have a well-defined content of structurally diverse scaffolds. However, in reported benchmark investigations, the choice of activity classes is often difficult to rationalize. To provide a compendium of well-characterized test cases for the assessment of scaffold hopping potential, structural distances between scaffolds were systematically calculated for compound classes available in the ChEMBL database. Nearly seven million scaffold pairs were evaluated. On the basis of the global scaffold distance distribution, a threshold value for large scaffold distances was determined. Compound data sets were ranked based on the proportion of scaffold pairs with large distances they contained, taking additional criteria into account that are relevant for virtual screening. A set of 50 activity classes is provided that represent attractive test cases for scaffold hopping analysis and benchmark calculations.

  • Virtual Target Screening: Validation Using Kinase Inhibitors.
    Santiago, Daniel N and Pevzner, Yuri and Durand, Ashley A and Tran, Minhphuong and Scheerer, Rachel R and Daniel, Kenyon and Sung, Shen-Shu and Lee Woodcock, H and Guida, Wayne C and Brooks, Wesley H
    Journal of chemical information and modeling, 2012, 52(8), 2192-2203
    PMID: 22747098     doi: 10.1021/ci300073m
    Computational methods involving virtual screening could potentially be employed to discover new biomolecular targets for an individual molecule of interest (MOI). However, existing scoring functions may not accurately differentiate proteins to which the MOI binds from a larger set of macromolecules in a protein structural database. An MOI will most likely have varying degrees of predicted binding affinities to many protein targets. However, correctly interpreting a docking score as a hit for the MOI docked to any individual protein can be problematic. In our method, which we term "Virtual Target Screening (VTS)", a set of small drug-like molecules are docked against each structure in the protein library to produce benchmark statistics. This calibration provides a reference for each protein so that hits can be identified for an MOI. VTS can then be used as tool for: drug repositioning (repurposing), specificity and toxicity testing, identifying potential metabolites, probing protein structures for allosteric sites, and testing focused libraries (collection of MOIs with similar chemotypes) for selectivity. To validate our VTS method, twenty kinase inhibitors were docked to a collection of calibrated protein structures. Here, we report our results where VTS predicted protein kinases as hits in preference to other proteins in our database. Concurrently, a graphical interface for VTS was developed.

  • Cheminformatics Meets Molecular Mechanics: A Combined Application of Knowledge-Based Pose Scoring and Physical Force Field-Based Hit Scoring Functions Improves the Accuracy of Structure-Based Virtual Screening
    Hsieh, Jui-Hua and Yin, Shuangye and Wang, Xiang S and Liu, Shubin and Dokholyan, Nikolay V and Tropsha, Alexander
    Journal of chemical information and modeling, 2012, 52(1), 16-28
    PMID: 22017385     doi: 10.1021/ci2002507

  • Virtual screening data fusion using both structure- and ligand-based methods.
    Svensson, Fredrik and Karlén, Anders and Sköld, Christian
    Journal of chemical information and modeling, 2012, 52(1), 225-232
    PMID: 22148635     doi: 10.1021/ci2004835
    Virtual screening is widely applied in drug discovery, and significant effort has been put into improving current methods. In this study, we have evaluated the performance of compound ranking in virtual screening using five different data fusion algorithms on a total of 16 data sets. The data were generated by docking, pharmacophore search, shape similarity, and electrostatic similarity, spanning both structure- and ligand-based methods. The algorithms used for data fusion were sum rank, rank vote, sum score, Pareto ranking, and parallel selection. None of the fusion methods require any prior knowledge or input other than the results from the single methods and, thus, are readily applicable. The results show that compound ranking using data fusion improves the performance and consistency of virtual screening compared to the single methods alone. The best performing data fusion algorithm was parallel selection, but both rank voting and Pareto ranking also have good performance.

  • ZINC: A Free Tool to Discover Chemistry for Biology.
    Irwin, John J and Sterling, Teague and Mysinger, Michael M. and Bolstad, Erin S and Coleman, Ryan G
    Journal of chemical information and modeling, 2012, 52(7), 1757-1768
    PMID: 22587354     doi: 10.1021/ci3001277
    ZINC is a free public resource for ligand discovery. The database contains over twenty million commercially available molecules in biologically relevant representations that may be downloaded in popular ready-to-dock formats and subsets. The Web site also enables searches by structure, biological activity, physical property, vendor, catalog number, name, and CAS number. Small custom subsets may be created, edited, shared, docked, downloaded, and conveyed to a vendor for purchase. The database is maintained and curated for a high purchasing success rate and is freely available at .

  • Kinase-Kernel Models: Accurate In silico Screening of 4 Million Compounds Across the Entire Human Kinome.
    Martin, Eric and Mukherjee, Prasenjit
    Journal of chemical information and modeling, 2012, 52(1), 156-170
    PMID: 22133092     doi: 10.1021/ci200314j
    Reliable in silico prediction methods promise many advantages over experimental high-throughput screening (HTS): vastly lower time and cost, affinity magnitude estimates, no requirement for a physical sample, and a knowledge-driven exploration of chemical space. For the specific case of kinases, given several hundred experimental IC(50) training measurements, the empirically parametrized profile-quantitative structure-activity relationship (profile-QSAR) and surrogate AutoShim methods developed at Novartis can predict IC(50) with a reliability approaching experimental HTS. However, in the absence of training data, prediction is much harder. The most common a priori prediction method is docking, which suffers from many limitations: It requires a protein structure, is slow, and cannot predict affinity. (1) Highly accurate profile-QSAR (2) models have now been built for roughly 100 kinases covering most of the kinome. Analyzing correlations among neighboring kinases shows that near neighbors share a high degree of SAR similarity. The novel chemogenomic kinase-kernel method reported here predicts activity for new kinases as a weighted average of predicted activities from profile-QSAR models for nearby neighbor kinases. Three different factors for weighting the neighbors were evaluated: binding site sequence identity to the kinase neighbors, similarity of the training set for each neighbor model to the compound being predicted, and accuracy of each neighbor model. Binding site sequence identity was by far most important, followed by chemical similarity. Model quality had almost no relevance. The median R(2)

  • Do crystal structures obviate the need for theoretical models of GPCRs for structure based virtual screening.
    Tang, Hao and Wang, Xiang Simon and Hsieh, Jui-Hua and Tropsha, Alexander
    Proteins, 2012, 80(6), 1503-1521
    PMID: 22275072     doi: 10.1002/prot.24035
    Recent highly expected structural characterizations of agonist-bound and antagonist-bound beta-2 adrenoreceptor ($\beta$2AR) by X-ray crystallography have been widely regarded as critical advances to enable more effective structure-based discovery of GPCRs ligands. It appears that this very important development may have undermined many previous efforts to develop 3D theoretical models of GPCRs. To address this question directly we have compared several historical $\beta$2AR models versus the inactive state and nanobody-stabilized active state of $\beta$2AR crystal structures in terms of their structural similarity and effectiveness of use in virtual screening for $\beta$2AR specific agonists and antagonists. Theoretical models, incluing both homology and de novo types, were collected from five different groups who have published extensively in the field of GPCRs modeling; all models were built before X-ray structures became available. In general, $\beta$2AR theoretical models differ significantly from the crystal structure in terms of TMH definition and the global packing. Nevertheless, surprisingly, several models afforded hit rates resulting from virtual screening of large chemical library enriched by known $\beta$2AR ligands that exceeded those using X-ray structures; the hit rates were particularly higher for agonists. Furthemore, the screening performance of models is associated with local structural quality such as the RMSDs for binding pocket residues and the ability to capture accurately most if not all critical protein/ligand interactions. These results suggest that carefully built models of GPCRs could capture critical chemical and structural features of the binding pocket thus may be even more useful for practical structure-based drug discovery than X-ray structures. Proteins 2012.

  • Ligand expansion in ligand-based virtual screening using relevance feedback.
    Abdo, Ammar and Saeed, Faisal and Hamza, Hentabli and Ahmed, Ali and Salim, Naomie
    Journal of computer-aided molecular design, 2012, 26(3), 279-287
    PMID: 22249773     doi: 10.1007/s10822-012-9543-4
    Query expansion is the process of reformulating an original query to improve retrieval performance in information retrieval systems. Relevance feedback is one of the most useful query modification techniques in information retrieval systems. In this paper, we introduce query expansion into ligand-based virtual screening (LBVS) using the relevance feedback technique. In this approach, a few high-ranking molecules of unknown activity are filtered from the outputs of a Bayesian inference network based on a single ligand molecule to form a set of ligand molecules. This set of ligand molecules is used to form a new ligand molecule. Simulated virtual screening experiments with the MDL Drug Data Report and maximum unbiased validation data sets show that the use of ligand expansion provides a very simple way of improving the LBVS, especially when the active molecules being sought have a high degree of structural heterogeneity. However, the effectiveness of the ligand expansion is slightly less when structurally-homogeneous sets of actives are being sought.

  • Flexibility and binding affinity in protein-ligand, protein-protein and multi-component protein interactions: limitations of current computational approaches.
    Tuffery, Pierre and Derreumaux, Philippe
    Journal of the Royal Society, Interface / the Royal Society, 2012, 9(66), 20-33
    PMID: 21993006     doi: 10.1098/rsif.2011.0584
    The recognition process between a protein and a partner represents a significant theoretical challenge. In silico structure-based drug design carried out with nothing more than the three-dimensional structure of the protein has led to the introduction of many compounds into clinical trials and numerous drug approvals. Central to guiding the discovery process is to recognize active among non-active compounds. While large-scale computer simulations of compounds taken from a library (virtual screening) or designed de novo are highly desirable in the post-genomic area, many technical problems remain to be adequately addressed. This article presents an overview and discusses the limits of current computational methods for predicting the correct binding pose and accurate binding affinity. It also presents the performances of the most popular algorithms for exploring binary and multi-body protein interactions.

  • Computational fragment-based screening using RosettaLigand: the SAMPL3 challenge.
    Kumar, Ashutosh and Zhang, Kam Y J
    Journal of computer-aided molecular design, 2012, 26(5), 603-616
    PMID: 22246345     doi: 10.1007/s10822-011-9523-0
    SAMPL3 fragment based virtual screening challenge provides a valuable opportunity for researchers to test their programs, methods and screening protocols in a blind testing environment. We participated in SAMPL3 challenge and evaluated our virtual fragment screening protocol, which involves RosettaLigand as the core component by screening a 500 fragments Maybridge library against bovine pancreatic trypsin. Our study reaffirmed that the real test for any virtual screening approach would be in a blind testing environment. The analyses presented in this paper also showed that virtual screening performance can be improved, if a set of known active compounds is available and parameters and methods that yield better enrichment are selected. Our study also highlighted that to achieve accurate orientation and conformation of ligands within a binding site, selecting an appropriate method to calculate partial charges is important. Another finding is that using multiple receptor ensembles in docking does not always yield better enrichment than individual receptors. On the basis of our results and retrospective analyses from SAMPL3 fragment screening challenge we anticipate that chances of success in a fragment screening process could be increased significantly with careful selection of receptor structures, protein flexibility, sufficient conformational sampling within binding pocket and accurate assignment of ligand and protein partial charges.

  • COPICAT: A software system for predicting interactions between proteins and chemical compounds.
    Sakakibara, Yasubumi and Hachiya, Tsuyoshi and Uchida, Miho and Nagamine, Nobuyoshi and Sugawara, Yohei and Yokota, Masahiro and Nakamura, Masaomi and Popendorf, Kris and Komori, Takashi and Sato, Kengo
    Bioinformatics (Oxford, England), 2012, 28(5), 745-746
    PMID: 22257668     doi: 10.1093/bioinformatics/bts031
    SUMMARY: Since tens of millions of chemical compounds have been accumulated in public chemical databases, fast comprehensive computational methods to predict interactions between chemical compounds and proteins are needed for virtual screening of lead compounds. Previously, we proposed a novel method for predicting protein-chemical interactions using two-layer Support Vector Machine classifiers that require only readily available biochemical data, i.e., amino acid sequences of proteins and structure formulas of chemical compounds.In this paper, the method has been implemented as the COPICAT web service, with an easy-to-use front-end interface. Users can simply submit a protein-chemical interaction prediction job using a pre-trained classifier, or can even train their own classification model by uploading training data. COPICAT's fast and accurate computational prediction has enhanced lead compound discovery against a database of tens of millions of chemical compounds, implying that the search space for drug discovery is extended by more than 1,000 times compared with currently well-used high-throughput screening methodologies. AVAILABILITY: The COPICAT server is available at All functions, including the prediction function are freely available via anonymous login without registration. Registered users, however, can use the system more intensively. CONTACT:

  • Enrichment of virtual hits by progressive shape-matching and docking.
    Choi, Jiwon and He, Ningning and Kim, Nayoung and Yoon, Sukjoon
    Journal of molecular graphics & modelling, 2012, 32, 82-88
    PMID: 22088763     doi: 10.1016/j.jmgm.2011.10.002
    The main applications of virtual chemical screening include the selection of a minimal receptor-relevant subset of a chemical library with a maximal chemical diversity. We have previously reported that the combination of ligand-centric and receptor-centric virtual screening methods may provide a compromise between computational time and accuracy during the hit enrichment process. In the present work, we propose a "progressive distributed docking" method that improves the virtual screening process using an iterative combination of shape-matching and docking steps. Known ligands with low docking scores were used as initial 3D templates for the shape comparisons with the chemical library. Next, new compounds with good template shape matches and low receptor docking scores were selected for the next round of shape searching and docking. The present iterative virtual screening process was tested for enriching peroxisome proliferator-activated receptor and phosphoinositide 3-kinase relevant compounds from a selected subset of the chemical libraries. It was demonstrated that the iterative combination improved the lead-hopping practice by improving the chemical diversity in the selected list of virtual hits.

  • Virtual screening for compounds that mimic protein-protein interface epitopes.
    Geppert, Tim and Reisen, Felix and Pillong, Max and Hähnke, Volker and Tanrikulu, Yusuf and Koch, Christian P and Perna, Anna Maria and Perez, Tatiana Batista and Schneider, Petra and Schneider, Gisbert
    Journal of computational chemistry, 2012, 33(5), 573-579
    PMID: 22162049     doi: 10.1002/jcc.22894
    Modulation of protein-protein interactions (PPI) has emerged as a new concept in rational drug design. Here, we present a computational protocol for identifying potential PPI inhibitors. Relevant regions of interfaces (epitopes) are predicted for three-dimensional protein models and serve as queries for virtual compound screening. We present a computational screening protocol that incorporates two different pharmacophore models. One model is based on the mathematical concept of autocorrelation vectors and the other utilizes fuzzy labeled graphs. In a proof-of-concept study, we were able to identify serine protease inhibitors using a predicted trypsin epitope as query. Our virtual screening framework may be suited for rapid identification of PPI inhibitors and suggesting bioactive tool compounds. Copyright for JCC Journal:


  • Potency-directed similarity searching using support vector machines.
    Wassermann, Anne M and Heikamp, Kathrin and Bajorath, Jürgen
    Chemical biology & drug design, 2011, 77(1), 30-38
    PMID: 21114788     doi: 10.1111/j.1747-0285.2010.01059.x
    Support vector machine modeling has become increasingly popular in chemoinformatics. Recently, several advanced support vector machine applications have been reported including, among others, multitask learning for ligand-target prediction. Here, we introduce another support vector machine approach to add compound potency information to similarity searching and enrich database selection sets with potent hits. For this purpose, we introduce a structure-activity kernel function and a potency-oriented support vector machine linear combination approach. Using fingerprint descriptors, potency-directed support vector machine searching has been successfully applied to four high-throughput screening data sets, and different support vector machine strategies have been compared. For potency-balanced compound reference sets, potency-directed support vector machine searching meets or exceeds recall rates of standard support vector machine calculations but detects many more potent hits.

  • How do 2D fingerprints detect structurally diverse active compounds? Revealing compound subset-specific fingerprint features through systematic selection.
    Heikamp, Kathrin and Bajorath, Jürgen
    Journal of chemical information and modeling, 2011, 51(9), 2254-2265
    PMID: 21793563     doi: 10.1021/ci200275m
    In independent studies it has previously been demonstrated that two-dimensional (2D) fingerprints have scaffold hopping ability in virtual screening, although these descriptors primarily emphasize structural and/or topological resemblance of reference and database compounds. However, the mechanism by which such fingerprints enrich structurally diverse molecules in database selection sets is currently little understood. In order to address this question, similarity search calculations on 120 compound activity classes of varying structural diversity were carried out using atom environment fingerprints. Two feature selection methods, Kullback-Leibler divergence and gain ratio analysis, were applied to systematically reduce these fingerprints and generate alternative versions for searching. Gain ratio is a feature selection method from information theory that has thus far not been considered in fingerprint analysis. However, it is shown here to be an effective fingerprint feature selection approach. Following comparative feature selection and similarity searching, the compound recall characteristics of original and reduced fingerprint versions were analyzed in detail. Small sets of fingerprint features were found to distinguish subsets of active compounds from other database molecules. The compound recall of fingerprint similarity searching often resulted from a cumulative detection of distinct compound subsets by different fingerprint features, which provided a rationale for the scaffold hopping potential of these 2D fingerprints.

  • Virtual decoy sets for molecular docking benchmarks.
    Wallach, Izhar and Lilien, Ryan
    Journal of chemical information and modeling, 2011, 51(2), 196-202
    PMID: 21207928     doi: 10.1021/ci100374f
    Virtual docking algorithms are often evaluated on their ability to separate active ligands from decoy molecules. The current state-of-the-art benchmark, the Directory of Useful Decoys (DUD), minimizes bias by including decoys from a library of synthetically feasible molecules that are physically similar yet chemically dissimilar to the active ligands. We show that by ignoring synthetic feasibility, we can compile a benchmark that is comparable to the DUD and less biased with respect to physical similarity.

  • SHAFTS: A Hybrid Approach for 3D Molecular Similarity Calculation. 1. Method and Assessment of Virtual Screening.
    Liu, Xiaofeng and Jiang, Hualiang and Li, Honglin
    Journal of chemical information and modeling, 2011, 51(9), 2372-2385
    PMID: 21819157     doi: 10.1021/ci200060s
    We developed a novel approach called SHAFTS (SHApe-FeaTure Similarity) for 3D molecular similarity calculation and ligand-based virtual screening. SHAFTS adopts a hybrid similarity metric combined with molecular shape and colored (labeled) chemistry groups annotated by pharmacophore features for 3D similarity calculation and ranking, which is designed to integrate the strength of pharmacophore matching and volumetric overlay approaches. A feature triplet hashing method is used for fast molecular alignment poses enumeration, and the optimal superposition between the target and the query molecules can be prioritized by calculating corresponding "hybrid similarities". SHAFTS is suitable for large-scale virtual screening with single or multiple bioactive compounds as the query "templates" regardless of whether corresponding experimentally determined conformations are available. Two public test sets (DUD and Jain's sets) including active and decoy molecules from a panel of useful drug targets were adopted to evaluate the virtual screening performance. SHAFTS outperformed several other widely used virtual screening methods in terms of enrichment of known active compounds as well as novel chemotypes, thereby indicating its robustness in hit compounds identification and potential of scaffold hopping in virtual screening.

  • FRED pose prediction and virtual screening accuracy.
    McGann, Mark
    Journal of chemical information and modeling, 2011, 51(3), 578-596
    PMID: 21323318     doi: 10.1021/ci100436p
    Results of a previous docking study are reanalyzed and extended to include results from the docking program FRED and a detailed statistical analysis of both structure reproduction and virtual screening results. FRED is run both in a traditional docking mode and in a hybrid mode that makes use of the structure of a bound ligand in addition to the protein structure to screen molecules. This analysis shows that most docking programs are effective overall but highly inconsistent, tending to do well on one system and poorly on the next. Comparing methods, the difference in mean performance on DUD is found to be statistically significant (95% confidence) 61% of the time when using a global enrichment metric (AUC). Early enrichment metrics are found to have relatively poor statistical power, with 0.5% early enrichment only able to distinguish methods to 95% confidence 14% of the time.

  • Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers.
    Collignon, Barbara and Schulz, Roland and Smith, Jeremy C and Baudry, Jerome
    Journal of computational chemistry, 2011, 32(6), 1202-1209
    PMID: 21387347     doi: 10.1002/jcc.21696
    A message passing interface (MPI)-based implementation (Autodock4.lga.MPI) of the grid-based docking program Autodock4 has been developed to allow simultaneous and independent docking of multiple compounds on up to thousands of central processing units (CPUs) using the Lamarkian genetic algorithm. The MPI version reads a single binary file containing precalculated grids that represent the protein-ligand interactions, i.e., van der Waals, electrostatic, and desolvation potentials, and needs only two input parameter files for the entire docking run. In comparison, the serial version of Autodock4 reads ASCII grid files and requires one parameter file per compound. The modifications performed result in significantly reduced input/output activity compared with the serial version. Autodock4.lga.MPI scales up to 8192 CPUs with a maximal overhead of 16.3%, of which two thirds is due to input/output operations and one third originates from MPI operations. The optimal docking strategy, which minimizes docking CPU time without lowering the quality of the database enrichments, comprises the docking of ligands preordered from the most to the least flexible and the assignment of the number of energy evaluations as a function of the number of rotatable bounds. In 24 h, on 8192 high-performance computing CPUs, the present MPI version would allow docking to a rigid protein of about 300K small flexible compounds or 11 million rigid compounds.

  • Substantial improvements in large-scale redocking and screening using the novel HYDE scoring function.
    Schneider, Nadine and Hindle, Sally and Lange, Gudrun and Klein, Robert and Albrecht, Jürgen and Briem, Hans and Beyer, Kristin and Clau{\ss}en, Holger and Gastreich, Marcus and Lemmen, Christian and Rarey, Matthias
    Journal of computer-aided molecular design, 2011, 26(6), 701-723
    PMID: 22203423     doi: 10.1007/s10822-011-9531-0
    The HYDE scoring function consistently describes hydrogen bonding, the hydrophobic effect and desolvation. It relies on HYdration and DEsolvation terms which are calibrated using octanol/water partition coefficients of small molecules. We do not use affinity data for calibration, therefore HYDE is generally applicable to all protein targets. HYDE reflects the Gibbs free energy of binding while only considering the essential interactions of protein-ligand complexes. The greatest benefit of HYDE is that it yields a very intuitive atom-based score, which can be mapped onto the ligand and protein atoms. This allows the direct visualization of the score and consequently facilitates analysis of protein-ligand complexes during the lead optimization process. In this study, we validated our new scoring function by applying it in large-scale docking experiments. We could successfully predict the correct binding mode in 93% of complexes in redocking calculations on the Astex diverse set, while our performance in virtual screening experiments using the DUD dataset showed significant enrichment values with a mean AUC of 0.77 across all protein targets with little or no structural defects. As part of these studies, we also carried out a very detailed analysis of the data that revealed interesting pitfalls, which we highlight here and which should be addressed in future benchmark datasets.

  • Evaluation of docking performance in a blinded virtual screening of fragment-like trypsin inhibitors.
    Surpateanu, Georgiana and Iorga, Bogdan I
    Journal of computer-aided molecular design, 2011, 26(5), 595-601
    PMID: 22180049     doi: 10.1007/s10822-011-9526-x
    In this study, we have "blindly" assessed the ability of several combinations of docking software and scoring functions to predict the binding of a fragment-like library of bovine trypsine inhibitors. The most suitable protocols (involving Gold software and GoldScore scoring function, with or without rescoring) were selected for this purpose using a training set of compounds with known biological activities. The selected virtual screening protocols provided good results with the SAMPL3-VS dataset, showing enrichment factors of about 10 for Top 20 compounds. This methodology should be useful in difficult cases of docking, with a special emphasis on the fragment-based virtual screening campaigns.

  • Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors.
    Gatica, Edgar A and Cavasotto, Claudio N
    Journal of chemical information and modeling, 2011, 52(1), 1-6
    PMID: 22168315     doi: 10.1021/ci200412p
    We compiled a G protein-coupled receptor (GPCR) ligand library (GLL) for 147 targets, selecting for each ligand 39 decoy molecules, collected in the GPCR Decoy Database (GDD). Decoys were chosen ensuring a ligand-decoy similarity of six physical properties, while enforcing ligand-decoy chemical dissimilarity. The performance in docking of the GDD was evaluated on 19 GPCRs, showing a marked decrease in enrichment compared to bias-uncorrected decoy sets. Both the GLL and GDD are freely available for the scientific community.

  • DEKOIS: Demanding Evaluation Kits for Objective in Silico Screening - A Versatile Tool for Benchmarking Docking Programs and Scoring Functions.
    Vogel, Simon M and Bauer, Matthias R and Boeckler, Frank M
    Journal of chemical information and modeling, 2011, 51(10), 2650-2665
    PMID: 21774552     doi: 10.1021/ci2001549
    For widely applied in silico screening techniques success depends on the rational selection of an appropriate method. We herein present a fast, versatile, and robust method to construct demanding evaluation kits for objective in silico screening (DEKOIS). This automated process enables creating tailor-made decoy sets for any given sets of bioactives. It facilitates a target-dependent validation of docking algorithms and scoring functions helping to save time and resources. We have developed metrics for assessing and improving decoy set quality and employ them to investigate how decoy embedding affects docking. We demonstrate that screening performance is target-dependent and can be impaired by latent actives in the decoy set (LADS) or enhanced by poor decoy embedding. The presented method allows extending and complementing the collection of publicly available high quality decoy sets toward new target space. All present and future DEKOIS data sets will be made accessible at .

  • Virtual Screening for Lead Discovery
    Tang, Yat T. and Marshall, Garland R.
    , 2011, 1-22
    doi: 10.1007/978-1-61779-012-6_1
    Abstract The identification of small drug-like compounds that selectively inhibit the function of biological targets has historically been a major focus in the pharmaceutical industry, and in recent years, has generated much interest in academia as well. Drug-like compounds ...

  • Virtual Screening for Lead Discovery
    Tang, Yat T. and Marshall, Garland R.
    , 2011, 1-22
    doi: 10.1007/978-1-61779-012-6_1
    Abstract The identification of small drug-like compounds that selectively inhibit the function of biological targets has historically been a major focus in the pharmaceutical industry, and in recent years, has generated much interest in academia as well. Drug-like compounds ...

  • PLS-DA - Docking Optimized Combined Energetic Terms (PLSDA-DOCET) protocol: a brief evaluation.
    Avram, Sorin and Pacureanu, Liliana Mioara and Seclaman, Edward and Bora, Alina and Kurunczi, Ludovic G
    Journal of chemical information and modeling, 2011, 51(12), 3169-3179
    PMID: 22066983     doi: 10.1021/ci2002268
    Docking studies have become popular approaches in drug design, where the binding energy of the ligand in the active site of the protein is estimated by a scoring function. Many promising techniques were developed to enhance the performance of scoring functions including the fusion of multiple scoring functions outcomes into a so-called consensus scoring function. Hereby, we evaluated the target oriented consensus technique using the energetic terms of several scoring functions. The approach was denoted PLSDA-DOCET. Optimization strategies for consensus energetic terms and scoring functions based on ROC metric were compared to classical rigid docking and to ligand-based similarity search methods comprising 2D fingerprints and ROCS. The ROCS results indicate large performance variations depending on the biological target. The AUC-based strategy of PLSDA-DOCET outperformed the other docking approaches regarding simple retrieval and scaffold-hopping. The superior performance of PLSDA-DOCET protocol relative to single and combined scoring functions was validated on an external test set. We found a relative low mean correlation of the ranks of the chemotypes retrieved by the PLSDA-DOCET protocol and all the other methods employed here.

  • Predicting the performance of fingerprint similarity searching.
    Vogt, Martin and Bajorath, Jürgen
    Methods in molecular biology (Clifton, N.J.), 2011, 672, 159-173
    PMID: 20838968     doi: 10.1007/978-1-60761-839-3_6
    Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the "background." By quantifying the difference in feature distribution using the Kullback-Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.

  • ReverseScreen3D: a structure-based ligand matching method to identify protein targets.
    Kinnings, Sarah L and Jackson, Richard M
    Journal of chemical information and modeling, 2011, 51(3), 624-634
    PMID: 21361385     doi: 10.1021/ci1003174
    Ligand promiscuity, which is now recognized as an extremely common phenomenon, is a major underlying cause of drug toxicity. We have developed a new reverse virtual screening (VS) method called ReverseScreen3D, which can be used to predict the potential protein targets of a query compound of interest. The method uses a 2D fingerprint-based method to select a ligand template from each unique binding site of each protein within a target database. The target database contains only the structurally determined bioactive conformations of known ligands. The 2D comparison is followed by a 3D structural comparison to the selected query ligand using a geometric matching method, in order to prioritize each target binding site in the database. We have evaluated the performance of the ReverseScreen2D and 3D methods using a diverse set of small molecule protein inhibitors known to have multiple targets, and have shown that they are able to provide a highly significant enrichment of true targets in the database. Furthermore, we have shown that the 3D structural comparison improves early enrichment when compared with the 2D method alone, and that the 3D method performs well even in the absence of 2D similarity to the template ligands. By carrying out further experimental screening on the prioritized list of targets, it may be possible to determine the potential targets of a new compound or determine the off-targets of an existing drug. The ReverseScreen3D method has been incorporated into a Web server, which is freely available at .

  • BEAR, a novel virtual screening methodology for drug discovery.
    Degliesposti, Gianluca and Portioli, Corinne and Parenti, Marco Daniele and Rastelli, Giulio
    Journal of biomolecular screening, 2011, 16(1), 129-133
    PMID: 21084717     doi: 10.1177/1087057110388276
    BEAR (binding estimation after refinement) is a new virtual screening technology based on the conformational refinement of docking poses through molecular dynamics and prediction of binding free energies using accurate scoring functions. Here, the authors report the results of an extensive benchmark of the BEAR performance in identifying a smaller subset of known inhibitors seeded in a large (1.5 million) database of compounds. BEAR performance proved strikingly better if compared with standard docking screening methods. The validations performed so far showed that BEAR is a reliable tool for drug discovery. It is fast, modular, and automated, and it can be applied to virtual screenings against any biological target with known structure and any database of compounds.

  • Using consensus-shape clustering to identify promiscuous ligands and protein targets and to choose the right query for shape-based virtual screening.
    Pérez-Nueno, Violeta I and Ritchie, David W
    Journal of chemical information and modeling, 2011, 51(6), 1233-1248
    PMID: 21604699     doi: 10.1021/ci100492r
    Ligand-based shape matching approaches have become established as important and popular virtual screening (VS) techniques. However, despite their relative success, many authors have discussed how best to choose the initial query compounds and which of their conformations should be used. Furthermore, it is increasingly the case that pharmaceutical companies have multiple ligands for a given target and these may bind in different ways to the same pocket. Conversely, a given ligand can sometimes bind to multiple targets, and this is clearly of great importance when considering drug side-effects. We recently introduced the notion of spherical harmonic-based "consensus shapes" to help deal with these questions. Here, we apply a consensus shape clustering approach to the 40 protein-ligand targets in the DUD data set using PARASURF/PARAFIT. Results from clustering show that in some cases the ligands for a given target are split into two subgroups which could suggest they bind to different subsites of the same target. In other cases, our clustering approach sometimes groups together ligands from different targets, and this suggests that those ligands could bind to the same targets. Hence spherical harmonic-based clustering can rapidly give cross-docking information while avoiding the expense of performing all-against-all docking calculations. We also report on the effect of the query conformation on the performance of shape-based screening of the DUD data set and the potential gain in screening performance by using consensus shapes calculated in different ways. We provide details of our analysis of shape-based screening using both PARASURF/PARAFIT and ROCS, and we compare the results obtained with shape-based and conventional docking approaches using MSSH/SHEF and GOLD. The utility of each type of query is analyzed using commonly reported statistics such as enrichment factors (EF) and receiver-operator-characteristic (ROC) plots as well as other early performance metrics.

  • Pharmacophore-based virtual screening.
    Horvath, Dragos
    Methods in molecular biology (Clifton, N.J.), 2011, 672, 261-298
    PMID: 20838973     doi: 10.1007/978-1-60761-839-3_11
    This chapter is a review of the most recent developments in the field of pharmacophore modeling, covering both methodology and application. Pharmacophore-based virtual screening is nowadays a mature technology, very well accepted in the medicinal chemistry laboratory. Nevertheless, like any empirical approach, it has specific limitations and efforts to improve the methodology are still ongoing. Fundamentally, the core idea of "stripping" functional groups of their actual chemical nature in order to classify them into very few pharmacophore types, according to their dominant physico-chemical features, is both the main advantage and the main drawback of pharmacophore modeling. The advantage is the one of simplicity - the complex nature of noncovalent ligand binding interactions is rendered intuitive and comprehensible by the human mind. Although computers are much better suited for comparisons of pharmacophore patterns, a chemist's intuition is primarily scaffold-oriented. Its underlying simplifications render pharmacophore modeling unable to provide perfect predictions of ligand binding propensities - not even if all its subsisting technical problems would be solved. Each step in pharmacophore modeling and exploitation has specific drawbacks: from insufficient or inaccurate conformational sampling to ambiguities in pharmacophore typing (mainly due to uncertainty regarding the tautomeric/protonation status of compounds), to computer time limitations in complex molecular overlay calculations, and to the choice of inappropriate anchoring points in active sites when ligand cocrystals structures are not available. Yet, imperfections notwithstanding, the approach is accurate enough in order to be practically useful and actually is the most used virtual screening technique in medicinal chemistry - notably for "scaffold hopping" approaches, allowing the discovery of new chemical classes carriers of a desired biological activity.

  • Graph-based similarity concepts in virtual screening.
    Hutter, Michael C
    Future medicinal chemistry, 2011, 3(4), 485-501
    PMID: 21452983     doi: 10.4155/fmc.11.3
    Applying similarity for finding new promising compounds is a key issue in drug design. Conversely, quantifying similarity between molecules has remained a difficult task despite the numerous approaches. Here, some general aspects along with recent developments regarding similarity criteria are collected. For the purpose of virtual screening, the compounds have to be encoded into a computer-readable format that permits a comparison, according to given similarity criteria, comprising the use of the 3D structure, fingerprints, graph-based and alignment-based approaches. Whereas finding the most common substructures is the most obvious method, more recent approaches take into account chemical modifications that appear throughout existing drugs, from various therapeutic categories and targets.

  • Virtual screening using molecular simulations.
    Yang, Tianyi and Wu, Johnny C and Yan, Chunli and Wang, Yuanfeng and Luo, Ray and Gonzales, Michael B and Dalby, Kevin N and Ren, Pengyu
    Proteins, 2011, 79(6), 1940-1951
    PMID: 21491494     doi: 10.1002/prot.23018
    Effective virtual screening relies on our ability to make accurate prediction of protein-ligand binding, which remains a great challenge. In this work, utilizing the molecular-mechanics Poisson-Boltzmann (or Generalized Born) surface area approach, we have evaluated the binding affinity of a set of 156 ligands to seven families of proteins, trypsin $\beta$, thrombin $\alpha$, cyclin-dependent kinase (CDK), cAMP-dependent kinase (PKA), urokinase-type plasminogen activator, $\beta$-glucosidase A, and coagulation factor Xa. The effect of protein dielectric constant in the implicit-solvent model on the binding free energy calculation is shown to be important. The statistical correlations between the binding energy calculated from the implicit-solvent approach and experimental free energy are in the range of 0.56-0.79 across all the families. This performance is better than that of typical docking programs especially given that the latter is directly trained using known binding data whereas the molecular mechanics is based on general physical parameters. Estimation of entropic contribution remains the barrier to accurate free energy calculation. We show that the traditional rigid rotor harmonic oscillator approximation is unable to improve the binding free energy prediction. Inclusion of conformational restriction seems to be promising but requires further investigation. On the other hand, our preliminary study suggests that implicit-solvent based alchemical perturbation, which offers explicit sampling of configuration entropy, can be a viable approach to significantly improve the prediction of binding free energy. Overall, the molecular mechanics approach has the potential for medium to high-throughput computational drug discovery.

  • iSMART: an integrated cloud computing web server for traditional Chinese medicine for online virtual screening, de novo evolution and drug design.
    Chang, Kai-Wei and Tsai, Tsung-Ying and Chen, Kuan-Chung and Yang, Shun-Chieh and Huang, Hung-Jin and Chang, Tung-Ti and Sun, Mao-Feng and Chen, Hsin-Yi and Tsai, Fuu-Jen and Chen, Calvin Yu-Chian
    Journal of biomolecular structure & dynamics, 2011, 29(1), 243-250
    PMID: 21696236    

  • TCM Database@Taiwan: the world's largest traditional Chinese medicine database for drug screening in silico.
    Chen, Calvin Yu-Chian
    PloS one, 2011, 6(1), e15939
    PMID: 21253603     doi: 10.1371/journal.pone.0015939
    Rapid advancing computational technologies have greatly speeded up the development of computer-aided drug design (CADD). Recently, pharmaceutical companies have increasingly shifted their attentions toward traditional Chinese medicine (TCM) for novel lead compounds. Despite the growing number of studies on TCM, there is no free 3D small molecular structure database of TCM available for virtual screening or molecular simulation. To address this shortcoming, we have constructed TCM Database@Taiwan ( based on information collected from Chinese medical texts and scientific publications. TCM Database@Taiwan is currently the world's largest non-commercial TCM database. This web-based database contains more than 20,000 pure compounds isolated from 453 TCM ingredients. Both cdx (2D) and Tripos mol2 (3D) formats of each pure compound in the database are available for download and virtual screening. The TCM database includes both simple and advanced web-based query options that can specify search clauses, such as molecular properties, substructures, TCM ingredients, and TCM classification, based on intended drug actions. The TCM database can be easily accessed by all researchers conducting CADD. Over the last eight years, numerous volunteers have devoted their time to analyze TCM ingredients from Chinese medical texts as well as to construct structure files for each isolated compound. We believe that TCM Database@Taiwan will be a milestone on the path towards modernizing traditional Chinese medicine.

  • Improving the accuracy of ultrafast ligand-based screening: incorporating lipophilicity into ElectroShape as an extra dimension.
    Armstrong, M Stuart and Finn, Paul W and Morris, Garrett M and Richards, W Graham
    Journal of computer-aided molecular design, 2011, 25(8), 785-790
    PMID: 21822723     doi: 10.1007/s10822-011-9463-8
    In a previous paper, we presented the ElectroShape method, which we used to achieve successful ligand-based virtual screening. It extended classical shape-based methods by applying them to the four-dimensional shape of the molecule where partial charge was used as the fourth dimension to capture electrostatic information. This paper extends the approach by using atomic lipophilicity (alogP) as an additional molecular property and validates it using the improved release 2 of the Directory of Useful Decoys (DUD). When alogP replaced partial charge, the enrichment results were slightly below those of ElectroShape, though still far better than purely shape-based methods. However, when alogP was added as a complement to partial charge, the resulting five-dimensional enrichments shows a clear improvement in performance. This demonstrates the utility of extending the ElectroShape virtual screening method by adding other atom-based descriptors.

  • Docking-based virtual screening for ligands of G protein-coupled receptors: not only crystal structures but also in silico models.
    Vilar, Santiago and Ferino, Giulio and Phatak, Sharangdhar S and Berk, Barkin and Cavasotto, Claudio N and Costanzi, Stefano
    Journal of molecular graphics & modelling, 2011, 29(5), 614-623
    PMID: 21146435     doi: 10.1016/j.jmgm.2010.11.005
    G protein-coupled receptors (GPCRs) regulate a wide range of physiological functions and hold great pharmaceutical interest. Using the $\beta$(2)-adrenergic receptor as a case study, this article explores the applicability of docking-based virtual screening to the discovery of GPCR ligands and defines methods intended to improve the screening performance. Our controlled computational experiments were performed on a compound dataset containing known agonists and blockers of the receptor as well as a large number of decoys. The screening based on the structure of the receptor crystallized in complex with its inverse agonist carazolol yielded excellent results, with a clearly delineated prioritization of ligands over decoys. Blockers generally were preferred over agonists; however, agonists were also well distinguished from decoys. A method was devised to increase the screening yields by generating an ensemble of alternative conformations of the receptor that accounts for its flexibility. Moreover, a method was devised to improve the retrieval of agonists, based on the optimization of the receptor around a known agonist. Finally, the applicability of docking-based virtual screening also to homology models endowed with different levels of accuracy was proved. This last point is of uttermost importance, since crystal structures are available only for a limited number of GPCRs, and extends our conclusions to the entire superfamily. The outcome of this analysis definitely supports the application of computer-aided techniques to the discovery of novel GPCR ligands, especially in light of the fact that, in the near future, experimental structures are expected to be solved and become available for an ever increasing number of GPCRs.

  • Consensus virtual screening approaches to predict protein ligands.
    Kukol, Andreas
    European journal of medicinal chemistry, 2011, 46(9), 4661-4664
    PMID: 21640444     doi: 10.1016/j.ejmech.2011.05.026
    In order to exploit the advantages of receptor-based virtual screening, namely time/cost saving and specificity, it is important to rely on algorithms that predict a high number of active ligands at the top ranks of a small molecule database. Towards that goal consensus methods combining the results of several docking algorithms were developed and compared against the individual algorithms. Furthermore, a recently proposed rescoring method based on drug efficiency indices was evaluated. Among AutoDock Vina 1.0, AutoDock 4.2 and GemDock, AutoDock Vina was the best performing single method in predicting high affinity ligands from a database of known ligands and decoys. The rescoring of predicted binding energies with the water/octanol partition coefficient did not lead to an improvement averaged over ten receptor targets. Various consensus algorithms were investigated and a simple combination of AutoDock and AutoDock Vina results gave the most consistent performance that showed early enrichment of known ligands for all receptor targets investigated. In case a number of ligands is known for a specific target, every method proposed in this study should be evaluated.

  • Computer-aided drug design platform using PyMOL.
    Lill, Markus A and Danielson, Matthew L
    Journal of computer-aided molecular design, 2011, 25(1), 13-19
    PMID: 21053052     doi: 10.1007/s10822-010-9395-8
    The understanding and optimization of protein-ligand interactions are instrumental to medicinal chemists investigating potential drug candidates. Over the past couple of decades, many powerful standalone tools for computer-aided drug discovery have been developed in academia providing insight into protein-ligand interactions. As programs are developed by various research groups, a consistent user-friendly graphical working environment combining computational techniques such as docking, scoring, molecular dynamics simulations, and free energy calculations is needed. Utilizing PyMOL we have developed such a graphical user interface incorporating individual academic packages designed for protein preparation (AMBER package and Reduce), molecular mechanics applications (AMBER package), and docking and scoring (AutoDock Vina and SLIDE). In addition to amassing several computational tools under one interface, the computational platform also provides a user-friendly combination of different programs. For example, utilizing a molecular dynamics (MD) simulation performed with AMBER as input for ensemble docking with AutoDock Vina. The overarching goal of this work was to provide a computational platform that facilitates medicinal chemists, many who are not experts in computational methodologies, to utilize several common computational techniques germane to drug discovery. Furthermore, our software is open source and is aimed to initiate collaborative efforts among computational researchers to combine other open source computational methods under a single, easily understandable graphical user interface.

  • Rapid Shape-Based Ligand Alignment and Virtual Screening Method Based on Atom/Feature-Pair Similarities and Volume Overlap Scoring.
    Sastry, Madhavi and Dixon, Steve and Sherman, Woody
    Journal of chemical information and modeling, 2011, 51(10), 2455-2466
    PMID: 21870862     doi: 10.1021/ci2002704
    Shape-based methods for aligning and scoring ligands have proven to be valuable in the field of computer-aided drug design. Here, we describe a new shape-based flexible ligand superposition and virtual screening method, Phase Shape, which is shown to rapidly produce accurate 3D ligand alignments and efficiently enrich actives in virtual screening. We describe the methodology, which is based on the principle of atom distribution triplets to rapidly define trial alignments, followed by refinement of top alignments to maximize the volume overlap. The method can be run in a shape-only mode or it can include atom types or pharmacophore feature encoding, the latter consistently producing the best results for database screening. We apply Phase Shape to flexibly align molecules that bind to the same target and show that the method consistently produces correct alignments when compared with crystal structures. We then illustrate the effectiveness of the method for identifying active compounds in virtual screening of eleven diverse targets. Multiple parameters are explored, including atom typing, query structure conformation, and the database conformer generation protocol. We show that Phase Shape performs well in database screening calculations when compared with other shape-based methods using a common set of actives and decoys from the literature.

  • Introduction of the conditional correlated Bernoulli model of similarity value distributions and its application to the prospective prediction of fingerprint search performance.
    Vogt, Martin and Bajorath, Jürgen
    Journal of chemical information and modeling, 2011, 51(10), 2496-2506
    PMID: 21892818     doi: 10.1021/ci2003472
    A statistical approach named the conditional correlated Bernoulli model is introduced for modeling of similarity scores and predicting the potential of fingerprint search calculations to identify active compounds. Fingerprint features are rationalized as dependent Bernoulli variables and conditional distributions of Tanimoto similarity values of database compounds given a reference molecule are assessed. The conditional correlated Bernoulli model is utilized in the context of virtual screening to estimate the position of a compound obtaining a certain similarity value in a database ranking. Through the generation of receiver operating characteristic curves from cumulative distribution functions of conditional similarity values for known active and random database compounds, one can predict how successful a fingerprint search might be. The comparison of curves for different fingerprints makes it possible to identify fingerprints that are most likely to identify new active molecules in a database search given a set of known reference molecules.

  • Rationalizing the role of SAR tolerance for ligand-based virtual screening.
    Ripphausen, Peter and Nisius, Britta and Wawer, Mathias and Bajorath, Jürgen
    Journal of chemical information and modeling, 2011, 51(4), 837-842
    PMID: 21438544     doi: 10.1021/ci200064c
    It is well appreciated that the results of ligand-based virtual screening (LBVS) are much influenced by methodological details, given the generally strong compound class dependence of LBVS methods. It is less well understood to what extent structure-activity relationship (SAR) characteristics might influence the outcome of LBVS. We have assessed the hypothesis that the success of prospective LBVS depends on the SAR tolerance of screening targets, in addition to methodological aspects. In this context, SAR tolerance is rationalized as the ability of a target protein to specifically interact with series of structurally diverse active compounds. In compound data sets, SAR tolerance articulates itself as SAR continuity, i.e., the presence of structurally diverse compounds having similar potency. In order to analyze the role of SAR tolerance for LBVS, activity landscape representations of compounds active against 16 different target proteins were generated for which successful LBVS applications were reported. In all instances, the activity landscapes of known active compounds contained multiple regions of local SAR continuity. When analyzing the location of newly identified LBVS hits and their SAR environments, we found that these hits almost exclusively mapped to regions of distinct local SAR continuity. Taken together, these findings indicate the presence of a close link between SAR tolerance at the target level, SAR continuity at the ligand level, and the probability of LBVS success.

  • From Virtual Screening to Bioactive Compounds by Visualizing and Clustering of Chemical Space
    Klenner, Alexander and Hähnke, Volker and Geppert, Tim and Schneider, Petra and Zettl, Heiko and Haller, Sarah and Rodrigues, Tiago and Reisen, Felix and Hoy, Benjamin and Schaible, Anja Maria and Werz, Oliver and Wessler, Silja and Schneider, Gisbert
    Molecular Informatics, 2011, 31(1), 21-26
    doi: 10.1002/minf.201100147

  • REPROVIS-DB: a benchmark system for ligand-based virtual screening derived from reproducible prospective applications.
    Ripphausen, Peter and Wassermann, Anne Mai and Bajorath, Jürgen
    Journal of chemical information and modeling, 2011, 51(10), 2467-2473
    PMID: 21902278     doi: 10.1021/ci200309j
    Benchmark calculations are essential for the evaluation of virtual screening (VS) methods. Typically, classes of known active compounds taken from the medicinal chemistry literature are divided into reference molecules (search templates) and potential hits that are added to background databases assumed to consist of compounds not sharing this activity. Then VS calculations are carried out, and the recall of known active compounds is determined. However, conventional benchmarking is affected by a number of problems that reduce its value for method evaluation. In addition to often insufficient statistical validation and the lack of generally accepted evaluation standards, the artificial nature of typical benchmark settings is often criticized. Retrospective benchmark calculations generally overestimate the potential of VS methods and do not scale with their performance in prospective applications. In order to provide additional opportunities for benchmarking that more closely resemble practical VS conditions, we have designed a publicly available compound database (DB) of reproducible virtual screens (REPROVIS-DB) that organizes information from successful ligand-based VS applications including reference compounds, screening databases, compound selection criteria, and experimentally confirmed hits. Using the currently available 25 hand-selected compound data sets, one can attempt to reproduce successful virtual screens with other than the originally applied methods and assess their potential for practical applications.

  • Computational screening for active compounds targeting protein sequences: methodology and experimental validation.
    Wang, Fei and Liu, Dongxiang and Wang, Heyao and Luo, Cheng and Zheng, Mingyue and Liu, Hong and Zhu, Weiliang and Luo, Xiaomin and Zhang, Jian and Jiang, Hualiang
    Journal of chemical information and modeling, 2011, 51(11), 2821-2828
    PMID: 21955088     doi: 10.1021/ci200264h
    The three-dimensional (3D) structures of most protein targets have not been determined so far, with many of them not even having a known ligand, a truly general method to predict ligand-protein interactions in the absence of three-dimensional information would be of great potential value in drug discovery. Using the support vector machine (SVM) approach, we constructed a model for predicting ligand-protein interaction based only on the primary sequence of proteins and the structural features of small molecules. The model, trained by using 15,000 ligand-protein interactions between 626 proteins and over 10,000 active compounds, was successfully used in discovering nine novel active compounds for four pharmacologically important targets (i.e., GPR40, SIRT1, p38, and GSK-3$\beta$). To our knowledge, this is the first example of a successful sequence-based virtual screening campaign, demonstrating that our approach has the potential to discover, with a single model, active ligands for any protein.

  • iScreen: world's first cloud-computing web server for virtual screening and de novo drug design based on TCM database@Taiwan
    Tsai, Tsung-Ying and Chang, Kai-Wei and Chen, Calvin Yu-Chian
    Journal of computer-aided molecular design, 2011, 25(6), 525-531
    PMID: 21647737     doi: 10.1007/s10822-011-9438-9
    The rapidly advancing researches on traditional Chinese medicine (TCM) have greatly intrigued pharmaceutical industries worldwide. To take initiative in the next generation of drug development, we constructed a cloud-computing system for TCM intelligent screening system (iScreen) based on TCM Database@Taiwan. iScreen is compacted web server for TCM docking and followed by customized de novo drug design. We further implemented a protein preparation tool that both extract protein of interest from a raw input file and estimate the size of ligand bind site. In addition, iScreen is designed in user-friendly graphic interface for users who have less experience with the command line systems. For customized docking, multiple docking services, including standard, in-water, pH environment, and flexible docking modes are implemented. Users can download first 200 TCM compounds of best docking results. For TCM de novo drug design, iScreen provides multiple molecular descriptors for a user's interest. iScreen is the world's first web server that employs world's largest TCM database for virtual screening and de novo drug design. We believe our web server can lead TCM research to a new era of drug development. The TCM docking and screening server is available at

  • Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision.
    Holliday, John D and Kanoulas, Evangelos and Malim, Nurul and Willett, Peter
    Journal of cheminformatics, 2011, 3(1), 29
    PMID: 21824430     doi: 10.1186/1758-2946-3-29

  • Exhaustive search and solvated interaction energy (SIE) for virtual screening and affinity prediction.
    Sulea, Traian and Hogues, Hervé and Purisima, Enrico O
    Journal of computer-aided molecular design, 2011, 26(5), 617-633
    PMID: 22198519     doi: 10.1007/s10822-011-9529-7
    We carried out a prospective evaluation of the utility of the SIE (solvation interaction energy) scoring function for virtual screening and binding affinity prediction. Since experimental structures of the complexes were not provided, this was an exercise in virtual docking as well. We used our exhaustive docking program, Wilma, to provide high-quality poses that were rescored using SIE to provide binding affinity predictions. We also tested the combination of SIE with our latest solvation model, first shell of hydration (FiSH), which captures some of the discrete properties of water within a continuum model. We achieved good enrichment in virtual screening of fragments against trypsin, with an area under the curve of about 0.7 for the receiver operating characteristic curve. Moreover, the early enrichment performance was quite good with 50% of true actives recovered with a 15% false positive rate in a prospective calculation and with a 3% false positive rate in a retrospective application of SIE with FiSH. Binding affinity predictions for both trypsin and host-guest complexes were generally within 2 kcal/mol of the experimental values. However, the rank ordering of affinities differing by 2 kcal/mol or less was not well predicted. On the other hand, it was encouraging that the incorporation of a more sophisticated solvation model into SIE resulted in better discrimination of true binders from binders. This suggests that the inclusion of proper Physics in our models is a fruitful strategy for improving the reliability of our binding affinity predictions.

  • sc-PDB: a database for identifying variations and multiplicity of 'druggable' binding sites in proteins
    Meslamani, Jamel and Rognan, Didier and Kellenberger, Esther
    Bioinformatics (Oxford, England), 2011, 27(9), 1324-1326
    doi: 10.1093/bioinformatics/btr120
    Background: The sc-PDB database is an annotated archive of druggable binding sites extracted from the Protein Data Bank. It contains all-atoms coordinates for 8166 protein-ligand complexes, chosen for their geometrical and physico-chemical properties. The sc-PDB provides a functional annotation for proteins, a chemical description for ligands and the detailed intermolecular interactions for complexes. The sc-PDB now includes a hierarchical classification of all the binding sites within a functional class.Method: The sc-PDB entries were first clustered according to the protein name indifferent of the species. For each cluster, we identified dissimilar sites (e. g. catalytic and allosteric sites of an enzyme).Scope and applications: The classification of sc-PDB targets by binding site diversity was intended to facilitate chemogenomics approaches to drug design. In ligand-based approaches, it avoids comparing ligands that do not share the same binding site. In structure-based approaches, it permits to quantitatively evaluate the diversity of the binding site definition (variations in size, sequence and/or structure).

  • G-protein coupled receptors virtual screening using genetic algorithm focused chemical space.
    Sage, Carleton and Wang, Runtong and Jones, Gareth
    Journal of chemical information and modeling, 2011, 51(8), 1754-1761
    PMID: 21761904     doi: 10.1021/ci200043z
    Exploiting the ever growing set of activity data for compounds against biological targets represents both a challenge and an opportunity for ligand-based virtual screening (LBVS). Because G-protein coupled receptors (GPCRs) represent a rich set of potential drug targets, we sought to develop an appropriate method to examine large sets of GPCR ligand information for both screening collection enhancement and hit expansion. To this end, we have implemented a modified version of BDACCS that removes highly correlated descriptors (rBDACCS). To test the hypothesis that a smaller, focused descriptor set would improve performance, we have extended rBDACCS by using a genetic algorithm (GA) to choose target-specific descriptors appropriate for selecting the set of 100 compounds most likely to be active from a decoy database. We have called this method GA-focused descriptor active space (GAFDAS). We compared the performce of rBDACCS and GAFDAS using a collection of activity data for 252 GPCR/ligand sets versus two decoy databases. While both methods appear effective in LBVS, overall GAFDAS performs better than rBDACCS in the early selection of compounds against both decoy databases.


  • Homology modeling and metabolism prediction of human carboxylesterase-2 using docking analyses by GriDock: a parallelized tool based on AutoDock 4.0.
    Vistoli, Giulio and Pedretti, Alessandro and Mazzolari, Angelica and Testa, Bernard
    Journal of computer-aided molecular design, 2010, 24(9), 771-787
    PMID: 20623318     doi: 10.1007/s10822-010-9373-1
    Metabolic problems lead to numerous failures during clinical trials, and much effort is now devoted to developing in silico models predicting metabolic stability and metabolites. Such models are well known for cytochromes P450 and some transferases, whereas less has been done to predict the activity of human hydrolases. The present study was undertaken to develop a computational approach able to predict the hydrolysis of novel esters by human carboxylesterase hCES2. The study involved first a homology modeling of the hCES2 protein based on the model of hCES1 since the two proteins share a high degree of homology (congruent with 73%). A set of 40 known substrates of hCES2 was taken from the literature; the ligands were docked in both their neutral and ionized forms using GriDock, a parallel tool based on the AutoDock4.0 engine which can perform efficient and easy virtual screening analyses of large molecular databases exploiting multi-core architectures. Useful statistical models (e.g., r (2)

  • Rapid flexible docking using a stochastic rotamer library of ligands.
    Ding, Feng and Yin, Shuangye and Dokholyan, Nikolay V
    Journal of chemical information and modeling, 2010, 50(9), 1623-1632
    PMID: 20712341     doi: 10.1021/ci100218t
    Existing flexible docking approaches model the ligand and receptor flexibility either separately or in a loosely coupled manner, which captures the conformational changes inefficiently. Here, we propose a flexible docking approach, MedusaDock, which models both ligand and receptor flexibility simultaneously with sets of discrete rotamers. We developed an algorithm to build the ligand rotamer library "on-the-fly" during docking simulations. MedusaDock benchmarks demonstrate a rapid sampling efficiency and high prediction accuracy in both self- (to the cocrystallized state) and cross-docking (to a state cocrystallized with a different ligand), the latter of which mimics the virtual screening procedure in computational drug discovery. We also perform a virtual screening test of four flexible kinase targets, including cyclin-dependent kinase 2, vascular endothelial growth factor receptor 2, HIV reverse transcriptase, and HIV protease. We find significant improvements of virtual screening enrichments when compared to rigid-receptor methods. The predictive power of MedusaDock in cross-docking and preliminary virtual-screening benchmarks highlights the importance to model both ligand and receptor flexibility simultaneously in computational docking.

  • ElectroShape: fast molecular similarity calculations incorporating shape, chirality and electrostatics.
    Armstrong, M Stuart and Morris, Garrett M and Finn, Paul W and Sharma, Raman and Moretti, Loris and Cooper, Richard I and Richards, W Graham
    Journal of computer-aided molecular design, 2010, 24(9), 789-801
    PMID: 20614163     doi: 10.1007/s10822-010-9374-0
    We present ElectroShape, a novel ligand-based virtual screening method, that combines shape and electrostatic information into a single, unified framework. Building on the ultra-fast shape recognition (USR) approach for fast non-superpositional shape-based virtual screening, it extends the method by representing partial charge information as a fourth dimension. It also incorporates the chiral shape recognition (CSR) method, which distinguishes enantiomers. It has been validated using release 2 of the Directory of useful decoys (DUD), and shows a near doubling in enrichment ratio at 1% over USR and CSR, and improvements as measured by Receiver Operating Characteristic curves. These improvements persisted even after taking into account the chemotype redundancy in the sets of active ligands in DUD. During the course of its development, ElectroShape revealed a difference in the charge allocation of the DUD ligand and decoy sets, leading to several new versions of DUD being generated as a result. ElectroShape provides a significant addition to the family of ultra-fast ligand-based virtual screening methods, and its higher-dimensional shape recognition approach has great potential for extension and generalisation.

  • Chemical space sampling by different scoring functions and crystal structures.
    Brooijmans, Natasja and Humblet, Christine
    Journal of computer-aided molecular design, 2010, 24(5), 433-447
    PMID: 20401681     doi: 10.1007/s10822-010-9356-2
    Virtual screening has become a popular tool to identify novel leads in the early phases of drug discovery. A variety of docking and scoring methods used in virtual screening have been the subject of active research in an effort to gauge limitations and articulate best practices. However, how to best utilize different scoring functions and various crystal structures, when available, is not yet well understood. In this work we use multiple crystal structures of PI3 K-gamma in both prospective and retrospective virtual screening experiments. Both Glide SP scoring and Prime MM-GBSA rescoring are utilized in the prospective and retrospective virtual screens, and consensus scoring is investigated in the retrospective virtual screening experiments. The results show that each of the different crystal structures that was used, samples a different chemical space, i.e. different chemotypes are prioritized by each structure. In addition, the different (re)scoring functions prioritize different chemotypes as well. Somewhat surprisingly, the Prime MM-GBSA scoring function generally gives lower enrichments than Glide SP. Finally we investigate the impact of different ligand preparation protocols on virtual screening enrichment factors. In summary, different crystal structures and different scoring functions are complementary to each other and allow for a wider variety of chemotypes to be considered for experimental follow-up.

  • Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods.
    Venkatraman, Vishwesh and Pérez-Nueno, Violeta I and Mavridis, Lazaros and Ritchie, David W
    Journal of chemical information and modeling, 2010, 50(12), 2079-2093
    PMID: 21090728     doi: 10.1021/ci100263p
    In recent years, many virtual screening (VS) tools have been developed that employ different molecular representations and have different speed and accuracy characteristics. In this paper, we compare ten popular ligand-based VS tools using the publicly available Directory of Useful Decoys (DUD) data set comprising over 100 000 compounds distributed across 40 protein targets. The DUD was developed initially to evaluate docking algorithms, but our results from an operational correlation analysis show that it is also well suited for comparing ligand-based VS tools. Although it is conventional wisdom that 3D molecular shape is an important determinant of biological activity, our results based on permutational significance tests of several commonly used VS metrics show that the 2D fingerprint-based methods generally give better VS performance than the 3D shape-based approaches for surprisingly many of the DUD targets. To help understand this finding, we have analyzed the nature of the scoring functions used and the composition of the DUD data set itself. We propose that to improve the VS performance of current 3D methods, it will be necessary to devise screening queries that can represent multiple possible conformations and which can exploit knowledge of known actives that span multiple scaffold families.

  • MOLA: a bootable, self-configuring system for virtual screening using AutoDock4/Vina on computer clusters.
    Abreu, Rui Mv and Froufe, Hugo Jc and Queiroz, Maria Jo{\~a}o Rp and Ferreira, Isabel Cfr
    Journal of cheminformatics, 2010, 2(1), 10
    PMID: 21029419     doi: 10.1186/1758-2946-2-10
    BACKGROUND:Virtual screening of small molecules using molecular docking has become an important tool in drug discovery. However, large scale virtual screening is time demanding and usually requires dedicated computer clusters. There are a number of software tools that perform virtual screening using AutoDock4 but they require access to dedicated Linux computer clusters. Also no software is available for performing virtual screening with Vina using computer clusters. In this paper we present MOLA, an easy-to-use graphical user interface tool that automates parallel virtual screening using AutoDock4 and/or Vina in bootable non-dedicated computer clusters.

  • Improving performance of docking-based virtual screening by structural filtration.
    Novikov, Fedor N and Stroylov, Viktor S and Stroganov, Oleg V and Chilov, Ghermes G
    Journal of Molecular Modeling, 2010, 16(7), 1223-1230
    PMID: 20041273     doi: 10.1007/s00894-009-0633-8
    In the current study an innovative method of structural filtration of docked ligand poses is introduced and applied to improve the virtual screening results. The structural filter is defined by a protein-specific set of interactions that are a) structurally conserved in available structures of a particular protein with its bound ligands, and b) that can be viewed as playing the crucial role in protein-ligand binding. The concept was evaluated on a set of 10 diverse proteins, for which the corresponding structural filters were developed and applied to the results of virtual screening obtained with the Lead Finder software. The application of structural filtration resulted in a considerable improvement of the enrichment factor ranging from several folds to hundreds folds depending on the protein target. It appeared that the structural filtration had effectively repaired the deficiencies of the scoring functions that used to overestimate decoy binding, resulting into a considerably lower false positive rate. In addition, the structural filters were also effective in dealing with some deficiencies of the protein structure models that would lead to false negative predictions otherwise. The ability of structural filtration to recover relatively small but specifically bound molecules creates promises for the application of this technology in the fragment-based drug discovery.

  • Biased retrieval of chemical series in receptor-based virtual screening.
    Brooijmans, Natasja and Cross, Jason B and Humblet, Christine
    Journal of computer-aided molecular design, 2010, 24(12), 1053-1062
    PMID: 21053053     doi: 10.1007/s10822-010-9394-9
    Using the kinases in the DUD dataset and an in-house HTS dataset from PI3K-$\gamma$, receptor-based virtual screening experiments were performed using Glide SP docking. While significant enrichments were observed for eight of the nine targets in the set, more detailed analyses highlighted that much of the early enrichment (10-80%) is the result of retrieval of a single cluster of active compounds. This biased retrieval was not necessarily due to early enrichment of the cluster containing the co-crystallized ligand. Virtual screening validation studies could thus benefit from including cluster-based analyses to assess enrichment of diverse chemotypes.

  • Comparative evaluation of 3D virtual ligand screening methods: impact of the molecular alignment on enrichment.
    Giganti, David and Guillemain, Hélène and Spadoni, Jean-Louis and Nilges, Michael and Zagury, Jean-François and Montes, Matthieu
    Journal of chemical information and modeling, 2010, 50(6), 992-1004
    PMID: 20527883     doi: 10.1021/ci900507g
    In the early stage of drug discovery programs, when the structure of a complex involving a target and a small molecule is available, structure-based virtual ligand screening methods are generally preferred. However, ligand-based strategies like shape-similarity search methods can also be applied. Shape-similarity search methods consist in exploring a pseudo-binding-site derived from the known small molecule used as a reference. Several of these methods use conformational sampling algorithms which are also shared by corresponding docking methods: for example Surflex-dock/Surflex-sim, FlexX/FlexS, ICM, and OMEGA-FRED/OMEGA-ROCS. Using 11 systems issued from the challenging "own" subsets of the Directory of Useful Decoys (DUD-own), we evaluated and compared the performance of the above-cited programs in terms of molecular alignment accuracy, enrichment in active compounds, and enrichment in different chemotypes (scaffold-hopping). Since molecular alignment is a crucial aspect of performance for the different methods, we have assessed its impact on enrichment. We have also illustrated the paradox of retrieving active compounds with good scores even if they are inaccurately positioned. Finally, we have highlighted possible positive aspects of using shape-based approaches in drug-discovery protocols when the structure of the target in complex with a small molecule is known.

  • Modeling approaches for ligand-based 3D similarity.
    Tresadern, Gary and Bemporad, Daniele
    Future medicinal chemistry, 2010, 2(10), 1547-1561
    PMID: 21426148     doi: 10.4155/fmc.10.244
    3D ligand-based similarity approaches are widely used in the early phases of drug discovery for tasks such as hit finding by virtual screening or compound design with quantitative structure-activity relationships. Here in we review widely used software for performing such tasks. Some techniques are based on relatively mature technology, shape-based similarity for instance. Typically, these methods remained in the realm of the expert user, the experienced modeler. However, advances in implementation and speed have improved usability and allow these methods to be applied to databases comprising millions of compounds. There are now many reports of such methods impacting drug-discovery projects. As such, the medicinal chemistry community has become the intended market for some of these new tools, yet they may consider the wide array and choice of approaches somewhat disconcerting. Each method has subtle differences and is better suited to certain tasks than others. In this article we review some of the widely used computational methods via application, provide straightforward background on the underlying theory and provide examples for the interested reader to pursue in more detail. In the new era of preclinical drug discovery there will be ever more pressure to move faster and more efficiently, and computational approaches based on 3D ligand similarity will play an increasing role in in this process.

  • Molecular shape technologies in drug discovery: methods and applications.
    Ebalunode, Jerry O and Zheng, Weifan
    Current topics in medicinal chemistry, 2010, 10(6), 669-679
    PMID: 20337591    
    Shape complementarity is a critically important factor in molecular recognition among drugs and their biological receptors. The notion that molecules with similar 3D shapes tend to have similar biological activity has been recognized and implemented in computational drug discovery tools for decades. But the low computational efficiency and the lack of widely accessible software tools limited the use of early shape-matching algorithms. However, recent development of fast and accurate shape comparison tools has changed the landscape, and facilitated the wide spread use of both the ligand-based and receptor-based shape-matching technologies in drug discovery. In this article, we summarize some of the well-known shape algorithms. We first describe the computational principles for both the superposition-based and the superposition-free shape-matching methods. These include ROCS (Rapid Overlay of Compound Structures), SQ, and the CatShape method in the former category; and the shape signatures algorithm and USR (Ultrafast Shape Recognition) that belong to the latter category. We then highlight some recent validation studies and practical applications of various shape technologies. Because of the rapid development of modern shape-matching algorithms, and the increasingly affordable computational resources and software tools, we anticipate much broader use of the molecular shape technologies in future drug discovery. They will be especially useful in chemogenomics research, where large scale associations between small molecules and protein targets are studied. Thus, molecular shape technologies, together with well-defined pharmacophore constraints, can afford both efficient and effective means for drug discovery and chemical genomics research.

  • Library screening by fragment-based docking.
    Huang, Danzhi and Caflisch, Amedeo
    Journal of molecular recognition : JMR, 2010, 23(2), 183-193
    PMID: 19718684     doi: 10.1002/jmr.981
    We review our computational tools for high-throughput screening by fragment-based docking of large collections of small molecules. Applications to six different enzymes, four proteases, and two protein kinases, are presented. Remarkably, several low-micromolar inhibitors were discovered in each of the high-throughput docking campaigns. Probable reasons for the lack of submicromolar inhibitors are the tiny fraction of chemical space covered by the libraries of available compounds, as well as the approximations in the methods employed for scoring, and the use of a rigid conformation of the target protein.

  • Virtual Screening with AutoDock: Theory and Practice.
    Cosconati, Sandro and Forli, Stefano and Perryman, Alex L and Harris, Rodney and Goodsell, David S and Olson, Arthur J
    Expert opinion on drug discovery, 2010, 5(6), 597-607
    PMID: 21532931     doi: 10.1517/17460441.2010.484460
    IMPORTANCE TO THE FIELD: Virtual screening is a computer-based technique for identifying promising compounds to bind to a target molecule of known structure. Given the rapidly increasing number of protein and nucleic acid structures, virtual screening continues to grow as an effective method for the discovery of new inhibitors and drug molecules. AREAS COVERED IN THIS REVIEW: We describe virtual screening methods that are available in the AutoDock suite of programs, and several of our successes in using AutoDock virtual screening in pharmaceutical lead discovery. WHAT THE READER WILL GAIN: A general overview of the challenges of virtual screening is presented, along with the tools available in the AutoDock suite of programs for addressing these challenges. TAKE HOME MESSAGE: Virtual screening is an effective tool for the discovery of compounds for use as leads in drug discovery, and the free, open source program AutoDock is an effective tool for virtual screening.

  • Advances in 2D fingerprint similarity searching.
    Geppert, Hanna and Bajorath, Jürgen
    Expert opinion on drug discovery, 2010, 5(6), 529-542
    PMID: 22823165     doi: 10.1517/17460441.2010.486830
    Importance to the field: Similarity searching is one of the premier approaches for computational hit identification. Fingerprints are bit string representations of molecular structure and properties and rather simplistic search tools. Nevertheless, they are widely used and often surprisingly successful in drug discovery applications. Areas covered in this review: Herein we discuss recent research efforts that have helped to better understand fingerprint search performance, design new fingerprints and search strategies, or modify standard fingerprints for specific applications. Key publications of the past ∼ 20 years are covered and major emphasis is put on reviewing fingerprint studies published during the past 5 years. What the reader will gain: The reader is provided with an overview of the state-of-the-art fingerprint design and search strategies developed. It will be possible to rationalize opportunities and limitations of 2D fingerprint similarity searching. Take home messages: Fingerprint search calculations are more complex than it might appear at first glance and susceptible to complications that are often overlooked in practical applications. Fingerprint search performance typically only depends on relatively small subsets of bit positions. Recently, different fingerprint engineering strategies have been applied to 'tune' existing fingerprints in a compound class-directed manner. Fingerprints have substantial scaffold hopping potential, despite the simplicity of their design.

  • Computational methodologies for compound database searching that utilize experimental protein-ligand interaction information.
    Tan, Lu and Batista, José and Bajorath, Jürgen
    Chemical biology & drug design, 2010, 76(3), 191-200
    PMID: 20636330     doi: 10.1111/j.1747-0285.2010.01007.x
    Ligand- and target structure-based methods are widely used in virtual screening, but there is currently no methodology available that fully integrates these different approaches. Herein, we provide an overview of various attempts that have been made to combine ligand- and structure-based computational screening methods. We then review different types of approaches that utilize protein-ligand interaction information for database screening and filtering. Interaction-based approaches make use of a variety of methodological concepts including pharmacophore modeling and direct or indirect encoding of protein-ligand interactions in fingerprint formats. These interaction-based methods have been successfully applied to tackle different tasks related to virtual screening including postprocessing of docking poses, prioritization of binding modes, selectivity analysis, or similarity searching. Furthermore, we discuss the recently developed interacting fragment approach that indirectly incorporates 3D interaction information into 2D similarity searching and bridges between ligand- and structure-based methods.

  • Quo vadis, virtual screening? A comprehensive survey of prospective applications.
    Ripphausen, Peter and Nisius, Britta and Peltason, Lisa and Bajorath, Jürgen
    Journal of medicinal chemistry, 2010, 53(24), 8461-8467
    PMID: 20929257     doi: 10.1021/jm101020z

  • Selection of in silico drug screening results by using universal active probes (UAPs).
    Fukunishi, Yoshifumi and Ohno, Kazuki and Orita, Masaya and Nakamura, Haruki
    Journal of chemical information and modeling, 2010, 50(7), 1233-1240
    PMID: 20578712     doi: 10.1021/ci100108p
    We developed a new method that uses a set of drug-like compounds to select reliable in silico drug screening results. If some active compounds are known, the screening results that rank these active compounds at the top should be reliable. If no active compound is known, how to select the result is in question. We propose a concept of a set of "universal active probes" (UAPs), which is a set of small active compounds that bind to different kinds of proteins. We found that the hit ratio of the true active compounds in in silico screening shows positive correlation to that of the UAPs, probably because UAPs form a set of drug-like compounds. Thus, if the UAPs were added to the compound library, the screening result that shows a high hit ratio of the UAPs could give reliable actual hit compounds for the target protein. We examined this method for several targets and found this idea useful.

  • Comparison of three preprocessing filters efficiency in virtual screening: identification of new putative LXRbeta regulators as a test case.
    Ghemtio, Léo and Devignes, Marie-Dominique and Smaïl-Tabbone, Malika and Souchet, Michel and Leroux, Vincent and Maigret, Bernard
    Journal of chemical information and modeling, 2010, 50(5), 701-715
    PMID: 20420434     doi: 10.1021/ci900356m
    In silico screening methodologies are widely recognized as efficient approaches in early steps of drug discovery. However, in the virtual high-throughput screening (VHTS) context, where hit compounds are searched among millions of candidates, three-dimensional comparison techniques and knowledge discovery from databases should offer a better efficiency to finding novel drug leads than those of computationally expensive molecular dockings. Therefore, the present study aims at developing a filtering methodology to efficiently eliminate unsuitable compounds in VHTS process. Several filters are evaluated in this paper. The first two are structure-based and rely on either geometrical docking or pharmacophore depiction. The third filter is ligand-based and uses knowledge-based and fingerprint similarity techniques. These filtering methods were tested with the Liver X Receptor (LXR) as a target of therapeutic interest, as LXR is a key regulator in maintaining cholesterol homeostasis. The results show that the three considered filters are complementary so that their combination should generate consistent compound lists of potential hits.

  • Pharmacophore screening of the protein data bank for specific binding site chemistry.
    Campagna-Slater, Valérie and Arrowsmith, Andrew G and Zhao, Yong and Schapira, Matthieu
    Journal of chemical information and modeling, 2010, 50(3), 358-367
    PMID: 20112952     doi: 10.1021/ci900427b
    A simple computational approach was developed to screen the Protein Data Bank (PDB) for putative pockets possessing a specific binding site chemistry and geometry. The method employs two commonly used 3D screening technologies, namely identification of cavities in protein structures and pharmacophore screening of chemical libraries. For each protein structure, a pocket finding algorithm is used to extract potential binding sites containing the correct types of residues, which are then stored in a large SDF-formatted virtual library; pharmacophore filters describing the desired binding site chemistry and geometry are then applied to screen this virtual library and identify pockets matching the specified structural chemistry. As an example, this approach was used to screen all human protein structures in the PDB and identify sites having chemistry similar to that of known methyl-lysine binding domains that recognize chromatin methylation marks. The selected genes include known readers of the histone code as well as novel binding pockets that may be involved in epigenetic signaling. Putative allosteric sites were identified on the structures of TP53BP1, L3MBTL3, CHEK1, KDM4A, and CREBBP.

  • VSDocker: a tool for parallel high-throughput virtual screening using AutoDock on Windows-based computer clusters
    Prakhov, Nikita D. and Chernorudskiy, Alexander L. and Gainullin, Murat R.
    Bioinformatics (Oxford, England), 2010, 26(10), 1374-1375
    PMID: 20378556     doi: 10.1093/bioinformatics/btq149
    VSDocker is an original program that allows using AutoDock4 for optimized virtual ligand screening on computer clusters or multiprocessor workstations. This tool is the first implementation of parallel high-performance virtual screening of ligands for MS Windows-based computer systems.

  • Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation.
    Geppert, Hanna and Vogt, Martin and Bajorath, Jürgen
    Journal of chemical information and modeling, 2010, 50(2), 205-216
    PMID: 20088575     doi: 10.1021/ci900419k

  • Comparison of Structure- and Ligand-Based Virtual Screening Protocols Considering Hit List Complementarity and Enrichment Factors
    Krueger, Dennis M. and Evers, Andreas
    Chemmedchem, 2010, 5(1), 148-158
    PMID: 19908272     doi: 10.1002/cmdc.200900314
    Structure- and ligand-based virtual-screening methods (clocking, 2D- and 3D-similarity searching) were analysed for their effectiveness in virtual screening against four different targets: angiotensin-converting enzyme (ACE), cyclooxygenase 2 (COX-1 2), thrombin and human immunodeficiency virus I (HIV-1) protease. The relative performance of the tools was compared by examining their ability to recognise known active compounds from a set of actives and nonactives. Furthermore, we investigated whether the application of different virtual-screening methods in parallel provides complementary or redundant hit lists. Docking was performed with GOLD, Glide, FlexX and Surflex. The obtained docking poses were rescored by using nine different scoring functions in addition to the scoring functions implemented as objective functions in the docking algorithms. Ligand-based virtual screening was done with ROCS (3D-similarity searching), Feature Trees and Scitegic Functional Fingerprints (2D-similarity searching). The results show that structure- and ligand-based virtual-screening methods provide comparable enrichments in detecting active compounds. Interestingly, the hit lists that are obtained from different virtual-screening methods are generally highly complementary. These results suggest that a parallel application of different structure- and ligand-based virtual-screening methods increases the chance of identifying more (and more diverse) active compounds from a virtual-screening campaign.

  • Virtual screening: an endless staircase?
    Schneider, Gisbert
    Nature reviews. Drug discovery, 2010, 9(4), 273-276
    PMID: 20357802     doi: 10.1038/nrd3139
    Computational chemistry - in particular, virtual screening - can provide valuable contributions in hit- and lead-compound discovery. Numerous software tools have been developed for this purpose. However, despite the applicability of virtual screening technology being well established, it seems that there are relatively few examples of drug discovery projects in which virtual screening has been the key contributor. Has virtual screening reached its peak? If not, what aspects are limiting its potential at present, and how can significant progress be made in the future?

  • FLAP: GRID molecular interaction fields in virtual screening. validation using the DUD data set.
    Cross, Simon and Baroni, Massimo and Carosati, Emanuele and Benedetti, Paolo and Clementi, Sergio
    Journal of chemical information and modeling, 2010, 50(8), 1442-1450
    PMID: 20690627     doi: 10.1021/ci100221g
    The performance of FLAP (Fingerprints for Ligands and Proteins) in virtual screening is assessed using a subset of the DUD (Directory of Useful Decoys) benchmarking data set containing 13 targets each with more than 15 different chemotype classes. A variety of ligand and receptor-based virtual screening approaches are examined, using combinations of individual templates 2D structures of known actives, a cocrystallized ligand, a receptor structure, or a cocrystallized ligand-biased receptor structure. We examine several data fusion approaches to combine the results of the individual virtual screens. In doing so, we show that excellent chemotype enrichment is achieved in both single target ligand-based and receptor-based approaches, of approximately 17-fold over random on average at a false positive rate of 1%. We also show that using as much starting knowledge as possible improves chemotype enrichment, and that data fusion using Pareto ranking is an effective method to do this giving up to 50% improvement in enrichment over the single methods. Finally we show that if inactivity or decoy data is incorporated, automatically training the scoring function in FLAP improves recovery still further, with almost 2-fold improvement over the enrichments shown by the single methods. The results clearly demonstrate the utility of FLAP for virtual screening when either a limited or wide range of prior knowledge is available.

  • Relating the shape of protein binding sites to binding affinity profiles: is there an association?
    Simon, Zoltán and Vigh-Smeller, Margit and Peragovics, Agnes and Csukly, Gábor and Zahoránszky-Kohalmi, Gergely and Rauscher, Anna A and Jelinek, Balázs and Hári, Péter and Bitter, István and Málnási-Csizmadia, András and Czobor, Pál
    BMC structural biology, 2010, 10, 32
    PMID: 20923553     doi: 10.1186/1472-6807-10-32
    BACKGROUND:Various pattern-based methods exist that use in vitro or in silico affinity profiles for classification and functional examination of proteins. Nevertheless, the connection between the protein affinity profiles and the structural characteristics of the binding sites is still unclear. Our aim was to investigate the association between virtual drug screening results (calculated binding free energy values) and the geometry of protein binding sites. Molecular Affinity Fingerprints (MAFs) were determined for 154 proteins based on their molecular docking energy results for 1,255 FDA-approved drugs. Protein binding site geometries were characterized by 420 PocketPicker descriptors. The basic underlying component structure of MAFs and binding site geometries, respectively, were examined by principal component analysis; association between principal components extracted from these two sets of variables was then investigated by canonical correlation and redundancy analyses.

  • Efficient virtual screening using multiple protein conformations described as negative images of the ligand-binding site.
    Virtanen, Salla I and Pentikäinen, Olli T
    Journal of chemical information and modeling, 2010, 50(6), 1005-1011
    PMID: 20504004     doi: 10.1021/ci100121c
    The protein structure-based virtual screening is typically accomplished using a molecular docking procedure. However, docking is a fairly slow process that is limited by the available scoring functions that cannot reliably distinguish between active and inactive ligands. In contrast, the ligand-based screening methods that are based on shape similarity identify the active ligands with high accuracy. Here, we show that the usage of negative images of the ligand-binding site, together with shape comparison tools, which are typically used in ligand-based virtual screening, improve the discrimination of active molecules from inactives. In contrast to ligand-based shape comparison, the negative image of the binding site allows identification of compounds whose shape complements the shape of the ligand-binding cavity as closely as possible. Furthermore, the use of several target protein conformations allows the identification of active ligands whose shape is not optimal for crystallized protein conformation. Accordingly, the presented virtual screening method improves the identification of novel lead molecules by concentrating on the optimally shaped molecules for the flexible ligand binding site.

  • Improved docking, screening and selectivity prediction for small molecule nuclear receptor modulators using conformational ensembles.
    Park, So-Jung and Kufareva, Irina and Abagyan, Ruben
    Journal of computer-aided molecular design, 2010, 24(5), 459-471
    PMID: 20455005     doi: 10.1007/s10822-010-9362-4
    Nuclear receptors (NRs) are ligand dependent transcriptional factors and play a key role in reproduction, development, and homeostasis of organism. NRs are potential targets for treatment of cancer and other diseases such as inflammatory diseases, and diabetes. In this study, we present a comprehensive library of pocket conformational ensembles of thirteen human nuclear receptors (NRs), and test the ability of these ensembles to recognize their ligands in virtual screening, as well as predict their binding geometry, functional type, and relative binding affinity. 157 known NR modulators and 66 structures were used as a benchmark. Our pocket ensemble library correctly predicted the ligand binding poses in 94% of the cases. The models were also highly selective for the active ligands in virtual screening, with the areas under the ROC curves ranging from 82 to a remarkable 99%. Using the computationally determined receptor-specific binding energy offsets, we showed that the ensembles can be used for predicting selectivity profiles of NR ligands. Our results evaluate and demonstrate the advantages of using receptor ensembles for compound docking, screening, and profiling.


  • Docking-based virtual screening: recent developments.
    Tuccinardi, Tiziano
    Combinatorial chemistry & high throughput screening, 2009, 12(3), 303-314
    PMID: 19275536    
    Virtual (database) screening (VS) of molecules promises to accelerate the discovery of new drugs and reduce costs by identifying molecules with high probabilities of binding to a target receptor. The large amount of available protein X-ray crystal structures, together with the development of more effective homology modelling techniques, has led recently to a steep increase in docking-based VS studies. This approach needs computational fitting of molecules into a receptor active site using advanced algorithms, followed by the scoring and ranking of these molecules to identify potential leads. In this review, the main published docking-based VS studies developed over the last eight years are investigated, and details are provided about the software used, the results achieved and the novel methods employed.

  • Robust optimization of scoring functions for a target class.
    Seifert, Markus H J
    Journal of computer-aided molecular design, 2009, 23(9), 633-644
    PMID: 19471858     doi: 10.1007/s10822-009-9276-1
    Target-specific optimization of scoring functions for protein-ligand docking is an effective method for significantly improving the discrimination of active and inactive molecules in virtual screening applications. Its applicability, however, is limited due to the narrow focus on, e.g., single protein structures. Using an ensemble of protein kinase structures, the publically available directory of useful decoys ligand dataset, and a novel multi-factorial optimization procedure, it is shown here that scoring functions can be tuned to multiple targets of a target class simultaneously. This leads to an improved robustness of the resulting scoring function parameters. Extensive validation experiments clearly demonstrate that (1) virtual screening performance for kinases improves significantly; (2) variations in database content affect this kind of machine-learning strategy to a lesser extent than binary QSAR models, and (3) the reweighting of interaction types is of particular importance for improved screening performance.

  • Docking Screens: Right for the Right Reasons?
    Kolb, Peter and Irwin, John J
    Current topics in medicinal chemistry, 2009, 9(9), 755-770
    Whereas docking screens have emerged as the most practical way to use protein structure for ligand discovery, an inconsistent track record raises questions about how well docking actually works. In its favor, a growing number of publications report the successful discovery of new ligands, often supported by experimental affinity data and controls for artifacts. Few reports, however, actually test the underlying structural hypotheses that docking makes. To be successful and not just lucky, prospective docking must not only rank a true ligand among the top scoring compounds, it must also correctly orient the ligand so the score it receives is biophysically sound. If the correct binding pose is not predicted, a skeptic might well infer that the discovery was serendipitous. Surveying over 15 years of the docking literature, we were surprised to discover how rarely sufficient evidence is presented to establish whether docking actually worked for the right reasons. The paucity of experimental tests of theoretically predicted poses undermines confidence in a technique that has otherwise become widely accepted. Of course, solving a crystal structure is not always possible, and even when it is, it can be a lot of work, and is not readily accessible to all groups. Even when a structure can be determined, investigators may prefer to gloss over an erroneous structural prediction to better focus on their discovery. Still, the absence of a direct test of theory by experiment is a loss for method developers seeking to understand and improve docking methods. We hope this review will motivate investigators to solve structures and compare them with their predictions whenever possible, to advance the field.

  • Docking and chemoinformatic screens for new ligands and targets
    Kolb, Peter and Ferreira, Rafaela S and Irwin, John J and Shoichet, Brian K
    Current Opinion in Biotechnology, 2009, 20(4), 429-436
    doi: 10.1016/j.copbio.2009.08.003
    ... rate of 24% [19 * ] (Figure 3). Intriguingly, five of these were inverse agonists, as was the ligand bound in the X-ray structure, carazolol, against which the screen occurred. ... This is borne out in a community-wide, blind assessment (GPCR Dock 2008 [41]) of the prediction of the ...

  • Docking and chemoinformatic screens for new ligands and targets
    Kolb, Peter and Ferreira, Rafaela S and Irwin, John J and Shoichet, Brian K
    Current Opinion in Biotechnology, 2009, 20(4), 429-436
    doi: 10.1016/j.copbio.2009.08.003
    ... rate of 24% [19 * ] (Figure 3). Intriguingly, five of these were inverse agonists, as was the ligand bound in the X-ray structure, carazolol, against which the screen occurred. ... This is borne out in a community-wide, blind assessment (GPCR Dock 2008 [41]) of the prediction of the ...

  • Reverse fingerprinting and mutual information-based activity labeling and scoring (MIBALS).
    Williams, Chris and Schreyer, Suzanne K
    Combinatorial chemistry & high throughput screening, 2009, 12(4), 424-439
    PMID: 19442069    
    A mutual information based activity labeling and scoring (MIBALS) approach to reverse fingerprint analysis is presented. Whole molecule scores produced by the method are shown to be capable of ranking compounds in virtual high-throughput screening (vHTS) experiments, while fragment scores produced by the method are able to identify pharmacophore moieties important for biological activity. The performance of MIBALS in vHTS experiments is assessed using reference ligands active against 40 different biological targets, and MIBALS retrieval rates are compared with those obtained using more traditional group fusion similarity search methods. The use of MIBALS to identify important pharmacophore fragments is demonstrated by comparing ligand fragment scores with known pharmacophores and known ligand/protein contacts. The ability of MIBALS to highlight beneficial and detrimental groups in a congeneric series is examined by comparing MIBALS fragment scores with features in known structure-activity relationships.

  • MMsINC: a large-scale chemoinformatics database.
    Masciocchi, Joel and Frau, Gianfranco and Fanton, Marco and Sturlese, Mattia and Floris, Matteo and Pireddu, Luca and Palla, Piergiorgio and Cedrati, Fabian and Rodriguez-Tomé, Patricia and Moro, Stefano
    Nucleic acids research, 2009, 37(Database issue), D284-90
    PMID: 18931373     doi: 10.1093/nar/gkn727
    MMsINC ( is a database of non-redundant, richly annotated and biomedically relevant chemical structures. A primary goal of MMsINC is to guarantee the highest quality and the uniqueness of each entry. MMsINC then adds value to these entries by including the analysis of crucial chemical properties, such as ionization and tautomerization processes, and the in silico prediction of 24 important molecular properties in the biochemical profile of each structure. MMsINC is consequently a natural input for different chemoinformatics and virtual screening applications. In addition, MMsINC supports various types of queries, including substructure queries and the novel 'molecular scissoring' query. MMsINC is interfaced with other primary data collectors, such as PubChem, Protein Data Bank (PDB), the Food and Drug Administration database of approved drugs and ZINC.

  • Ligand prediction from protein sequence and small molecule information using support vector machines and fingerprint descriptors.
    Geppert, Hanna and Humrich, Jens and Stumpfe, Dagmar and Gärtner, Thomas and Bajorath, Jürgen
    Journal of chemical information and modeling, 2009, 49(4), 767-779
    PMID: 19309114     doi: 10.1021/ci900004a
    Support vector machine (SVM) database search strategies are presented that aim at the identification of small molecule ligands for targets for which no ligand information is currently available. In pharmaceutical research and chemical biology, this situation is faced, for example, when studying orphan targets or newly identified members of protein families. To investigate methods for de novo ligand identification in the absence of known three-dimensional target structures or active molecules, we have focused on combining sequence and ligand information for closely and distantly related proteins. To provide a basis for these investigations, a set of 11 protease targets from different families was assembled together with more than 2000 inhibitors directed against individual proteases. We have compared SVM approaches that combine protein sequence and ligand information in different ways and utilize 2D fingerprints as ligand descriptors. These methodologies were applied to search for inhibitors of individual proteases not taken into account during learning. A target sequence-ligand kernel and, in particular, a linear combination of multiple target-directed SVMs consistently identified inhibitors with high accuracy including test cases where homology-based similarity searching using data fusion and conventional SVM ranking nearly or completely failed. The SVM linear combination and target-ligand kernel methods described herein are intuitive and straightforward to adopt for ligand prediction against other targets.

  • Predicting multiple ligand binding modes using self-consistent pharmacophore hypotheses.
    Wallach, Izhar and Lilien, Ryan
    Journal of chemical information and modeling, 2009, 49(9), 2116-2128
    PMID: 19711952     doi: 10.1021/ci900199e
    The ability to predict ligand binding modes without the aid of wet-lab experiments may accelerate and reduce the cost of drug discovery research. Despite significant recent progress, virtual screening has not yet eliminated the need for wet-lab experiments. For example, after a lead compound has been identified, the precise binding mode is still typically determined by experimental structural biology. This structural knowledge is then employed to guide lead optimization. We present a step toward improving protein-ligand binding mode prediction for a set of ligands known to interact with a common protein. There is thus an important distinction between this work and traditional virtual screening algorithms. Whereas traditional approaches attempt to identify binding ligands from a large database of available compounds, our approach aims to more accurately predict the binding mode for a set of ligands which are already known to bind the target protein. The approach is based on the hypothesis that each active site contains a set of interaction points which binding ligands tend to exploit. In a more traditional context, these interaction points make up a pharmacophoric map. Our algorithm first performs traditional protein-ligand docking for each known binder. The ranked lists of candidate binding modes are then evaluated to identify a set of poses maximally self-consistent with respect to a pharmacophoric map generated from the same poses. We have extensively demonstrated the application of the algorithm to four protein systems (thrombin, cyclin-dependent kinase 2, dihydrofolate reductase, and HIV-1 protease) and attained predictions with an average RMSD < 2.5 A for all tested systems. This represents a typical improvement of 0.5-1.0 A (up to 25%) RMSD over the naive virtual docking predictions. Our algorithm is independent of the docking method and may significantly improve binding mode prediction of virtual docking experiments.

  • Improving virtual screening performance against conformational variations of receptors by shape matching with ligand binding pocket.
    Lee, Hui Sun and Lee, Cheol Soon and Kim, Jeong Sook and Kim, Dong Hou and Choe, Han
    Journal of chemical information and modeling, 2009, 49(11), 2419-2428
    PMID: 19852439     doi: 10.1021/ci9002365
    In this report, we present a novel virtual high-throughput screening methodology to assist in computer-aided drug discovery. Our method, designated as SLIM, involves ligand-free shape and chemical feature matching. The procedure takes advantage of a negative image of a binding pocket in a target receptor. The negative image is a set of virtual atoms representing the inner shape and chemical features of the binding pocket. Using this image, SLIM implements a shape-based similarity search based on molecular volume superposition for the ensemble of conformers of each molecule. The superposed structures, prioritized by shape similarity, are subjected to comparison of chemical feature similarities. To validate the merits of the SLIM method, we compared its performance with those of three distinct widely used tools ROCS, GLIDE, and GOLD. ROCS was selected as a representative of the ligand-centric methods, and docking programs GLIDE and GOLD as representatives of the receptor-centric methods. Our data suggest that SLIM has overall hit ranking ability that is comparable to that of the docking method, retaining the high computational speed of the ligand-centric method. It is notable that the SLIM method offers consistently reliable screening quality against conformational variations of receptors, whereas the docking methods have limited screening performance.

  • Scoring ligand similarity in structure-based virtual screening.
    Zavodszky, Maria I and Rohatgi, Anjali and Van Voorst, Jeffrey R and Yan, Honggao and Kuhn, Leslie A
    Journal of molecular recognition : JMR, 2009, 22(4), 280-292
    PMID: 19235177     doi: 10.1002/jmr.942
    Scoring to identify high-affinity compounds remains a challenge in virtual screening. On one hand, protein-ligand scoring focuses on weighting favorable and unfavorable interactions between the two molecules. Ligand-based scoring, on the other hand, focuses on how well the shape and chemistry of each ligand candidate overlay on a three-dimensional reference ligand. Our hypothesis is that a hybrid approach, using ligand-based scoring to rank dockings selected by protein-ligand scoring, can ensure that high-ranking molecules mimic the shape and chemistry of a known ligand while also complementing the binding site. Results from applying this approach to screen nearly 70 000 National Cancer Institute (NCI) compounds for thrombin inhibitors tend to support the hypothesis. EON ligand-based ranking of docked molecules yielded the majority (4/5) of newly discovered, low to mid-micromolar inhibitors from a panel of 27 assayed compounds, whereas ranking docked compounds by protein-ligand scoring alone resulted in one new inhibitor. Since the results depend on the choice of scoring function, an analysis of properties was performed on the top-scoring docked compounds according to five different protein-ligand scoring functions, plus EON scoring using three different reference compounds. The results indicate that the choice of scoring function, even among scoring functions measuring the same types of interactions, can have an unexpectedly large effect on which compounds are chosen from screening. Furthermore, there was almost no overlap between the top-scoring compounds from protein-ligand versus ligand-based scoring, indicating the two approaches provide complementary information. Matchprint analysis, a new addition to the SLIDE (Screening Ligands by Induced-fit Docking, Efficiently) screening toolset, facilitated comparison of docked molecules' interactions with those of known inhibitors. The majority of interactions conserved among top-scoring compounds for a given scoring function, and from the different scoring functions, proved to be conserved interactions in known inhibitors. This was particularly true in the S1 pocket, which was occupied by all the docked compounds.

  • Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity.
    Quintus, Flavien and Sperandio, Olivier and Grynberg, Julien and Petitjean, Michel and Tuffery, Pierre
    Bmc Bioinformatics, 2009, 10(1), 245
    PMID: 19671127     doi: 10.1186/1471-2105-10-245
    BACKGROUND:Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of "3D knowledge" can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community.

  • Comparison of ligand- and structure-based virtual screening on the DUD data set.
    von Korff, Modest and Freyss, Joel and Sander, Thomas
    Journal of chemical information and modeling, 2009, 49(2), 209-231
    PMID: 19434824     doi: 10.1021/ci800303k
    Several in-house developed descriptors and our in-house docking tool ActDock were compared with virtual screening on the data set of useful decoys (DUD). The results were compared with the chemical fingerprint descriptor from ChemAxon and with the docking results of the original DUD publication. The DUD is the first published data set providing active molecules, decoys, and references for crystal structures of ligand-target complexes. The DUD was designed for the purpose of evaluating docking programs. It contains 2950 active compounds against a total of 40 target proteins. Furthermore, for every ligand the data set contains 36 structurally dissimilar decoy compounds with similar physicochemical properties. We extracted the ligands from the target proteins to extend the applicability of the data set to include ligand based virtual screening. From the 40 target proteins, 37 contained ligands that we used as query molecules for virtual screening evaluation. With this data set a large comparison was done between four different chemical fingerprints, a topological pharmacophore descriptor, the Flexophore descriptor, and ActDock. The Actelion docking tool relies on a MM2 forcefield and a pharmacophore point interaction statistic for scoring; the details are described in this publication. In terms of enrichment rates the chemical fingerprint descriptors performed better than the Flexophore and the docking tool. After removing molecules chemically similar to the query molecules the Flexophore descriptor outperformed the chemical descriptors and the topological pharmacophore descriptors. With the similarity matrix calculations used in this study it was shown that the Flexophore is well suited to find new chemical entities via "scaffold hopping". The Flexophore descriptor can be explored with a Java applet at in the submenu Tools->Flexophore. Its usage is free of charge and does not require registration.

  • Pharmacophore-based virtual screening versus docking-based virtual screening: a benchmark comparison against eight targets.
    Chen, Zhi and Li, Hong-lin and Zhang, Qi-jun and Bao, Xiao-guang and Yu, Kun-qian and Luo, Xiao-min and Zhu, Wei-liang and Jiang, Hua-liang
    Acta pharmacologica Sinica, 2009, 30(12), 1694-1708
    PMID: 19935678     doi: 10.1038/aps.2009.159
    AIM:This study was conducted to compare the efficiencies of two virtual screening approaches, pharmacophore-based virtual screening (PBVS) and docking-based virtual screening (DBVS) methods.

  • Structure-based drug screening and ligand-based drug screening with machine learning.
    Fukunishi, Yoshifumi
    Combinatorial chemistry & high throughput screening, 2009, 12(4), 397-408
    PMID: 19442067    
    The initial stage of drug development is the hit (active) compound search from a pool of millions of compounds; for this process, in silico (virtual) screening has been successfully applied. One of the problems of in silico screening, however, is the low hit ratio in relation to the high computational cost and the long CPU time. This problem becomes serious in structure-based in silico screening. The major reason is the low accuracy of the estimation of protein-compound binding free energy. The problem of ligand-based in silico screening is that the conventional quantitative structure-activity relationship (QSAR) approach is not effective at predicting new hit compounds with new scaffolds. Recently, machine-learning approaches have been applied to in silico drug screening to overcome the above problems. We review here machine-learning approaches for both structure-based and ligand-based drug screening. Machine learning is used to improve database enrichment in two ways, namely by improving the docking score calculated by the protein-compound docking program and by calculating the optimal distance between the feature vectors of active and inactive compounds. Both approaches require compounds that are known to be active with respect to the target protein. In structure-based screening, the former approach is mainly used with a protein-compound affinity matrix. In ligand-based screening, both the former and latter approaches are used, and the latter approach can be applied to various kinds of descriptors, such as 1D/2D descriptors/fingerprints and the affinity fingerprint given by the protein-compound affinity matrix.

  • Performance of machine learning methods for ligand-based virtual screening.
    Plewczynski, Dariusz and Spieser, Stéphane A H and Koch, Uwe
    Combinatorial chemistry & high throughput screening, 2009, 12(4), 358-368
    PMID: 19442065    
    Computational screening of compound databases has become increasingly popular in pharmaceutical research. This review focuses on the evaluation of ligand-based virtual screening using active compounds as templates in the context of drug discovery. Ligand-based screening techniques are based on comparative molecular similarity analysis of compounds with known and unknown activity. We provide an overview of publications that have evaluated different machine learning methods, such as support vector machines, decision trees, ensemble methods such as boosting, bagging and random forests, clustering methods, neuronal networks, naïve Bayesian, data fusion methods and others.

  • APIF: a new interaction fingerprint based on atom pairs and its application to virtual screening.
    Pérez-Nueno, Violeta I and Rabal, Obdulia and Borrell, José I and Teixidó, Jordi
    Journal of chemical information and modeling, 2009, 49(5), 1245-1260
    PMID: 19364101     doi: 10.1021/ci900043r
    A new interaction fingerprint (IF) called APIF (atom-pairs-based interaction fingerprint) has been developed for postprocessing protein-ligand docking results. Unlike other existing fingerprints which employ absolute locations of individual interactions, APIF considers the relative positions of pairs of interacting atoms. Docking-based virtual screening was performed with GOLD using the crystal structures of trypsin, rhinovirus, HIV protease, carboxypeptidase, and estrogen receptor-alpha as targets. A score derived from the similarity of the bit strings for each docking solution to that of a known reference binding mode was obtained. Comparisons between APIF, GoldScore function, and standard interaction fingerprint (CHIF) scores were performed using enrichment plots. Superior recovery rates were observed in the IF score cases. Comparable results were achieved by using either of the two interaction fingerprints, substantially improving GoldScore function enrichment factors. Binding mode analyses were also carried out in order to study the best method for selecting conformations with a binding mode similar to that of the reference crystallized complex. These showed that the first conformations retrieved by interaction fingerprint scores had a more similar binding mode to the reference complex than those retrieved by the GoldScore function.

  • Beyond the virtual screening paradigm: structure-based searching for new lead compounds.
    Schlosser, Jochen and Rarey, Matthias
    Journal of chemical information and modeling, 2009, 49(4), 800-809
    PMID: 19354328     doi: 10.1021/ci9000212
    The standard approach to structure-based high-throughput virtual screening is a sequential procedure: Each molecule of a given library is screened against the target protein, eventually generating a ranked list of molecules. In this paper a new paradigm avoiding the sequential screening pipeline is presented. Based on a novel descriptor, compounds can be directly accessed by their chemical and shape complementarity to a given protein active site. The docking calculation is performed inherently during the search process since each search result automatically implies a ligand pose in the active site. The new method named TrixX BMI is ideally suited for application scenarios in which medicinal chemists request a certain pharmacophore interaction pattern to a protein. By using an innovative indexing technology, sublinear runtimes in the number of ligands can be achieved. Redocking experiments show that TrixX BMI correctly predicts the pose of the bioactive conformation within an rmsd of less than 2.0 A of the cocrystallized ligand in 80% of 85 protein-ligand complexes of the Astex Diverse Set. In addition to that several comparative enrichment experiments show that TrixX BMI is on a competitive basis to established virtual screening technology, while the observed runtimes are clearly below one second per compound.

  • Novel approach for efficient pharmacophore-based virtual screening: method and applications.
    Dror, Oranit and Schneidman-Duhovny, Dina and Inbar, Yuval and Nussinov, Ruth and Wolfson, Haim J
    Journal of chemical information and modeling, 2009, 49(10), 2333-2343
    PMID: 19803502     doi: 10.1021/ci900263d
    Virtual screening is emerging as a productive and cost-effective technology in rational drug design for the identification of novel lead compounds. An important model for virtual screening is the pharmacophore. Pharmacophore is the spatial configuration of essential features that enable a ligand molecule to interact with a specific target receptor. In the absence of a known receptor structure, a pharmacophore can be identified from a set of ligands that have been observed to interact with the target receptor. Here, we present a novel computational method for pharmacophore detection and virtual screening. The pharmacophore detection module is able to (i) align multiple flexible ligands in a deterministic manner without exhaustive enumeration of the conformational space, (ii) detect subsets of input ligands that may bind to different binding sites or have different binding modes, (iii) address cases where the input ligands have different affinities by defining weighted pharmacophores based on the number of ligands that share them, and (iv) automatically select the most appropriate pharmacophore candidates for virtual screening. The algorithm is highly efficient, allowing a fast exploration of the chemical space by virtual screening of huge compound databases. The performance of PharmaGist was successfully evaluated on a commonly used data set of G-Protein Coupled Receptor alpha1A. Additionally, a large-scale evaluation using the DUD (directory of useful decoys) data set was performed. DUD contains 2950 active ligands for 40 different receptors, with 36 decoy compounds for each active ligand. PharmaGist enrichment rates are comparable with other state-of-the-art tools for virtual screening.

  • Validation of molecular docking programs for virtual screening against dihydropteroate synthase.
    Hevener, Kirk E and Zhao, Wei and Ball, David M and Babaoglu, Kerim and Qi, Jianjun and White, Stephen W and Lee, Richard E
    Journal of chemical information and modeling, 2009, 49(2), 444-460
    PMID: 19434845     doi: 10.1021/ci800293n
    Dihydropteroate synthase (DHPS) is the target of the sulfonamide class of antibiotics and has been a validated antibacterial drug target for nearly 70 years. The sulfonamides target the p-aminobenzoic acid (pABA) binding site of DHPS and interfere with folate biosynthesis and ultimately prevent bacterial replication. However, widespread bacterial resistance to these drugs has severely limited their effectiveness. This study explores the second and more highly conserved pterin binding site of DHPS as an alternative approach to developing novel antibiotics that avoid resistance. In this study, five commonly used docking programs, FlexX, Surflex, Glide, GOLD, and DOCK, and nine scoring functions, were evaluated for their ability to rank-order potential lead compounds for an extensive virtual screening study of the pterin binding site of B. anthracis DHPS. Their performance in ligand docking and scoring was judged by their ability to reproduce a known inhibitor conformation and to efficiently detect known active compounds seeded into three separate decoy sets. Two other metrics were used to assess performance; enrichment at 1% and 2% and Receiver Operating Characteristic (ROC) curves. The effectiveness of postdocking relaxation prior to rescoring and consensus scoring were also evaluated. Finally, we have developed a straightforward statistical method of including the inhibition constants of the known active compounds when analyzing enrichment results to more accurately assess scoring performance, which we call the 'sum of the sum of log rank' or SSLR. Of the docking and scoring functions evaluated, Surflex with Surflex-Score and Glide with GlideScore were the best overall performers for use in virtual screening against the DHPS target, with neither combination showing statistically significant superiority over the other in enrichment studies or pose selection. Postdocking ligand relaxation and consensus scoring did not improve overall enrichment.

  • Comparison of several molecular docking programs: pose prediction and virtual screening accuracy.
    Cross, Jason B and Thompson, David C and Rai, Brajesh K and Baber, J Christian and Fan, Kristi Yi and Hu, Yongbo and Humblet, Christine
    Journal of chemical information and modeling, 2009, 49(6), 1455-1474
    PMID: 19476350     doi: 10.1021/ci900056c
    Molecular docking programs are widely used modeling tools for predicting ligand binding modes and structure based virtual screening. In this study, six molecular docking programs (DOCK, FlexX, GLIDE, ICM, PhDOCK, and Surflex) were evaluated using metrics intended to assess docking pose and virtual screening accuracy. Cognate ligand docking to 68 diverse, high-resolution X-ray complexes revealed that ICM, GLIDE, and Surflex generated ligand poses close to the X-ray conformation more often than the other docking programs. GLIDE and Surflex also outperformed the other docking programs when used for virtual screening, based on mean ROC AUC and ROC enrichment values obtained for the 40 protein targets in the Directory of Useful Decoys (DUD). Further analysis uncovered general trends in accuracy that are specific for particular protein families. Modifying basic parameters in the software was shown to have a significant effect on docking and virtual screening results, suggesting that expert knowledge is critical for optimizing the accuracy of these methods.

  • Ultrafast shape recognition: evaluating a new ligand-based virtual screening technology.
    Ballester, Pedro J and Finn, Paul W and Richards, W Graham
    Journal of molecular graphics & modelling, 2009, 27(7), 836-845
    PMID: 19188082     doi: 10.1016/j.jmgm.2009.01.001
    Large scale database searching to identify molecules that share a common biological activity for a target of interest is widely used in drug discovery. Such an endeavour requires the availability of a method encoding molecular properties that are indicative of biological activity and at least one active molecule to be used as a template. Molecular shape has been shown to be an important indicator of biological activity; however, currently used methods are relatively slow, so faster and more reliable methods are highly desirable. Recently, a new non-superposition based method for molecular shape comparison, called Ultrafast Shape Recognition (USR), has been devised with computational performance at least three orders of magnitude faster than previously existing methods. In this study, we investigate the performance of USR in retrieving biologically active compounds through retrospective Virtual Screening experiments. Results show that USR performs better on average than a commercially available shape similarity method, while screening conformers at a rate that is more than 2500 times faster. This outstanding computational performance is particularly useful for searching much larger portions of chemical space than previously possible, which makes USR a very valuable new tool in the search for new lead molecules for drug discovery programs.

  • Automated Docking Screens: A Feasibility Study
    Irwin, John J and Shoichet, Brian K and Mysinger, Michael M. and Huang, Niti and Colizzi, Francesco and Wassam, Pascal and Cao, Yiqun
    Journal of medicinal chemistry, 2009, 52(18), 5712-5720
    PMID: 19719084     doi: 10.1021/jm9006966
    Molecular docking is the most practical approach to leverage protein structure for ligand discovery, but the technique retains important liabilities that make it challenging to deploy on a large scale. We have therefore created an expert system, DOCK Blaster, to investigate the feasibility of full automation. The method requires a PDB code, sometimes with a ligand structure, and from that alone can launch a full screen of large libraries. A critical feature is self-assessment, which estimates the anticipated reliability of the automated screening results using pose fidelity and enrichment. Against common benchmarks, DOCK Blaster recapitulates the crystal ligand pose within 2 angstrom rmsd 50-60% of the time; inferior to an expert, but respectrable. Half the time the ligand also ranked among the top 5% of 100 physically matched decoys chosen on the fly. Further tests were undertaken culminating in a study of 7755 eligible PDB structures. In 1398 cases, the redocked ligand ranked in the top 5% of 100 property-matched decoys while also posing within 2 angstrom rmsd, suggesting that unsupervised prospective docking is viable. DOCK Blaster is available at

  • Systematic extraction of structure-activity relationship information from biological screening data.
    Wawer, Mathias and Bajorath, Jürgen
    Chemmedchem, 2009, 4(9), 1431-1438
    PMID: 19621333     doi: 10.1002/cmdc.200900222
    A data mining approach is introduced that automatically extracts SAR information from high-throughput screening data sets and that helps to select active compounds for chemical exploration and hit-to-lead projects. SAR pathways are systematically identified consisting of sequences of similar active compounds with gradual increases in potency. Fully enumerated SAR pathway sets are subjected to pathway scoring, filtering, and mining, and pathways with the most significant SAR information content are prioritized. High-scoring SAR pathways often reveal activity cliffs contained in screening data. Subsets of SAR pathways are analyzed in SAR trees that make it possible to identify microenvironments of significant SAR discontinuity from which hits are preferentially selected. SAR trees of alternative pathways leading to activity cliffs identify key compounds and help to develop chemically intuitive SAR hypotheses.

  • 3-D clustering: a tool for high throughput docking
    Priestle, John P.
    Journal of Molecular Modeling, 2009, 15(5), 551-560
    PMID: 19085027     doi: 10.1007/s00894-008-0360-6
    This report describes a computer program for clustering docking poses based on their 3-dimensional (3D) coordinates as well as on their chemical structures. This is chiefly intended for reducing a set of hits coming from high throughput docking, since the capacity to prepare and biologically test such molecules is generally far more limited than the capacity to generate such hits. The advantage of clustering molecules based on 3D, rather than 2D, criteria is that small variations on a scaffold may bring about different binding modes for molecules that would not be predicted by 2D similarity alone. The program does a pose-by-pose/atom-by-atom comparison of a set of docking hits (poses), scoring both spatial and chemical similarity. Using these pair-wise similarities, the whole set is clustered based on a user-supplied similarity threshold. An output coordinate file is created that mirrors the input coordinate file, but contains two new properties: a cluster number and similarity to the cluster center. Poses in this output file can easily be sorted by cluster and displayed together for visual inspection with any standard molecular viewing program, and decisions made about which molecule should be selected for biological testing as the best representative of this group of similar molecules with similar binding modes.

  • Novel Method for Generating Structure-Based Pharmacophores Using Energetic Analysis
    Salam, Noeris K. and Nuti, Roberto and Sherman, Woody
    Journal of chemical information and modeling, 2009, 49(10), 2356-2368
    PMID: 19761201     doi: 10.1021/ci900212v
    We describe a novel method to develop energetically optimized, structure-based pharmacophores for use in rapid in silico screening. The method combines pharmacophore perception and database screening with protein-ligand energetic terms computed by the Glide XP scoring function to rank the importance of pharmacophore features. We derive energy-optimized pharmacophore hypotheses for 30 pharmaceutically relevant crystal structures and screen a database to assess the enrichment of active compounds. The method is compared to three other approaches: (1) pharmacophore hypotheses derived from a systematic assessment of receptor-ligand contacts, (2) Glide SP docking, and (3) 2D ligand fingerprint similarity. The method developed here shows better enrichments than the other three methods and yields a greater diversity of actives than the contact-based pharmacophores or the 2D ligand similarity. Docking produces the most cases (28/30) with enrichments greater than 10.0 in the top I% of the database and on average produces the greatest diversity of active molecules. The combination of energy terms from a structure-based analysis with the speed of a ligand-based pharmacophore search.results in a method that leverages the strengths of both approaches to produce high enrichments with a g6od diversity of active molecules.

  • Structure-Based Virtual Ligand Screening: Recent Success Stories
    Villoutreix, Bruno O. and Eudes, Richard and Miteva, Maria A.
    Combinatorial chemistry & high throughput screening, 2009, 12(10), 1000-1016
    doi: 10.2174/138620709789824682
    Today, computational methods are commonly used in all areas of health science research. Among these methods, virtual ligand screening has become an established technique for hit discovery and optimization. In this review, we first introduce structure-based virtual ligand screening and briefly comment on compound collections and target preparations. We also provide the readers with a list of resources, from chemoinformatics packages to compound collections, which could be helpful to implement a structure-based virtual screening platform. Then we discuss seventeen recent success stories obtained with various receptor-based in silico methods, performed on experimental structures (Xray crystallography, 12 cases) or homology models (5 cases) and concerning different target classes, from the design of catalytic site inhibitors to drug-like compounds impeding macromolecular interactions. In light of these results, some suggestions are made about areas that present opportunities for improvements.

  • Critical comparison of virtual screening methods against the MUV data set.
    Tiikkainen, Pekka and Markt, Patrick and Wolber, Gerhard and Kirchmair, Johannes and Distinto, Simona and Poso, Antti and Kallioniemi, Olli
    Journal of chemical information and modeling, 2009, 49(10), 2168-2178
    PMID: 19799417     doi: 10.1021/ci900249b
    In the current work, we measure the performance of seven ligand-based virtual screening tools-five similarity search methods and two pharmacophore elucidators-against the MUV data set. For the similarity search tools, single active molecules as well as active compound sets clustered in terms of their chemical diversity were used as templates. Their score was calculated against all inactive and active compounds in their target class. Subsequently, the scores were used to calculate different performance metrics including enrichment factors and AUC values. We also studied the effect of data fusion on the results. To measure the performance of the pharmacophore tools, a set of active molecules was picked either random- or chemical diversity-based from each target class to build a pharmacophore model which was then used to screen the remaining compounds in the set. Our results indicate that template sets selected by their chemical diversity are the best choice for similarity search tools, whereas the optimal training sets for pharmacophore elucidators are based on random selection underscoring that pharmacophore modeling cannot be easily automated. We also suggest a number of improvements for future benchmark sets and discuss activity cliffs as a potential problem in ligand-based virtual screening.

  • Docking, virtual high throughput screening and in silico fragment-based drug design.
    Zoete, Vincent and Grosdidier, Aurélien and Michielin, Olivier
    Journal of cellular and molecular medicine, 2009, 13(2), 238-248
    PMID: 19183238     doi: 10.1111/j.1582-4934.2008.00665.x
    The drug discovery process has been profoundly changed recently by the adoption of computational methods helping the design of new drug candidates more rapidly and at lower costs. In silico drug design consists of a collection of tools helping to make rational decisions at the different steps of the drug discovery process, such as the identification of a biomolecular target of therapeutical interest, the selection or the design of new lead compounds and their modification to obtain better affinities, as well as pharmacokinetic and pharmacodynamic properties. Among the different tools available, a particular emphasis is placed in this review on molecular docking, virtual high-throughput screening and fragment-based ligand design.


  • Evaluation of the performance of 3D virtual screening protocols: RMSD comparisons, enrichment assessments, and decoy selection-what can we learn from earlier mistakes?
    Kirchmair, Johannes and Markt, Patrick and Distinto, Simona and Wolber, Gerhard and Langer, Thierry
    Journal of computer-aided molecular design, 2008, 22(3-4), 213-228
    PMID: 18196462     doi: 10.1007/s10822-007-9163-6
    Within the last few years a considerable amount of evaluative studies has been published that investigate the performance of 3D virtual screening approaches. Thereby, in particular assessments of protein-ligand docking are facing remarkable interest in the scientific community. However, comparing virtual screening approaches is a non-trivial task. Several publications, especially in the field of molecular docking, suffer from shortcomings that are likely to affect the significance of the results considerably. These quality issues often arise from poor study design, biasing, by using improper or inexpressive enrichment descriptors, and from errors in interpretation of the data output. In this review we analyze recent literature evaluating 3D virtual screening methods, with focus on molecular docking. We highlight problematic issues and provide guidelines on how to improve the quality of computational studies. Since 3D virtual screening protocols are in general assessed by their ability to discriminate between active and inactive compounds, we summarize the impact of the composition and preparation of test sets on the outcome of evaluations. Moreover, we investigate the significance of both classic enrichment parameters and advanced descriptors for the performance of 3D virtual screening methods. Furthermore, we review the significance and suitability of RMSD as a measure for the accuracy of protein-ligand docking algorithms and of conformational space sub sampling algorithms.

  • Is it possible to increase hit rates in structure-based virtual screening by pharmacophore filtering? An investigation of the advantages and pitfalls of post-filtering.
    Muthas, Daniel and Sabnis, Yogesh A and Lundborg, Magnus and Karlén, Anders
    Journal of molecular graphics & modelling, 2008, 26(8), 1237-1251
    PMID: 18203638     doi: 10.1016/j.jmgm.2007.11.005
    We have investigated the influence of post-filtering virtual screening results, with pharmacophoric features generated from an X-ray structure, on enrichment rates. This was performed using three docking softwares, zdock+, Surflex and FRED, as virtual screening tools and pharmacophores generated in UNITY from co-crystallized complexes. Sets of known actives along with 9997 pharmaceutically relevant decoy compounds were docked against six chemically diverse protein targets namely CDK2, COX2, ERalpha, fXa, MMP3, and NA. To try to overcome the inherent limitations of the well-known docking problem, we generated multiple poses for each compound. The compounds were first ranked according to their scores alone and enrichment rates were calculated using only the top scoring pose of each compound. Subsequently, all poses for each compound were passed through the different pharmacophores generated from co-crystallized complexes and the enrichment factors were re-calculated based on the top-scoring passing pose of each compound. Post-filtering with a pharmacophore generated from only one X-ray complex was shown to increase enrichment rates in all investigated targets compared to docking alone. This indicates that this is a general method, which works for diverse targets and different docking softwares.

  • Consensus scoring with feature selection for structure-based virtual screening
    Teramoto, Reiji and Fukunishi, Hiroaki
    Journal of chemical information and modeling, 2008, 48(2), 288-295
    doi: 10.1021/ci700239t
    The evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, and scoring functions play significant roles in it. While consensus scoring (CS) generally improves enrichment by compensating for the deficiencies of each scoring function, the strategy of how individual scoring functions are selected remains a challenging task when few known active compounds are available. To address this problem, we propose feature selection-based consensus scoring (FSCS), which performs supervised feature selection with docked native ligand conformations to select complementary scoring functions. We evaluated the enrichments of five scoring functions (F-Score, D-Score, PMF, G-Score, and ChemScore), FSCS, and RCS (rank-by-rank consensus scoring) for four different target proteins: acetylcholine esterase (AChE), thrombin (thrombin), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPAR gamma). The results indicated that FSCS was able to select the complementary scoring functions and enhance ligand enrichments and that it outperformed RCS and the individual scoring functions for all target proteins. They also indicated that the performances of the single scoring functions were strongly dependent on the target protein. An especially favorable result with implications for practical drug screening is that FSCS performs well even if only one 3D structure of the protein-ligand complex is known. Moreover, we found that one can infer which scoring functions significantly enrich active compounds by using feature selection before actual docking and that the selected scoring functions are complementary.

  • Structure-based virtual screening with supervised consensus scoring: Evaluation of pose prediction and enrichment factors
    Teramoto, Reiji and Fukunishi, Hiroaki
    Journal of chemical information and modeling, 2008, 48(4), 747-754
    doi: 10.1021/ci700464x
    Since the evaluation of ligand conformations is a crucial aspect of structure-based virtual screening, scoring functions play significant roles in it. However, it is known that a scoring function does not always work well for all target proteins. When one cannot know which scoring function works best against a target protein a priori, there is no standard scoring method to know it even if 3D structure of a target protein-ligand complex is available. Therefore, development of the method to achieve high enrichments from given scoring functions and 3D structure of protein-ligand complex is a crucial and challenging task. To address this problem, we applied SCS (supervised consensus scoring), which employs a rough linear correlation between the binding free energy and the root-mean-square deviation (rmsd) of a native ligand conformations and incorporates protein-ligand binding process with docked ligand conformations using supervised learning, to virtual screening. We evaluated both the docking poses and enrichments of SCS and five scoring functions (F-Score, G-Score, D-Score, ChemScore, and PMF) for three different target proteins: thymidine kinase (TK), thrombin (thrombin), and peroxisome proliferator-activated receptor gamma (PPAR gamma). Our enrichment studies show that SCS is competitive or superior to a best single scoring function at the top ranks of screened database. We found that the enrichments of SCS could be limited by a best scoring function, because SCS is obtained on the basis of the five individual scoring functions. Therefore, it is concluded that SCS works very successfully from our results. Moreover, from docking pose analysis, we revealed the connection between enrichment and average centroid distance of top-scored docking poses. Since SCS requires only one 3D structure of protein-ligand complex, SCS will be useful for identifying new ligands.

  • DOVIS: an implementation for high-throughput virtual screening using AutoDock.
    Zhang, Shuxing and Kumar, Kamal and Jiang, Xiaohui and Wallqvist, Anders and Reifman, Jaques
    Bmc Bioinformatics, 2008, 9, 126
    PMID: 18304355     doi: 10.1186/1471-2105-9-126
    BACKGROUND:Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial.

  • DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0.
    Jiang, Xiaohui and Kumar, Kamal and Hu, Xin and Wallqvist, Anders and Reifman, Jaques
    Chemistry Central journal, 2008, 2, 18
    PMID: 18778471     doi: 10.1186/1752-153X-2-18
    BACKGROUND:Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS) to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability.

  • Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results.
    Sheridan, Robert P and McGaughey, Georgia B and Cornell, Wendy D
    Journal of computer-aided molecular design, 2008, 22(3-4), 257-265
    PMID: 18273559     doi: 10.1007/s10822-008-9168-9
    As an extension to a previous published study (McGaughey et al., J Chem Inf Model 47:1504-1519, 2007) comparing 2D and 3D similarity methods to docking, we apply a subset of those virtual screening methods (TOPOSIM, SQW, ROCS-color, and Glide) to a set of protein/ligand pairs where the protein is the target for docking and the cocrystallized ligand is the target for the similarity methods. Each protein is represented by a maximum of five crystal structures. We search a diverse subset of the MDDR as well as a diverse small subset of the MCIDB, Merck's proprietary database. It is seen that the relative effectiveness of virtual screening methods, as measured by the enrichment factor, is highly dependent on the particular crystal structure or ligand, and on the database being searched. 2D similarity methods appear very good for the MDDR, but poor for the MCIDB. However, ROCS-color (a 3D similarity method) does well for both databases.

  • AMMOS: Automated Molecular Mechanics Optimization tool for in silico Screening.
    Pencheva, Tania and Lagorce, David and Pajeva, Ilza and Villoutreix, Bruno O. and Miteva, Maria A.
    Bmc Bioinformatics, 2008, 9, 438
    PMID: 18925937     doi: 10.1186/1471-2105-9-438
    BACKGROUND:Virtual or in silico ligand screening combined with other computational methods is one of the most promising methods to search for new lead compounds, thereby greatly assisting the drug discovery process. Despite considerable progresses made in virtual screening methodologies, available computer programs do not easily address problems such as: structural optimization of compounds in a screening library, receptor flexibility/induced-fit, and accurate prediction of protein-ligand interactions. It has been shown that structural optimization of chemical compounds and that post-docking optimization in multi-step structure-based virtual screening approaches help to further improve the overall efficiency of the methods. To address some of these points, we developed the program AMMOS for refining both, the 3D structures of the small molecules present in chemical libraries and the predicted receptor-ligand complexes through allowing partial to full atom flexibility through molecular mechanics optimization.

  • Ligand-target interaction-based weighting of substructures for virtual screening.
    Crisman, Thomas J and Sisay, Mihiret T. and Bajorath, Jürgen
    Journal of chemical information and modeling, 2008, 48(10), 1955-1964
    PMID: 18821751     doi: 10.1021/ci800229q
    A methodology is introduced to assign energy-based scores to two-dimensional (2D) structural features based on three-dimensional (3D) ligand-target interaction information and utilize interaction-annotated features in virtual screening. Database molecules containing such fragments are assigned cumulative scores that serve as a measure of similarity to active reference compounds. The Interaction Annotated Structural Features (IASF) method is applied to mine five high-throughput screening (HTS) data sets and often identifies more hits than conventional fragment-based similarity searching or ligand-protein docking.

  • Comparison of ligand-based and receptor-based virtual screening of HIV entry inhibitors for the CXCR4 and CCR5 receptors using 3D ligand shape matching and ligand-receptor docking.
    Pérez-Nueno, Violeta I and Ritchie, David W and Rabal, Obdulia and Pascual, Rosalia and Borrell, José I and Teixidó, Jordi
    Journal of chemical information and modeling, 2008, 48(3), 509-533
    PMID: 18298095     doi: 10.1021/ci700415g
    HIV infection is initiated by fusion of the virus with the target cell through binding of the viral gp120 protein with the CD4 cell surface receptor protein and the CXCR4 or CCR5 co-receptors. There is currently considerable interest in developing novel ligands that can modulate the conformations of these co-receptors and, hence, ultimately block virus-cell fusion. This article describes a detailed comparison of the performance of receptor-based and ligand-based virtual screening approaches to find CXCR4 and CCR5 antagonists that could potentially serve as HIV entry inhibitors. Because no crystal structures for these proteins are available, homology models of CXCR4 and CCR5 have been built, using bovine rhodopsin as the template. For ligand-based virtual screening, several shape-based and property-based molecular comparison approaches have been compared, using high-affinity ligands as query molecules. These methods were compared by virtually screening a library assembled by us, consisting of 602 known CXCR4 and CCR5 inhibitors and some 4700 similar presumed inactive molecules. For each receptor, the library was queried using known binders, and the enrichment factors and diversity of the resulting virtual hit lists were analyzed. Overall, ligand-based shape-matching searches yielded higher enrichments than receptor-based docking, especially for CXCR4. The results obtained for CCR5 suggest the possibility that different active scaffolds bind in different ways within the CCR5 pocket.

  • FieldChopper, a new tool for automatic model generation and virtual screening based on molecular fields.
    Kalliokoski, Tuomo and Ronkko, Toni and Poso, Antti
    Journal of chemical information and modeling, 2008, 48(6), 1131-1137
    PMID: 18489083     doi: 10.1021/ci700216u
    Algorithms were developed for ligand-based virtual screening of molecular databases. FieldChopper (FC) is based on the discretization of the electrostatic and van der Waals field into three classes. A model is built from a set of superimposed active molecules. The similarity of the compounds in the database to the model is then calculated using matrices that define scores for comparing field values of different categories. The method was validated using 12 publicly available data sets by comparing the method to the electrostatic similarity comparison program EON. The results suggest that FC is competitive with more complex descriptors and could be used as a molecular sieve in virtual screening experiments when multiple active ligands are known.

  • PDTD: a web-accessible protein database for drug target identification
    Gao, Zhenting and Li, Honglin and Zhang, Hailei and Liu, Xiaofeng and Kang, Ling and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Wang, Xicheng and Jiang, Hualiang
    Bmc Bioinformatics, 2008, 9, -
    PMID: 18282303     doi: 10.1186/1471-2105-9-104
    Background: Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking), which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation.Description: PDTD is a web-accessible protein database for in silico target identification. It currently contains > 1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of > 830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores.Conclusion: PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at

  • Ligand-based approaches in virtual screening
    Douguet, Dominique
    Current computer-aided drug design, 2008, 4(3), 180-190
    Although there are many more receptor structures than there were in the 1970s and 1980s, drug discovery remains dominated by empirical screening and substrate-based drug design. Computer-aided drug design methods have become value-adding disciplines that now contribute to the early stage of the drug discovery process [1, 2]. Computational methods encompass all aspects of drug discovery from target assessment to lead optimization. The computational strategy varies from case to case and can be influenced by several situational variables: lead hunting or lead optimization, requirement for a novel lead class, type of biological assay, structural information available, known classes of ligands, allocated chemistry resources. Today, drug discovery is still a complex and approximate science. Thus, incorporating knowledge-based approaches like ligand-based screenings may bias the process towards success. This review describes these strategies with practical applications and presents future perspectives of ligand-based screening.

  • Lead finder: an approach to improve accuracy of protein-ligand docking, binding energy estimation, and virtual screening.
    Stroganov, Oleg V and Novikov, Fedor N and Stroylov, Viktor S and Kulkov, Val and Chilov, Ghermes G
    Journal of chemical information and modeling, 2008, 48(12), 2371-2385
    PMID: 19007114     doi: 10.1021/ci800166p
    An innovative molecular docking algorithm and three specialized high accuracy scoring functions are introduced in the Lead Finder docking software. Lead Finder's algorithm for ligand docking combines the classical genetic algorithm with various local optimization procedures and resourceful exploitation of the knowledge generated during docking process. Lead Finder's scoring functions are based on a molecular mechanics functional which explicitly accounts for different types of energy contributions scaled with empiric coefficients to produce three scoring functions tailored for (a) accurate binding energy predictions; (b) correct energy-ranking of docked ligand poses; and (c) correct rank-ordering of active and inactive compounds in virtual screening experiments. The predicted values of the free energy of protein-ligand binding were benchmarked against a set of experimentally measured binding energies for 330 diverse protein-ligand complexes yielding rmsd of 1.50 kcal/mol. The accuracy of ligand docking was assessed on a set of 407 structures, which included almost all published test sets of the following programs: FlexX, Glide SP, Glide XP, Gold, LigandFit, MolDock, and Surflex. rmsd of 2 A or less was observed for 80-96% of the structures in the test sets (80.0% on the Glide XP and FlexX test sets, 96.0% on the Surflex and MolDock test sets). The ability of Lead Finder to distinguish between active and inactive compounds during virtual screening experiments was benchmarked against 34 therapeutically relevant protein targets. Impressive enrichment factors were obtained for almost all of the targets with the average area under receiver operator curve being equal to 0.92.

  • LASSO-ligand activity by surface similarity order: a new tool for ligand based virtual screening
    Reid, Darryl and Sadjad, Bashir S and Zsoldos, Zsolt and Simon, Aniko
    Journal of computer-aided molecular design, 2008, 22, 479-487
    PMID: 18204980     doi: 10.1007/s10822-007-9164-5
    Virtual Ligand Screening (VLS) has become an integral part of the drug discovery process for many pharmaceutical companies. Ligand similarity searches provide a very powerful method of screening large databases of ligands to identify possible hits. If these hits belong to new chemotypes the method is deemed even more successful. eHiTS LASSO uses a new interacting surface point types (ISPT) molecular descriptor that is generated from the 3D structure of the ligand, but unlike most 3D descriptors it is conformation independent. Combined with a neural network machine learning technique, LASSO screens molecular databases at an ultra fast speed of 1 million structures in under 1 min on a standard PC. The results obtained from eHiTS LASSO trained on relatively small training sets of just 2, 4 or 8 actives are presented using the diverse directory of useful decoys (DUD) dataset. It is shown that over a wide range of receptor families, eHiTS LASSO is consistently able to enrich screened databases and provides scaffold hopping ability.

  • Virtual screening for the discovery of bioactive natural products.
    Rollinger, Judith M and Stuppner, Hermann and Langer, Thierry
    Progress in drug research. Fortschritte der Arzneimittelforschung. Progrès des recherches pharmaceutiques, 2008, 65, 211, 213-49
    PMID: 18084917    
    In this survey the impact of the virtual screening concept is discussed in the field of drug discovery from nature. Confronted by a steadily increasing number of secondary metabolites and a growing number of molecular targets relevant in the therapy of human disorders, the huge amount of information needs to be handled. Virtual screening filtering experiments already showed great promise for dealing with large libraries of potential bioactive molecules. It can be utilized for browsing databases for molecules fitting either an established pharmacophore model or a three dimensional (3D) structure of a macromolecular target. However, for the discovery of natural lead candidates the application of this in silico tool has so far almost been neglected. There are several reasons for that. One concerns the scarce availability of natural product (NP) 3D databases in contrast to synthetic libraries; another reason is the problematic compatibility of NPs with modern robotized high throughput screening (HTS) technologies. Further arguments deal with the incalculable availability of pure natural compounds and their often too complex chemistry. Thus research in this field is time-consuming, highly complex, expensive and ineffective. Nevertheless, naturally derived compounds are among the most favorable source of drug candidates. A more rational and economic search for new lead structures from nature must therefore be a priority in order to overcome these problems. Here we demonstrate some basic principles, requirements and limitations of virtual screening strategies and support their applicability in NP research with already performed studies. A sensible exploitation of the molecular diversity of secondary metabolites however asks for virtual screening concepts that are interfaced with well-established strategies from classical pharmacognosy that are used in an effort to maximize their efficacy in drug discovery. Such integrated virtual screening workflows are outlined here and shall help to motivate NP researchers to dare a step towards this powerful in silico tool.

  • Integrating Structure- and Ligand-Based Virtual Screening: Comparison of Individual, Parallel, and Fused Molecular Docking and Similarity Search Calculations on Multiple Targets
    Tan, Lu and Geppert, Hanna and Sisay, Mihiret T. and Guetschow, Michael and Bajorath, Juergen
    Chemmedchem, 2008, 3(10), 1566-1571
    doi: 10.1002/cmdc.200800129
    Similarity searching is often used to preselect compounds for docking, thereby decreasing the size of screening databases. However, integrated structure- and ligand-based screening schemes are rare at present. Docking and similarity search calculations using 2D fingerprints were carried out in a comparative manner on nine target enzymes, for which significant numbers of diverse inhibitors could be obtained. In the absence of knowledge-based docking constraints and target-directed parameter optimisation, fingerprint searching displayed a clear preference over docking calculations. Alternative combinations of docking and similarity search results were investigated and found to further increase compound recall of individual methods in a number of instances. When the results of similarity searching and docking were combined, parallel selection of candidate compounds from individual rankings was generally superior to rank fusion. We suggest that complementary results from docking and similarity searching can be captured by integrated compound selection schemes.

  • Evaluating docking programs: keeping the playing field level.
    Liebeschuetz, John W
    Journal of computer-aided molecular design, 2008, 22(3-4), 229-238
    PMID: 18196461     doi: 10.1007/s10822-008-9169-8
    Over recent years many enrichment studies have been published which purport to rigorously compare the performance of two or more docking protocols. It has become clear however that such studies often have flaws within their methodologies, which cast doubt on the rigour of the conclusions. Setting up such comparisons is fraught with difficulties and no best mode of practice is available to guide the experimenter. Careful choice of structural models and ligands appropriate to those models is important. The protein structure should be representative for the target. In addition the set of active ligands selected should be appropriate to the structure in cases where different forms of the protein bind different classes of ligand. Binding site definition is also an area in which errors arise. Particular care is needed in deciding which crystallographic waters to retain and again this may be predicated by knowledge of the likely binding modes of the ligands making up the active ligand list. Geometric integrity of the ligand structures used is clearly important yet it is apparent that published sets of actives + decoys may contain sometimes high proportions of incorrect structures. Choice of protocol for docking and analysis needs careful consideration as many programs can be tweaked for optimum performance. Should studies be run using 'black box' protocols supplied by the software provider? Lastly, the correct method of analysis of enrichment studies is a much discussed topic at the moment. However currently promoted approaches do not consider a crucial aspect of a successful virtual screen, namely that a good structural diversity of hits be returned. Overall there is much to consider in the experimental design of enrichment studies. Hopefully this study will be of benefit in helping others plan such experiments.

  • FieldScreen: virtual screening using molecular fields. Application to the DUD data set
    Cheeseright, TJ and Mackey, MD
    Journal of chemical\ldots}, 2008, 48(11), 2108-2117
    FieldScreen, a ligand-based Virtual Screening (VS) method, is described. Its use of 3D molecular fields makes it particularly suitable for scaffold hopping, and we have rigorously validated it for this purpose using a clustered version of the Directory of Useful Decoys (DUD). Using thirteen pharmaceutically relevant targets, we demonstrate that FieldScreen produces superior early chemotype enrichments, compared to DOCK. Additionally, hits retrieved by FieldScreen are consistently lower in molecular weight than those retrieved by docking. Where no X-ray protein structures are available, FieldScreen searches are more robust than docking into homology models or apo structures.

  • Synergies of Virtual Screening Approaches
    Muegge, Ingo
    Mini Reviews in Medicinal Chemistry, 2008, 8(9), 927-933
    doi: 10.2174/138955708785132792
    Virtual screening is a knowledge driven approach. Therefore, synergies between different virtual screening methods using information about the drug target as well as about known ligands in combination promise the best results. Finding novel active scaffolds is often a more important success criterion than hit rates of virtual screens. Novelty should also be considered in balance with often weaker activities of virtual screening hits. Virtual screening is most effective if performed in iterations following up on weak primary hits of interest through testing of structural analogs and additional synthesis of compounds.

  • Essential factors for successful virtual screening
    Seifert, MHJ
    Mini Reviews in Medicinal Chemistry, 2008, 8(1), 63-72
    Virtual high-throughput screening (vHTS) is a powerful technique for identifying hit molecules as starting points for medicinal chemistry. Numerous successful applications of vHTS have been published using a large variety of methodologies. This review attempts to identify the essential factors for successful virtual screening in the hit identification phase.

  • Towards improving compound selection in structure-based virtual screening
    Drug discovery today, 2008, 13(5/6), 219-226
    Structure-based virtual screening is now an established technology for supporting hit finding and lead optimisation in drug discovery. Recent validation studies have highlighted the poor performance of currently used scoring functions in estimating binding affinity and hence in ranking large datasets of docked ligands. Progress in the analysis of large datasets can be made through the use of appropriate data mining techniques and the derivation of a broader range of descriptors relevant to receptor-ligand binding. In addition, simple scoring functions can be supplemented by simulation-based scoring protocols. Developments in workflow design allow the automation of repetitive tasks, and also encourage the routine use of simulation-based methods and the rapid prototyping of novel modelling and analysis procedures.

  • MedusaScore: an accurate force field-based scoring function for virtual drug screening.
    Yin, Shuangye and Biedermannova, Lada and Vondrasek, Jiri and Dokholyan, Nikolay V
    Journal of chemical information and modeling, 2008, 48(8), 1656-1662
    PMID: 18672869     doi: 10.1021/ci8001167
    Virtual screening is becoming an important tool for drug discovery. However, the application of virtual screening has been limited by the lack of accurate scoring functions. Here, we present a novel scoring function, MedusaScore, for evaluating protein-ligand binding. MedusaScore is based on models of physical interactions that include van der Waals, solvation, and hydrogen bonding energies. To ensure the best transferability of the scoring function, we do not use any protein-ligand experimental data for parameter training. We then test the MedusaScore for docking decoy recognition and binding affinity prediction and find superior performance compared to other widely used scoring functions. Statistical analysis indicates that one source of inaccuracy of MedusaScore may arise from the unaccounted entropic loss upon ligand binding, which suggests avenues of approach for further MedusaScore improvement.


  • Supervised consensus scoring for docking and virtual screening
    Teramoto, Reiji and Fukunishi, Hiroaki
    Journal of chemical information and modeling, 2007, 47(2), 526-534
    doi: 10.1021/ci6004993
    Docking programs are widely used to discover novel ligands efficiently and can predict protein-ligand complex structures with reasonable accuracy and speed. However, there is an emerging demand for better performance from the scoring methods. Consensus scoring (CS) methods improve the performance by compensating for the deficiencies of each scoring function. However, conventional CS and existing scoring functions have the same problems, such as a lack of protein flexibility, inadequate treatment of salvation, and the simplistic nature of the energy function used. Although there are many problems in current scoring functions, we focus our attention on the incorporation of unbound ligand conformations. To address this problem, we propose supervised consensus scoring (SCS), which takes into account protein-ligand binding process using unbound ligand conformations with supervised learning. An evaluation of docking accuracy for 100 diverse protein-ligand complexes shows that SCS outperforms both CS and 11 scoring functions (PLP, F-Score, LigScore, DrugScore, LUDI, X-Score, AutoDock, PMF, G-Score, ChemScore, and D-score). The success rates of SCS range from 89% to 91% in the range of rmsd < 2 A, while those of CS range from 80% to 85%, and those of the scoring functions range from 26% to 76%. Moreover, we also introduce a method for judging whether a compound is active or inactive with the appropriate criterion for virtual screening. SCS performs quite well in docking accuracy and is presumably useful for screening large-scale compound databases before predicting binding affinity.

  • WinDock: structure-based drug discovery on Windows-based PCs.
    Hu, Zengjian and Southerland, William
    Journal of computational chemistry, 2007, 28(14), 2347-2351
    PMID: 17476686     doi: 10.1002/jcc.20756
    In recent years, virtual database screening using high-throughput docking (HTD) has emerged as a very important tool and a well-established method for finding new lead compounds in the drug discovery process. With the advent of powerful personal computers (PCs), it is now plausible to perform HTD investigations on these inexpensive PCs. To make HTD more accessible to a broad community, we present here WinDock, an integrated application designed to help researchers perform structure-based drug discovery tasks under a uniform, user friendly graphical interface for Windows-based PCs. WinDock combines existing small molecule searchable three-dimensional (3D) libraries, homology modeling tools, and ligand-protein docking programs in a semi-automatic, interactive manner, which guides the user through the use of each integrated software component. WinDock is coded in C++.

  • Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches.
    Eckert, Hanna and Bajorath, Jürgen
    Drug discovery today, 2007, 12(5-6), 225-233
    PMID: 17331887     doi: 10.1016/j.drudis.2007.01.011
    The success of ligand-based virtual-screening calculations is influenced highly by the nature of target-specific structure-activity relationships. This might pose severe constraints on the ability to recognize diverse structures with similar activity. Accordingly, the performance of similarity-based methods strongly depends on the class of compound that is studied, and approaches of different design and complexity often produce, overall, equally good (or bad) results. However, it is also found that there is often little overlap in the similarity relationships detected by different approaches, which rationalizes the need to develop alternative similarity methods. Among others, these include novel algorithms to navigate high-dimensional chemical spaces, train similarity calculations on specific compound classes, and detect remote similarity relationships.

  • MED-SuMoLig: a new ligand-based screening tool for efficient scaffold hopping.
    Sperandio, Olivier and Andrieu, Olivier and Miteva, Maria A. and Vo, Minh-Quang and Souaille, Marc and Delfaud, Francois and Villoutreix, Bruno O.
    Journal of chemical information and modeling, 2007, 47(3), 1097-1110
    PMID: 17477521     doi: 10.1021/ci700031v
    The identification of small molecules with selective bioactivity, whether intended as potential therapeutics or as tools for experimental research, is central to progress in medicine and in the life sciences. To facilitate such study, we have developed a ligand-based program well-suited for effective screening of large compound collections. This package, MED-SuMoLig, combines a SMARTS-driven substructure search aiming at 3D pharmacophore profiling and computation of the local atomic density of the compared molecules. The screening utility was then investigated using 52 diverse active molecules (against CDK2, Factor Xa, HIV-1 protease, neuraminidase, ribonuclease A, and thymidine kinase) merged to a library of about 40,000 putative inactive (druglike) compounds. In all cases, the program recovered more than half of the actives in the top 3% of the screened library. We also compared the performance of MED-SuMoLig with that of ChemMine or of ROCS and found that MED-SuMoLig outperformed both methods for CDK2 and Factor Xa in terms of enrichment rates or performed equally well for the other targets.

  • QUASI: a novel method for simultaneous superposition of multiple flexible ligands and virtual screening using partial similarity.
    Todorov, Nikolay P and Alberts, Ian L and de Esch, Iwan J P and Dean, Philip M
    Journal of chemical information and modeling, 2007, 47(3), 1007-1020
    PMID: 17497844     doi: 10.1021/ci6003338
    The structure of many receptors is unknown, and only information about diverse ligands binding to them is available. A new method is presented for the superposition of such ligands, derivation of putative receptor site models and utilization of the models for screening of compound databases. In order to generate a receptor model, the similarity of all ligands is optimized simultaneously taking into account conformational flexibility and also the possibility that the ligands can bind to different regions of the site and only partially overlap. Ligand similarity is defined with respect to a receptor site model serving as a common reference frame. The receptor model is dynamic and coevolves with the ligand alignment until an optimal self-consistent superposition is achieved. When ligand conformational flexibility is permitted, different superposition models are possible and consistent with the data. Clustering of the superposition solutions is used to obtain diverse models. When the models are used to screen a database of compounds, high enrichments are obtained, comparable to those obtained in docking studies.

  • Processing of small molecule databases for automated docking.
    Cummings, Maxwell D and Gibbs, Alan C and Desjarlais, Renee L
    Medicinal chemistry (Sh{\, 2007, 3(1), 107-113
    PMID: 17266630    
    Virtual screening involves the mining of small molecule databases from various sources. The small molecule databases used in virtual screening are typically processed, from simple 2D representations, to maximize their information content and to optimize them for input to the particular virtual screening technology being used. Processing interprets or adds molecular information related to connectivity, stereochemistry, protonation, tautomers and conformation. For virtual screening with an automated docking protocol, a technique that relies on specific intermolecular atom-atom contacts for ranking molecules, it is expected that the pre-processing protocol can affect the results of the docking experiment. The possible effects of processing on docking results have not been extensively studied, and this topic has only recently emerged as a significant aspect of the docking-based virtual screening process. One recent report highlights significant effects of different processing procedures on docking enrichment, while another outlines a general ligand preparation strategy. Here we survey and comment on recent practice in the field.

  • Shapes of things: computer modeling of molecular shape in drug discovery.
    Putta, Santosh and Beroza, Paul
    Current topics in medicinal chemistry, 2007, 7(15), 1514-1524
    PMID: 17897038    
    We review recent advances in computer modeling of molecular shape in drug discovery. We summarize the ways of representing shape computationally, discuss the various means of aligning molecules and shapes, consider the various ways of scoring similarity of shapes, and describe the ways in which these shapes can be used to construct molecular descriptors. Finally, we evaluate the success of these methods to date, suggest when they are best applied, and provide our recommendations for the direction of future work.

  • Ligand docking and structure-based virtual screening in drug discovery.
    Cavasotto, Claudio N and Orry, Andrew J W
    Current topics in medicinal chemistry, 2007, 7(10), 1006-1014
    PMID: 17508934    
    Ligand-docking-based methods are starting to play a critical role in lead discovery and optimization, thus resulting in new 'drug-candidates'. They offer the possibility to go beyond the pool of existing active compounds, and thus find novel chemotypes. A brief tutorial on ligand docking and structure-based virtual screening is presented highlighting current problems and limitations, together with the most recent methodological and algorithmic developments in the field. Recent successful applications of docking-based tools for hit discovery, lead optimization and target-biased library design are also presented. Special consideration is devoted to ongoing efforts to account for protein flexibility in structure-based virtual screening.

  • Evaluation of docking programs for predicting binding of Golgi alpha-mannosidase II inhibitors: a comparison with crystallography.
    Englebienne, Pablo and Fiaux, Hélène and Kuntz, Douglas A and Corbeil, Christopher R and Gerber-Lemaire, Sandrine and Rose, David R and Moitessier, Nicolas
    Proteins, 2007, 69(1), 160-176
    PMID: 17557336     doi: 10.1002/prot.21479
    Golgi alpha-mannosidase II (GMII), a zinc-dependent glycosyl hydrolase, is a promising target for drug development in anti-tumor therapies. Using X-ray crystallography, we have determined the structure of Drosophila melanogaster GMII (dGMII) complexed with three different inhibitors exhibiting IC50's ranging from 80 to 1000 microM. These structures, along with those of seven other available dGMII/inhibitor complexes, were then used as a basis for the evaluation of seven docking programs (GOLD, Glide, FlexX, AutoDock, eHiTS, LigandFit, and FITTED). We found that small inhibitors could be accurately docked by most of the software, while docking of larger compounds (i.e., those with extended aromatic cycles or long aliphatic chains) was more problematic. Overall, Glide provided the best docking results, with the most accurately predicted binding around the active site zinc atom. Further evaluation of Glide's performance revealed its ability to extract active compounds from a benchmark library of decoys.

  • Comparative performance of several flexible docking programs and scoring functions: enrichment studies for a diverse set of pharmaceutically relevant targets.
    Zhou, Zhiyong and Felts, Anthony K and Friesner, Richard A and Levy, Ronald M
    Journal of chemical information and modeling, 2007, 47(4), 1599-1608
    PMID: 17585856     doi: 10.1021/ci7000346
    Virtual screening by molecular docking has become a widely used approach to lead discovery in the pharmaceutical industry when a high-resolution structure of the biological target of interest is available. The performance of three widely used docking programs (Glide, GOLD, and DOCK) for virtual database screening is studied when they are applied to the same protein target and ligand set. Comparisons of the docking programs and scoring functions using a large and diverse data set of pharmaceutically interesting targets and active compounds are carried out. We focus on the problem of docking and scoring flexible compounds which are sterically capable of docking into a rigid conformation of the receptor. The Glide XP methodology is shown to consistently yield enrichments superior to the two alternative methods, while GOLD outperforms DOCK on average. The study also shows that docking into multiple receptor structures can decrease the docking error in screening a diverse set of active compounds.

  • Comments on the article "On evaluating molecular-docking methods for pose prediction and enrichment factors".
    Perola, Emanuele and Walters, W Patrick and Charifson, Paul
    Journal of chemical information and modeling, 2007, 47(2), 251-253
    PMID: 17260981     doi: 10.1021/ci600460h
    The recent article "On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors" (Chen H. et al. J. Chem. Inf. Model. 2006, 46, 401-415) contains a series of comments on a similar study we published in Proteins in 2004 (Perola et al. Proteins 2004, 56, 235-249). We believe that some of these comments are misleading, and we feel that an adequate response is in order.

  • Structure-based virtual ligand screening with LigandFit: Pose prediction and enrichment of compound collections
    Montes, Matthieu and Miteva, Maria A. and Villoutreix, Bruno O.
    Proteins, 2007, 68(3), 712-725
    PMID: 17510958     doi: 10.1002/prot.21405
    Virtual ligand screening methods based on the structure of the receptor are extensively used to facilitate the discovery of lead compounds. In the present study, we investigated the LigandFit package on four different proteins (coagulation factor VIIa, estrogen receptor, thymidine kinase, and neuraminidase), a relatively large compound collection of 65,560 unique "drug-like" molecules and four focused libraries (1950 molecules each). We performed virtual screening experiments with the large database and evaluated six scoring functions available in the package (DockScore, LigScorel, Lig-Score2, PLP1, PLP2, and PMF). The results showed that LigandFit is an efficient program, especially when used with LigScorel. Similar computations were carried out using focused libraries. In this situation the LigScorel scoring function outperformed the other ones on three out of the four proteins tested. Even for the difficult neuraminidase case, the LigandFit/LigScore1 combination was still reasonably successful. Assessment Of docking accuracy was also performed and again, we found that LigandFit (with DockScore and the CFF parameters) was performing well. On the basis of these results and observed increased enrichments after LigandFit/Ligscorel screening on focused libraries, we suggest that using this program as a final step of a hierarchical protocol can be very beneficial to assist lead finding.

  • Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem.
    Truchon, Jean-Francois and Bayly, Christopher I
    Journal of chemical information and modeling, 2007, 47(2), 488-508
    PMID: 17288412     doi: 10.1021/ci600426e
    Many metrics are currently used to evaluate the performance of ranking methods in virtual screening (VS), for instance, the area under the receiver operating characteristic curve (ROC), the area under the accumulation curve (AUAC), the average rank of actives, the enrichment factor (EF), and the robust initial enhancement (RIE) proposed by Sheridan et al. In this work, we show that the ROC, the AUAC, and the average rank metrics have the same inappropriate behaviors that make them poor metrics for comparing VS methods whose purpose is to rank actives early in an ordered list (the "early recognition problem"). In doing so, we derive mathematical formulas that relate those metrics together. Moreover, we show that the EF metric is not sensitive to ranking performance before and after the cutoff. Instead, we formally generalize the ROC metric to the early recognition problem which leads us to propose a novel metric called the Boltzmann-enhanced discrimination of receiver operating characteristic that turns out to contain the discrimination power of the RIE metric but incorporates the statistical significance from ROC and its well-behaved boundaries. Finally, two major sources of errors, namely, the statistical error and the "saturation effects", are examined. This leads to practical recommendations for the number of actives, the number of inactives, and the "early recognition" importance parameter that one should use when comparing ranking methods. Although this work is applied specifically to VS, it is general and can be used to analyze any method that needs to segregate actives toward the front of a rank-ordered list.

  • Comparison of topological, shape, and docking methods in virtual screening.
    McGaughey, Georgia B and Sheridan, Robert P and Bayly, Christopher I and Culberson, J Chris and Kreatsoulas, Constantine and Lindsley, Stacey and Maiorov, Vladimir and Truchon, Jean-Francois and Cornell, Wendy D
    Journal of chemical information and modeling, 2007, 47(4), 1504-1519
    PMID: 17591764     doi: 10.1021/ci700052x
    Virtual screening benchmarking studies were carried out on 11 targets to evaluate the performance of three commonly used approaches: 2D ligand similarity (Daylight, TOPOSIM), 3D ligand similarity (SQW, ROCS), and protein structure-based docking (FLOG, FRED, Glide). Active and decoy compound sets were assembled from both the MDDR and the Merck compound databases. Averaged over multiple targets, ligand-based methods outperformed docking algorithms. This was true for 3D ligand-based methods only when chemical typing was included. Using mean enrichment factor as a performance metric, Glide appears to be the best docking method among the three with FRED a close second. Results for all virtual screening methods are database dependent and can vary greatly for particular targets.

  • Supervised scoring models with docked ligand conformations for structure-based virtual screening.
    Teramoto, Reiji and Fukunishi, Hiroaki
    Journal of chemical information and modeling, 2007, 47(5), 1858-1867
    PMID: 17685604     doi: 10.1021/ci700116z
    Protein-ligand docking programs have been used to efficiently discover novel ligands for target proteins from large-scale compound databases. However, better scoring methods are needed. Generally, scoring functions are optimized by means of various techniques that affect their fitness for reproducing X-ray structures and protein-ligand binding affinities. However, these scoring functions do not always work well for all target proteins. A scoring function should be optimized for a target protein to enhance enrichment for structure-based virtual screening. To address this problem, we propose the supervised scoring model (SSM), which takes into account the protein-ligand binding process using docked ligand conformations with supervised learning for optimizing scoring functions against a target protein. SSM employs a rough linear correlation between binding free energy and the root mean square deviation of a native ligand for predicting binding energy. We applied SSM to the FlexX scoring function, that is, F-Score, with five different target proteins: thymidine kinase (TK), estrogen receptor (ER), acetylcholine esterase (AChE), phosphodiesterase 5 (PDE5), and peroxisome proliferator-activated receptor gamma (PPARgamma). For these five proteins, SSM always enhanced enrichment better than F-Score, exhibiting superior performance that was particularly remarkable for TK, AChE, and PPARgamma. We also demonstrated that SSM is especially good at enhancing enrichments of the top ranks of screened compounds, which is useful in practical drug screening.

  • Molecular field technology applied to virtual screening and finding the bioactive conformation
    Cheeseright, Tim and PhD, Mark Mackey and PhD, Sally Rose and PhD, Andy Vinter, 2007, 2(1), 131-144
    Virtual screening is being applied to reduce the high-throughput screening bottleneck in many pharmaceutical companies and to reduce compound wastage. Cresset's ligand-based virtual screening technology using molecular fields can facilitate rapid identification of novel chemotypes from biologically testing only 200 - 1000 compounds. Four molecular fields calculated using the interaction of different probe atoms with the ligand are sufficient to describe how a ligand binds to its protein. Compounds with similar fields to known active ligands are predicted to have a high probability of showing similar activity. As binding is related to field similarity, this property has been exploited further to predict the bioactive conformation of small sets of structurally diverse active ligands starting from the two-dimensional structures alone without knowledge of the target site structure.

  • Virtual screening in drug discovery - a computational perspective.
    A Srinivas Reddy and S Priyadarshini Pati and P Praveen Kumar andH.N. Pradeep} and G Narahari Sastry
    Current Protein & Peptide Science, 2007, 8(4), 329-351
    PMID: 17696867     doi: 10.2174/138920307781369427
    Virtual screening emerged as an important tool in our quest to access novel drug like compounds. There are a wide range of comparable and contrasting methodological protocols available in screening databases for the lead compounds. The number of methods and software packages which employ the target and ligand based virtual screening are increasing at a rapid pace. However, the general understanding on the applicability and limitations of these methodologies is not emerging as fast as the developments of various methods. Therefore, it is extremely important to compare and contrast various protocols with practical examples to gauge the strength and applicability of various methods. The review provides a comprehensive appraisal on several of the available virtual screening methods to-date. Recent developments of the docking and similarity based methods have been discussed besides the descriptor selection and pharmacophore based searching. The review touches upon the application of statistical, graph theory based methods machine learning tools in virtual screening and combinatorial library design. Finally, several case studies are undertaken where the virtual screening technology has been applied successfully. A critical analysis of these case studies provides a good platform to estimate the applicability of various virtual screening methods in the new lead identification and optimization.

  • Virtual screening strategies in drug discovery
    McInnes, C
    Current opinion in chemical biology, 2007, 11, 494-502
    The identification of novel therapeutic targets and characterization of their 3D structures is increasing at a dramatic rate. Computational screening methods continue to be developed and improved as credible and complementary alternatives to high-throughput biochemical compound screening (HTS). While the majority of drug candidates currently being developed have been found using HTS methods, high-throughput docking and pharmacophore- based searching algorithms are gaining acceptance and becoming a major source of lead molecules in drug discovery. Refinements and optimization of high-throughput docking methods have lead to improvements in reproducing experimental data and in hit rates obtained, validating their use in hit identification. In parallel with virtual screening methods, concomitant developments in cheminformatics including identification, design and manipulation of drug-like small molecule libraries have been achieved. Herein, currently used in silico screening techniques and their utility on a comparative and target dependent basis is discussed.

  • Comparison of Shape-Matching and Docking as Virtual Screening Tools
    Hawkins, Paul C D and Skillman, A Geoffrey and Nicholls, Anthony
    Journal of medicinal chemistry, 2007, 50(1), 74-82
    doi: 10.1021/jm0603365


  • Virtual ligand screening: strategies, perspectives and limitations
    Klebe, Gerhard
    Drug discovery today, 2006, 11(13-14), 580-594
    doi: 10.1016/j.drudis.2006.05.012
    ... The expression ' virtual screening ' (VS) was coined in the late 1990s; however, the techniques involved are ... In an effort to show that searching for lead candidates using a computer is a ... their binding to a macromolecular target using computer programs (in drug discovery , the term ...

  • Virtual ligand screening: strategies, perspectives and limitations
    Klebe, Gerhard
    Drug discovery today, 2006, 11(13-14), 580-594
    doi: 10.1016/j.drudis.2006.05.012
    ... The expression ' virtual screening ' (VS) was coined in the late 1990s; however, the techniques involved are ... In an effort to show that searching for lead candidates using a computer is a ... their binding to a macromolecular target using computer programs (in drug discovery , the term ...

  • Benchmarking Sets for Molecular Docking
    Huang, Niu and Shoichet, Brian K and Irwin, John J
    Journal of medicinal chemistry, 2006, 49(23), 6789-6801
    doi: 10.1021/jm0608356
    Ligand enrichment among top-ranking hits is a key metric of molecular docking. To avoid bias, decoys should resemble ligands physically, so that enrichment is not simply a separation of gross features, yet be chemically distinct from them, so that they are unlikely ...

  • Benchmarking Sets for Molecular Docking
    Huang, Niu and Shoichet, Brian K and Irwin, John J
    Journal of medicinal chemistry, 2006, 49(23), 6789-6801
    doi: 10.1021/jm0608356
    Ligand enrichment among top-ranking hits is a key metric of molecular docking. To avoid bias, decoys should resemble ligands physically, so that enrichment is not simply a separation of gross features, yet be chemically distinct from them, so that they are unlikely ...

  • Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring.
    Zhang, Qiang and Muegge, Ingo
    Journal of medicinal chemistry, 2006, 49(5), 1536-1548
    PMID: 16509572     doi: 10.1021/jm050468i
    The ability to find novel bioactive scaffolds in compound similarity-based virtual screening experiments has been studied comparing Tanimoto-based, ranking-based, voting, and consensus scoring protocols. Ligand sets for seven well-known drug targets (CDK2, COX2, estrogen receptor, neuraminidase, HIV-1 protease, p38 MAP kinase, thrombin) have been assembled such that each ligand represents its own unique chemotype, thus ensuring that each similarity recognition event between ligands constitutes a scaffold hopping event. In a series of virtual screening studies involving 9969 MDDR compounds as negative controls it has been found that atom pair descriptors and 3D pharmacophore fingerprints combined with ranking, voting, and consensus scoring strategies perform well in finding novel bioactive scaffolds. In addition, often superior performance has been observed for similarity-based virtual screening compared to structure-based methods. This finding suggests that information about a target obtained from known bioactive ligands is as valuable as knowledge of the target structures for identifying novel bioactive scaffolds through virtual screening.

  • Scaffold-hopping potential of ligand-based similarity concepts.
    Renner, Steffen and Schneider, Gisbert
    Chemmedchem, 2006, 1(2), 181-185
    PMID: 16892349     doi: 10.1002/cmdc.200500005

  • Scoring functions for protein-ligand docking.
    Jain, Ajay N
    Current Protein & Peptide Science, 2006, 7(5), 407-420
    PMID: 17073693    
    Virtual screening by molecular docking has become established as a method for drug lead discovery and optimization. All docking algorithms make use of a scoring function in combination with a method of search. Two theoretical aspects of scoring function performance dominate operational performance. The first is the degree to which a scoring function has a global extremum within the ligand pose landscape at the proper location. The second is the degree to which the magnitude of the function at the extremum is accurate. Presuming adequate search strategies, a scoring function's location performance will dominate behavior with respect to docking accuracy: the degree to which a predicted pose of a ligand matches experimental observation. A scoring function's magnitude performance will dominate behavior with respect to screening utility: enrichment of true ligands over non-ligands. Magnitude estimation also controls pure scoring accuracy: the degree to which bona fide ligands of a particular protein may be correctly ranked. Approaches to the development of scoring functions have varied widely, with a number of functions yielding similarly high levels of performance relating to the location issue. However, even among functions performing equally well on location, widely varying performance is observed on the question of magnitude. In many cases, performance is good enough to yield high enrichments of true ligands versus non-ligands in screening across a wide variety of protein types. Generally, performance is not good enough to correctly rank among true ligands. Strategies for improvement are discussed.

  • GFscore: a general nonlinear consensus scoring function for high-throughput docking.
    Betzi, Stéphane and Suhre, Karsten and Chétrit, Bernard and Guerlesquin, Françoise and Morelli, Xavier
    Journal of chemical information and modeling, 2006, 46(4), 1704-1712
    PMID: 16859302     doi: 10.1021/ci0600758
    Most of the recent published works in the field of docking and scoring protein/ligand complexes have focused on ranking true positives resulting from a Virtual Library Screening (VLS) through the use of a specified or consensus linear scoring function. In this work, we present a methodology to speed up the High Throughput Screening (HTS) process, by allowing focused screens or for hitlist triaging when a prohibitively large number of hits is identified in the primary screen, where we have extended the principle of consensus scoring in a nonlinear neural network manner. This led us to introduce a nonlinear Generalist scoring Function, GFscore, which was trained to discriminate true positives from false positives in a data set of diverse chemical compounds. This original Generalist scoring Function is a combination of the five scoring functions found in the CScore package from Tripos Inc. GFscore eliminates up to 75% of molecules, with a confidence rate of 90%. The final result is a Hit Enrichment in the list of molecules to investigate during a research campaign for biological active compounds where the remaining 25% of molecules would be sent to in vitro screening experiments. GFscore is therefore a powerful tool for the biologist, saving both time and money.

  • Screening drug-like compounds by docking to homology models: a systematic study.
    Kairys, Visvaldas and Fernandes, Miguel X and Gilson, Michael K
    Journal of chemical information and modeling, 2006, 46(1), 365-379
    PMID: 16426071     doi: 10.1021/ci050238c
    In the absence of an experimentally solved structure, a homology model of a protein target can be used instead for virtual screening of drug candidates by docking and scoring. This approach poses a number of questions regarding the choice of the template to use in constructing the model, the accuracy of the screening results, and the importance of allowing for protein flexibility. The present study addresses such questions with compound screening calculations for multiple homology models of five drug targets. A central result is that docking to homology models frequently yields enrichments of known ligands as good as that obtained by docking to a crystal structure of the actual target protein. Interestingly, however, standard measures of the similarity of the template used to build the homology model to the targeted protein show little correlation with the effectiveness of the screening calculations, and docking to the template itself often is as successful as docking to the corresponding homology model. Treating key side chains as mobile produces a modest improvement in the results. The reasons for these sometimes unexpected results, and their implications for future methodologic development, are discussed.

  • Molecular descriptors and methods for ligand based virtual high throughput screening in drug discovery.
    Pozzan, Alfonso
    Current pharmaceutical design, 2006, 12(17), 2099-2110
    PMID: 16796558    
    The aim of virtual high throughput screening is the identification of biologically relevant molecules amongst either tangible or virtual (large) collections of compounds. Amongst the various virtual screening approaches, those that are ligand based are becoming very popular due to the possibility to screen millions of molecules in a timely way. Descriptors and methods are briefly introduced and reviewed with more emphasis for those approaches that are based on fingerprint descriptors and that seems to be more utilized during the drug discovery process.

  • Novel 2D fingerprints for ligand-based virtual screening.
    Ewing, Todd and Baber, J Christian and Feher, Miklos
    Journal of chemical information and modeling, 2006, 46(6), 2423-2431
    PMID: 17125184     doi: 10.1021/ci060155b
    This paper describes the development of a set of new 2D fingerprints for the purposes of virtual screening in a pharmaceutical environment. The new fingerprints are based on established ones: the changes in their design included the introduction of overlapping pharmacophore feature types, feature counts for pharmacophore and structural fingerprints, as well as changes in the resolution in property description for property fingerprints. The effects of each of these changes on virtual screening performance were monitored using two types of training sets, emulating different stages in the drug discovery process. The results demonstrate that these changes all lead to an improvement in virtual screening performance.

  • TarFisDock: a web server for identifying drug targets with docking approach
    Li, Honglin and Gao, Zhenting and Kang, Ling and Zhang, Hailei and Yang, Kun and Yu, Kunqian and Luo, Xiaomin and Zhu, Weiliang and Chen, Kaixian and Shen, Jianhua and Wang, Xicheng and Jiang, Hualiang
    Nucleic acids research, 2006, 34(Web Server issue), W219-W224
    PMID: 16844997     doi: 10.1093/nar/gkl114
    TarFisDock is a web-based tool for automating the procedure of searching for small molecule-protein interactions over a large repertoire of protein structures. It offers PDTD (potential drug target database), a target database containing 698 protein structures covering 15 therapeutic areas and a reverse ligand protein docking program. In contrast to conventional ligand-protein docking, reverse ligand-protein docking aims to seek potential protein targets by screening an appropriate protein database. The input file of this web server is the small molecule to be tested, in standard mol2 format; TarFisDock then searches for possible binding proteins for the given small molecule by use of a docking approach. The ligand-protein interaction energy terms of the program DOCK are adopted for ranking the proteins. To test the reliability of the TarFisDock server, we searched the PDTD for putative binding proteins for vitamin E and 4H-tamoxifen. The top 2 and 10% candidates of vitamin E binding proteins identified by TarFisDock respectively cover 30 and 50% of reported targets verified or implicated by experiments; and 30 and 50% of experimentally confirmed targets for 4H-tamoxifen appear amongst the top 2 and 5% of the TarFisDock predicted candidates, respectively. Therefore, TarFisDock may be a useful tool for target identification, mechanism study of old drugs and probes discovered from natural products. TarFisDock and PDTD are available at

  • Multiple target screening method for robust and accurate in silico ligand screening.
    Fukunishi, Yoshifumi and Mikami, Yoshiaki and Kubota, Satoru and Nakamura, Haruki
    Journal of molecular graphics & modelling, 2006, 25(1), 61-70
    PMID: 16376595     doi: 10.1016/j.jmgm.2005.11.006
    We developed a new in silico multiple target screening (MTS) method, based on a multi-receptor versus multi-ligand docking affinity matrixes, and examined its robustness against changes in the scoring system. According to this method, compounds in a database are docked to multiple proteins. The compounds among these proteins that are likely bind to the target protein are selected as the members of the candidate-hit compound group. Then, the compounds in the group are sorted into descending order using the docking score: the first (n-th) compound is expected to be the most (n-th) probable hit compound. This method was applied to the analysis of a set of 142 receptors and 142 compounds using a receptor-ligand docking program, Sievgene [Y. Fukunishi, Y. Mikami, H. Nakamura, Similarities among receptor pockets and among compounds: analysis and application to in silico ligand screening, J. Mol. Graphics Modelling, 24 (2005) 34-45], and the results demonstrated that this method achieves a high hit ratio compared to uniform sampling. We prepared two new scores: the DeltaG score, designed to reproduce the protein-ligand binding free energy, and the hit-optimized score, designed to maximize the hit ratio of in silico screening. Using the Sievgene docking score, DeltaG score and hit-optimized score, the MTS method is more robust than the multiple active-site correction scoring method [G.P.A. Vigers, J.P. Rizzi, Multiple active site corrections for docking and virtual screening, J. Med. Chem., 47 (2004) 80-89].

  • Critical assessment of the automated AutoDock as a new docking tool for virtual screening.
    Park, Hwangseo and Lee, Jinuk and Lee, Sangyoub
    Proteins, 2006, 65(3), 549-554
    PMID: 16988956     doi: 10.1002/prot.21183
    A major problem in virtual screening concerns the accuracy of the binding free energy between a target protein and a putative ligand. Here we report an example supporting the outperformance of the AutoDock scoring function in virtual screening in comparison to the other popular docking programs. The original AutoDock program is in itself inefficient to be used in virtual screening because the grids of interaction energy have to be calculated for each putative ligand in chemical database. However, the automation of the AutoDock program with the potential grids defined in common for all putative ligands leads to more than twofold increase in the speed of virtual database screening. The utility of the automated AutoDock in virtual screening is further demonstrated by identifying the actual inhibitors of various target enzymes in chemical databases with accuracy higher than the other docking tools including DOCK and FlexX. These results exemplify the usefulness of the automated AutoDock as a new promising tool in structure-based virtual screening.

  • Similarity Based Virtual Screening: A Tool for Targeted Library Design
    Alvesalo, Joni K O and Siiskonen, Antti and Vainio, Mikko J and Tammela, Päivi S M and Vuorela, Pia M
    Journal of medicinal chemistry, 2006, 49(7), 2353-2356
    doi: 10.1021/jm051209w
    ... to create a comparative model for the C. pneumoniae target protein, since we wanted to test whether high level of similarity in the ... active yet structurally different molecules as we did, unless the ligands act on the same target protein in the cell- based screening assay. ... Docking . ...

  • Similarity Based Virtual Screening: A Tool for Targeted Library Design
    Alvesalo, Joni K O and Siiskonen, Antti and Vainio, Mikko J and Tammela, Päivi S M and Vuorela, Pia M
    Journal of medicinal chemistry, 2006, 49(7), 2353-2356
    doi: 10.1021/jm051209w
    ... to create a comparative model for the C. pneumoniae target protein, since we wanted to test whether high level of similarity in the ... active yet structurally different molecules as we did, unless the ligands act on the same target protein in the cell- based screening assay. ... Docking . ...

  • Similarity-based virtual screening using 2D fingerprints
    Willett, Peter
    Drug discovery today, 2006, 11(23-24), 1046-1053
    PMID: 17129822     doi: 10.1016/j.drudis.2006.10.005
    ... screening system: the popular structure- based approaches, such as docking and de ... Examples of ligand- based approaches include: pharmacophore methods, which involve the identification ... containing known active and known inactive molecules; and the similarity methods that ...

  • Similarity-based virtual screening using 2D fingerprints
    Willett, Peter
    Drug discovery today, 2006, 11(23-24), 1046-1053
    PMID: 17129822     doi: 10.1016/j.drudis.2006.10.005
    ... screening system: the popular structure- based approaches, such as docking and de ... Examples of ligand- based approaches include: pharmacophore methods, which involve the identification ... containing known active and known inactive molecules; and the similarity methods that ...

  • Virtual Screening: Are We There Yet?
    Jalaie, M
    Mini Reviews in Medicinal\ldots}, 2006, 6(10), 1159-1167
    The cost of pharmaceutical development has increased dramatically in recent years, and many assorted approaches have been developed to decrease both the time and costs associated with bringing a drug to the market. Among these methods is the use of in silico screening of compound databases for potential new lead compounds, commonly referred to as virtual screening (VS). Virtual screening has become an integral part of the early discovery process in pharmaceutical development, readily observed by the large number of methodologies that have been published to date. Other reviews have been published detailing the various types of virtual screening methods in use. This work will review some of the virtual screening approaches and strategies that have been attempted to identify compounds to launch medicinal chemistry campaigns. Understanding trends and drivers in VS should help to set expectations about how and when VS could be used and what it can and cannot deliver and how it can be integrated in a successful screening campaign and used in a complementary fashion to HTS.

  • sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank.
    Kellenberger, Esther and Muller, Pascal and Schalon, Claire and Bret, Guillaume and Foata, Nicolas and Rognan, Didier
    Journal of chemical information and modeling, 2006, 46(2), 717-727
    PMID: 16563002     doi: 10.1021/ci050372x
    The sc-PDB is a collection of 6 415 three-dimensional structures of binding sites found in the Protein Data Bank (PDB). Binding sites were extracted from all high-resolution crystal structures in which a complex between a protein cavity and a small-molecular-weight ligand could be identified. Importantly, ligands are considered from a pharmacological and not a structural point of view. Therefore, solvents, detergents, and most metal ions are not stored in the sc-PDB. Ligands are classified into four main categories: nucleotides (< 4-mer), peptides (< 9-mer), cofactors, and organic compounds. The corresponding binding site is formed by all protein residues (including amino acids, cofactors, and important metal ions) with at least one atom within 6.5 angstroms of any ligand atom. The database was carefully annotated by browsing several protein databases (PDB, UniProt, and GO) and storing, for every sc-PDB entry, the following features: protein name, function, source, domain and mutations, ligand name, and structure. The repository of ligands has also been archived by diversity analysis of molecular scaffolds, and several chemoinformatics descriptors were computed to better understand the chemical space covered by stored ligands. The sc-PDB may be used for several purposes: (i) screening a collection of binding sites for predicting the most likely target(s) of any ligand, (ii) analyzing the molecular similarity between different cavities, and (iii) deriving rules that describe the relationship between ligand pharmacophoric points and active-site properties. The database is periodically updated and accessible on the web at


  • Validation and use of the MM-PBSA approach for drug discovery.
    Kuhn, Bernd and Gerber, Paul and Schulz-Gasch, Tanja and Stahl, Martin
    Journal of medicinal chemistry, 2005, 48(12), 4040-4048
    PMID: 15943477     doi: 10.1021/jm049081q
    The MM-PBSA approach has become a popular method for calculating binding affinities of biomolecular complexes. Published application examples focus on small test sets and few proteins and, hence, are of limited relevance in assessing the general validity of this method. To further characterize MM-PBSA, we report on a more extensive study involving a large number of ligands and eight different proteins. Our results show that applying the MM-PBSA energy function to a single, relaxed complex structure is an adequate and sometimes more accurate approach than the standard free energy averaging over molecular dynamics snapshots. The use of MM-PBSA on a single structure is shown to be valuable (a) as a postdocking filter in further enriching virtual screening results, (b) as a helpful tool to prioritize de novo design solutions, and (c) for distinguishing between good and weak binders (DeltapIC(50) > or

  • Comparison of automated docking programs as virtual screening tools.
    Cummings, Maxwell D and Desjarlais, Renee L and Gibbs, Alan C and Mohan, Venkatraman and Jaeger, Edward P
    Journal of medicinal chemistry, 2005, 48(4), 962-976
    PMID: 15715466     doi: 10.1021/jm049798d
    The performance of several commercially available docking programs is compared in the context of virtual screening. Five different protein targets are used, each with several known ligands. The simulated screening deck comprised 1000 molecules from a cleansed version of the MDL drug data report and 49 known ligands. For many of the known ligands, crystal structures of the relevant protein-ligand complexes were available. We attempted to run experiments with each docking method that were as similar as possible. For a given docking method, hit rates were improved versus what would be expected for random selection for most protein targets. However, the ability to prioritize known ligands on the basis of docking poses that resemble known crystal structures is both method- and target-dependent.

  • Evaluation of library ranking efficacy in virtual screening
    Kontoyianni, M and Sokol, GS and McClellan, LM
    Journal of computational chemistry, 2005, 26(1), 11-22
    PMID: 15526325     doi: 10.1002/jcc.20141
    We present the results of a comprehensive study in which we explored how the docking procedure affects the performance of a virtual screening approach. We used four docking engines and applied 10 scoring functions to the top-ranked docking solutions of seeded databases against six target proteins. The scores of the experimental poses were placed within the total set to assess whether the scoring function required an accurate pose to provide the appropriate rank for the seeded compounds. This method allows a direct comparison of library ranking efficacy. Our results indicate that the LigandFit/Ligscore1 and LigandFit/GOLD docking/scoring combinations, and to a lesser degree FlexX/FlexX, Glide/Ligscore1, DOCK/PMF (Tripos implementation), LigandFit1/Ligscore2 and LigandFit/PMF (Tripos implementation) were able to retrieve the highest number of actives at a 10% fraction of the database when all targets were looked upon collectively. We also show that the scoring functions rank the observed binding modes higher than the inaccurate poses provided that the experimental poses are available. This finding stresses the discriminatory ability of the scoring algorithms, when better poses are available, and suggests that the number of false positives can be lowered with conformers closer to bioactive ones. (C) 2004 Wiley Periodicals, Inc.

  • ZINC-a free database of commercially available compounds for virtual screening.
    Irwin, John J and Shoichet, Brian K
    Journal of chemical information and modeling, 2005, 45(1), 177-182
    PMID: 15667143     doi: 10.1021/ci049714+
    A critical barrier to entry into structure-based virtual screening is the lack of a suitable, easy to access database of purchasable compounds. We have therefore prepared a library of 727,842 molecules, each with 3D structure, using catalogs of compounds from vendors (the size of this library continues to grow). The molecules have been assigned biologically relevant protonation states and are annotated with properties such as molecular weight, calculated LogP, and number of rotatable bonds. Each molecule in the library contains vendor and purchasing information and is ready for docking using a number of popular docking programs. Within certain limits, the molecules are prepared in multiple protonation states and multiple tautomeric forms. In one format, multiple conformations are available for the molecules. This database is available for free download ( in several common file formats including SMILES, mol2, 3D SDF, and DOCK flexibase format. A Web-based query tool incorporating a molecular drawing interface enables the database to be searched and browsed and subsets to be created. Users can process their own molecules by uploading them to a server. Our hope is that this database will bring virtual screening libraries to a wide community of structural biologists and medicinal chemists.


  • Docking and scoring in virtual screening for drug discovery: methods and applications.
    Kitchen, Douglas B and Decornez, Hélène and Furr, John R and Bajorath, Jürgen
    Nature reviews. Drug discovery, 2004, 3(11), 935-949
    PMID: 15520816     doi: 10.1038/nrd1549
    Computational approaches that 'dock' small molecules into the structures of macromolecular targets and 'score' their potential complementarity to binding sites are widely used in hit identification and lead optimization. Indeed, there are now a number of drugs whose development was heavily influenced by or based on structure-based design and screening strategies, such as HIV protease inhibitors. Nevertheless, there remain significant challenges in the application of these approaches, in particular in relation to current scoring schemes. Here, we review key concepts and specific features of small-molecule-protein docking methods, highlight selected applications and discuss recent advances that aim to address the acknowledged limitations of established approaches.

  • Virtual screening of chemical libraries
    Shoichet, Brian K
    Nature\ldots}, 2004, 432(7019), 862-865
    PMID: 15602552     doi: 10.1038/nature03197
    Virtual screening uses computer-based methods to discover new ligands on the basis of biological structures. Although widely heralded in the 1970s and 1980s, the technique has since struggled to meet its initial promise, and drug discovery remains dominated by ...

  • Virtual screening of chemical libraries
    Shoichet, Brian K
    Nature\ldots}, 2004, 432(7019), 862-865
    PMID: 15602552     doi: 10.1038/nature03197
    Virtual screening uses computer-based methods to discover new ligands on the basis of biological structures. Although widely heralded in the 1970s and 1980s, the technique has since struggled to meet its initial promise, and drug discovery remains dominated by ...

  • Recovering the true targets of specific ligands by virtual screening of the protein data bank.
    Paul, Nicodéme and Kellenberger, Esther and Bret, Guillaume and Muller, Pascal and Rognan, Didier
    Proteins, 2004, 54(4), 671-680
    PMID: 14997563     doi: 10.1002/prot.10625
    The Protein Data Bank (PDB) has been processed to extract a screening protein library (sc-PDB) of 2148 entries. A knowledge-based detection algorithm has been applied to 18,000 PDB files to find regular expressions corresponding to either protein, ions, co-factors, solvent, or ligand atoms. The sc-PDB database comprises high-resolution X-ray structures of proteins for which (i) a well-defined active site exists, (ii) the bound-ligand is a small molecular weight molecule. The database has been screened by an inverse docking tool derived from the GOLD program to recover the known target of four unrelated ligands. Both the database and the inverse screening procedures are accurate enough to rank the true target of the four investigated ligands among the top 1% scorers, with 70-100 fold enrichment with respect to random screening. Applying the proposed screening procedure to a small-sized generic ligand was much less accurate suggesting that inverse screening shall be reserved to rather selective compounds.

  • OptiDock: virtual HTS of combinatorial libraries by efficient sampling of binding modes in product space.
    Sprous, Dennis G and Lowis, David R and Leonard, Joseph M and Heritage, Trevor and Burkett, Steven N and Baker, David S and Clark, Robert D
    Journal of combinatorial chemistry, 2004, 6(4), 530-539
    PMID: 15244414     doi: 10.1021/cc034068x
    Products from combinatorial libraries generally share a common core structure that can be exploited to improve the efficiency of virtual high-throughput screening (vHTS). In general, it is more efficient to find a method that scales with the total number of reagents (Sigma growth) rather with the number of products (Pi growth). The OptiDock methodology described herein entails selecting a diverse but representative subset of compounds that span the structural space encompassed by the full library. These compounds are docked individually using the FlexX program (Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. J. Mol. Biol. 1995, 251, 470-489) to define distinct docking modes in terms of reference placements for combinatorial core atoms. Thereafter, substituents in R-cores (consisting of the core structure substituted at a single variation site) are docked, keeping the core atoms fixed at the coordinates dictated by each reference placement. Interaction energies are calculated for each docked R-core with respect to the target protein, and energies for whole compounds are calculated by finding the reference core placement for which the sum of corresponding R-core energies is most negative. The use of diverse whole compounds to define binding modes is a key advantage of the protocol over other combinatorial docking programs. As a result, OptiDock returns better-scoring conformers than does serially applied FlexX. OptiDock is also better able to find a viable docked pose for each library member than are other combinatorial approaches.

  • HierVLS hierarchical docking protocol for virtual ligand screening of large-molecule databases
    Floriano, WB and Vaidehi, N and Zamanakos, G and Goddard, WA
    Journal of medicinal chemistry, 2004, 47(1), 56-71
    PMID: 14695820     doi: 10.1021/jm030271v
    To provide practical means for rapidly scanning the extensive experimental combinatorial chemistry libraries now available for high-throughput screening (HTS), it is essential to establish computational virtual ligand screening (VLS) techniques to rapidly identify out of a large library all active compounds against a particular protein target. Toward this goal we developed HierVLS, a fast hierarchical docking approach that starts with a coarse grain conformational search over a large number of configurations filtered with a fast but crude energy function, followed by a succession of finer grain levels, using successively more accurate but more expensive descriptions of the ligand-protein-solvent interactions to filter successively fewer cases. The final step of this procedure optimizes one configuration of the ligand in the protein site using our most accurate energy expression and description of the solvent, which would be impractical for all conformations and sites sampled in the coarse level. HierVLS is based on the HierDock approach, but rather than allowing an hour or more to determine the best binding site and energy for each ligands (as in HierDock), we have adapted our procedure so that it can lead to reliable results while using only 4 min (866 MHz Pentium III processor) per ligand. To validate the accuracy for HierVLS to predict the experimentally observed binding conformation, we considered 37 cocrystal structures comprising 11 target proteins. We find that HierVLS identifies the correct binding mode for all 37 cocrystals. In addition, the calculated binding energies correlate well with available experimental binding constants. To validate how well HierVLS can identify the correct ligand in an extensive library of decoys, we considered a library of over 10 000 molecules. HierVLS identifies 26 out of the 37 cases in the top 2% ranked by binding affinity among the 10 037 molecules. The failures result from either metal-containing sites on the protein or water-mediated ligand-protein interactions, which we anticipate can be solved within the constraints of practical VLS. We then applied HierVLS to screen a 55000-compound virtual library against the target protein-tyrosine phosphatase 1B (ptp1b). The top 250 compounds by binding affinity included all six ptp1b cocrystal ligands added to the library plus three other experimentally confirmed binders. The best (top 1) binder is an experimentally confirmed positive. We conclude that HierVLS is useful for selecting leads for a particular target out of large combinatorial databases.

  • Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures.
    Hert, Jérôme and Willett, Peter and Wilton, David J and Acklin, Pierre and Azzaoui, Kamal and Jacoby, Edgar and Schuffenhauer, Ansgar
    Organic & biomolecular chemistry, 2004, 2(22), 3256-3266
    PMID: 15534703     doi: 10.1039/B409865J
    This paper reports a detailed comparison of a range of different types of 2D fingerprints when used for similarity-based virtual screening with multiple reference structures. Experiments with the MDL Drug Data Report database demonstrate the effectiveness of fingerprints that encode circular substructure descriptors generated using the Morgan algorithm. These fingerprints are notably more effective than fingerprints based on a fragment dictionary, on hashing and on topological pharmacophores. The combination of these fingerprints with data fusion based on similarity scores provides both an effective and an efficient approach to virtual screening in lead-discovery programmes.

  • Fuzzy pharmacophore models from molecular alignments for correlation-vector-based virtual screening.
    Renner, Steffen and Schneider, Gisbert
    Journal of medicinal chemistry, 2004, 47(19), 4653-4664
    PMID: 15341481     doi: 10.1021/jm031139y
    A pharmacophore-based approach for compiling focused screening libraries is presented. It integrates information from three-dimensional molecular alignments into correlation vector-based database screening. The pharmacophore model is represented by a number of spheres of Gaussian-distributed feature densities. Different degrees of "fuzziness" can be introduced to influence the model's resolution. Transformation of this pharmacophore representation into a correlation vector results in a vector of feature probabilities which can be utilized for rapid virtual screening of compound databases or virtual libraries. The approach was validated by retrospective screening for cyclooxygenase 2 (COX-2) and thrombin ligands. A variety of models with different degrees of fuzziness were calculated and tested for both classes of molecules. Best performance was obtained with pharmacophore models reflecting an intermediate degree of fuzziness, yielding an enrichment factor of up to 39 for the first 1% of the ranked database. Appropriately weighted fuzzy pharmacophore models performed better in retrospective screening than similarity searching using only a single query molecule. The new pharmacophore method was shown to complement existing approaches.

  • Evaluation and application of multiple scoring functions for a virtual screening experiment.
    Xing, Li and Hodgkin, Edward and Liu, Qian and Sedlock, David
    Journal of computer-aided molecular design, 2004, 18(5), 333-344
    PMID: 15595460    
    In order to identify novel chemical classes of factor Xa inhibitors, five scoring functions (FlexX, DOCK, GOLD, ChemScore and PMF) were engaged to evaluate the multiple docking poses generated by FlexX. The compound collection was composed of confirmed potent factor Xa inhibitors and a subset of the LeadQuest screening compound library. Except for PMF the other four scoring functions succeeded in reproducing the crystal complex (PDB code: 1FAX). During virtual screening the highest hit rate (80%) was demonstrated by FlexX at an energy cutoff of -40 kJ/mol, which is about 40-fold over random screening (2.06%). Limited results suggest that presenting more poses of a single molecule to the scoring functions could deteriorate their enrichment factors. A series of promising scaffolds with favorable binding scores was retrieved from LeadQuest. Consensus scoring by pair-wise intersection failed to enrich the hit rate yielded by single scorings (i.e. FlexX). We note that reported successes of consensus scoring in hit rate enrichment could be artificial because their comparisons were based on a selected subset of single scoring and a markedly reduced subset of double or triple scoring. The findings presented in this report are based upon a single biological system and support further studies.

  • A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance.
    Perola, Emanuele and Walters, W Patrick and Charifson, Paul S
    Proteins, 2004, 56(2), 235-249
    PMID: 15211508     doi: 10.1002/prot.20088
    A thorough evaluation of some of the most advanced docking and scoring methods currently available is described, and guidelines for the choice of an appropriate protocol for docking and virtual screening are defined. The generation of a large and highly curated test set of pharmaceutically relevant protein-ligand complexes with known binding affinities is described, and three highly regarded docking programs (Glide, GOLD, and ICM) are evaluated on the same set with respect to their ability to reproduce crystallographic binding orientations. Glide correctly identified the crystallographic pose within 2.0 A in 61% of the cases, versus 48% for GOLD and 45% for ICM. In general Glide appears to perform most consistently with respect to diversity of binding sites and ligand flexibility, while the performance of ICM and GOLD is more binding site-dependent and it is significantly poorer when binding is predominantly driven by hydrophobic interactions. The results also show that energy minimization and reranking of the top N poses can be an effective means to overcome some of the limitations of a given docking function. The same docking programs are evaluated in conjunction with three different scoring functions for their ability to discriminate actives from inactives in virtual screening. The evaluation, performed on three different systems (HIV-1 protease, IMPDH, and p38 MAP kinase), confirms that the relative performance of different docking and scoring methods is to some extent binding site-dependent. GlideScore appears to be an effective scoring function for database screening, with consistent performance across several types of binding sites, while ChemScore appears to be most useful in sterically demanding sites since it is more forgiving of repulsive interactions. Energy minimization of docked poses can significantly improve the enrichments in systems with sterically demanding binding sites. Overall Glide appears to be a safe general choice for docking, while the choice of the best scoring tool remains to a larger extent system-dependent and should be evaluated on a case-by-case basis.

  • Multiple active site corrections for docking and virtual screening.
    Vigers, Guy P A and Rizzi, James P
    Journal of medicinal chemistry, 2004, 47(1), 80-89
    PMID: 14695822     doi: 10.1021/jm030161o
    Several docking programs are now available that can reproduce the bound conformation of a ligand in an active site, for a wide variety of experimentally determined complexes. However, these programs generally perform less well at ranking multiple possible ligands in one site. Since accurate identification of potential ligands is a prerequisite for many aspects of structure-based drug design, this is a serious limitation. We have tested the ability of two docking programs, FlexX and Gold, to match ligands and active sites for multiple complexes. We show that none of the docking scores from either program are able to match consistently ligands and active sites in our tests. We propose a simple statistical correction, the multiple active site correction (MASC), which greatly ameliorates this problem. We have also tested the correction method against an extended set of 63 cocrystals and in a virtual screening experiment. In all cases, MASC significantly improves the results of the docking experiments.

  • FlexX-Scan: Fast, structure-based virtual screening
    Schellhammer, I and Rarey, M
    Proteins, 2004, 57(3), 504-517
    PMID: 15382244     doi: 10.1002/prot.20217
    We present a new software module, FlexX-Scan, for high-throughput, structure-based virtual screening. FlexX-Scan was developed with the aim to further speed up the virtual screening process. Based on the incremental construction docking tool FlexX (Rarey et al., J Mol Biol 1996;261: 470-489), a compact descriptor for representing favorable protein interaction spots within the protein binding site has been developed. The descriptor is calculated using special-purpose clustering techniques applied to the usual interaction points created by FlexX. The algorithm automatically detects a small set of interaction spots in the binding site for positioning ligand functional groups. The parametrizations of the base placement and incremental construction algorithms have been adapted to the new interaction model. We tested the software tool on a diverse set of 200 protein-ligand complexes from the protein database (PDB) (Kramer et al., Proteins 1999;37:228-241). On average, the algorithm proposes about 90 interaction spots per binding site compared to about 1000 interaction dots in FlexX. We observe that the docking solutions of FlexX-Scan have a root-mean-square deviation from the crystal structure similar to the deviation of docking solutions of standard FlexX. For further validation we also performed virtual screening experiments for cyclin-dependent kinase 2, thrombin, angiotensin-converting enzyme, and dihydrofolat reductase. In these experiments, we screened a set of 34,000 random compounds and a number of known actives for each target. With FlexX-Scan, we achieved comparable enrichments to standard FlexX, with an averaged computing time of 5-10 s per compound, depending on parametrization. (C) 2004 Wiley-Liss, Inc.

  • Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening
    Halgren, TA and Murphy, RB and Friesner, RA and Beard, HS and Frye, LL and Pollard, WT and Banks, JL
    Journal of medicinal chemistry, 2004, 47(7), 1750-1759
    PMID: 15027866     doi: 10.1021/jm030644s
    Glide's ability to identify active compounds in a database screen is characterized by applying Glide to a diverse set of nine protein receptors. In many cases, two, or even three, protein sites are employed to probe the sensitivity of the results to the site geometry. To make the database screens as realistic as possible, the screens use sets of "druglike" decoy ligands that have been selected to be representative of what we believe is likely to be found in the compound collection of a pharmaceutical or biotechnology company. Results are presented for releases 1.8, 2.0, and 2.5 of Glide. The comparisons show that average measures for both "early" and "global" enrichment for Glide 2.5 are 3 times higher than for Glide 1.8 and more than 2 times higher than for Glide 2.0 because of better results for the least well-handled screens. This improvement in enrichment stems largely from the better balance of the more widely parametrized GlideScore 2.5 function and the inclusion of terms that penalize ligand-protein interactions that violate established principles of physical chemistry, particularly as it concerns the exposure to solvent of charged protein and ligand groups. Comparisons to results for the thymidine kinase and estrogen receptors published by Rognan and co-workers (J. Med. Chem. 2000, 43, 4759-4767) show that Glide 2.5 performs better than GOLD 1.1, FlexX 1.8, or DOCK 4.01.

  • Virtual screening using protein-ligand docking: Avoiding artificial enrichment
    Verdonk, ML and Berdini, V and Hartshorn, MJ and Mooij, WTM and Murray, CW and Taylor, RD and Watson, P
    Journal of Chemical Information and Computer Sciences, 2004, 44(3), 793-806
    PMID: 15154744     doi: 10.1021/ci034289q
    This study addresses a number of topical issues around the use of protein-ligand docking in virtual screening. We show that, for the validation of such methods, it is key to use focused libraries (containing compounds with one-dimensional properties, similar to the actives), rather than "random" or "drug-like" libraries to test the actives against. We also show that, to obtain good enrichments, the docking program needs to produce reliable binding modes. We demonstrate how pharmacophores can be used to guide the dockings and improve enrichments, and we compare the performance of three consensus-ranking protocols against ranking based on individual scoring functions. Finally, we show that protein-ligand docking can be an effective aid in the screening for weak, fragment-like binders, which has rapidly become a popular strategy for hit identification. All results presented are based on carefully constructed virtual screening experiments against four targets, using the protein-ligand docking program GOLD.


  • Hit and lead generation: beyond high-throughput screening.
    Bleicher, Konrad H and Böhm, Hans-Joachim and Müller, Klaus and Alanine, Alexander I
    Nature reviews. Drug discovery, 2003, 2(5), 369-378
    PMID: 12750740     doi: 10.1038/nrd1086
    The identification of small-molecule modulators of protein function, and the process of transforming these into high-content lead series, are key activities in modern drug discovery. The decisions taken during this process have far-reaching consequences for success later in lead optimization and even more crucially in clinical development. Recently, there has been an increased focus on these activities due to escalating downstream costs resulting from high clinical failure rates. In addition, the vast emerging opportunities from efforts in functional genomics and proteomics demands a departure from the linear process of identification, evaluation and refinement activities towards a more integrated parallel process. This calls for flexible, fast and cost-effective strategies to meet the demands of producing high-content lead series with improved prospects for clinical success.

  • Shape signatures: a new approach to computer-aided ligand- and receptor-based drug design.
    Zauhar, Randy J and Moyna, Guillermo and Tian, LiFeng and Li, ZhiJian and Welsh, William J
    Journal of medicinal chemistry, 2003, 46(26), 5674-5690
    PMID: 14667221     doi: 10.1021/jm030242k
    A unifying principle of rational drug design is the use of either shape similarity or complementarity to identify compounds expected to be active against a given target. Shape similarity is the underlying foundation of ligand-based methods, which seek compounds with structure similar to known actives, while shape complementarity is the basis of most receptor-based design, where the goal is to identify compounds complementary in shape to a given receptor. These approaches can be extended to include molecular descriptors in addition to shape, such as lipophilicity or electrostatic potential. Here we introduce a new technique, which we call shape signatures, for describing the shape of ligand molecules and of receptor sites. The method uses a technique akin to ray-tracing to explore the volume enclosed by a ligand molecule, or the volume exterior to the active site of a protein. Probability distributions are derived from the ray-trace, and can be based solely on the geometry of the reflecting ray, or may include joint dependence on properties, such as the molecular electrostatic potential, computed over the surface. Our shape signatures are just these probability distributions, stored as histograms. They converge rapidly with the length of the ray-trace, are independent of molecular orientation, and can be compared quickly using simple metrics. Shape signatures can be used to test for both shape similarity between compounds and for shape complementarity between compounds and receptors and thus can be applied to problems in both ligand- and receptor-based molecular design. We present results for comparisons between small molecules of biological interest and the NCI Database using shape signatures under two different metrics. Our results show that the method can reliably extract compounds of shape (and polarity) similar to the query molecules. We also present initial results for a receptor-based strategy using shape signatures, with application to the design of new inhibitors predicted to be active against HIV protease.

  • LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites
    Venkatachalam, CM and Jiang, X and Oldfield, T and Waldman, M
    Journal of molecular graphics & modelling, 2003, 21(4), 289-307
    PMID: 12479928    
    We present a new shape-based method, LigandFit, for accurately docking ligands into protein active sites. The method employs a cavity detection algorithm for detecting invaginations in the protein as candidate active site regions. A shape comparison filter is combined with a Monte Carlo conformational search for generating ligand poses consistent with the active site shape. Candidate poses are minimized in the context of the active site using a grid-based method for evaluating protein-ligand interaction energies. Errors arising from grid interpolation are dramatically reduced using a new non-linear interpolation scheme. Results are presented for 19 diverse protein-ligand complexes. The method appears quite promising, reproducing the X-ray structure ligand pose within an RMS of 2Angstrom in 14 out of the 19 complexes. A high-throughput screening study applied to the thymidine kinase receptor is also presented in which LigandFit, when combined with LigScore, an internally developed scoring function [1], yields very good hit rates for a ligand pool seeded with known actives. (C) 2002 Published by Elsevier Science Inc.

  • Automated generation of MCSS-derived pharmacophoric DOCK site points for searching multiconformation databases.
    Joseph-McCarthy, Diane and Alvarez, Juan C
    Proteins, 2003, 51(2), 189-202
    PMID: 12660988     doi: 10.1002/prot.10296
    All docking methods employ some sort of heuristic to orient the ligand molecules into the binding site of the target structure. An automated method, MCSS2SPTS, for generating chemically labeled site points for docking is presented. MCSS2SPTS employs the program Multiple Copy Simultaneous Search (MCSS) to determine target-based theoretical pharmacophores. More specifically, chemically labeled site points are automatically extracted from selected low-energy functional-group minima and clustered together. These pharmacophoric site points can then be directly matched to the pharmacophoric features of database molecules with the use of either DOCK or PhDOCK to place the small molecules into the binding site. Several examples of the ability of MCSS2SPTS to reproduce the three-dimensional pharmacophoric features of ligands from known ligand-protein complex structures are discussed. In addition, a site-point set calculated for one human immunodeficiency virus 1 (HIV1) protease structure is used with PhDOCK to dock a set of HIV1 protease ligands; the docked poses are compared to the corresponding complex structures of the ligands. Finally, the use of an MCSS2SPTS-derived site-point set for acyl carrier protein synthase is compared to the use of atomic positions from a bound ligand as site points for a large-scale DOCK search. In general, MCSS2SPTS-generated site points focus the search on the more relevant areas and thereby allow for more effective sampling of the target site.


  • Virtual screening and fast automated docking methods
    Schneider, Gisbert and Böhm, Hans-Joachim
    Drug discovery today, 2002, 7(1), 64-70
    doi: 10.1016/S1359-6446(01)02091-8
    ... molecules which were identified, optimized or designed using virtual screening methods a. Molecular structure, Activity, Method, Refs. Ca 2+ antagonist (T-channel blocker), Pharmacophore similarity searching, [51]. K + channel (kv 1.5) blocker, Fragment based evolutionary de novo ...

  • Virtual screening and fast automated docking methods
    Schneider, Gisbert and Böhm, Hans-Joachim
    Drug discovery today, 2002, 7(1), 64-70
    doi: 10.1016/S1359-6446(01)02091-8
    ... molecules which were identified, optimized or designed using virtual screening methods a. Molecular structure, Activity, Method, Refs. Ca 2+ antagonist (T-channel blocker), Pharmacophore similarity searching, [51]. K + channel (kv 1.5) blocker, Fragment based evolutionary de novo ...

  • A structure-based design approach for the identification of novel inhibitors: application to an alanine racemase.
    Mustata, Gabriela Iurcu and Briggs, James M
    Journal of computer-aided molecular design, 2002, 16(12), 935-953
    PMID: 12825624    
    We report a new structure-based strategy for the identification of novel inhibitors. This approach has been applied to Bacillus stearothermophilus alanine racemase (AlaR), an enzyme implicated in the biosynthesis of the bacterial cell wall. The enzyme catalyzes the racemization of L- and D-alanine using pyridoxal 5'-phosphate (PLP) as a cofactor. The restriction of AlaR to bacteria and some fungi and the absolute requirement for D-alanine in peptidoglycan biosynthesis make alanine racemase a suitable target for drug design. Unfortunately, known inhibitors of alanine racemase are not specific and inhibit the activity of other PLP-dependent enzymes, leading to neurological and other side effects. This article describes the development of a receptor-based pharmacophore model for AllaR, taking into account receptor flexibility (i.e. a 'dynamic' pharmacophore model). In order to accomplish this, molecular dynamics (MD) simulations were performed on the full AlaR dimer from Bacillus stearothermophilus (PDB entry, 1 sft) with a D-alanine molecule in one active site and the non-covalent inhibitor, propionate, in the second active site of this homodimer. The basic strategy followed in this study was to utilize conformations of the protein obtained during MD simulations to generate a dynamic pharmacophore model using the property mapping capability of the LigBuilder program. Compounds from the Available Chemicals Directory that fit the pharmacophore model were identified and have been submitted for experimental testing. The approach described here can be used as a valuable tool for the design of novel inhibitors of other biomolecular targets.

  • Structure-based virtual screening: an overview
    Lyne, Paul D
    Drug discovery today, 2002, 7(20), 1047-1055
    doi: 10.1016/S1359-6446(02)02483-2
    ... Typically for docking , the physical- based scoring functions (eg Dock [29] and QXP [23]) employ force-fields in a minimalistic manner on a grid with no ... Empirical- based scoring functions based on physicochemical properties such as hydrogen - bond counts (eg ...

  • Structure-based virtual screening: an overview
    Lyne, Paul D
    Drug discovery today, 2002, 7(20), 1047-1055
    doi: 10.1016/S1359-6446(02)02483-2
    ... Typically for docking , the physical- based scoring functions (eg Dock [29] and QXP [23]) employ force-fields in a minimalistic manner on a grid with no ... Empirical- based scoring functions based on physicochemical properties such as hydrogen - bond counts (eg ...

  • Do Structurally Similar Molecules Have Similar Biological Activity?
    Martin, Yvonne C and Kofron, James L and Traphagen, Linda M
    Journal of medicinal chemistry, 2002, 45(19), 4350-4358
    PMID: 12213076     doi: 10.1021/jm020155c
    To design diverse combinatorial libraries or to select diverse compounds to augment a screening collection, computational chemists frequently reject compounds that are > or

  • Protein flexibility and drug design: how to hit a moving target
    Carlson, HA
    Current opinion in chemical biology, 2002, 6, 447-452
    The most advanced methods for computer-aided drug design and database mining incorporate protein flexibility. Such techniques are not only needed to obtain proper results; they are also critical for dealing with the growing body of information from structural genomics.


  • Detailed analysis of scoring functions for virtual screening.
    Stahl, M and Rarey, M
    Journal of medicinal chemistry, 2001, 44(7), 1035-1042
    PMID: 11297450    
    We present a comprehensive study of the performance of fast scoring functions for library docking using the program FlexX as the docking engine. Four scoring functions, among them two recently developed knowledge-based potentials, are evaluated on seven target proteins whose binding sites represent a wide range of size, form, and polarity. The results of these calculations give valuable insight into strengths and weaknesses of current scoring functions. Furthermore, it is shown that a well-chosen combination of two of the tested scoring functions leads to a new, robust scoring scheme with superior performance in virtual screening.


  • Virtual screening-an overview
    Walters, WP and Stahl, MT
    Drug discovery today, 1998, 3(4), 160-178
    Recent advances in combinatorial chemistry and high- throughput screening have made it possible for chemists to synthesize large numbers of compounds. However, this is still a small percentage of the total number that could be synthesized. Virtual screening encompasses a variety of computational techniques that allow chemists to reduce a huge virtual library to a more manageable size. This review presents the current state of the art in virtual screening and discusses approaches that will allow the evaluation of larger numbers of compounds