# Bibliography of computer-aided Drug Design

Updated on 7/18/2014. Currently 2130 references

## Screening / Methodology / Ligand-based

2014 / 2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2007 / 2006 / 2005 / 2004 / 2003 / 2002 / 2001 /

## 2014

• The optimization of running time for a maximum common substructure-based algorithm and its application in drug design.
Chen, Jian and Sheng, Jia and Lv, Dijing and Zhong, Yang and Zhang, Guoqing and Nan, Peng
Computational biology and chemistry, 2014, 48, 14-20
PMID: 24291488     doi: 10.1016/j.compbiolchem.2013.10.003

In the field of drug discovery, it is particularly important to discover bioactive compounds through high-throughput virtual screening. The maximum common substructure-based (MCS) algorithm is a promising method for the virtual screening of drug candidates. However, in practical applications, there is always a trade-off between efficiency and accuracy. In this paper, we optimized this method by running time evaluation using essential drugs defined by WHO and FDA-approved small-molecule drugs. The amount of running time allocated to the MCS-based virtual screening was varied, and statistical analysis was conducted to study the impact of computation running time on the screening results. It was determined that the running time efficiency can be improved without compromising accuracy by setting proper running time thresholds. In addition, the similarity of compound structures and its relevance to biological activity are analyzed quantitatively, which highlight the applicability of the MCS-based methods in predicting functions of small molecules. 15-30s was established as a reasonable range for selecting a candidate running time threshold. The effect of CPU speed is considered and the conclusion is generalized. The potential biological activity of small molecules with unknown functions can be predicted by the MCS-based methods.

• DiSCuS: an open platform for (not only) virtual screening results management.
Wójcikowski, Maciej and Zielenkiewicz, Piotr and Siedlecki, Pawel
Journal of chemical information and modeling, 2014, 54(1), 347-354
PMID: 24364790     doi: 10.1021/ci400587f

DiSCuS, a "Database System for Compound Selection", has been developed. The primary goal of DiSCuS is to aid researchers in the steps subsequent to generating high-throughput virtual screening (HTVS) results, such as selection of compounds for further study, purchase, or synthesis. To do so, DiSCuS provides (1) a storage facility for ligand-receptor complexes (generated with external programs), (2) a number of tools for validating these complexes, such as scoring functions, potential energy contributions, and med-chem features with ligand similarity estimates, and (3) powerful searching and filtering options with logical operators. DiSCuS supports multiple receptor targets for a single ligand, so it can be used either to evaluate different variants of an active site or for selectivity studies. DiSCuS documentation, installation instructions, and source code can be found at http://discus.ibb.waw.pl .

• SABRE: ligand/structure-based virtual screening approach using consensus molecular-shape pattern recognition.
Journal of chemical information and modeling, 2014, 54(1), 338-346
PMID: 24328054     doi: 10.1021/ci4005496

We present an efficient and rational ligand/structure shape-based virtual screening approach combining our previous ligand shape-based similarity SABRE (shape-approach-based routines enhanced) and the 3D shape of the receptor binding site. Our approach exploits the pharmacological preferences of a number of known active ligands to take advantage of the structural diversities and chemical similarities, using a linear combination of weighted molecular shape density. Furthermore, the algorithm generates a consensus molecular-shape pattern recognition that is used to filter and place the candidate structure into the binding pocket. The descriptor pool used to construct the consensus molecular-shape pattern consists of four dimensional (4D) fingerprints generated from the distribution of conformer states available to a molecule and the 3D shapes of a set of active ligands computed using SABRE software. The virtual screening efficiency of SABRE was validated using the Database of Useful Decoys (DUD) and the filtered version (WOMBAT) of 10 DUD targets. The ligand/structure shape-based similarity SABRE algorithm outperforms several other widely used virtual screening methods which uses the data fusion of multiscreening tools (2D and 3D fingerprints) and demonstrates a superior early retrieval rate of active compounds (EF(0.1%)

## 2013

• SMIfp (SMILES fingerprint) Chemical Space for Virtual Screening and Visualization of Large Databases of Organic Molecules
Schwartz, Julian and Awale, Mahendra and Reymond, Jean-Louis
Journal of chemical information and modeling, 2013, 53(8), 1979-1989
PMID: 23845040     doi: 10.1021/ci400206h

SMIfp (SMILES fingerprint) is defined here as a scalar fingerprint describing organic molecules by counting the occurrences of 34 different symbols in their SMILES strings, which creates a 34-dimensional chemical space. Ligand-based virtual screening using the city-block distance CBDSMIfp as similarity measure provides good AUC values and enrichment factors for recovering series of actives from the directory of useful decoys (DUD-E) and from ZINC. DrugBank, ChEMBL, ZINC, PubChem, GDB-11, GDB-13, and GDB-17 can be searched by CBDSMIfp using an online SMIfp-browser at www.gdb.unibe.ch . Visualization of the SMIfp chemical space was performed by principal component analysis and color-coded maps of the (PC1, PC2)-planes, with interactive access to the molecules enabled by the Java application SMIfp-MAPPLET available from www.gdb.unibe.ch . These maps spread molecules according to their fraction of aromatic atoms, size and polarity. SMIfp provides a new and relevant entry to explore the small molecule chemical space.

• Broad Coverage of Commercially Available Lead-like Screening Space with Fewer than 350,000 Compounds
Baell, Jonathan B
Journal of chemical information and modeling, 2013, 53(1), 39-55
PMID: 23198812

In establishing what we propose is the globe's highest quality collection of available screening compounds, it is convincingly shown that the globe's pool of such compounds is extremely shallow and can be represented by fewer than 350,000 compounds. To support our argument, we discuss and fully disclose our extensive battery of functional group filters. We discuss the use of PAINS filters and also show the effect of similarity exclusion on structure-activity relationships. We show why limited analogue representation requires screening at higher concentrations to capture hit classes for difficult targets that otherwise may be prosecuted unsuccessfully. We construct our arguments in a structurally focused manner to be most useful to medicinal chemists, the key players in drug discovery.

• HitPick: a web server for hit identification and target prediction of chemical screenings
Liu, X and Vogt, I and Haque, T and Campillos, M
Bioinformatics (Oxford, England), 2013, 29(15), 1910-1912
PMID: 23716196     doi: 10.1093/bioinformatics/btt303

MOTIVATION: High-throughput phenotypic assays reveal information about the molecules that modulate biological processes, such as a disease phenotype and a signaling pathway. In these assays, the identification of hits along with their molecular targets is critical to understand the chemical activities modulating the biological system. Here, we present HitPick, a web server for identification of hits in high-throughput chemical screenings and prediction of their molecular targets. HitPick applies the B-score method for hit identification and a newly developed approach combining 1-nearest-neighbor (1NN) similarity searching and Laplacian-modified naïve Bayesian target models to predict targets of identified hits. The performance of the HitPick web server is presented and discussed. AVAILABILITY: The server can be accessed at http://mips.helmholtz-muenchen.de/proj/hitpick. CONTACT: monica.campillos@helmholtz-muenchen.de.

• Scaffold-Focused Virtual Screening: Prospective Application to the Discovery of TTK Inhibitors.
Langdon, Sarah R and Westwood, Isaac M and van Montfort, Rob L M and Brown, Nathan and Blagg, Julian
Journal of chemical information and modeling, 2013, 53(5), 1100-1112
PMID: 23672464     doi: 10.1021/ci400100c

We describe and apply a scaffold-focused virtual screen based upon scaffold trees to the mitotic kinase TTK (MPS1). Using level 1 of the scaffold tree, we perform both 2D and 3D similarity searches between a query scaffold and a level 1 scaffold library derived from a 2 million compound library; 98 compounds from 27 unique top-ranked level 1 scaffolds are selected for biochemical screening. We show that this scaffold-focused virtual screen prospectively identifies eight confirmed active compounds that are structurally differentiated from the query compound. In comparison, 100 compounds were selected for biochemical screening using a virtual screen based upon whole molecule similarity resulting in 12 confirmed active compounds that are structurally similar to the query compound. We elucidated the binding mode for four of the eight confirmed scaffold hops to TTK by determining their protein-ligand crystal structures; each represents a ligand-efficient scaffold for inhibitor design.

• Multiple structures for virtual ligand screening: defining binding site properties-based criteria to optimize the selection of the query.
Ben Nasr, Nesrine and Guillemain, Hélène and Lagarde, Nathalie and Zagury, Jean-François and Montes, Matthieu
Journal of chemical information and modeling, 2013, 53(2), 293-311
PMID: 23312043

Virtual ligand screening is an integral part of the modern drug discovery process. Traditional ligand-based, virtual screening approaches are fast but require a set of structurally diverse ligands known to bind to the target. Traditional structure-based approaches require high-resolution target protein structures and are computationally demanding. In contrast, the recently developed threading/structure-based FINDSITE-based approaches have the advantage that they are as fast as traditional ligand-based approaches and yet overcome the limitations of traditional ligand- or structure-based approaches. These new methods can use predicted low-resolution structures and infer the likelihood of a ligand binding to a target by utilizing ligand information excised from the target's remote or close homologous proteins and/or libraries of ligand binding databases. Here, we develop an improved version of FINDSITE, FINDSITEfilt, that filters out false positive ligands in threading identified templates by a better binding site detection procedure that includes information about the binding site amino acid similarity. We then combine FINDSITEfilt with FINDSITEX that uses publicly available binding databases ChEMBL and DrugBank for virtual ligand screening. The combined approach, FINDSITEcomb, is compared to two traditional docking methods, AUTODOCK Vina and DOCK 6, on the DUD benchmark set. It is shown to be significantly better in terms of enrichment factor, dependence on target structure quality, and speed. FINDSITEcomb is then tested for virtual ligand screening on a large set of 3576 generic targets from the DrugBank database as well as a set of 168 Human GPCRs. Excluding close homologues, FINDSITEcomb gives an average enrichment factor of 52.1 for generic targets and 22.3 for GPCRs within the top 1% of the screened compound library. Around 65% of the targets have better than random enrichment factors. The performance is insensitive to target structure quality, as long as it has a TM-score ≥ 0.4 to native. Thus, FINDSITEcomb makes the screening of millions of compounds across entire proteomes feasible. The FINDSITEcomb web service is freely available for academic users at http://cssb.biology.gatech.edu/skolnick/webservice/FINDSITE-COMB/index.html

• Fragment-based Shape Signatures: a new tool for virtual screening and drug discovery.
Zauhar, Randy J and Gianti, Eleonora and Welsh, William J
Journal of computer-aided molecular design, 2013, 27(12), 1009-1036
PMID: 24366428     doi: 10.1007/s10822-013-9698-7

Since its introduction in 2003, the Shape Signatures method has been successfully applied in a number of drug design projects. Because it uses a ray-tracing approach to directly measure molecular shape and properties (as opposed to relying on chemical structure), it excels at scaffold hopping, and is extraordinarily easy to use. Despite its advantages, a significant drawback of the method has hampered its application to certain classes of problems; namely, when the chemical structures considered are large and contain heterogeneous ring-systems, the method produces descriptors that tend to merely measure the overall size of the molecule, and begin to lose selective power. To remedy this, the approach has been reformulated to automatically decompose compounds into fragments using ring systems as anchors, and to likewise partition the ray-trace in accordance with the fragment assignments. Subsequently, descriptors are generated that are fragment-based, and query and target molecules are compared by mapping query fragments onto target fragments in all ways consistent with the underlying chemical connectivity. This has proven to greatly extend the selective power of the method, while maintaining the ease of use and scaffold-hopping capabilities that characterized the original implementation. In this work, we provide a full conceptual description of the next generation Shape Signatures, and we underline the advantages of the method by discussing its practical applications to ligand-based virtual screening. The new approach can also be applied in receptor-based mode, where protein-binding sites (partitioned into subsites) can be matched against the new fragment-based Shape Signatures descriptors of library compounds.

• A novel and efficient ligand-based virtual screening approach using the HWZ scoring function and an enhanced shape-density model.
Hamza, Adel and Wei, Ning-Ning and Hao, Ce and Xiu, Zhilong and Zhan, Chang-Guo
Journal of biomolecular structure & dynamics, 2013, 31(11), 1236-1250
PMID: 23140256     doi: 10.1080/07391102.2012.732341

In this work, we extend our previous ligand shape-based virtual screening approach by using the scoring function Hamza-Wei-Zhan (HWZ) score and an enhanced molecular shape-density model for the ligands. The performance of the method has been tested against the 40 targets in the Database of Useful Decoys and compared with the performance of our previous HWZ score method. The virtual screening results using the novel ligand shape-based approach demonstrated a favorable improvement (area under the receiver operator characteristics curve AUC

• Similarity searching for potent compounds using feature selection.
Vogt, Martin and Bajorath, Jürgen
Journal of chemical information and modeling, 2013, 53(7), 1613-1619
PMID: 23808911     doi: 10.1021/ci4003206

In similarity searching, compound potency is usually not taken into account. Given a set of active reference compounds, similarity to database molecules is calculated using different metrics without considering compound potency as a search parameter. Herein, we introduce a feature selection method for fingerprint similarity searching to maximize compound recall and preferentially detect potent compounds. On the basis of training examples, fingerprint features are selected that identify potent compounds and produce high recall. Using the reduced fingerprint representations, potent hits are preferentially detected, even if reference compounds have only moderate or low potency. Small sets of simple chemical features are found to yield high search performance.

## 2012

• Directory of Useful Decoys, Enhanced (DUD-E) - Better Ligands and Decoys for Better Benchmarking.
Mysinger, Michael and Carchia, Michael and Irwin, John J and Shoichet, Brian K
Journal of medicinal chemistry, 2012, 55(14), 6582-6594
PMID: 22716043     doi: 10.1021/jm300687e

A key metric to assess molecular docking remains ligand enrichment against challenging decoys. Whereas the directory of useful decoys (DUD) has been widely used, clear areas for optimization have emerged. Here we describe an improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22,886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC. To ensure chemotype diversity we cluster each target's ligands by their Bemis-Murcko atomic frameworks. We add net charge to the matched physico-chemical properties, and include only the most dissimilar decoys, by topology, from the ligands. An online automated tool (http://decoys.docking.org) generates these improved matched decoys for user-supplied ligands. We test this dataset by docking all 102 targets, using the results to improve the balance between ligand desolvation and electrostatics in DOCK 3.6. The complete DUD-E benchmarking set is freely available at http://dude.docking.org.

• Automated recycling of chemistry for virtual screening and library design.
Vainio, Mikko and Kogej, Thierry and Raubacher, Florian
Journal of chemical information and modeling, 2012, 52(7), 1777-1786
PMID: 22657574     doi: 10.1021/ci300157m

An early stage drug discovery project needs to identify a number of chemically diverse and attractive compounds. These hit compounds are typically found through high-throughput screening campaigns. The diversity of the chemical libraries used in screening is therefore important. In this study, we describe a virtual high-throughput screening system called Virtual Library. The system automatically "recycles" validated synthetic protocols and available starting materials to generate a large number of virtual compound libraries, and allows for fast searches in the generated libraries using a 2D fingerprint based screening method. Virtual Library links the returned virtual hit compounds back to experimental protocols to quickly assess the synthetic accessibility of the hits. The system can be used as an idea generator for library design to enrich the screening collection, and to explore the structure-activity landscape around a specific active compound.

• Virtual fragment screening: Discovery of histamine H(3) receptor ligands using ligand-based and protein-based molecular fingerprints.
Sirci, Francesco and Istyastono, Enade P and Vischer, Henry F and Kooistra, Albert J and Nijmeijer, Saskia and Kuijer, Martien and Wijtmans, Maikel and Mannhold, Raimund and Leurs, Rob and de Esch, Iwan J P and de Graaf, Chris
Journal of chemical information and modeling, 2012, 52(12), 3308-3324
PMID: 23140085     doi: 10.1021/ci3004094

Virtual Fragment Screening (VFS) is a promising new method that uses computer models to identify small, fragment-like biologically active molecules as useful starting points for Fragment-Based Drug Discovery (FBDD). Training sets of true active and inactive fragment-like molecules to construct and validate target customized VFS methods are however lacking. We have for the first time explored the possibilities and challenges of VFS using molecular fingerprints derived from a unique set of fragment affinity data for the histamine H(3) receptor (H(3)R), a pharmaceutically relevant G Protein-coupled Receptor (GPCR). Optimized FLAP (Fingerprint of Ligands And Proteins) models containing essential molecular interaction fields that discriminate known H(3)R binders from inactive molecules were successfully used for the identification of new H(3)R ligands. Prospective virtual screening of 156,090 molecules yielded a high hit rate of 62% (18 of the 29 tested) experimentally confirmed novel fragment-like H(3)R ligands that offer new potential starting points for the design of H(3)R targeting drugs. The first construction and application of customized FLAP models for the discovery of fragment-like biologically active molecules demonstrates that VFS is an efficient way to explore protein-fragment interaction space in silico.

• Shaping a Screening File for Maximal Lead Discovery Efficiency and Effectiveness: Elimination of Molecular Redundancy.
Bakken, Gregory A and Boehm, Markus and Bell, Andrew S and Everett, Jeremy R and Gonzales, Rosalia and Hepworth, David and Klug-McLeod, Jacquelyn L and Lanfear, Jeremy and Loesel, Jens and Mathias, John and Wood, Terence P
Journal of chemical information and modeling, 2012, 52(11), 2937-2949
PMID: 23062111     doi: 10.1021/ci300372a

High Throughput Screening (HTS) is a successful strategy for finding hits and leads that have the opportunity to be converted into drugs. In this paper we highlight novel computational methods used to select compounds to build a new screening file at Pfizer and the analytical methods we used to assess their quality. We also introduce the novel concept of molecular redundancy to help decide on the density of compounds required in any region of chemical space in order to be confident of running successful HTS campaigns.

• QSAR Classification Model for Antibacterial Compounds and Its Use in Virtual Screening.
Singh, Narender and Chaudhury, Sidhartha and Liu, Ruifeng and Abdulhameed, Mohamed Diwan M and Tawa, Gregory and Wallqvist, Anders
Journal of chemical information and modeling, 2012, 52(10), 2559-2569
PMID: 23013546     doi: 10.1021/ci300336v

As novel and drug-resistant bacterial strains continue to present an emerging health threat, the development of new antibacterial agents is critical. This includes making improvements to existing antibacterial scaffolds as well as identifying novel ones. The aim of this study is to apply a Bayesian classification QSAR approach to rapidly screen chemical libraries for compounds predicted to have antibacterial activity. Toward this end we assembled a data set of 317 known antibacterial compounds as well as a second data set of diverse, well-validated, non-antibacterial compounds from 215 PubChem Bioassays against various bacterial species. We constructed a Bayesian classification model using structural fingerprints and physicochemical property descriptors and achieved an accuracy of 84% and precision of 86% on an independent test set in identifying antibacterial compounds. To demonstrate the practical applicability of the model in virtual screening, we screened an independent data set of ∼200k compounds. The results show that the model can screen top hits of PubChem Bioassay actives with accuracy up to ∼76%, representing a 1.5-2-fold enrichment. The top screened hits represented a mixture of both known antibacterial scaffolds as well as novel scaffolds. Our study suggests that a well-validated Bayesian classification QSAR approach could compliment other screening approaches in identifying novel and promising hits. The data sets used in constructing and validating this model have been made publicly available.

• DecoyFinder: an easy-to-use python GUI application for building target-specific decoy sets.
Cereto-Massagué, Adrià and Guasch, Laura and Valls, Cristina and Mulero, Miquel and Pujadas, Gerard and Garcia-Vallvé, Santiago
Bioinformatics (Oxford, England), 2012, 28(12), 1661-1662
PMID: 22539671     doi: 10.1093/bioinformatics/bts249

Decoys are molecules that are presumed to be inactive against a target (i.e. will not likely bind to the target) and are used to validate the performance of molecular docking or a virtual screening workflow. The Directory of Useful Decoys database (http://dud.docking.org/) provides a free directory of decoys for use in virtual screening, though it only contains a limited set of decoys for 40 targets.To overcome this limitation, we have developed an application called DecoyFinder that selects, for a given collection of active ligands of a target, a set of decoys from a database of compounds. Decoys are selected if they are similar to active ligands according to five physical descriptors (molecular weight, number of rotational bonds, total hydrogen bond donors, total hydrogen bond acceptors and the octanol-water partition coefficient) without being chemically similar to any of the active ligands used as an input (according to the Tanimoto coefficient between MACCS fingerprints). To the best of our knowledge, DecoyFinder is the first application designed to build target-specific decoy sets. AVAILABILITY: A complete description of the software is included on the application home page. A validation of DecoyFinder on 10 DUD targets is provided as Supplementary Table S1. DecoyFinder is freely available at http://URVnutrigenomica-CTNS.github.com/DecoyFinder.

• Integrating Ligand-Based and Protein-Centric Virtual Screening of Kinase Inhibitors Using Ensembles of Multiple Protein Kinase Genes and Conformations.
Dixit, Anshuman and Verkhivker, Gennady M
Journal of chemical information and modeling, 2012, 52(10), 2501-2515
PMID: 22992037     doi: 10.1021/ci3002638

The rapidly growing wealth of structural and functional information about kinase genes and kinase inhibitors that is fueled by a significant therapeutic role of this protein family provides a significant impetus for development of targeted computational screening approaches. In this work, we explore an ensemble-based, protein-centric approach that allows for simultaneous virtual ligand screening against multiple kinase genes and multiple kinase receptor conformations. We systematically analyze and compare the results of ligand-based and protein-centric screening approaches using both single-receptor and ensemble-based docking protocols. A panel of protein kinase targets that includes ABL, EGFR, P38, CDK2, TK, and VEGFR2 kinases is used in this comparative analysis. By applying various performance metrics we have shown that ligand-centric shape matching can provide an effective enrichment of active compounds outperforming single-receptor docking screening. However, ligand-based approaches can be highly sensitive to the choice of inhibitor queries. Employment of multiple inhibitor queries combined with parallel selection ranking criteria can improve the performance and efficiency of ligand-based virtual screening. We also demonstrated that replica-exchange Monte Carlo docking with kinome-based ensembles of multiple crystal structures can provide a superior early enrichment on the kinase targets. The central finding of this study is that incorporation of the template-based structural information about kinase inhibitors and protein kinase structures in diverse functional states can significantly enhance the overall performance and robustness of both ligand and protein-centric screening strategies. The results of this study may be useful in virtual screening of kinase inhibitors potentially offering a beneficial spectrum of therapeutic activities across multiple disease states.

• Ligand-Based Virtual Screening Approach Using a New Scoring Function.
Hamza, Adel and Wei, Ning-Ning and Zhan, Chang-Guo
Journal of chemical information and modeling, 2012, 52(4), 963-974
PMID: 22486340     doi: 10.1021/ci200617d

In this study, we aimed to develop a new ligand-based virtual screening approach using an effective shape-overlapping procedure and a more robust scoring function (denoted by the HWZ score for convenience). The HWZ score-based virtual screening approach was tested against the compounds for 40 protein targets available in the Database of Useful Decoys (DUD; dud.docking.org/jahn/ ), and the virtual screening performance was evaluated in terms of the area under the receiver operator characteristic (ROC) curve (AUC), enrichment factor (EF), and hit rate (HR), demonstrating an improved overall performance compared to other popularly used approaches examined. In particular, the HWZ score-based virtual screening led to an average AUC value of 0.84 $\pm$ 0.02 (95% confidence interval) for the 40 targets. The average HR values at the top 1% and 10% of the active compounds for the 40 targets were 46.3% $\pm$ 6.7% and 59.2% $\pm$ 4.7%, respectively. In addition, the performance of the HWZ score-based virtual screening approach is less sensitive to the choice of the target.

• Assessment of a Rule-Based Virtual Screening Technology (INDDEx) on a Benchmark Data Set.
Reynolds, Christopher R and Amini, Ata C and Muggleton, Stephen H and Sternberg, Michael J. E.
The journal of physical chemistry. B, 2012, 116(23), 6732-6739
PMID: 22380596     doi: 10.1021/jp212084f

The Investigational Novel Drug Discovery by Example (INDDEx) package has been developed to find active compounds by linking activity to chemical substructure and to guide the process of further drug development. INDDEx is a machine-learning technique, based on forming qualitative logical rules about substructural features of active molecules, weighting the rules to form a quantitative model, and then using the model to screen a molecular database. INDDEx is shown to be able to learn from multiple active compounds and to be useful for scaffold-hopping when performing virtual screening, giving high retrieval rates even when learning from a small number of compounds. Across the data sets tested, at 1% of the data, INDDEx was found to have average enrichment factors of 69.2, 82.7, and 90.4 when learning from 2, 4, and 8 active ligands, respectively. At 0.1% of the data, INDDEx had average enrichment factors of 492, 631, and 707 when learning from 2, 4, and 8 active ligands, respectively. Excluding all ligands with more than 0.5 Tanimoto Maximum Common Substructure, INDDEx had average enrichment factors at 1% of 52.3, 63.6, and 66.9 when learning from 2, 4, and 8 active ligands, respectively. The performance of INDDEx is compared with that of eHiTS LASSO, PharmaGist, and DOCK.

• Application of Support Vector Machine to Three-Dimensional Shape-Based Virtual Screening Using Comprehensive Three-Dimensional Molecular Shape Overlay with Known Inhibitors.
Sato, Tomohiro and Yuki, Hitomi and Takaya, Daisuke and Sasaki, Shunta and Tanaka, Akiko and Honma, Teruki
Journal of chemical information and modeling, 2012, 52(4), 1015-1026
PMID: 22424085     doi: 10.1021/ci200562p

In this study, machine learning using support vector machine was combined with three-dimensional (3D) molecular shape overlay, to improve the screening efficiency. Since the 3D molecular shape overlay does not use fingerprints or descriptors to compare two compounds, unlike 2D similarity methods, the application of machine learning to a 3D shape-based method has not been extensively investigated. The 3D similarity profile of a compound is defined as the array of 3D shape similarities with multiple known active compounds of the target protein and is used as the explanatory variable of support vector machine. As the measures of 3D shape similarity for our new prediction models, the prediction performances of the 3D shape similarity metrics implemented in ROCS, such as ShapeTanimoto and ScaledColor, were validated, using the known inhibitors of 15 target proteins derived from the ChEMBL database. The learning models based on the 3D similarity profiles stably outperformed the original ROCS when more than 10 known inhibitors were available as the queries. The results demonstrated the advantages of combining machine learning with the 3D similarity profile to process the 3D shape information of plural active compounds.

• Improving Classical Substructure-Based Virtual Screening to Handle Extrapolation Challenges.
Biniashvili, Tammy and Schreiber, Ehud and Kliger, Yossef
Journal of chemical information and modeling, 2012, 52(3), 678-685
PMID: 22360790     doi: 10.1021/ci200472s

Target-oriented substructure-based virtual screening (sSBVS) of molecules is a promising approach in drug discovery. Yet, there are doubts whether sSBVS is suitable also for extrapolation, that is, for detecting molecules that are very different from those used for training. Herein, we evaluate the predictive power of classic virtual screening methods, namely, similarity searching using Tanimoto coefficient (MTC) and Naive Bayes (NB). As could be expected, these classic methods perform better in interpolation than in extrapolation tasks. Consequently, to enhance the predictive ability for extrapolation tasks, we introduce the Shadow approach, in which inclusion relations between substructures are considered, as opposed to the classic sSBVS methods that assume independence between substructures. Specifically, we discard contributions from substructures included in ("shaded" by) others which are, in turn, included in the molecule of interest. Indeed, the Shadow classifier significantly outperforms both MTC (pValue

• A reverse combination of structure-based and ligand-based strategies for virtual screening.
Cortés-Cabrera, Alvaro and Gago, Federico and Morreale, Antonio
Journal of computer-aided molecular design, 2012, 26(3), 319-327
PMID: 22395903     doi: 10.1007/s10822-012-9558-x

A new approach is presented that combines structure- and ligand-based virtual screening in a reverse way. Opposite to the majority of the methods, a docking protocol is first employed to prioritize small ligands ("fragments") that are subsequently used as queries to search for similar larger ligands in a database. For a given chemical library, a three-step strategy is followed consisting of (1) contraction into a representative, non-redundant, set of fragments, (2) selection of the three best-scoring fragments docking into a given macromolecular target site, and (3) expansion of the fragments' structures back into ligands by using them as queries to search the library by means of fingerprint descriptions and similarity criteria. We tested the performance of this approach on a collection of fragments and ligands found in the ZINC database and the directory of useful decoys, and compared the results with those obtained using a standard docking protocol. The new method provided better overall results and was several times faster. We also studied the chemical diversity that both methods cover using an in-house compound library and concluded that the novel approach performs similarly but at a much smaller computational cost.

• Systematic assessment of scaffold distances in ChEMBL: prioritization of compound data sets for scaffold hopping analysis in virtual screening.
Li, Ruifang and Bajorath, Jürgen
Journal of computer-aided molecular design, 2012, 26(10), 1101-1109
PMID: 22972561     doi: 10.1007/s10822-012-9603-9

The evaluation of the scaffold hopping potential of computational methods is of high relevance for virtual screening. For benchmark calculations, classes of known active compounds are utilized. Ideally, such classes should have a well-defined content of structurally diverse scaffolds. However, in reported benchmark investigations, the choice of activity classes is often difficult to rationalize. To provide a compendium of well-characterized test cases for the assessment of scaffold hopping potential, structural distances between scaffolds were systematically calculated for compound classes available in the ChEMBL database. Nearly seven million scaffold pairs were evaluated. On the basis of the global scaffold distance distribution, a threshold value for large scaffold distances was determined. Compound data sets were ranked based on the proportion of scaffold pairs with large distances they contained, taking additional criteria into account that are relevant for virtual screening. A set of 50 activity classes is provided that represent attractive test cases for scaffold hopping analysis and benchmark calculations.

• Virtual screening data fusion using both structure- and ligand-based methods.
Svensson, Fredrik and Karlén, Anders and Sköld, Christian
Journal of chemical information and modeling, 2012, 52(1), 225-232
PMID: 22148635     doi: 10.1021/ci2004835

Virtual screening is widely applied in drug discovery, and significant effort has been put into improving current methods. In this study, we have evaluated the performance of compound ranking in virtual screening using five different data fusion algorithms on a total of 16 data sets. The data were generated by docking, pharmacophore search, shape similarity, and electrostatic similarity, spanning both structure- and ligand-based methods. The algorithms used for data fusion were sum rank, rank vote, sum score, Pareto ranking, and parallel selection. None of the fusion methods require any prior knowledge or input other than the results from the single methods and, thus, are readily applicable. The results show that compound ranking using data fusion improves the performance and consistency of virtual screening compared to the single methods alone. The best performing data fusion algorithm was parallel selection, but both rank voting and Pareto ranking also have good performance.

• Kinase-Kernel Models: Accurate In silico Screening of 4 Million Compounds Across the Entire Human Kinome.
Martin, Eric and Mukherjee, Prasenjit
Journal of chemical information and modeling, 2012, 52(1), 156-170
PMID: 22133092     doi: 10.1021/ci200314j

Reliable in silico prediction methods promise many advantages over experimental high-throughput screening (HTS): vastly lower time and cost, affinity magnitude estimates, no requirement for a physical sample, and a knowledge-driven exploration of chemical space. For the specific case of kinases, given several hundred experimental IC(50) training measurements, the empirically parametrized profile-quantitative structure-activity relationship (profile-QSAR) and surrogate AutoShim methods developed at Novartis can predict IC(50) with a reliability approaching experimental HTS. However, in the absence of training data, prediction is much harder. The most common a priori prediction method is docking, which suffers from many limitations: It requires a protein structure, is slow, and cannot predict affinity. (1) Highly accurate profile-QSAR (2) models have now been built for roughly 100 kinases covering most of the kinome. Analyzing correlations among neighboring kinases shows that near neighbors share a high degree of SAR similarity. The novel chemogenomic kinase-kernel method reported here predicts activity for new kinases as a weighted average of predicted activities from profile-QSAR models for nearby neighbor kinases. Three different factors for weighting the neighbors were evaluated: binding site sequence identity to the kinase neighbors, similarity of the training set for each neighbor model to the compound being predicted, and accuracy of each neighbor model. Binding site sequence identity was by far most important, followed by chemical similarity. Model quality had almost no relevance. The median R(2)

• Ligand expansion in ligand-based virtual screening using relevance feedback.
Abdo, Ammar and Saeed, Faisal and Hamza, Hentabli and Ahmed, Ali and Salim, Naomie
Journal of computer-aided molecular design, 2012, 26(3), 279-287
PMID: 22249773     doi: 10.1007/s10822-012-9543-4

Query expansion is the process of reformulating an original query to improve retrieval performance in information retrieval systems. Relevance feedback is one of the most useful query modification techniques in information retrieval systems. In this paper, we introduce query expansion into ligand-based virtual screening (LBVS) using the relevance feedback technique. In this approach, a few high-ranking molecules of unknown activity are filtered from the outputs of a Bayesian inference network based on a single ligand molecule to form a set of ligand molecules. This set of ligand molecules is used to form a new ligand molecule. Simulated virtual screening experiments with the MDL Drug Data Report and maximum unbiased validation data sets show that the use of ligand expansion provides a very simple way of improving the LBVS, especially when the active molecules being sought have a high degree of structural heterogeneity. However, the effectiveness of the ligand expansion is slightly less when structurally-homogeneous sets of actives are being sought.

• COPICAT: A software system for predicting interactions between proteins and chemical compounds.
Sakakibara, Yasubumi and Hachiya, Tsuyoshi and Uchida, Miho and Nagamine, Nobuyoshi and Sugawara, Yohei and Yokota, Masahiro and Nakamura, Masaomi and Popendorf, Kris and Komori, Takashi and Sato, Kengo
Bioinformatics (Oxford, England), 2012, 28(5), 745-746
PMID: 22257668     doi: 10.1093/bioinformatics/bts031

SUMMARY: Since tens of millions of chemical compounds have been accumulated in public chemical databases, fast comprehensive computational methods to predict interactions between chemical compounds and proteins are needed for virtual screening of lead compounds. Previously, we proposed a novel method for predicting protein-chemical interactions using two-layer Support Vector Machine classifiers that require only readily available biochemical data, i.e., amino acid sequences of proteins and structure formulas of chemical compounds.In this paper, the method has been implemented as the COPICAT web service, with an easy-to-use front-end interface. Users can simply submit a protein-chemical interaction prediction job using a pre-trained classifier, or can even train their own classification model by uploading training data. COPICAT's fast and accurate computational prediction has enhanced lead compound discovery against a database of tens of millions of chemical compounds, implying that the search space for drug discovery is extended by more than 1,000 times compared with currently well-used high-throughput screening methodologies. AVAILABILITY: The COPICAT server is available at http://copicat.dna.bio.keio.ac.jp. All functions, including the prediction function are freely available via anonymous login without registration. Registered users, however, can use the system more intensively. CONTACT: yasu@bio.keio.ac.jp.

• Enrichment of virtual hits by progressive shape-matching and docking.
Choi, Jiwon and He, Ningning and Kim, Nayoung and Yoon, Sukjoon
Journal of molecular graphics & modelling, 2012, 32, 82-88
PMID: 22088763     doi: 10.1016/j.jmgm.2011.10.002

The main applications of virtual chemical screening include the selection of a minimal receptor-relevant subset of a chemical library with a maximal chemical diversity. We have previously reported that the combination of ligand-centric and receptor-centric virtual screening methods may provide a compromise between computational time and accuracy during the hit enrichment process. In the present work, we propose a "progressive distributed docking" method that improves the virtual screening process using an iterative combination of shape-matching and docking steps. Known ligands with low docking scores were used as initial 3D templates for the shape comparisons with the chemical library. Next, new compounds with good template shape matches and low receptor docking scores were selected for the next round of shape searching and docking. The present iterative virtual screening process was tested for enriching peroxisome proliferator-activated receptor and phosphoinositide 3-kinase relevant compounds from a selected subset of the chemical libraries. It was demonstrated that the iterative combination improved the lead-hopping practice by improving the chemical diversity in the selected list of virtual hits.

## 2011

• Potency-directed similarity searching using support vector machines.
Wassermann, Anne M and Heikamp, Kathrin and Bajorath, Jürgen
Chemical biology & drug design, 2011, 77(1), 30-38
PMID: 21114788     doi: 10.1111/j.1747-0285.2010.01059.x

Support vector machine modeling has become increasingly popular in chemoinformatics. Recently, several advanced support vector machine applications have been reported including, among others, multitask learning for ligand-target prediction. Here, we introduce another support vector machine approach to add compound potency information to similarity searching and enrich database selection sets with potent hits. For this purpose, we introduce a structure-activity kernel function and a potency-oriented support vector machine linear combination approach. Using fingerprint descriptors, potency-directed support vector machine searching has been successfully applied to four high-throughput screening data sets, and different support vector machine strategies have been compared. For potency-balanced compound reference sets, potency-directed support vector machine searching meets or exceeds recall rates of standard support vector machine calculations but detects many more potent hits.

• How do 2D fingerprints detect structurally diverse active compounds? Revealing compound subset-specific fingerprint features through systematic selection.
Heikamp, Kathrin and Bajorath, Jürgen
Journal of chemical information and modeling, 2011, 51(9), 2254-2265
PMID: 21793563     doi: 10.1021/ci200275m

In independent studies it has previously been demonstrated that two-dimensional (2D) fingerprints have scaffold hopping ability in virtual screening, although these descriptors primarily emphasize structural and/or topological resemblance of reference and database compounds. However, the mechanism by which such fingerprints enrich structurally diverse molecules in database selection sets is currently little understood. In order to address this question, similarity search calculations on 120 compound activity classes of varying structural diversity were carried out using atom environment fingerprints. Two feature selection methods, Kullback-Leibler divergence and gain ratio analysis, were applied to systematically reduce these fingerprints and generate alternative versions for searching. Gain ratio is a feature selection method from information theory that has thus far not been considered in fingerprint analysis. However, it is shown here to be an effective fingerprint feature selection approach. Following comparative feature selection and similarity searching, the compound recall characteristics of original and reduced fingerprint versions were analyzed in detail. Small sets of fingerprint features were found to distinguish subsets of active compounds from other database molecules. The compound recall of fingerprint similarity searching often resulted from a cumulative detection of distinct compound subsets by different fingerprint features, which provided a rationale for the scaffold hopping potential of these 2D fingerprints.

• SHAFTS: A Hybrid Approach for 3D Molecular Similarity Calculation. 1. Method and Assessment of Virtual Screening.
Liu, Xiaofeng and Jiang, Hualiang and Li, Honglin
Journal of chemical information and modeling, 2011, 51(9), 2372-2385
PMID: 21819157     doi: 10.1021/ci200060s

We developed a novel approach called SHAFTS (SHApe-FeaTure Similarity) for 3D molecular similarity calculation and ligand-based virtual screening. SHAFTS adopts a hybrid similarity metric combined with molecular shape and colored (labeled) chemistry groups annotated by pharmacophore features for 3D similarity calculation and ranking, which is designed to integrate the strength of pharmacophore matching and volumetric overlay approaches. A feature triplet hashing method is used for fast molecular alignment poses enumeration, and the optimal superposition between the target and the query molecules can be prioritized by calculating corresponding "hybrid similarities". SHAFTS is suitable for large-scale virtual screening with single or multiple bioactive compounds as the query "templates" regardless of whether corresponding experimentally determined conformations are available. Two public test sets (DUD and Jain's sets) including active and decoy molecules from a panel of useful drug targets were adopted to evaluate the virtual screening performance. SHAFTS outperformed several other widely used virtual screening methods in terms of enrichment of known active compounds as well as novel chemotypes, thereby indicating its robustness in hit compounds identification and potential of scaffold hopping in virtual screening.

• Predicting the performance of fingerprint similarity searching.
Vogt, Martin and Bajorath, Jürgen
Methods in molecular biology (Clifton, N.J.), 2011, 672, 159-173
PMID: 20838968     doi: 10.1007/978-1-60761-839-3_6

Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the "background." By quantifying the difference in feature distribution using the Kullback-Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.

• Using consensus-shape clustering to identify promiscuous ligands and protein targets and to choose the right query for shape-based virtual screening.
Pérez-Nueno, Violeta I and Ritchie, David W
Journal of chemical information and modeling, 2011, 51(6), 1233-1248
PMID: 21604699     doi: 10.1021/ci100492r

Ligand-based shape matching approaches have become established as important and popular virtual screening (VS) techniques. However, despite their relative success, many authors have discussed how best to choose the initial query compounds and which of their conformations should be used. Furthermore, it is increasingly the case that pharmaceutical companies have multiple ligands for a given target and these may bind in different ways to the same pocket. Conversely, a given ligand can sometimes bind to multiple targets, and this is clearly of great importance when considering drug side-effects. We recently introduced the notion of spherical harmonic-based "consensus shapes" to help deal with these questions. Here, we apply a consensus shape clustering approach to the 40 protein-ligand targets in the DUD data set using PARASURF/PARAFIT. Results from clustering show that in some cases the ligands for a given target are split into two subgroups which could suggest they bind to different subsites of the same target. In other cases, our clustering approach sometimes groups together ligands from different targets, and this suggests that those ligands could bind to the same targets. Hence spherical harmonic-based clustering can rapidly give cross-docking information while avoiding the expense of performing all-against-all docking calculations. We also report on the effect of the query conformation on the performance of shape-based screening of the DUD data set and the potential gain in screening performance by using consensus shapes calculated in different ways. We provide details of our analysis of shape-based screening using both PARASURF/PARAFIT and ROCS, and we compare the results obtained with shape-based and conventional docking approaches using MSSH/SHEF and GOLD. The utility of each type of query is analyzed using commonly reported statistics such as enrichment factors (EF) and receiver-operator-characteristic (ROC) plots as well as other early performance metrics.

• Improving the accuracy of ultrafast ligand-based screening: incorporating lipophilicity into ElectroShape as an extra dimension.
Armstrong, M Stuart and Finn, Paul W and Morris, Garrett M and Richards, W Graham
Journal of computer-aided molecular design, 2011, 25(8), 785-790
PMID: 21822723     doi: 10.1007/s10822-011-9463-8

In a previous paper, we presented the ElectroShape method, which we used to achieve successful ligand-based virtual screening. It extended classical shape-based methods by applying them to the four-dimensional shape of the molecule where partial charge was used as the fourth dimension to capture electrostatic information. This paper extends the approach by using atomic lipophilicity (alogP) as an additional molecular property and validates it using the improved release 2 of the Directory of Useful Decoys (DUD). When alogP replaced partial charge, the enrichment results were slightly below those of ElectroShape, though still far better than purely shape-based methods. However, when alogP was added as a complement to partial charge, the resulting five-dimensional enrichments shows a clear improvement in performance. This demonstrates the utility of extending the ElectroShape virtual screening method by adding other atom-based descriptors.

• Rapid Shape-Based Ligand Alignment and Virtual Screening Method Based on Atom/Feature-Pair Similarities and Volume Overlap Scoring.
Sastry, Madhavi and Dixon, Steve and Sherman, Woody
Journal of chemical information and modeling, 2011, 51(10), 2455-2466
PMID: 21870862     doi: 10.1021/ci2002704

Shape-based methods for aligning and scoring ligands have proven to be valuable in the field of computer-aided drug design. Here, we describe a new shape-based flexible ligand superposition and virtual screening method, Phase Shape, which is shown to rapidly produce accurate 3D ligand alignments and efficiently enrich actives in virtual screening. We describe the methodology, which is based on the principle of atom distribution triplets to rapidly define trial alignments, followed by refinement of top alignments to maximize the volume overlap. The method can be run in a shape-only mode or it can include atom types or pharmacophore feature encoding, the latter consistently producing the best results for database screening. We apply Phase Shape to flexibly align molecules that bind to the same target and show that the method consistently produces correct alignments when compared with crystal structures. We then illustrate the effectiveness of the method for identifying active compounds in virtual screening of eleven diverse targets. Multiple parameters are explored, including atom typing, query structure conformation, and the database conformer generation protocol. We show that Phase Shape performs well in database screening calculations when compared with other shape-based methods using a common set of actives and decoys from the literature.

• Introduction of the conditional correlated Bernoulli model of similarity value distributions and its application to the prospective prediction of fingerprint search performance.
Vogt, Martin and Bajorath, Jürgen
Journal of chemical information and modeling, 2011, 51(10), 2496-2506
PMID: 21892818     doi: 10.1021/ci2003472

A statistical approach named the conditional correlated Bernoulli model is introduced for modeling of similarity scores and predicting the potential of fingerprint search calculations to identify active compounds. Fingerprint features are rationalized as dependent Bernoulli variables and conditional distributions of Tanimoto similarity values of database compounds given a reference molecule are assessed. The conditional correlated Bernoulli model is utilized in the context of virtual screening to estimate the position of a compound obtaining a certain similarity value in a database ranking. Through the generation of receiver operating characteristic curves from cumulative distribution functions of conditional similarity values for known active and random database compounds, one can predict how successful a fingerprint search might be. The comparison of curves for different fingerprints makes it possible to identify fingerprints that are most likely to identify new active molecules in a database search given a set of known reference molecules.

• Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision.
Holliday, John D and Kanoulas, Evangelos and Malim, Nurul and Willett, Peter
Journal of cheminformatics, 2011, 3(1), 29
PMID: 21824430     doi: 10.1186/1758-2946-3-29

UNLABELLED:ABSTRACT:

• G-protein coupled receptors virtual screening using genetic algorithm focused chemical space.
Sage, Carleton and Wang, Runtong and Jones, Gareth
Journal of chemical information and modeling, 2011, 51(8), 1754-1761
PMID: 21761904     doi: 10.1021/ci200043z

Exploiting the ever growing set of activity data for compounds against biological targets represents both a challenge and an opportunity for ligand-based virtual screening (LBVS). Because G-protein coupled receptors (GPCRs) represent a rich set of potential drug targets, we sought to develop an appropriate method to examine large sets of GPCR ligand information for both screening collection enhancement and hit expansion. To this end, we have implemented a modified version of BDACCS that removes highly correlated descriptors (rBDACCS). To test the hypothesis that a smaller, focused descriptor set would improve performance, we have extended rBDACCS by using a genetic algorithm (GA) to choose target-specific descriptors appropriate for selecting the set of 100 compounds most likely to be active from a decoy database. We have called this method GA-focused descriptor active space (GAFDAS). We compared the performce of rBDACCS and GAFDAS using a collection of activity data for 252 GPCR/ligand sets versus two decoy databases. While both methods appear effective in LBVS, overall GAFDAS performs better than rBDACCS in the early selection of compounds against both decoy databases.

## 2010

• ElectroShape: fast molecular similarity calculations incorporating shape, chirality and electrostatics.
Armstrong, M Stuart and Morris, Garrett M and Finn, Paul W and Sharma, Raman and Moretti, Loris and Cooper, Richard I and Richards, W Graham
Journal of computer-aided molecular design, 2010, 24(9), 789-801
PMID: 20614163     doi: 10.1007/s10822-010-9374-0

We present ElectroShape, a novel ligand-based virtual screening method, that combines shape and electrostatic information into a single, unified framework. Building on the ultra-fast shape recognition (USR) approach for fast non-superpositional shape-based virtual screening, it extends the method by representing partial charge information as a fourth dimension. It also incorporates the chiral shape recognition (CSR) method, which distinguishes enantiomers. It has been validated using release 2 of the Directory of useful decoys (DUD), and shows a near doubling in enrichment ratio at 1% over USR and CSR, and improvements as measured by Receiver Operating Characteristic curves. These improvements persisted even after taking into account the chemotype redundancy in the sets of active ligands in DUD. During the course of its development, ElectroShape revealed a difference in the charge allocation of the DUD ligand and decoy sets, leading to several new versions of DUD being generated as a result. ElectroShape provides a significant addition to the family of ultra-fast ligand-based virtual screening methods, and its higher-dimensional shape recognition approach has great potential for extension and generalisation.

• Selection of in silico drug screening results by using universal active probes (UAPs).
Fukunishi, Yoshifumi and Ohno, Kazuki and Orita, Masaya and Nakamura, Haruki
Journal of chemical information and modeling, 2010, 50(7), 1233-1240
PMID: 20578712     doi: 10.1021/ci100108p

We developed a new method that uses a set of drug-like compounds to select reliable in silico drug screening results. If some active compounds are known, the screening results that rank these active compounds at the top should be reliable. If no active compound is known, how to select the result is in question. We propose a concept of a set of "universal active probes" (UAPs), which is a set of small active compounds that bind to different kinds of proteins. We found that the hit ratio of the true active compounds in in silico screening shows positive correlation to that of the UAPs, probably because UAPs form a set of drug-like compounds. Thus, if the UAPs were added to the compound library, the screening result that shows a high hit ratio of the UAPs could give reliable actual hit compounds for the target protein. We examined this method for several targets and found this idea useful.

• Comparison of three preprocessing filters efficiency in virtual screening: identification of new putative LXRbeta regulators as a test case.
Ghemtio, Léo and Devignes, Marie-Dominique and Smaïl-Tabbone, Malika and Souchet, Michel and Leroux, Vincent and Maigret, Bernard
Journal of chemical information and modeling, 2010, 50(5), 701-715
PMID: 20420434     doi: 10.1021/ci900356m

In silico screening methodologies are widely recognized as efficient approaches in early steps of drug discovery. However, in the virtual high-throughput screening (VHTS) context, where hit compounds are searched among millions of candidates, three-dimensional comparison techniques and knowledge discovery from databases should offer a better efficiency to finding novel drug leads than those of computationally expensive molecular dockings. Therefore, the present study aims at developing a filtering methodology to efficiently eliminate unsuitable compounds in VHTS process. Several filters are evaluated in this paper. The first two are structure-based and rely on either geometrical docking or pharmacophore depiction. The third filter is ligand-based and uses knowledge-based and fingerprint similarity techniques. These filtering methods were tested with the Liver X Receptor (LXR) as a target of therapeutic interest, as LXR is a key regulator in maintaining cholesterol homeostasis. The results show that the three considered filters are complementary so that their combination should generate consistent compound lists of potential hits.

• FLAP: GRID molecular interaction fields in virtual screening. validation using the DUD data set.
Cross, Simon and Baroni, Massimo and Carosati, Emanuele and Benedetti, Paolo and Clementi, Sergio
Journal of chemical information and modeling, 2010, 50(8), 1442-1450
PMID: 20690627     doi: 10.1021/ci100221g

The performance of FLAP (Fingerprints for Ligands and Proteins) in virtual screening is assessed using a subset of the DUD (Directory of Useful Decoys) benchmarking data set containing 13 targets each with more than 15 different chemotype classes. A variety of ligand and receptor-based virtual screening approaches are examined, using combinations of individual templates 2D structures of known actives, a cocrystallized ligand, a receptor structure, or a cocrystallized ligand-biased receptor structure. We examine several data fusion approaches to combine the results of the individual virtual screens. In doing so, we show that excellent chemotype enrichment is achieved in both single target ligand-based and receptor-based approaches, of approximately 17-fold over random on average at a false positive rate of 1%. We also show that using as much starting knowledge as possible improves chemotype enrichment, and that data fusion using Pareto ranking is an effective method to do this giving up to 50% improvement in enrichment over the single methods. Finally we show that if inactivity or decoy data is incorporated, automatically training the scoring function in FLAP improves recovery still further, with almost 2-fold improvement over the enrichments shown by the single methods. The results clearly demonstrate the utility of FLAP for virtual screening when either a limited or wide range of prior knowledge is available.

• Efficient virtual screening using multiple protein conformations described as negative images of the ligand-binding site.
Virtanen, Salla I and Pentikäinen, Olli T
Journal of chemical information and modeling, 2010, 50(6), 1005-1011
PMID: 20504004     doi: 10.1021/ci100121c

The protein structure-based virtual screening is typically accomplished using a molecular docking procedure. However, docking is a fairly slow process that is limited by the available scoring functions that cannot reliably distinguish between active and inactive ligands. In contrast, the ligand-based screening methods that are based on shape similarity identify the active ligands with high accuracy. Here, we show that the usage of negative images of the ligand-binding site, together with shape comparison tools, which are typically used in ligand-based virtual screening, improve the discrimination of active molecules from inactives. In contrast to ligand-based shape comparison, the negative image of the binding site allows identification of compounds whose shape complements the shape of the ligand-binding cavity as closely as possible. Furthermore, the use of several target protein conformations allows the identification of active ligands whose shape is not optimal for crystallized protein conformation. Accordingly, the presented virtual screening method improves the identification of novel lead molecules by concentrating on the optimally shaped molecules for the flexible ligand binding site.

## 2009

• Reverse fingerprinting and mutual information-based activity labeling and scoring (MIBALS).
Williams, Chris and Schreyer, Suzanne K
Combinatorial chemistry & high throughput screening, 2009, 12(4), 424-439
PMID: 19442069

A mutual information based activity labeling and scoring (MIBALS) approach to reverse fingerprint analysis is presented. Whole molecule scores produced by the method are shown to be capable of ranking compounds in virtual high-throughput screening (vHTS) experiments, while fragment scores produced by the method are able to identify pharmacophore moieties important for biological activity. The performance of MIBALS in vHTS experiments is assessed using reference ligands active against 40 different biological targets, and MIBALS retrieval rates are compared with those obtained using more traditional group fusion similarity search methods. The use of MIBALS to identify important pharmacophore fragments is demonstrated by comparing ligand fragment scores with known pharmacophores and known ligand/protein contacts. The ability of MIBALS to highlight beneficial and detrimental groups in a congeneric series is examined by comparing MIBALS fragment scores with features in known structure-activity relationships.

• MMsINC: a large-scale chemoinformatics database.
Masciocchi, Joel and Frau, Gianfranco and Fanton, Marco and Sturlese, Mattia and Floris, Matteo and Pireddu, Luca and Palla, Piergiorgio and Cedrati, Fabian and Rodriguez-Tomé, Patricia and Moro, Stefano
Nucleic acids research, 2009, 37(Database issue), D284-90
PMID: 18931373     doi: 10.1093/nar/gkn727

MMsINC (http://mms.dsfarm.unipd.it/MMsINC/search) is a database of non-redundant, richly annotated and biomedically relevant chemical structures. A primary goal of MMsINC is to guarantee the highest quality and the uniqueness of each entry. MMsINC then adds value to these entries by including the analysis of crucial chemical properties, such as ionization and tautomerization processes, and the in silico prediction of 24 important molecular properties in the biochemical profile of each structure. MMsINC is consequently a natural input for different chemoinformatics and virtual screening applications. In addition, MMsINC supports various types of queries, including substructure queries and the novel 'molecular scissoring' query. MMsINC is interfaced with other primary data collectors, such as PubChem, Protein Data Bank (PDB), the Food and Drug Administration database of approved drugs and ZINC.

• Ligand prediction from protein sequence and small molecule information using support vector machines and fingerprint descriptors.
Geppert, Hanna and Humrich, Jens and Stumpfe, Dagmar and Gärtner, Thomas and Bajorath, Jürgen
Journal of chemical information and modeling, 2009, 49(4), 767-779
PMID: 19309114     doi: 10.1021/ci900004a

Support vector machine (SVM) database search strategies are presented that aim at the identification of small molecule ligands for targets for which no ligand information is currently available. In pharmaceutical research and chemical biology, this situation is faced, for example, when studying orphan targets or newly identified members of protein families. To investigate methods for de novo ligand identification in the absence of known three-dimensional target structures or active molecules, we have focused on combining sequence and ligand information for closely and distantly related proteins. To provide a basis for these investigations, a set of 11 protease targets from different families was assembled together with more than 2000 inhibitors directed against individual proteases. We have compared SVM approaches that combine protein sequence and ligand information in different ways and utilize 2D fingerprints as ligand descriptors. These methodologies were applied to search for inhibitors of individual proteases not taken into account during learning. A target sequence-ligand kernel and, in particular, a linear combination of multiple target-directed SVMs consistently identified inhibitors with high accuracy including test cases where homology-based similarity searching using data fusion and conventional SVM ranking nearly or completely failed. The SVM linear combination and target-ligand kernel methods described herein are intuitive and straightforward to adopt for ligand prediction against other targets.

• Scoring ligand similarity in structure-based virtual screening.
Zavodszky, Maria I and Rohatgi, Anjali and Van Voorst, Jeffrey R and Yan, Honggao and Kuhn, Leslie A
Journal of molecular recognition : JMR, 2009, 22(4), 280-292
PMID: 19235177     doi: 10.1002/jmr.942

Scoring to identify high-affinity compounds remains a challenge in virtual screening. On one hand, protein-ligand scoring focuses on weighting favorable and unfavorable interactions between the two molecules. Ligand-based scoring, on the other hand, focuses on how well the shape and chemistry of each ligand candidate overlay on a three-dimensional reference ligand. Our hypothesis is that a hybrid approach, using ligand-based scoring to rank dockings selected by protein-ligand scoring, can ensure that high-ranking molecules mimic the shape and chemistry of a known ligand while also complementing the binding site. Results from applying this approach to screen nearly 70 000 National Cancer Institute (NCI) compounds for thrombin inhibitors tend to support the hypothesis. EON ligand-based ranking of docked molecules yielded the majority (4/5) of newly discovered, low to mid-micromolar inhibitors from a panel of 27 assayed compounds, whereas ranking docked compounds by protein-ligand scoring alone resulted in one new inhibitor. Since the results depend on the choice of scoring function, an analysis of properties was performed on the top-scoring docked compounds according to five different protein-ligand scoring functions, plus EON scoring using three different reference compounds. The results indicate that the choice of scoring function, even among scoring functions measuring the same types of interactions, can have an unexpectedly large effect on which compounds are chosen from screening. Furthermore, there was almost no overlap between the top-scoring compounds from protein-ligand versus ligand-based scoring, indicating the two approaches provide complementary information. Matchprint analysis, a new addition to the SLIDE (Screening Ligands by Induced-fit Docking, Efficiently) screening toolset, facilitated comparison of docked molecules' interactions with those of known inhibitors. The majority of interactions conserved among top-scoring compounds for a given scoring function, and from the different scoring functions, proved to be conserved interactions in known inhibitors. This was particularly true in the S1 pocket, which was occupied by all the docked compounds.

• Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity.
Quintus, Flavien and Sperandio, Olivier and Grynberg, Julien and Petitjean, Michel and Tuffery, Pierre
Bmc Bioinformatics, 2009, 10(1), 245
PMID: 19671127     doi: 10.1186/1471-2105-10-245

BACKGROUND:Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of "3D knowledge" can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community.

• Novel approach for efficient pharmacophore-based virtual screening: method and applications.
Dror, Oranit and Schneidman-Duhovny, Dina and Inbar, Yuval and Nussinov, Ruth and Wolfson, Haim J
Journal of chemical information and modeling, 2009, 49(10), 2333-2343
PMID: 19803502     doi: 10.1021/ci900263d

Virtual screening is emerging as a productive and cost-effective technology in rational drug design for the identification of novel lead compounds. An important model for virtual screening is the pharmacophore. Pharmacophore is the spatial configuration of essential features that enable a ligand molecule to interact with a specific target receptor. In the absence of a known receptor structure, a pharmacophore can be identified from a set of ligands that have been observed to interact with the target receptor. Here, we present a novel computational method for pharmacophore detection and virtual screening. The pharmacophore detection module is able to (i) align multiple flexible ligands in a deterministic manner without exhaustive enumeration of the conformational space, (ii) detect subsets of input ligands that may bind to different binding sites or have different binding modes, (iii) address cases where the input ligands have different affinities by defining weighted pharmacophores based on the number of ligands that share them, and (iv) automatically select the most appropriate pharmacophore candidates for virtual screening. The algorithm is highly efficient, allowing a fast exploration of the chemical space by virtual screening of huge compound databases. The performance of PharmaGist was successfully evaluated on a commonly used data set of G-Protein Coupled Receptor alpha1A. Additionally, a large-scale evaluation using the DUD (directory of useful decoys) data set was performed. DUD contains 2950 active ligands for 40 different receptors, with 36 decoy compounds for each active ligand. PharmaGist enrichment rates are comparable with other state-of-the-art tools for virtual screening.

• Ultrafast shape recognition: evaluating a new ligand-based virtual screening technology.
Ballester, Pedro J and Finn, Paul W and Richards, W Graham
Journal of molecular graphics & modelling, 2009, 27(7), 836-845
PMID: 19188082     doi: 10.1016/j.jmgm.2009.01.001

Large scale database searching to identify molecules that share a common biological activity for a target of interest is widely used in drug discovery. Such an endeavour requires the availability of a method encoding molecular properties that are indicative of biological activity and at least one active molecule to be used as a template. Molecular shape has been shown to be an important indicator of biological activity; however, currently used methods are relatively slow, so faster and more reliable methods are highly desirable. Recently, a new non-superposition based method for molecular shape comparison, called Ultrafast Shape Recognition (USR), has been devised with computational performance at least three orders of magnitude faster than previously existing methods. In this study, we investigate the performance of USR in retrieving biologically active compounds through retrospective Virtual Screening experiments. Results show that USR performs better on average than a commercially available shape similarity method, while screening conformers at a rate that is more than 2500 times faster. This outstanding computational performance is particularly useful for searching much larger portions of chemical space than previously possible, which makes USR a very valuable new tool in the search for new lead molecules for drug discovery programs.

## 2008

• Ligand-target interaction-based weighting of substructures for virtual screening.
Crisman, Thomas J and Sisay, Mihiret T. and Bajorath, Jürgen
Journal of chemical information and modeling, 2008, 48(10), 1955-1964
PMID: 18821751     doi: 10.1021/ci800229q

A methodology is introduced to assign energy-based scores to two-dimensional (2D) structural features based on three-dimensional (3D) ligand-target interaction information and utilize interaction-annotated features in virtual screening. Database molecules containing such fragments are assigned cumulative scores that serve as a measure of similarity to active reference compounds. The Interaction Annotated Structural Features (IASF) method is applied to mine five high-throughput screening (HTS) data sets and often identifies more hits than conventional fragment-based similarity searching or ligand-protein docking.

• FieldChopper, a new tool for automatic model generation and virtual screening based on molecular fields.
Kalliokoski, Tuomo and Ronkko, Toni and Poso, Antti
Journal of chemical information and modeling, 2008, 48(6), 1131-1137
PMID: 18489083     doi: 10.1021/ci700216u

Algorithms were developed for ligand-based virtual screening of molecular databases. FieldChopper (FC) is based on the discretization of the electrostatic and van der Waals field into three classes. A model is built from a set of superimposed active molecules. The similarity of the compounds in the database to the model is then calculated using matrices that define scores for comparing field values of different categories. The method was validated using 12 publicly available data sets by comparing the method to the electrostatic similarity comparison program EON. The results suggest that FC is competitive with more complex descriptors and could be used as a molecular sieve in virtual screening experiments when multiple active ligands are known.

• LASSO-ligand activity by surface similarity order: a new tool for ligand based virtual screening
Reid, Darryl and Sadjad, Bashir S and Zsoldos, Zsolt and Simon, Aniko
Journal of computer-aided molecular design, 2008, 22, 479-487
PMID: 18204980     doi: 10.1007/s10822-007-9164-5

Virtual Ligand Screening (VLS) has become an integral part of the drug discovery process for many pharmaceutical companies. Ligand similarity searches provide a very powerful method of screening large databases of ligands to identify possible hits. If these hits belong to new chemotypes the method is deemed even more successful. eHiTS LASSO uses a new interacting surface point types (ISPT) molecular descriptor that is generated from the 3D structure of the ligand, but unlike most 3D descriptors it is conformation independent. Combined with a neural network machine learning technique, LASSO screens molecular databases at an ultra fast speed of 1 million structures in under 1 min on a standard PC. The results obtained from eHiTS LASSO trained on relatively small training sets of just 2, 4 or 8 actives are presented using the diverse directory of useful decoys (DUD) dataset. It is shown that over a wide range of receptor families, eHiTS LASSO is consistently able to enrich screened databases and provides scaffold hopping ability.

• FieldScreen: virtual screening using molecular fields. Application to the DUD data set
Cheeseright, TJ and Mackey, MD
Journal of chemical\ldots}, 2008, 48(11), 2108-2117

FieldScreen, a ligand-based Virtual Screening (VS) method, is described. Its use of 3D molecular fields makes it particularly suitable for scaffold hopping, and we have rigorously validated it for this purpose using a clustered version of the Directory of Useful Decoys (DUD). Using thirteen pharmaceutically relevant targets, we demonstrate that FieldScreen produces superior early chemotype enrichments, compared to DOCK. Additionally, hits retrieved by FieldScreen are consistently lower in molecular weight than those retrieved by docking. Where no X-ray protein structures are available, FieldScreen searches are more robust than docking into homology models or apo structures.

## 2007

• MED-SuMoLig: a new ligand-based screening tool for efficient scaffold hopping.
Sperandio, Olivier and Andrieu, Olivier and Miteva, Maria A. and Vo, Minh-Quang and Souaille, Marc and Delfaud, Francois and Villoutreix, Bruno O.
Journal of chemical information and modeling, 2007, 47(3), 1097-1110
PMID: 17477521     doi: 10.1021/ci700031v

The identification of small molecules with selective bioactivity, whether intended as potential therapeutics or as tools for experimental research, is central to progress in medicine and in the life sciences. To facilitate such study, we have developed a ligand-based program well-suited for effective screening of large compound collections. This package, MED-SuMoLig, combines a SMARTS-driven substructure search aiming at 3D pharmacophore profiling and computation of the local atomic density of the compared molecules. The screening utility was then investigated using 52 diverse active molecules (against CDK2, Factor Xa, HIV-1 protease, neuraminidase, ribonuclease A, and thymidine kinase) merged to a library of about 40,000 putative inactive (druglike) compounds. In all cases, the program recovered more than half of the actives in the top 3% of the screened library. We also compared the performance of MED-SuMoLig with that of ChemMine or of ROCS and found that MED-SuMoLig outperformed both methods for CDK2 and Factor Xa in terms of enrichment rates or performed equally well for the other targets.

• QUASI: a novel method for simultaneous superposition of multiple flexible ligands and virtual screening using partial similarity.
Todorov, Nikolay P and Alberts, Ian L and de Esch, Iwan J P and Dean, Philip M
Journal of chemical information and modeling, 2007, 47(3), 1007-1020
PMID: 17497844     doi: 10.1021/ci6003338

The structure of many receptors is unknown, and only information about diverse ligands binding to them is available. A new method is presented for the superposition of such ligands, derivation of putative receptor site models and utilization of the models for screening of compound databases. In order to generate a receptor model, the similarity of all ligands is optimized simultaneously taking into account conformational flexibility and also the possibility that the ligands can bind to different regions of the site and only partially overlap. Ligand similarity is defined with respect to a receptor site model serving as a common reference frame. The receptor model is dynamic and coevolves with the ligand alignment until an optimal self-consistent superposition is achieved. When ligand conformational flexibility is permitted, different superposition models are possible and consistent with the data. Clustering of the superposition solutions is used to obtain diverse models. When the models are used to screen a database of compounds, high enrichments are obtained, comparable to those obtained in docking studies.

• Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem.
Truchon, Jean-Francois and Bayly, Christopher I
Journal of chemical information and modeling, 2007, 47(2), 488-508
PMID: 17288412     doi: 10.1021/ci600426e

Many metrics are currently used to evaluate the performance of ranking methods in virtual screening (VS), for instance, the area under the receiver operating characteristic curve (ROC), the area under the accumulation curve (AUAC), the average rank of actives, the enrichment factor (EF), and the robust initial enhancement (RIE) proposed by Sheridan et al. In this work, we show that the ROC, the AUAC, and the average rank metrics have the same inappropriate behaviors that make them poor metrics for comparing VS methods whose purpose is to rank actives early in an ordered list (the "early recognition problem"). In doing so, we derive mathematical formulas that relate those metrics together. Moreover, we show that the EF metric is not sensitive to ranking performance before and after the cutoff. Instead, we formally generalize the ROC metric to the early recognition problem which leads us to propose a novel metric called the Boltzmann-enhanced discrimination of receiver operating characteristic that turns out to contain the discrimination power of the RIE metric but incorporates the statistical significance from ROC and its well-behaved boundaries. Finally, two major sources of errors, namely, the statistical error and the "saturation effects", are examined. This leads to practical recommendations for the number of actives, the number of inactives, and the "early recognition" importance parameter that one should use when comparing ranking methods. Although this work is applied specifically to VS, it is general and can be used to analyze any method that needs to segregate actives toward the front of a rank-ordered list.

• Molecular field technology applied to virtual screening and finding the bioactive conformation
Cheeseright, Tim and PhD, Mark Mackey and PhD, Sally Rose and PhD, Andy Vinter
www.expertopin.com/edc, 2007, 2(1), 131-144

Virtual screening is being applied to reduce the high-throughput screening bottleneck in many pharmaceutical companies and to reduce compound wastage. Cresset's ligand-based virtual screening technology using molecular fields can facilitate rapid identification of novel chemotypes from biologically testing only 200 - 1000 compounds. Four molecular fields calculated using the interaction of different probe atoms with the ligand are sufficient to describe how a ligand binds to its protein. Compounds with similar fields to known active ligands are predicted to have a high probability of showing similar activity. As binding is related to field similarity, this property has been exploited further to predict the bioactive conformation of small sets of structurally diverse active ligands starting from the two-dimensional structures alone without knowledge of the target site structure.

## 2006

• Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring.
Zhang, Qiang and Muegge, Ingo
Journal of medicinal chemistry, 2006, 49(5), 1536-1548
PMID: 16509572     doi: 10.1021/jm050468i

The ability to find novel bioactive scaffolds in compound similarity-based virtual screening experiments has been studied comparing Tanimoto-based, ranking-based, voting, and consensus scoring protocols. Ligand sets for seven well-known drug targets (CDK2, COX2, estrogen receptor, neuraminidase, HIV-1 protease, p38 MAP kinase, thrombin) have been assembled such that each ligand represents its own unique chemotype, thus ensuring that each similarity recognition event between ligands constitutes a scaffold hopping event. In a series of virtual screening studies involving 9969 MDDR compounds as negative controls it has been found that atom pair descriptors and 3D pharmacophore fingerprints combined with ranking, voting, and consensus scoring strategies perform well in finding novel bioactive scaffolds. In addition, often superior performance has been observed for similarity-based virtual screening compared to structure-based methods. This finding suggests that information about a target obtained from known bioactive ligands is as valuable as knowledge of the target structures for identifying novel bioactive scaffolds through virtual screening.

• Scaffold-hopping potential of ligand-based similarity concepts.
Renner, Steffen and Schneider, Gisbert
Chemmedchem, 2006, 1(2), 181-185
PMID: 16892349     doi: 10.1002/cmdc.200500005

• Novel 2D fingerprints for ligand-based virtual screening.
Ewing, Todd and Baber, J Christian and Feher, Miklos
Journal of chemical information and modeling, 2006, 46(6), 2423-2431
PMID: 17125184     doi: 10.1021/ci060155b

This paper describes the development of a set of new 2D fingerprints for the purposes of virtual screening in a pharmaceutical environment. The new fingerprints are based on established ones: the changes in their design included the introduction of overlapping pharmacophore feature types, feature counts for pharmacophore and structural fingerprints, as well as changes in the resolution in property description for property fingerprints. The effects of each of these changes on virtual screening performance were monitored using two types of training sets, emulating different stages in the drug discovery process. The results demonstrate that these changes all lead to an improvement in virtual screening performance.

• Similarity Based Virtual Screening: A Tool for Targeted Library Design
Alvesalo, Joni K O and Siiskonen, Antti and Vainio, Mikko J and Tammela, Päivi S M and Vuorela, Pia M
Journal of medicinal chemistry, 2006, 49(7), 2353-2356
doi: 10.1021/jm051209w

... to create a comparative model for the C. pneumoniae target protein, since we wanted to test whether high level of similarity in the ... active yet structurally different molecules as we did, unless the ligands act on the same target protein in the cell- based screening assay. ... Docking . ...

• Similarity Based Virtual Screening: A Tool for Targeted Library Design
Alvesalo, Joni K O and Siiskonen, Antti and Vainio, Mikko J and Tammela, Päivi S M and Vuorela, Pia M
Journal of medicinal chemistry, 2006, 49(7), 2353-2356
doi: 10.1021/jm051209w

... to create a comparative model for the C. pneumoniae target protein, since we wanted to test whether high level of similarity in the ... active yet structurally different molecules as we did, unless the ligands act on the same target protein in the cell- based screening assay. ... Docking . ...

## 2004

• Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures.
Hert, Jérôme and Willett, Peter and Wilton, David J and Acklin, Pierre and Azzaoui, Kamal and Jacoby, Edgar and Schuffenhauer, Ansgar
Organic & biomolecular chemistry, 2004, 2(22), 3256-3266
PMID: 15534703     doi: 10.1039/B409865J

This paper reports a detailed comparison of a range of different types of 2D fingerprints when used for similarity-based virtual screening with multiple reference structures. Experiments with the MDL Drug Data Report database demonstrate the effectiveness of fingerprints that encode circular substructure descriptors generated using the Morgan algorithm. These fingerprints are notably more effective than fingerprints based on a fragment dictionary, on hashing and on topological pharmacophores. The combination of these fingerprints with data fusion based on similarity scores provides both an effective and an efficient approach to virtual screening in lead-discovery programmes.

• Fuzzy pharmacophore models from molecular alignments for correlation-vector-based virtual screening.
Renner, Steffen and Schneider, Gisbert
Journal of medicinal chemistry, 2004, 47(19), 4653-4664
PMID: 15341481     doi: 10.1021/jm031139y

A pharmacophore-based approach for compiling focused screening libraries is presented. It integrates information from three-dimensional molecular alignments into correlation vector-based database screening. The pharmacophore model is represented by a number of spheres of Gaussian-distributed feature densities. Different degrees of "fuzziness" can be introduced to influence the model's resolution. Transformation of this pharmacophore representation into a correlation vector results in a vector of feature probabilities which can be utilized for rapid virtual screening of compound databases or virtual libraries. The approach was validated by retrospective screening for cyclooxygenase 2 (COX-2) and thrombin ligands. A variety of models with different degrees of fuzziness were calculated and tested for both classes of molecules. Best performance was obtained with pharmacophore models reflecting an intermediate degree of fuzziness, yielding an enrichment factor of up to 39 for the first 1% of the ranked database. Appropriately weighted fuzzy pharmacophore models performed better in retrospective screening than similarity searching using only a single query molecule. The new pharmacophore method was shown to complement existing approaches.

## 2003

• Automated generation of MCSS-derived pharmacophoric DOCK site points for searching multiconformation databases.
Joseph-McCarthy, Diane and Alvarez, Juan C
Proteins, 2003, 51(2), 189-202
PMID: 12660988     doi: 10.1002/prot.10296

All docking methods employ some sort of heuristic to orient the ligand molecules into the binding site of the target structure. An automated method, MCSS2SPTS, for generating chemically labeled site points for docking is presented. MCSS2SPTS employs the program Multiple Copy Simultaneous Search (MCSS) to determine target-based theoretical pharmacophores. More specifically, chemically labeled site points are automatically extracted from selected low-energy functional-group minima and clustered together. These pharmacophoric site points can then be directly matched to the pharmacophoric features of database molecules with the use of either DOCK or PhDOCK to place the small molecules into the binding site. Several examples of the ability of MCSS2SPTS to reproduce the three-dimensional pharmacophoric features of ligands from known ligand-protein complex structures are discussed. In addition, a site-point set calculated for one human immunodeficiency virus 1 (HIV1) protease structure is used with PhDOCK to dock a set of HIV1 protease ligands; the docked poses are compared to the corresponding complex structures of the ligands. Finally, the use of an MCSS2SPTS-derived site-point set for acyl carrier protein synthase is compared to the use of atomic positions from a bound ligand as site points for a large-scale DOCK search. In general, MCSS2SPTS-generated site points focus the search on the more relevant areas and thereby allow for more effective sampling of the target site.