# Bibliography of computer-aided Drug Design

Updated on 7/18/2014. Currently 2130 references

## Ligand design / Methodology

2014 / 2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2007 / 2006 / 2005 / 2004 / 2003 / 2002 / 2001 / 2000 / 1997 / 1996 / 1995 / 1994 / 1992 /

## 2014

• Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus.
Reker, Daniel and Rodrigues, Tiago and Schneider, Petra and Schneider, Gisbert
PNAS, 2014, 111(11), 4067-4072
PMID: 24591595     doi: 10.1073/pnas.1320001111

De novo molecular design and in silico prediction of polypharmacological profiles are emerging research topics that will profoundly affect the future of drug discovery and chemical biology. The goal is to identify the macromolecular targets of new chemical agents. Although several computational tools for predicting such targets are publicly available, none of these methods was explicitly designed to predict target engagement by de novo-designed molecules. Here we present the development and practical application of a unique technique, self-organizing map-based prediction of drug equivalence relationships (SPiDER), that merges the concepts of self-organizing maps, consensus scoring, and statistical analysis to successfully identify targets for both known drugs and computer-generated molecular scaffolds. We discovered a potential off-target liability of fenofibrate-related compounds, and in a comprehensive prospective application, we identified a multitarget-modulating profile of de novo designed molecules. These results demonstrate that SPiDER may be used to identify innovative compounds in chemical biology and in the early stages of drug discovery, and help investigate the potential side effects of drugs and their repurposing options.

## 2013

• Protein pocket and ligand shape comparison and its application in virtual screening
Wirth, Matthias and Volkamer, Andrea and Zoete, Vincent and Rippmann, Friedrich and Michielin, Olivier and Rarey, Matthias and Sauer, WolfgangH B
Journal of computer-aided molecular design, 2013, 27(6), 511-524
PMID: 23807262     doi: 10.1007/s10822-013-9659-1

Understanding molecular recognition is one major requirement for drug discovery and design. Physicochemical and shape complementarity between two binding partners is the driving force during complex formation. In this study, the impact of shape within this process is analyzed. Protein binding pockets and co-crystallized ligands are represented by normalized principal moments of inertia ratios (NPRs). The corresponding descriptor space is triangular, with its corners occupied by spherical, discoid, and elongated shapes. An analysis of a selected set of sc-PDB complexes suggests that pockets and bound ligands avoid spherical shapes, which are, however, prevalent in small unoccupied pockets. Furthermore, a direct shape comparison confirms previous studies that on average only one third of a pocket is filled by its bound ligand, supplemented by a 50 % subpocket coverage. In this study, we found that shape complementary is expressed by low pairwise shape distances in NPR space, short distances between the centers-of-mass, and small deviations in the angle between the first principal ellipsoid axes. Furthermore, it is assessed how different binding pocket parameters are related to bioactivity and binding efficiency of the co-crystallized ligand. In addition, the performance of different shape and size parameters of pockets and ligands is evaluated in a virtual screening scenario performed on four representative targets.

• In Silico Fragment-Based Drug Discovery: Setup and Validation of a Fragment-to-Lead Computational Protocol Using S4MPLE
Hoffer, Laurent and Renaud, Jean-Paul and Horvath, Dragos
Journal of chemical information and modeling, 2013, 53(4), 836-851
PMID: 23537132

• Water PMF for predicting the properties of water molecules in protein binding site
Zheng, Mingyue and Li, Yanlian and Xiong, Bing and Jiang, Hualiang and Shen, Jingkang
Journal of computational chemistry, 2013, 34(7), 583-592
PMID: 23114863     doi: 10.1002/jcc.23170

Water is an important component in living systems and deserves better understanding in chemistry and biology. However, due to the difficulty of investigating the water functions in protein structures, it is usually ignored in computational modeling, especially in the field of computer-aided drug design. Here, using the potential of mean forces (PMFs) approach, we constructed a water PMF (wPMF) based on 3946 non-redundant high resolution crystal structures. The extracted wPMF potential was first used to investigate the structure pattern of water and analyze the residue hydrophilicity. Then, the relationship between wPMF score and the B factor value of crystal waters was studied. It was found that wPMF agrees well with some previously reported experimental observations. In addition, the wPMF score was also tested in parallel with 3D-RISM to measure the ability of retrieving experimentally observed waters, and showed comparable performance but with much less computational cost. In the end, we proposed a grid-based clustering scheme together with a distance weighted wPMF score to further extend wPMF to predict the potential hydration sites of protein structure. From the test, this approach can predict the hydration site at the accuracy about 80% when the calculated score lower than -4.0. It also allows the assessment of whether or not a given water molecule should be targeted for displacement in ligand design. Overall, the wPMF presented here provides an optional solution to many water related computational modeling problems, some of which can be highly valuable as part of a rational drug design strategy.

• Automated Ligand- and Structure-Based Protocol for in Silico Prediction of Human Serum Albumin Binding
Hall, Michelle Lynn and Jorgensen, William L and Whitehead, Lewis
Journal of chemical information and modeling, 2013, 53(4), 907-922
PMID: 23472823

Plasma protein binding has a profound impact on the pharmacokinetic and pharmacodynamic properties of many drug candidates and is thus an integral component of drug discovery. Nevertheless, extant methods to examine small-molecule interactions with plasma protein have various limitations, thus creating a need for alternative methods. Herein we present a comprehensive and cross-validated in silico workflow for the prediction of small-molecule binding to Human Serum Albumin (HSA), the most ubiquitous plasma protein. This protocol reliably predicts small-molecule interactions with HSA, including a binding affinity calculation using multiple linear regression methods, binding site prediction using a naive-Bayes classifier, and a three-dimensional binding pose using induced fit docking. Furthermore, this workflow is implemented in a portable and automated format that can be downloaded and used by other end users, either as is or with customization.

• Chemoisosterism in the Proteome
Jalencas, Xavier and Mestres, Jordi
Journal of chemical information and modeling, 2013, 53(2), 279-292
PMID: 23312010

The concept of chemoisosterism of protein environments is introduced as the complementary property to bioisosterism of chemical fragments. In the same way that two chemical fragments are considered bioisosteric if they can bind to the same protein environment, two protein environments will be considered chemoisosteric if they can interact with the same chemical fragment. The basis for the identification of chemoisosteric relationships among protein environments was the increasing amount of crystal structures available currently for protein-ligand complexes. It is shown that one can recover the right location and orientation of chemical fragments constituting the native ligand in a nuclear receptor structure by using only chemoisosteric environments present in enzyme structures. Examples of the potential applicability of chemoisosterism in fragment-based drug discovery are provided.

• Systematic Identification of Scaffolds Representing Compounds Active against Individual Targets and Single or Multiple Target Families
Hu, Ye and Bajorath, Jürgen
Journal of chemical information and modeling, 2013, 53(2), 312-326
PMID: 23339619

Given the enormous growth of compound activity data we currently observe, we have revisited the previously introduced concepts of privileged substructures and community-selective scaffolds and systematically searched for molecular scaffolds representing compounds active against single targets, multiple targets belonging to the same target family, or targets belonging to different families. The influence of different types of activity measurements on scaffold assignments has been determined. Furthermore, scaffold assignments have also been carried out after applying a potency threshold to exclude weakly active compounds from the comparison and address the issue of molecular selectivity. In both instances, the results were very similar indicating that single-target and single-family scaffolds display target- and family-selective tendencies, respectively. Unexpectedly large numbers of 630 unique single-target, 489 single-family, and 336 multi-family scaffolds have been identified in public domain compound data that represented relatively large numbers of compounds. Other important findings are that most of the growth in high-confidence compound activity data has been due to the evaluation of new compounds, rather than additional measurements for previously tested compounds or analog series for previously explored scaffolds. The majority of scaffolds have remained in the same category over time. Activity measurement type-dependent sets of single-target, single-family, and multi-family scaffolds are also provided as an up-to-date scaffold knowledge base.

• Predicting Potent Compounds via Model-Based Global Optimization
Ahmadi, Mohsen and Vogt, Martin and Iyer, Preeti and Bajorath, Jürgen and Fröhlich, Holger
Journal of chemical information and modeling, 2013, 53(3), 553-559
PMID: 23363236

Finding potent compounds for a given target in silico can be viewed as a constraint global optimization problem. This requires the use of an optimization function for which evaluations might be costly. The major task is maximizing the function while minimizing the number of evaluation steps. To solve this problem, we propose a machine learning algorithm, which first builds a statistical QSAR-model of the SAR landscape and then uses the model to identify regions in compound space having a high probability to contain a highly potent compound. For this purpose, we devise the so-called expected potency improvement (EI) criterion to rank candidate compounds with respect to their likelihood to exhibit higher potency than the most active compound in the training data. Therefore, this approach significantly differs from a purely prediction-oriented classical QSAR model. The method is superior to a nearest neighbor approach as significantly fewer evaluation steps are needed to identify the most potent compound for the given target.

• Scaffold hopping by fragment replacement.
Vainio, Mikko J and Kogej, Thierry and Raubacher, Florian and Sadowski, Jens
Journal of chemical information and modeling, 2013, 53(7), 1825-1835
PMID: 23826858     doi: 10.1021/ci4001019

This work describes a data driven method for scaffold hopping by fragment replacement. A search database of scaffolds is created by cutting bonds of existing compounds in a combinatorial fashion. Three-dimensional structures of the scaffolds are then generated and made searchable based on the relative orientation of the broken bonds using an auxiliary index file. The retrieved scaffolds are ranked using volume overlap and electrostatic similarity scores. A similar approach has been used before in the program CAVEAT and others. The present work introduces a novel indexing scheme for the attachment vector geometry, which allows for fast searching. A scaffold shape descriptor is defined, which allows for queries with a single attachment vector (R-groups) and improves the shape similarity between the query and the suggested replacement fragments. The program, called Scaffold Hopping, is shown to retrieve relevant bioisosteric replacement scaffolds for a set of example queries in a reasonable time frame, making the program suitable to be used in drug design work.

• SwissBioisostere: a database of molecular replacements for ligand design.
Wirth, Matthias and Zoete, Vincent and Michielin, Olivier and Sauer, Wolfgang H B
Nucleic acids research, 2013, 41(D1), D1137-43
PMID: 23161688     doi: 10.1093/nar/gks1059

The SwissBioisostere database (http://www.swissbioisostere.ch) contains information on molecular replacements and their performance in biochemical assays. It is meant to provide researchers in drug discovery projects with ideas for bioisosteric modifications of their current lead molecule, as well as to give interested scientists access to the details on particular molecular replacements. As of August 2012, the database contains 21 293 355 datapoints corresponding to 5 586 462 unique replacements that have been measured in 35 039 assays against 1948 molecular targets representing 30 target classes. The accessible data were created through detection of matched molecular pairs and mining bioactivity data in the ChEMBL database. The SwissBioisostere database is hosted by the Swiss Institute of Bioinformatics and available via a web-based interface.

## 2012

• Application of Drug-perturbed Essential Dynamics/Molecular Dynamics (ED/MD) to Virtual Screening and Rational Drug Design.
Chaudhuri, Rima and Carrillo, Oliver and Laughton, Charles Anthony and Orozco, Modesto
Journal of chemical theory and computation, 2012, 8(7), 2204-2214
doi: 10.1021/ct300223c

We present here the first application of a new algorithm, essential dynamics/molecular dynamics (ED/MD), to the field of small molecule docking. The method uses a previously existing molecular dynamics (MD) ensemble of a protein or protein-drug complex to generate, with a very small computational cost, perturbed ensembles which represent ligand-induced binding site flexibility in a more accurate way than the original trajectory. The use of these perturbed ensembles in a standard docking program leads to superior performance than the same docking procedure using the crystal structure or ensembles obtained from conventional MD simulations as templates. The simplicity and accuracy of the method opens up the possibility of introducing protein flexibility in high-throughput docking experiments.

• R-group template CoMFA combines benefits of "ad hoc" and topomer alignments using 3D-QSAR for lead optimization.
Cramer, Richard D
Journal of computer-aided molecular design, 2012, 26(7), 805-819
PMID: 22661224     doi: 10.1007/s10822-012-9583-9

Template CoMFA methodologies extend topomer CoMFA by allowing user-designated templates, for example the experimental receptor-bound conformation of a prototypical ligand, to help determine the alignment of training and test set structures for 3D-QSAR. The algorithms that generate its new structural modality, template-constrained topomers, are described. Template CoMFA's resolution of certain topomer CoMFA concerns, by providing user control of topological consistency and structural acceptability, is demonstrated for sixteen 3D-QSAR training sets, in particular the Selwood dataset.

• Introducing Drugster: a comprehensive and fully integrated drug design, lead and structure optimization toolkit
Vlachakis, D and Tsagrasoulis, D and Megalooikonomou, V and Kossida, S
Bioinformatics (Oxford, England), 2012, 29(1), 126-128
PMID: 23104887     doi: 10.1093/bioinformatics/bts637

SUMMARY: Drugster is a fully interactive pipeline designed to break the command line barrier and introduce a new user-friendly environment to perform drug design, lead and structure optimization experiments through an efficient combination of the PDB2PQR, Ligbuilder, Gromacs and Dock suites. Our platform features a novel workflow that guides the user through each logical step of the iterative 3D structural optimization setup and drug design process, by providing a seamless interface to all incorporated packages. AVAILABILITY: Drugster can be freely downloaded via our dedicated server system at http://www.bioacademy.gr/bioinformatics/drugster/. CONTACT: For support, comments and bug reports please contact: dvlachakis@bioacademy.gr.

• Automated design of ligands to polypharmacological profiles.
Besnard, Jérémy and Ruda, Gian Filippo and Setola, Vincent and Abecassis, Keren and Rodriguiz, Ramona M and Huang, Xi-Ping and Norval, Suzanne and Sassano, Maria F and Shin, Antony I and Webster, Lauren A and Simeons, Frederick R C and Stojanovski, Laste and Prat, Annik and Seidah, Nabil G and Constam, Daniel B and Bickerton, G Richard and Read, Kevin D and Wetsel, William C and Gilbert, Ian H and Roth, Bryan L and Hopkins, Andrew L
Nature\ldots}, 2012, 492(7428), 215-220
PMID: 23235874     doi: 10.1038/nature11691

The clinical efficacy and safety of a drug is determined by its activity profile across many proteins in the proteome. However, designing drugs with a specific multi-target profile is both complex and difficult. Therefore methods to design drugs rationally a priori against profiles of several proteins would have immense value in drug discovery. Here we describe a new approach for the automated design of ligands against profiles of multiple drug targets. The method is demonstrated by the evolution of an approved acetylcholinesterase inhibitor drug into brain-penetrable ligands with either specific polypharmacology or exquisite selectivity profiles for G-protein-coupled receptors. Overall, 800 ligand-target predictions of prospectively designed ligands were tested experimentally, of which 75% were confirmed to be correct. We also demonstrate target engagement in vivo. The approach can be a useful source of drug leads when multi-target profiles are required to achieve either selectivity over other drug targets or a desired polypharmacology.

• AMMOS software: method and application.
Pencheva, T and Lagorce, D and Pajeva, I and Villoutreix, B O and Miteva, M A
Methods in molecular biology (Clifton, N.J.), 2012, 819, 127-141
PMID: 22183534     doi: 10.1007/978-1-61779-465-0_9

Recent advances in computational sciences enabled extensive use of in silico methods in projects at the interface between chemistry and biology. Among them virtual ligand screening, a modern set of approaches, facilitates hit identification and lead optimization in drug discovery programs. Most of these approaches require the preparation of the libraries containing small organic molecules to be screened or a refinement of the virtual screening results. Here we present an overview of the open source AMMOS software, which is a platform performing an automatic procedure that allows for a structural generation and optimization of drug-like molecules in compound collections, as well as a structural refinement of protein-ligand complexes to assist in silico screening exercises.

• LigMerge: A Fast Algorithm to Generate Models of Novel Potential Ligands from Sets of Known Binders.
Lindert, Steffen and Durrant, Jacob D and McCammon, J Andrew
Chemical biology & drug design, 2012, 80(3), 358-365
PMID: 22594624     doi: 10.1111/j.1747-0285.2012.01414.x

One common practice in drug discovery is to optimize known or suspected ligands in order to improve binding affinity. In performing these optimizations, it is useful to look at as many known inhibitors as possible for guidance. Medicinal chemists often seek to improve potency by altering certain chemical moieties of known/endogenous ligands while retaining those critical for binding. To our knowledge, no automated, ligand-based algorithm exists for systematically "swapping" the chemical moieties of known ligands in order to generate novel ligands with potentially improved potency. To address this need, we have created a novel algorithm called "LigMerge". LigMerge identifies the maximum (largest) common substructure of two three-dimensional ligand models, superimposes these two substructures, and then systematically mixes and matches the distinct fragments attached to the common substructure at each common atom, thereby generating multiple compound models related to the known inhibitors that can be evaluated using computer docking prior to synthesis and experimental testing. To demonstrate the utility of LigMerge, we identify compounds predicted to inhibit peroxisome proliferator-activated receptor gamma, HIV reverse transcriptase, and dihydrofolate reductase with affinities higher than those of known ligands. We are hopeful that LigMerge will be a helpful tool for the drug-design community.

• IADE: a system for intelligent automatic design of bioisosteric analogs.
Ertl, Peter and Lewis, Richard
Journal of computer-aided molecular design, 2012, 26(11), 1207-1215
PMID: 23053736     doi: 10.1007/s10822-012-9609-3

IADE, a software system supporting molecular modellers through the automatic design of non-classical bioisosteric analogs, scaffold hopping and fragment growing, is presented. The program combines sophisticated cheminformatics functionalities for constructing novel analogs and filtering them based on their drug-likeness and synthetic accessibility using automatic structure-based design capabilities: the best candidates are selected according to their similarity to the template ligand and to their interactions with the protein binding site. IADE works in an iterative manner, improving the fitness of designed molecules in every generation until structures with optimal properties are identified. The program frees molecular modellers from routine, repetitive tasks, allowing them to focus on analysis and evaluation of the automatically designed analogs, considerably enhancing their work efficiency as well as the area of chemical space that can be covered. The performance of IADE is illustrated through a case study of the design of a nonclassical bioisosteric analog of a farnesyltransferase inhibitor-an analog that has won a recent "Design a Molecule" competition.

• Searching for substructures in fragment spaces.
Ehrlich, Hans-Christian and Volkamer, Andrea and Rarey, Matthias
Journal of chemical information and modeling, 2012, 52(12), 3181-3189
PMID: 23205736     doi: 10.1021/ci300283a

A common task in drug development is the selection of compounds fulfilling specific structural features from a large data pool. While several methods that iteratively search through such data sets exist, their application is limited compared to the infinite character of molecular space. The introduction of the concept of fragment spaces (FSs), which are composed of molecular fragments and their connection rules, made the representation of large combinatorial data sets feasible. At the same time, search algorithms face the problem of structural features spanning over multiple fragments. Due to the combinatorial nature of FSs, an enumeration of all products is impossible. In order to overcome these time and storage issues, we present a method that is able to find substructures in FSs without explicit product enumeration. This is accomplished by splitting substructures into subsubstructures and mapping them onto fragments with respect to fragment connectivity rules. The method has been evaluated on three different drug discovery scenarios considering the exploration of a molecule class, the elaboration of decoration patterns for a molecular core, and the exhaustive query for peptides in FSs. FSs can be searched in seconds, and found products contain novel compounds not present in the PubChem database which may serve as hints for new lead structures.

• DOGS: reaction-driven de novo design of bioactive compounds.
Hartenfeller, Markus and Zettl, Heiko and Walter, Miriam and Rupp, Matthias and Reisen, Felix and Proschak, Ewgenij and Weggen, Sascha and Stark, Holger and Schneider, Gisbert
PLoS computational biology, 2012, 8(2), e1002380
PMID: 22359493     doi: 10.1371/journal.pcbi.1002380

We present a computational method for the reaction-based de novo design of drug-like molecules. The software DOGS (Design of Genuine Structures) features a ligand-based strategy for automated 'in silico' assembly of potentially novel bioactive compounds. The quality of the designed compounds is assessed by a graph kernel method measuring their similarity to known bioactive reference ligands in terms of structural and pharmacophoric features. We implemented a deterministic compound construction procedure that explicitly considers compound synthesizability, based on a compilation of 25'144 readily available synthetic building blocks and 58 established reaction principles. This enables the software to suggest a synthesis route for each designed compound. Two prospective case studies are presented together with details on the algorithm and its implementation. De novo designed ligand candidates for the human histamine H₄ receptor and $\gamma$-secretase were synthesized as suggested by the software. The computational approach proved to be suitable for scaffold-hopping from known ligands to novel chemotypes, and for generating bioactive molecules with drug-like properties.

## 2011

• DockoMatic: automated peptide analog creation for high throughput virtual screening.
Jacob, Reed B. and Bullock, Casey W and Andersen, Tim and McDougal, Owen M.
Journal of computational chemistry, 2011, 32(13), 2936-2941
PMID: 21717479     doi: 10.1002/jcc.21864

The purpose of this manuscript is threefold: (1) to describe an update to DockoMatic that allows the user to generate cyclic peptide analog structure files based on protein database (pdb) files, (2) to test the accuracy of the peptide analog structure generation utility, and (3) to evaluate the high throughput capacity of DockoMatic. The DockoMatic graphical user interface interfaces with the software program Treepack to create user defined peptide analogs. To validate this approach, DockoMatic produced cyclic peptide analogs were tested for three-dimensional structure consistency and binding affinity against four experimentally determined peptide structure files available in the Research Collaboratory for Structural Bioinformatics database. The peptides used to evaluate this new functionality were alpha-conotoxins ImI, PnIA, and their published analogs. Peptide analogs were generated by DockoMatic and tested for their ability to bind to X-ray crystal structure models of the acetylcholine binding protein originating from Aplysia californica. The results, consisting of more than 300 simulations, demonstrate that DockoMatic predicts the binding energy of peptide structures to within 3.5 kcal mol(-1), and the orientation of bound ligand compares to within 1.8\AA} root mean square deviation for ligand structures as compared to experimental data. Evaluation of high throughput virtual screening capacity demonstrated that Dockomatic can collect, evaluate, and summarize the output of 10,000 AutoDock jobs in less than 2 hours of computational time, while 100,000 jobs requires approximately 15 hours and 1,000,000 jobs is estimated to take up to a week.

• Fragment-Based Drug Design and Drug Repositioning Using Multiple Ligand Simultaneous Docking (MLSD): Identifying Celecoxib and Template Compounds as Novel Inhibitors of Signal Transducer and Activator of Transcription 3 (STAT3).
Li, Huameng and Liu, Aiguo and Zhao, Zhenjiang and Xu, Yufang and Lin, Jiayuh and Jou, David and Li, Chenglong
Journal of medicinal chemistry, 2011, 54(15), 5592-5596
PMID: 21678971     doi: 10.1021/jm101330h

We describe a novel method of drug discovery using MLSD and drug repositioning, with cancer target STAT3 being used as a test case. Multiple drug scaffolds were simultaneously docked into hot spots of STAT3 by MLSD, followed by tethering to generate virtual template compounds. Similarity search of virtual hits on drug database identified celecoxib as a novel inhibitor of STAT3. Furthermore, we designed two novel lead inhibitors based on one of the lead templates and celecoxib.

• A collection of robust organic synthesis reactions for in silico molecule design.
Hartenfeller, Markus and Eberle, Martin and Meier, Peter and Nieto-Oberhuber, Cristina and Altmann, Karl-Heinz and Schneider, Gisbert and Jacoby, Edgar and Renner, Steffen
Journal of chemical information and modeling, 2011, 51(12), 3093-3098
PMID: 22077721     doi: 10.1021/ci200379p

A focused collection of organic synthesis reactions for computer-based molecule construction is presented. It is inspired by real-world chemistry and has been compiled in close collaboration with medicinal chemists to achieve high practical relevance. Virtual molecules assembled from existing starting material connected by these reactions are supposed to have an enhanced chance to be amenable to real chemical synthesis. About 50% of the reactions in the dataset are ring-forming reactions, which fosters the assembly of novel ring systems and innovative chemotypes. A comparison with a recent survey of the reactions used in early drug discovery revealed considerable overlaps with the collection presented here. The dataset is available encoded as computer-readable Reaction SMARTS expressions from the Supporting Information presented for this paper.

• Assessing the lipophilicity of fragments and early hits.
Mortenson, Paul N and Murray, Christopher W
Journal of computer-aided molecular design, 2011, 25(7), 663-667
PMID: 21614595     doi: 10.1007/s10822-011-9435-z

A key challenge in many drug discovery programs is to accurately assess the potential value of screening hits. This is particularly true in fragment-based drug design (FBDD), where the hits often bind relatively weakly, but are correspondingly small. Ligand efficiency (LE) considers both the potency and the size of the molecule, and enables us to estimate whether or not an initial hit is likely to be optimisable to a potent, druglike lead. While size is a key property that needs to be controlled in a small molecule drug, there are a number of additional properties that should also be considered. Lipophilicity is amongst the most important of these additional properties, and here we present a new efficiency index (LLE(AT)) that combines lipophilicity, size and potency. The index is intuitively defined, and has been designed to have the same target value and dynamic range as LE, making it easily interpretable by medicinal chemists. Monitoring both LE and LLE(AT) should help both in the selection of more promising fragment hits, and controlling molecular weight and lipophilicity during optimisation.

• CrystalDock: A Novel Approach to Fragment-Based Drug Design.
Durrant, Jacob D and Friedman, Aaron J and McCammon, J Andrew
Journal of chemical information and modeling, 2011, 51(10), 2573-25732580
PMID: 21910501     doi: 10.1021/ci200357y

We present a novel algorithm called CrystalDock that analyzes a molecular pocket of interest and identifies potential binding fragments. The program first identifies groups of pocket-lining receptor residues (i.e., microenvironments) and then searches for geometrically similar microenvironments present in publically available databases of ligand-bound experimental structures. Germane fragments from the crystallographic or NMR ligands are subsequently placed within the novel binding pocket. These positioned fragments can be linked together to produce ligands that are likely to be potent; alternatively, they can be joined to an inhibitor with a known or suspected binding pose to potentially improve binding affinity. To demonstrate the utility of the algorithm, CrystalDock is used to analyze the principal binding pockets of influenza neuraminidase and Trypanosoma brucei RNA-editing ligands 1, validated drug targets in the fight against pandemic influenza and African sleeping sickness, respectively. In both cases, CrystalDock suggests modifications to known inhibitors that may improve binding affinity.

• Computational approach to de novo discovery of fragment binding for novel protein states.
Konteatis, Zenon D. and Klon, Anthony E and Zou, Jinming and Meshkat, Siavash
Methods in enzymology, 2011, 493, 357-380
PMID: 21371598     doi: 10.1016/B978-0-12-381274-2.00014-5

In silico fragment-based drug discovery has become an integral component of the new fragment-based approach that has evolved over the past decade. Protein structure of high quality is essential in carrying out computational designs, and protein flexibility has been shown to impact prospective designs or docking experiments. Here we introduce methodology to calculate protein normal modes and protein molecular dynamics in torsion space which enable the development of multiple protein states to address the natural flexibility of proteins. We also present two fragment-based sampling methods, grand canonical Monte Carlo and systematic sampling, which are used to study protein-fragment interactions by generating fragment ensembles and we discuss the process by which these ensembles are linked to design ligands.

• Validation of the SPROUT de novo design program
Law, JMS and Fung, DYK and Zsoldos, Z and Simon, A and Szabo, Z and Csizmadia, IG and Johnson, AP
Journal of computer-aided molecular design, 2003, 25(8), 651-657
PMID: 21735261     doi: 10.1016/j.theochem.2003.08.104

The validation of SPROUT was carried out on four receptor-ligand complexes: thrombin-NAPAP, calmodulin (CAM)AAA, Ras P-21-GDP and dihydrofolate reductase (DHFR)-methotrexate (MTX). These complexes were downloaded from the Brookhaven Protein Data Bank (PDB). For the thrombin-NAPAP complex, two structures very similar to NAPAP were generated. These two structures were similar in 3D structure to NAPAP but contained an extra hexane ring. For CAM-AAA and Ras P-21-GDP, the ligands generated were essentially identical to their original ligands. For DHFR, two ligands, one most similar in 2D structure and one most similar in 3D conformation were found. The successful regeneration of the ligands for each case proves the ability and applicability of SPROUT for designing strongly binding, successful drug candidates. When the program is executed with less restricted constraints, it generates a large number of novel structures that are structurally diverse, making it an ideal tool for de novo design. (C) 2003 Elsevier B.V. All rights reserved.

• Structure-guided fragment-based in silico drug design of dengue protease inhibitors.
Knehans, Tim and Schüller, Andreas and Doan, Danny N and Nacro, Kassoum and Hill, Jeffrey and Güntert, Peter and Madhusudhan, M S and Weil, Tanja and Vasudevan, Subhash G
Journal of computer-aided molecular design, 2011, 25(3), 263-274
PMID: 21344277     doi: 10.1007/s10822-011-9418-0

An in silico fragment-based drug design approach was devised and applied towards the identification of small molecule inhibitors of the dengue virus (DENV) NS2B-NS3 protease. Currently, no DENV protease co-crystal structure with bound inhibitor and fully formed substrate binding site is available. Therefore a homology model of DENV NS2B-NS3 protease was generated employing a multiple template spatial restraints method and used for structure-based design. A library of molecular fragments was derived from the ZINC screening database with help of the retrosynthetic combinatorial analysis procedure (RECAP). 150,000 molecular fragments were docked to the DENV protease homology model and the docking poses were rescored using a target-specific scoring function. High scoring fragments were assembled to small molecule candidates by an implicit linking cascade. The cascade included substructure searching and structural filters focusing on interactions with the S1 and S2 pockets of the protease. The chemical space adjacent to the promising candidates was further explored by neighborhood searching. A total of 23 compounds were tested experimentally and two compounds were discovered to inhibit dengue protease (IC(50)

• LigBuilder 2: A Practical de Novo Drug Design Approach.
Yuan, Yaxia and Pei, Jianfeng and Lai, Luhua
Journal of chemical information and modeling, 2011, 51(5), 1083-1091
PMID: 21513346     doi: 10.1021/ci100350u

We have developed a new version (2.0) of the de novo drug design program LigBuilder. With LigBuilder 2.0, the synthesis accessibility of designed compounds can be analyzed, and a cavity detection procedure is implemented to detect the positions and shapes of the binding sites on the surface of a given protein structure and to quantitatively estimate drugability. Ligands are designed to best fit the detected cavities using a set of rules for evaluation. Drug-like and privileged fragments are used to construct the ligands with the aid of internal and external absorption, distribution, metabolism, excretion, and toxicity (ADME/T) and drug-like filters.

• De novo design by pharmacophore-based searches in fragment spaces.
Lippert, Tobias and Schulz-Gasch, Tanja and Roche, Olivier and Guba, Wolfgang and Rarey, Matthias
Journal of computer-aided molecular design, 2011, 25(10), 931-945
PMID: 21922280     doi: 10.1007/s10822-011-9473-6

De novo ligand design supports the search for novel molecular scaffolds in medicinal chemistry projects. This search can either be based on structural information of the targeted active site (structure-based approach) or on similarity to known binders (ligand-based approach). In the absence of structural information on the target, pharmacophores provide a way to find topologically novel scaffolds. Fragment spaces have proven to be a valuable source for molecular structures in de novo design that are both diverse and synthetically accessible. They also offer a simple way to formulate custom chemical spaces. We have implemented a new method which stochastically constructs new molecules from fragment spaces under consideration of a three dimensional pharmacophore. The program has been tested on several published pharmacophores and is shown to be able to reproduce scaffold hops from the literature, which resulted in new chemical entities.

## 2010

• Dockomatic - automated ligand creation and docking.
Bullock, Casey W and Jacob, Reed B. and McDougal, Owen M. and Hampikian, Greg and Andersen, Tim
BMC research notes, 2010, 3, 289
PMID: 21059259     doi: 10.1186/1756-0500-3-289

BACKGROUND:The application of computational modeling to rationally design drugs and characterize macro biomolecular receptors has proven increasingly useful due to the accessibility of computing clusters and clouds. AutoDock is a well-known and powerful software program used to model ligand to receptor binding interactions. In its current version, AutoDock requires significant amounts of user time to setup and run jobs, and collect results. This paper presents DockoMatic, a user friendly Graphical User Interface (GUI) application that eases and automates the creation and management of AutoDock jobs for high throughput screening of ligand to receptor interactions.

• LoFT: similarity-driven multiobjective focused library design.
Fischer, J Robert and Lessel, Uta and Rarey, Matthias
Journal of chemical information and modeling, 2010, 50(1), 1-21
PMID: 20020715     doi: 10.1021/ci900287p

We present LoFT, a tool for focused combinatorial library design. LoFT provides a set of algorithms, constructing a focused library from a chemical fragment space under optimization of multiple design criteria. A weighted multiobjective scoring function based on physicochemical descriptors is employed for traversing the chemical search space. The new aspect of LoFT is that a similarity-driven product-based library design approach is provided on fragment level. For this reason the feature tree descriptor is incorporated for similarity comparison of library compounds to given bioactive molecules as well as for diversifying the resulting libraries. The feature tree descriptor abstracts the molecular graph to a tree structure where the nodes are labeled with physicochemical properties. For comparison, the nodes of two trees are mapped onto each other. This strictly hierarchical mechanism is suitable for the efficient comparison of chemical fragments, allowing the evaluation of the resulting products on fragment level without explicitly enumerating them. LoFT was validated, applying three different data sets. Starting with a random reagent selection, we optimized the libraries using maximum similarity to known bioactive molecules and iteratively adding further criteria. Moreover, we compared these results with data we obtained with FTrees-FS.

• Structure-based design, synthesis and biological evaluation of new N-carboxyphenylpyrrole derivatives as HIV fusion inhibitors targeting gp41
Wang, Y and Lu, H and Zhu, Q and Jiang, S
Bioorganic & Medicinal Chemistry, 2010, 1(1), 189-192

A new series of N-carboxyphenylpyrrole ligands were designed using GeometryFit based on an X-ray crystal structure of gp41. The synthesized ligands showed significant inhibitory activities against HIV gp41 6-helix bundle formation, HIV -1 mediated cell-cell fusion and HIV-1 replication.

• Scaffold hopping using two-dimensional fingerprints: true potential, black magic, or a hopeless endeavor? Guidelines for virtual screening.
Vogt, Martin and Stumpfe, Dagmar and Geppert, Hanna and Bajorath, Jürgen
Journal of medicinal chemistry, 2010, 53(15), 5707-5715
PMID: 20684607     doi: 10.1021/jm100492z

The scaffold hopping potential of popular 2D fingerprints has been thoroughly investigated. We have found that these types of fingerprints have at least limited scaffold hopping ability including early enrichment of small numbers of active scaffolds at high database ranks. However, it has not been possible to derive Tanimoto coefficient value ranges for individual fingerprints that are generally preferred for scaffold hopping. For selected fingerprints, similarity threshold values have been identified that yield small database selection sets having a high probability to contain a few active scaffolds. Furthermore, essentially all tested fingerprints have shown the ability to enrich scaffold hops in approximately 1% of a screening database. For the test cases reported herein, selecting 0.5-1% of the screening database yields approximately 25% of the available scaffolds. On the basis of our findings, practical guidelines for virtual screening using different types of 2D fingerprints have been formulated.

• MORPH: a new tool for ligand design.
Beno, Brett R and Langley, David R
Journal of chemical information and modeling, 2010, 50(6), 1159-1164
PMID: 20481489     doi: 10.1021/ci9004964

A frequently employed strategy in drug discovery efforts is to replace aromatic rings in known active compounds with alternative aromatic moieties to create novel compounds with improved potency and/or adsorption, distribution, metabolism, excretion, and toxicity properties. Here we introduce MORPH, which is a simple software tool for systematically modifying aromatic rings in three-dimensional models of molecules without altering the coordinates of the nonhydrogen atoms in the rings. MORPH works on individual rings as well as fused ring systems and additionally provides the ability to filter out modified compounds which do not contain hydrogen-bond donors or acceptors at specific positions on the rings or contain more or less than the desired number of heteroatoms. The MORPH program and its application to two ligands extracted from cocrystal structures with cyclin-dependent kinase 2 (CDK2)/cyclin A and CDK2 are discussed below.

## 2009

• FOG: Fragment Optimized Growth algorithm for the de novo generation of molecules occupying druglike chemical space.
Kutchukian, Peter S. and Lou, David and Shakhnovich, Eugene I
Journal of chemical information and modeling, 2009, 49(7), 1630-1642
PMID: 19527020     doi: 10.1021/ci9000458

An essential feature of all practical de novo molecule generating programs is the ability to focus the potential combinatorial explosion of grown molecules on a desired chemical space. It is a daunting task to balance the generation of new molecules with limitations on growth that produce desired features such as stability in water, synthetic accessibility, or drug-likeness. We have developed an algorithm, Fragment Optimized Growth (FOG), which statistically biases the growth of molecules with desired features. At the heart of the algorithm is a Markov Chain which adds fragments to the nascent molecule in a biased manner, depending on the frequency of specific fragment-fragment connections in the database of chemicals it was trained on. We show that in addition to generating synthetically feasible molecules, it can be trained to grow new molecules that resemble desired classes of molecules such as drugs, natural products, and diversity-oriented synthetic products. In order to classify our grown molecules, we developed the Topology Classifier (TopClass) algorithm that is capable of classifying compounds, for example as drugs or nondrugs. The classification accuracies obtained with TopClass compare favorably with the literature. Furthermore, in contrast to "black-box" approaches such as Neural Networks, TopClass brings to light characteristics of drugs that distinguish them from nondrugs.

• Fragment shuffling: an automated workflow for three-dimensional fragment-based ligand design.
Nisius, Britta and Rester, Ulrich
Journal of chemical information and modeling, 2009, 49(5), 1211-1222
PMID: 19413347     doi: 10.1021/ci8004572

Fragment-based approaches display a promising alternative in lead discovery. Herein, we present the automated fragment shuffling workflow for the identification of novel lead compounds combining central elements from fragment-based lead identification and structure-based de novo design. Our method is based on sets of aligned 3D ligand structures binding to the same target or target family. The implementation comprises three different ligand fragmentation methods, a scoring scheme assigning individual scores to each fragment, and the incremental construction of novel ligands based on a greedy search algorithm guided by the calculated fragment scores. The validation of our 3D ligand design workflow is presented on the basis of two pharmaceutically relevant drug targets. A retrospective study based on a selected protein kinase data set revealed that the fragment shuffling approach realizes extended results compared to the well-known BREED technique. Furthermore, we applied our approach in a prospective study for the design of novel non-peptidic thrombin inhibitors. The designed ligand structures in both studies demonstrate the potential of the fragment shuffling workflow.

• De Novo Drug Design Using Multiobjective Evolutionary Graphs
Nicolaou, Christos A and Apostolakis, Joannis and Pattichis, Costas S
Journal of chemical information and modeling, 2009, 49(2), 295-307
PMID: 19434831     doi: 10.1021/ci800308h

• AutoGrow: a novel algorithm for protein inhibitor design.
Durrant, Jacob D and Amaro, Rommie E and McCammon, J Andrew
Chemical biology & drug design, 2009, 73(2), 168-178
PMID: 19207419     doi: 10.1111/j.1747-0285.2008.00761.x

Due in part to the increasing availability of crystallographic protein structures as well as rapid improvements in computing power, the past few decades have seen an explosion in the field of computer-based rational drug design. Several algorithms have been developed to identify or generate potential ligands in silico by optimizing the ligand-receptor hydrogen bond, electrostatic, and hydrophobic interactions. We here present AutoGrow, a novel computer-aided drug design algorithm that combines the strengths of both fragment-based growing and docking algorithms. To validate AutoGrow, we recreate three crystallographically resolved ligands from their constituent fragments.

• Discovering potent small molecule inhibitors of cyclophilin A using de novo drug design approach.
Ni, Shuaishuai and Yuan, Yaxia and Huang, Jin and Mao, Xiaona and Lv, Maosheng and Zhu, Jin and Shen, Xu and Pei, Jianfeng and Lai, Luhua and Jiang, Hualiang and Li, Jian
Journal of medicinal chemistry, 2009, 52(17), 5295-5298
PMID: 19691347     doi: 10.1021/jm9008295

This work describes an integrated approach of de novo drug design, chemical synthesis, and bioassay for quick identification of a series of novel small molecule cyclophilin A (CypA) inhibitors (1-3). The activities of the two most potent CypA inhibitors (3h and 3i) are 2.59 and 1.52 nM, respectively, which are about 16 and 27 times more potent than that of cyclosporin A. This study clearly demonstrates the power of our de novo drug design strategy and the related program LigBuilder 2.0 in drug discovery.

• Second-generation de novo design: a view from a medicinal chemist perspective.
Zaliani, Andrea and Boda, Krisztina and Seidel, Thomas and Herwig, Achim and Schwab, Christof H and Gasteiger, Johann and Clau{\ss}en, Holger and Lemmen, Christian and Degen, Jörg and Pärn, Juri and Rarey, Matthias
Journal of computer-aided molecular design, 2009, 23(8), 593-602
PMID: 19562260     doi: 10.1007/s10822-009-9291-2

For computational de novo design, a general retrospective validation work is a very challenging task. Here we propose a comprehensive workflow to de novo design driven by the needs of computational and medicinal chemists and, at the same time, we propose a general validation scheme for this technique. The study was conducted combining a suite of already published programs developed within the framework of the NovoBench project, which involved three different pharmaceutical companies and four groups of developers. Based on 188 PDB protein-ligand complexes with diverse functions, the study involved the ligand reconstruction by means of a fragment-based de-novo design approach. The structure-based de novo search engine FlexNovo showed in five out of eight total cases the ability to reconstruct native ligands and to rank them in four cases out of five within the first five candidates. The generated structures were ranked according to their synthetic accessibilities evaluated by the program SYLVIA. This investigation showed that the final candidate molecules have about the same synthetic complexity as the respective reference ligands. Furthermore, the plausibility of being true actives was assessed through literature searches.

• Evaluation of an inverse molecular design algorithm in a model binding site
Huggins, David J. and Altman, Michael D. and Tidor, Bruce
Proteins, 2009, 75(1), 168-186
doi: 10.1002/prot.22226

Computational molecular design is a useful tool in modern drug discovery. Virtual screening is an approach that docks and then scores individual members of compound libraries. In contrast to this forward approach, inverse approaches construct compounds from fragments, such that the computed affinity, or a combination of relevant properties, is optimized. We have recently developed a new inverse approach to drug design based on the dead-end elimination and A* algorithms employing a physical potential function. This approach has been applied to combinatorially constructed libraries of small-molecule ligands to design high-affinity HIV-1 protease inhibitors (Altman et al., J Am Chem Soc 2008;130:6099-6013). Here we have evaluated the new method using the well-studied W191G mutant of cytochrome c peroxidase. This mutant possesses a charged binding pocket and has been used to evaluate other design approaches. The results show that overall the new inverse approach does an excellent job of separating binders from non-binders. For a few individual cases, scoring inaccuracies led to false positives. The majority of these involve erroneous solvation energy estimation for charged amines, anilinium ions, and phenols, which has been observed previously for a variety of scoring algorithms. interestingly, although inverse approaches are generally expected to identify some but not all binders in a library, due to limited conformational searching, these results show excellent coverage of the known binders while still showing strong discrimination of the nonbinders.

• SHOP: a method for structure-based fragment and scaffold hopping.
Fontaine, Fabien and Cross, Simon and Plasencia, Guillem and Pastor, Manuel and Zamora, Ismael
Chemmedchem, 2009, 4(3), 427-439
PMID: 19152365     doi: 10.1002/cmdc.200800355

A new method for fragment and scaffold replacement is presented that generates new families of compounds with biological activity, using GRID molecular interaction fields (MIFs) and the crystal structure of the targets. In contrast to virtual screening strategies, this methodology aims only to replace a fragment of the original molecule, maintaining the other structural elements that are known or suspected to have a critical role in ligand binding. First, we report a validation of the method, recovering up to 95% of the original fragments searched among the top-five proposed solutions, using 164 fragment queries from 11 diverse targets. Second, six key customizable parameters are investigated, concluding that filtering the receptor MIF using the co-crystallized ligand atom type has the greatest impact on the ranking of the proposed solutions. Finally, 11 examples using more realistic scenarios have been performed; diverse chemotypes are returned, including some that are similar to compounds that are known to bind to similar targets.

• In silico fragment screening by replica generation (FSRG) method for fragment-based drug design.
Fukunishi, Yoshifumi and Mashimo, Tadaaki and Orita, Masaya and Ohno, Kazuki and Nakamura, Haruki
Journal of chemical information and modeling, 2009, 49(4), 925-933
PMID: 19354203     doi: 10.1021/ci800435x

We developed a new in silico screening method, which is a structure-based virtual fragment screening with protein-compound docking. The structure-based in silico screening of small fragments is known to be difficult due to poor surface complementarity between protein surfaces and small compound (fragment) surfaces. In our method, several side chains were attached to the fragment in question to generate a set of replica molecules of different sizes. This chemical modification enabled us to select potentially active fragments more easily than basing the selection on the original form of the fragment. In addition, the Coulombic and hydrogen bonding interactions were ignored in the docking simulation to reduce the variety of chemical modifications. Namely, we focused on the sizes and the shapes of the side chains and could ignore the atomic charges and types of elements. This procedure was validated in the screenings of inhibitors of six target proteins using known active compounds, and the results revealed that our procedure was effective.

• Quantum Isostere Database: A Web-Based Tool Using Quantum Chemical Topology To Predict Bioisosteric Replacements for Drug Design
Devereux, Mike and Popelier, Paul L A and McLay, Iain M
Journal of chemical information and modeling, 2009, 49(6), 1497-1513
doi: 10.1021/ci900085d

This paper introduces the 'Quantum Isostere Database '(QID), a Web-based tool designed to find bioisosteric fragment replacements for lead optimization using stored ab initio data. A wide range of original geometric, electronic, and calculated physical properties are stored ...

• Quantum Isostere Database: A Web-Based Tool Using Quantum Chemical Topology To Predict Bioisosteric Replacements for Drug Design
Devereux, Mike and Popelier, Paul L A and McLay, Iain M
Journal of chemical information and modeling, 2009, 49(6), 1497-1513
doi: 10.1021/ci900085d

This paper introduces the 'Quantum Isostere Database '(QID), a Web-based tool designed to find bioisosteric fragment replacements for lead optimization using stored ab initio data. A wide range of original geometric, electronic, and calculated physical properties are stored ...

## 2008

• Is it possible to increase hit rates in structure-based virtual screening by pharmacophore filtering? An investigation of the advantages and pitfalls of post-filtering.
Muthas, Daniel and Sabnis, Yogesh A and Lundborg, Magnus and Karlén, Anders
Journal of molecular graphics & modelling, 2008, 26(8), 1237-1251
PMID: 18203638     doi: 10.1016/j.jmgm.2007.11.005

We have investigated the influence of post-filtering virtual screening results, with pharmacophoric features generated from an X-ray structure, on enrichment rates. This was performed using three docking softwares, zdock+, Surflex and FRED, as virtual screening tools and pharmacophores generated in UNITY from co-crystallized complexes. Sets of known actives along with 9997 pharmaceutically relevant decoy compounds were docked against six chemically diverse protein targets namely CDK2, COX2, ERalpha, fXa, MMP3, and NA. To try to overcome the inherent limitations of the well-known docking problem, we generated multiple poses for each compound. The compounds were first ranked according to their scores alone and enrichment rates were calculated using only the top scoring pose of each compound. Subsequently, all poses for each compound were passed through the different pharmacophores generated from co-crystallized complexes and the enrichment factors were re-calculated based on the top-scoring passing pose of each compound. Post-filtering with a pharmacophore generated from only one X-ray complex was shown to increase enrichment rates in all investigated targets compared to docking alone. This indicates that this is a general method, which works for diverse targets and different docking softwares.

• CONFIRM: connecting fragments found in receptor molecules
Thompson, David C and Aldrin Denny, R and Nilakantan, Ramaswamy and Humblet, Christine and Joseph-McCarthy, Diane and Feyfant, Eric
Journal of computer-aided molecular design, 2008, 22(10), 761-772
doi: 10.1007/s10822-008-9221-8

A novel algorithm for the connecting of fragment molecules is presented and validated for a number of test systems. Within the CONFIRM (Connecting Fragments Found in Receptor Molecules) approach a pre-prepared library of bridges is searched to extract those which match a search criterion derived from known experimental or computational binding information about fragment molecules within a target binding site. The resulting bridge `hits' are then connected, in an automated fashion, to the fragments and docked into the target receptor. Docking poses are assessed in terms of root-mean-squared deviation from the known positions of the fragment molecules, as well as docking score should known inhibitors be available. The creation of the bridge library, the full details and novelty of the CONFIRM algorithm, and the general applicability of this approach within the field of fragment-based de novo drug design are discussed.

• A Drug Candidate Design Environment Using Evolutionary Computation
Ecemis, M. Ihsan and Wikel, James and Bingham, Christopher and Bonabeau, Eric
Ieee Transactions on Evolutionary Computation, 2008, 12(5), 591-603
doi: 10.1109/TEVC.2007.913131

This paper describes the Candidate Design Environment we developed for efficient identification of promising drug candidates. Developing effective drugs from active molecules is a challenging problem which requires the simultaneous satisfaction of many factors. Traditionally, the drug discovery process is conducted by medicinal chemists whose vital expertise is not readily quantifiable. Recently, in silico modeling and virtual screening have been emerging as valuable tools despite their mixed results early on. Our approach combines the capabilities of computational models with human knowledge using a genetic algorithm and interactive evolutionary computation. We enable the chemist's expertise to play a key role in every stage of the discovery process. Our evolved structures are guaranteed to be within the chemistry space specified by the medicinal chemist, thereby making the results plausible. In this paper, we describe our approach, introduce a case study to test our methodology, and present our results.

• De novo ligand design to partially flexible active sites: application of the ReFlex algorithm to carboxypeptidase A, acetylcholinesterase, and the estrogen receptor.
Firth-Clark, Stuart and Kirton, Stewart B and Willems, Henriëtte M G and Williams, Anthony
Journal of chemical information and modeling, 2008, 48(2), 296-305
PMID: 18232679     doi: 10.1021/ci700282u

Reflex is a recent algorithm in the de novo ligand design software, SkelGen, that allows the flexibility of amino acid side chains in a protein to be taken into account during the drug-design process. In this paper the impact of flexibility on the solutions generated by the de novo design algorithm, when applied to carboxypeptidase A, acetylcholinesterase, and the estrogen receptor (ER), is investigated. The results for each of the targets indicate that when allowing side-chain movement in the active site, solutions are generated that were not accessible from the multiple static protein conformations available for these targets. Furthermore, an analysis of structures generated in a flexible versus a static ER active site suggests that these additional solutions are not merely noise but contain many interesting chemotypes.

• Concept of combinatorial de novo design of drug-like molecules by particle swarm optimization.
Hartenfeller, Markus and Proschak, Ewgenij and Schüller, Andreas and Schneider, Gisbert
Chemical biology & drug design, 2008, 72(1), 16-26
PMID: 18564216     doi: 10.1111/j.1747-0285.2008.00672.x

We present a fast stochastic optimization algorithm for fragment-based molecular de novo design (COLIBREE, Combinatorial Library Breeding). The search strategy is based on a discrete version of particle swarm optimization. Molecules are represented by a scaffold, which remains constant during optimization, and variable linkers and side chains. Different linkers represent virtual chemical reactions. Side-chain building blocks were obtained from pseudo-retrosynthetic dissection of large compound databases. Here, ligand-based design was performed using chemically advanced template search (CATS) topological pharmacophore similarity to reference ligands as fitness function. A weighting scheme was included for particle swarm optimization-based molecular design, which permits the use of many reference ligands and allows for positive and negative design to be performed simultaneously. In a case study, the approach was applied to the de novo design of potential peroxisome proliferator-activated receptor subtype-selective agonists. The results demonstrate the ability of the technique to cope with large combinatorial chemistry spaces and its applicability to focused library design. The technique was able to perform exploitation of a known scheme and at the same time explorative search for novel ligands within the framework of a given molecular core structure. It thereby represents a practical solution for compound screening in the early hit and lead finding phase of a drug discovery project.

• Fragment-based de novo ligand design by multiobjective evolutionary optimization.
Dey, Fabian and Caflisch, Amedeo
Journal of chemical information and modeling, 2008, 48(3), 679-690
PMID: 18307332     doi: 10.1021/ci700424b

GANDI (Genetic Algorithm-based de Novo Design of Inhibitors) is a computational tool for automatic fragment-based design of molecules within a protein binding site of known structure. A genetic algorithm and a tabu search act in concert to join predocked fragments with a user-supplied list of fragments. A novel feature of GANDI is the simultaneous optimization of force field energy and a term enforcing 2D-similarity to known inhibitor(s) or 3D-overlap to known binding mode(s). Scaffold hopping can be promoted by tuning the relative weights of these terms. The performance of GANDI is tested on cyclin-dependent kinase 2 (CDK2) using a library of about 14 000 fragments and the binding mode of a known oxindole inhibitor to bias the design. Top ranking GANDI molecules are involved in one to three hydrogen bonds with the backbone polar groups in the hinge region of CDK2, an interaction pattern observed in potent kinase inhibitors. Notably, a GANDI molecule with very favorable predicted binding affinity shares a 2-N-phenyl-1,3-thiazole-2,4-diamine moiety with a known nanomolar inhibitor of CDK2. Importantly, molecules with a favorable GANDI score are synthetic accessible. In fact, eight of the 1809 molecules designed by GANDI for CDK2 are found in the ZINC database of commercially available compounds which also contains about 600 compounds with identical scaffolds as those in the top ranking GANDI molecules.

## 2007

• Tagged fragment method for evolutionary structure-based de novo lead generation and optimization.
Liu, Qian and Masek, Brian and Smith, Karl and Smith, Julian
Journal of medicinal chemistry, 2007, 50(22), 5392-5402
PMID: 17918924     doi: 10.1021/jm070750k

Here we describe a computer-assisted de novo drug design method, EAISFD, which combines the de novo design engine EA-Inventor with a scoring function featuring the molecular docking program Surflex-Dock. This method employs tagged fragments, which are preserved substructures in EA-Inventor used for base fragment matching in Surflex-Dock, for constructing ligand structures under specific binding motifs. In addition, a target score mechanism is adopted that allows EAISFD to deliver a diverse set of desired structures. This method can be used to design novel ligand scaffolds (lead generation) or to optimize attachments on a fixed scaffold (lead optimization). EAISFD has successfully suggested many known inhibitor scaffolds as well as a number of new scaffold types when applied to p38 MAP kinase.

• Flux (2): Comparison of Molecular Mutation and Crossover Operators for Ligand-Based de Novo Design
Fechner, Uli and Schneider, Gisbert
Journal of chemical information and modeling, 2007, 47(2), 656-667
doi: 10.1021/ci6005307

We implemented a fragment-based de novo design algorithm for a population-based optimization of molecular structures. The concept is grounded on an evolution strategy with mutation and crossover operators for structure breeding. Molecular building blocks were obtained from the pseudo-retrosynthesis of a collection of pharmacologically active compounds following the RECAP principle. The influence of mutation and crossover on the course of optimization was assessed in redesign studies using known drugs as template structures. A topological atom-pair descriptor grounded on potential pharmacophore points was used as a molecular descriptor, and the Manhattan distance between the template and candidate molecules served as a fitness function. Exclusive use of the crossover operator yielded few unique compounds and often resulted in premature convergence of the optimization process, whereas exclusive use of the mutation operator resulted in diverse high-quality structures. Combinations of crossover and mutation yielded the overall best results. The majority of the designed structures exhibit a chemically reasonable architecture; chiral centers are rare, and unfavorable connections of building blocks are infrequent. We conclude that this fragment-based design principle is suited as an idea generator for the automated design of novel leadlike molecules.

• Designing active template molecules by combining computational de novo design and human chemist's expertise.
Lameijer, Eric-Wubbo and Tromp, Reynier A and Spanjersberg, Ronald F and Brussee, Johannes and Ijzerman, Adriaan P
Journal of medicinal chemistry, 2007, 50(8), 1925-1932
PMID: 17367122     doi: 10.1021/jm061356

We used a new software tool for de novo design, the "Molecule Evoluator", to generate a number of small molecules. Explicit constraints were a relatively low molecular weight and otherwise limited functionality, for example, low numbers of hydrogen bond donors and acceptors, one or two aromatic rings, and a small number of rotatable bonds. In this way, we obtained a collection of scaffold- or templatelike molecules rather than fully "decorated" ones. We asked medicinal chemists to evaluate the suggested molecules for ease of synthesis and overall appeal, allowing them to make structural changes to the molecules for these reasons. On the basis of their recommendations, we synthesized eight molecules with an unprecedented (not patented) yet simple structure, which were subsequently tested in a screen of 83 drug targets, mostly G protein-coupled receptors. Four compounds showed affinity for biogenic amine targets (receptor, ion channel, and transport protein), reflecting the training of the medicinal chemists involved. Apparently the generation of leadlike solutions helped the medicinal chemists to select good starting points for future lead optimization, away from existing compound libraries.

• A database of historically-observed chemical replacements.
Haubertin, David Y and Bruneau, Pierre
Journal of chemical information and modeling, 2007, 47(4), 1294-1302
PMID: 17539596     doi: 10.1021/ci600395u

A systematic analysis of one-to-one chemical replacements occurring in a set of 50,000 druglike molecules was performed. The frequency of occurrence, as well as the average change in measured and calculated properties, was computed for each observed substitution. The experimental properties considered were solubility, protein binding, and logD. The calculated properties were logP, molecular weight, number of hydrogen bond donors and acceptors, and polar surface area. During this analysis, in which 9000 different functional groups were considered, 0.7 million substitutions were identified and stored in a database. As an application, we present a web interface from which users can identify historically observed replacements of any functional group on their query molecule. The server returns a list of side-chains, as well as the historically observed shift in experimental properties.

## 2006

• Flux (1): A Virtual Synthesis Scheme for Fragment-Based de Novo Design
Fechner, Uli and Schneider, Gisbert
Journal of chemical information and modeling, 2006, 46(2), 699-707
doi: 10.1021/ci0503560

It is demonstrated that the fragmentation of druglike molecules by applying simplistic pseudo-retrosynthesis results in a stock of chemically meaningful building blocks for de novo molecule generation. A stochastic search algorithm in conjunction with ligand-based similarity scoring (Flux:  fragment-based ligand builder reaxions) facilitated the generation of new molecules using a single known reference compound as a template. This molecule assembly method is applicable in the absence of receptor-structure information. In a case study, we used imantinib (Gleevec) and a Factor Xa inhibitor as the reference structures. The algorithm succeeded in redesigning the templates from scratch and suggested several alternative molecular structures. The resulting designed molecules were chemically reasonable and contained essential substructure motifs. A comparison of molecular descriptors suggests that holographic descriptors might be advantageous over binary fingerprints for ligand-based de novo design.

• The molecule evoluator. An interactive evolutionary algorithm for the design of drug-like molecules.
Lameijer, Eric-Wubbo and Kok, Joost N and Bäck, Thomas and Ijzerman, Ad P
Journal of chemical information and modeling, 2006, 46(2), 545-552
PMID: 16562982     doi: 10.1021/ci050369d

We developed a software tool to design drug-like molecules, the "Molecule Evoluator", which we introduce and describe here. An atom-based evolutionary approach was used allowing both several types of mutation and crossover to occur. The novelty, we claim, is the unprecedented interactive evolution, in which the user acts as a fitness function. This brings a human being's creativity, implicit knowledge, and imagination into the design process, next to the more standard chemical rules. Proof-of-concept was demonstrated in a number of ways, both computationally and in the lab. Thus, we synthesized a number of compounds designed with the aid of the Molecule Evoluator. One of these is described here, a new chemical entity with activity on alpha-adrenergic receptors.

• Computer-aided design of non-nucleoside inhibitors of HIV-1 reverse transcriptase
Jorgensen, WL and Ruiz-Caro, J and Tirado-Rives, J
Bioorganic & medicinal\ldots}, 2006, 16(3), 663-667

Design principles are delineated for non-nucleoside inhibitors for HIV-1 reverse transcriptase (NNRTIs). Simultaneous optimization of binding affinity for wild-type RT, tolerance for viral mutations, and physical properties is pursued. Automated lead generation with the growing program BOMB, Monte Carlo simulations with free-energy perturbation theory for lead optimization, and property analysis with QikProp are featured. An initial 30 lM lead has been optimized rapidly to the 10 nM level.

• Grand canonical Monte Carlo simulation of ligand-protein binding
Clark, M and Guarnieri, F and Shkurko, I
Journal of chemical\ldots}, 2006, 46(1), 231-242

A new application of the grand canonical thermodynamics ensemble to compute ligand-protein binding is described. The described method is sufficiently rapid that it is practical to compute ligand-protein binding free energies for a large number of poses over the entire protein surface, thus identifying multiple putative ligand binding sites. In addition, the method computes binding free energies for a large number of poses. The method is demonstrated by the simulation of two protein-ligand systems, thermolysin and T4 lysozyme, for which there is extensive thermodynamic and crystallographic data for the binding of small, rigid ligands. These low-molecular-weight ligands correspond to the molecular fragments used in computational fragment- based drug design. The simulations correctly identified the experimental binding poses and rank ordered the affinities of ligands in each of these systems.

• SkelGen: a general tool for structure-based de novo ligand design
Dean, Philip M and Firth-Clark, Stuart and Harris, William and Kirton, Stewart B and Todorov, Nikolay P
Expert opinion on drug discovery, 2006, 1(2), 179-189
doi: 10.1517/17460441.1.2.179

The recent lapse in productivity in the pharmaceutical industry has facilitated the emergence of experimental and in silico structure-based design methodologies, based on identification of biologically active low molecular weight fragments that can be exploited to produce potential drug candidates with diverse chemistries. SkelGen, an in silico example of this methodology, is reviewed. The ability of this algorithm to identify chemically diverse low molecular weight fragments that would potentially bind to DNA gyrase is recounted, as is the first purely de novo structure-based design of five compounds that show at least micromolar activity against the estrogen receptor. The ability of the algorithm to incorporate partial protein flexibility during its design of compounds to the estrogen receptor is discussed, and an opinion as to the near and long-term futures for de novo design algorithms is expressed.

• FlexNovo: structure-based searching in large fragment spaces.
Degen, Jörg and Rarey, Matthias
Chemmedchem, 2006, 1(8), 854-868
PMID: 16902939     doi: 10.1002/cmdc.200500102

We present a new molecular design program, FlexNovo, for structure-based searching within large fragment spaces following a sequential growth strategy. The fragment spaces consist of several thousands of chemical fragments and a corresponding set of rules that specify how the fragments can be connected. FlexNovo is based on the FlexX molecular docking software and makes use of its incremental construction algorithm and the underlying chemical models. Interaction energies are calculated by using standard scoring functions. Several placement geometry, physicochemical property (drug-likeness), and diversity filter criteria are directly integrated into the "build-up" process. FlexNovo has been used to design potential inhibitors for four targets of pharmaceutical interest (dihydrofolate reductase, cyclin-dependant kinase 2, cyclooxygenase-2, and the estrogen receptor). We have carried out calculations using different diversity parameters for each of these targets and generated solution sets containing up to 50 molecules. The compounds obtained show that FlexNovo is able to generate a diverse set of reasonable molecules with drug-like properties. The results, including an automated similarity analysis with the Feature Tree program, indicate that FlexNovo often reproduces structural motifs as well as the corresponding binding modes seen in known active structures.

• De novo ligand design to an ensemble of protein structures.
Todorov, N P and Buenemann, C L and Alberts, I L
Proteins, 2006, 64(1), 43-59
PMID: 16555306     doi: 10.1002/prot.20928

We describe a combinatorial method for de novo ligand design to an ensemble of receptor structures. Receptor conformations, protonation states, and structural water molecules are considered consistently within the framework of de novo ligand design. The method relies on Monte Carlo optimization to search the space of ligand structures, conformations, and rigid-body movements as well as receptor models. The method is applied to an ensemble of HIV protease and human collagenase receptor models. Ligand structures generated de novo exhibit the correct hydrogen-bonding pattern in the core of the active site, with hydrophobic groups extending into the receptor S1 and S1' pocket space. Furthermore, it is shown that known ligands are recovered in the correct binding mode and in the native, most tightly binding receptor model.

• sc-PDB: an annotated database of druggable binding sites from the Protein Data Bank.
Kellenberger, Esther and Muller, Pascal and Schalon, Claire and Bret, Guillaume and Foata, Nicolas and Rognan, Didier
Journal of chemical information and modeling, 2006, 46(2), 717-727
PMID: 16563002     doi: 10.1021/ci050372x

The sc-PDB is a collection of 6 415 three-dimensional structures of binding sites found in the Protein Data Bank (PDB). Binding sites were extracted from all high-resolution crystal structures in which a complex between a protein cavity and a small-molecular-weight ligand could be identified. Importantly, ligands are considered from a pharmacological and not a structural point of view. Therefore, solvents, detergents, and most metal ions are not stored in the sc-PDB. Ligands are classified into four main categories: nucleotides (< 4-mer), peptides (< 9-mer), cofactors, and organic compounds. The corresponding binding site is formed by all protein residues (including amino acids, cofactors, and important metal ions) with at least one atom within 6.5 angstroms of any ligand atom. The database was carefully annotated by browsing several protein databases (PDB, UniProt, and GO) and storing, for every sc-PDB entry, the following features: protein name, function, source, domain and mutations, ligand name, and structure. The repository of ligands has also been archived by diversity analysis of molecular scaffolds, and several chemoinformatics descriptors were computed to better understand the chemical space covered by stored ligands. The sc-PDB may be used for several purposes: (i) screening a collection of binding sites for predicting the most likely target(s) of any ligand, (ii) analyzing the molecular similarity between different cavities, and (iii) deriving rules that describe the relationship between ligand pharmacophoric points and active-site properties. The database is periodically updated and accessible on the web at http://bioinfo-pharma.u-strasbg.fr/scPDB/.

## 2005

• LEA3D: a computer-aided ligand design for structure-based drug design.
Douguet, Dominique and Munier-Lehmann, Hélène and Labesse, Gilles and Pochet, Sylvie
Journal of medicinal chemistry, 2005, 48(7), 2457-2468
PMID: 15801836     doi: 10.1021/jm0492296

We present an improved version of the program LEA developed to design organic molecules. Rational drug design involves finding solutions to large combinatorial problems for which an exhaustive search is impractical. Genetic algorithms provide a tool for the investigation of such problems. New software, called LEA3D, is now able to conceive organic molecules by combining 3D fragments. Fragments were extracted from both biological compounds and known drugs. A fitness function guides the search process in optimizing the molecules toward an optimal value of the properties. The fitness function is build up by combining several independent property evaluations, including the score provided by the FlexX docking program. One application in de novo drug design is described. The example makes use of the structure of Mycobacterium tuberculosis thymidine monophosphate kinase to generate analogues of one of its natural substrates. Among 22 tested compounds, 17 show inhibitory activity in the micromolar range.

• Combinatorial ligand design targeted at protein families.
Todorov, Nikolay P and Buenemann, Christoph L and Alberts, Ian L
Journal of chemical information and modeling, 2005, 45(2), 314-320
PMID: 15807493     doi: 10.1021/ci049692r

We describe a method to create ligands specific for a given protein family. The method is applied to generate ligand candidates for the cyclin-dependent kinase (CDK) family. The CDK family of proteins is involved in regulating the cell cycle by alternately activating and deactivating the cell's progression through the cycle. CDKs are activated by association with cyclin and are inhibited by complexation with small molecules. X-ray crystal structures are available for three of the thirteen known CDK family members: CDK2, CDK5 and CDK 6. In this work, we use novel computational approaches to design ligand candidates that are potentially inhibitory across the three CDK family members as well as more specific molecules which can potentially inhibit one or any combination of two of the three CDK family members. We define a new scoring term, SpecScore, to quantify the potential inhibitory power of the generated structures. According to a search of the World Drug Alerts, the highest scoring SpecScore molecule that is specific for the three CDK family members shows very similar chemical characteristics and functional groups to numerous molecules known to deactivate several members of the CDK family.

• Bioisosterism: a useful strategy for molecular modification and drug design.
Lima, Lídia Moreira and Barreiro, Eliezer J
Current medicinal chemistry, 2005, 12(1), 23-49
PMID: 15638729

## 2004

• De novo generation of molecular structures using optimization to select graphs on a given lattice.
Bywater, Robert P and Poulsen, Thomas A and R{\o}gen, Peter and Hjorth, Poul G
Journal of Chemical Information and Computer Sciences, 2004, 44(3), 856-861
PMID: 15154750     doi: 10.1021/ci0342369

A recurrent problem in organic chemistry is the generation of new molecular structures that conform to some predetermined set of structural constraints that are imposed in an endeavor to build certain required properties into the newly generated structure. An example of this is the pharmacophore model, used in medicinal chemistry to guide de novo design or selection of suitable structures from compound databases. We propose here a method that efficiently links up a selected number of required atom positions while at the same time directing the emergent molecular skeleton to avoid forbidden positions. The linkage process takes place on a lattice whose unit step length and overall geometry is designed to match typical architectures of organic molecules. We use an optimization method to select from the many different graphs possible. The approach is demonstrated in an example where crystal structures of the same (in this case rigid) ligand complexed with different proteins are available.

• Native atom types for knowledge-based potentials: application to binding energy prediction.
Dominy, Brian N and Shakhnovich, Eugene I
Journal of medicinal chemistry, 2004, 47(18), 4538-4558
PMID: 15317465     doi: 10.1021/jm0498046

Knowledge-based potentials have been found useful in a variety of biophysical studies of macromolecules. Recently, it has also been shown in self-consistent studies that it is possible to extract quantities consistent with pair potentials from model structural databases. In this study, we attempt to extend the results obtained from these self-consistent studies toward the extraction of realistic pair potentials from the Protein Data Bank (PDB). The new method utilizes a clustering approach to define atom types within the PDB consistent with the optimal effective pairwise potential. The method has been integrated into the SMoG drug design package, resulting in an improved approach for the rapid and accurate estimation of binding affinities from structural information. Using this approach, it is possible to generate simple knowledge-based potentials that correlate (R

• BREED: Generating novel inhibitors through hybridization of known ligands. Application to CDK2, p38, and HIV protease.
Pierce, Albert C and Rao, Govinda and Bemis, Guy W
Journal of medicinal chemistry, 2004, 47(11), 2768-2775
PMID: 15139755     doi: 10.1021/jm030543u

In this work we describe BREED, a method for the generation of novel inhibitors from structures of known ligands bound to a common target. The method is essentially an automation of the common medicinal chemistry practice of joining fragments of two known ligands to generate a new inhibitor. The ligand-bound target structures are overlaid, all overlapping bonds in all pairs of ligands are found, and the fragments on each side of each matching bond are swapped to generate the new molecules. Since the method is automated, it can be applied recursively to generate all possible combinations of known ligands. In an application of this method to HIV protease inhibitors and protein kinase inhibitors, hundreds of new molecular structures were generated. These included known inhibitor scaffolds not included in the initial set, entirely novel scaffolds, and novel substituents on known scaffolds. The method is fast, and since all of the ligand functional groups are known to bind the target in the precise position and orientation present in the novel ligand, the success rate of this method should be superior to more traditional de novo design techniques. In an era of increasingly high-throughput structural biology, such methods for high-throughput utilization of structural information will become increasingly valuable.

## 2003

• SYNOPSIS: SYNthesize and OPtimize System in Silico.
Vinkers, H Maarten and de Jonge, Marc R and Daeyaert, Frederik F D and Heeres, Jan and Koymans, Lucien M H and van Lenthe, Joop H and Lewi, Paul J and Timmerman, Henk and Van Aken, Koen and Janssen, Paul A J
Journal of medicinal chemistry, 2003, 46(13), 2765-2773
PMID: 12801239     doi: 10.1021/jm030809x

We present a de novo design program called SYNOPSIS, that includes a synthesis route for each generated molecule. SYNOPSIS designs novel molecules by starting from a database of available molecules and simulating organic synthesis steps. This way of generating molecules imposes synthetic accessibility on the molecules. In addition to a starting database, a fitness function is needed that calculates the value of a desired property for an arbitrary molecule. The values obtained from this function guide the design process in optimizing the molecules toward an optimal value of the calculated property. Two applications are described. The first uses an electric dipole moment calculation to generate molecules possessing a strong dipole moment. The second makes use of the three-dimensional structure of a viral enzyme in order to generate high affinity ligands. Twenty eight compounds designed with the program resulted in 18 synthesized and tested compounds, 10 of which showed HIV inhibitory activity in vitro.

• Validation of the SPROUT de novo design program
Law, JMS and Fung, DYK and Zsoldos, Z and Simon, A and Szabo, Z and Csizmadia, IG and Johnson, AP
Journal of computer-aided molecular design, 2003, 25(8), 651-657
PMID: 21735261     doi: 10.1016/j.theochem.2003.08.104

The validation of SPROUT was carried out on four receptor-ligand complexes: thrombin-NAPAP, calmodulin (CAM)AAA, Ras P-21-GDP and dihydrofolate reductase (DHFR)-methotrexate (MTX). These complexes were downloaded from the Brookhaven Protein Data Bank (PDB). For the thrombin-NAPAP complex, two structures very similar to NAPAP were generated. These two structures were similar in 3D structure to NAPAP but contained an extra hexane ring. For CAM-AAA and Ras P-21-GDP, the ligands generated were essentially identical to their original ligands. For DHFR, two ligands, one most similar in 2D structure and one most similar in 3D conformation were found. The successful regeneration of the ligands for each case proves the ability and applicability of SPROUT for designing strongly binding, successful drug candidates. When the program is executed with less restricted constraints, it generates a large number of novel structures that are structurally diverse, making it an ideal tool for de novo design. (C) 2003 Elsevier B.V. All rights reserved.

## 2002

• A validation study on the practical use of automated de novo design
Stahl, M and Todorov, NP and James, T and Mauser, H
Journal of computer-\ldots}, 2002, 16(7), 459-478
PMID: 12510880

The de novo design program Skelgen has been used to design inhibitor structures for four targets of pharmaceutical interest. The designed structures are compared to modeled binding modes of known inhibitors (i) visually and (ii) by means of a novel similarity measure considering the size and spatial proximity of the maximum common substructure of two small molecules. It is shown that the Skelgen algorithm generates representatives of many inhibitor classes within a very short time and that the new similarity measure is useful for comparing and cluster- ing designed structures. The results demonstrate the necessity of properly defining search constraints in practical applications of de novo design.

• SMall Molecule Growth 2001 (SMoG2001): An Improved Knowledge-Based Scoring Function for Protein−Ligand Interactions
Ishchenko, Alexey V and Shakhnovich, Eugene I
Journal of medicinal chemistry, 2002, 45(13), 2770-2780
doi: 10.1021/jm0105833

## 2001

• Similarity searching in large combinatorial chemistry spaces.
Rarey, M and Stahl, M
Journal of computer-aided molecular design, 2001, 15(6), 497-520
PMID: 11495223

We present a novel algorithm, called Ftrees-FS, for similarity searching in large chemistry spaces based on dynamic programming. Given a query compound, the algorithm generates sets of compounds from a given chemistry space that are similar to the query. The similarity search is based on the feature tree similarity measure representing molecules by tree structures. This descriptor allows handling combinatorial chemistry spaces as a whole instead of looking at subsets of enumerated compounds. Within few minutes of computing time, the algorithm is able to find the most similar compound in very large spaces as well as sets of compounds at an arbitrary similarity level. In addition, the diversity among the generated compounds can be controlled. A set of 17,000 fragments of known drugs, generated by the RECAP procedure from the World Drug Index, was used as the search chemistry space. These fragments can be combined to more than 10(18) compounds of reasonable size. For validation, known antagonists/inhibitors of several targets including dopamine D4, histamine H1, and COX2 are used as queries. Comparison of the compounds created by Ftrees-FS to other known actives demonstrates the ability of the method to jump between structurally unrelated molecule classes.

## 2000

• De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks.
Schneider, G and Lee, M L and Stahl, M and Schneider, P
Journal of computer-aided molecular design, 2000, 14(5), 487-494
PMID: 10896320

An evolutionary algorithm was developed for fragment-based de novo design of molecules (TOPAS, TOPology-Assigning System). This stochastic method aims at generating a novel molecular structure mimicking a template structure. A set of approximately 25,000 fragment structures serves as the building block supply, which were obtained by a straightforward fragmentation procedure applied to 36,000 known drugs. Eleven reaction schemes were implemented for both fragmentation and building block assembly. This combination of drug-derived building blocks and a restricted set of reaction schemes proved to be a key for the automatic development of novel, synthetically tractable structures. In a cyclic optimization process, molecular architectures were generated from a parent structure by virtual synthesis, and the best structure of a generation was selected as the parent for the subsequent TOPAS cycle. Similarity measures were used to define 'fitness', based on 2D-structural similarity or topological pharmacophore distance between the template molecule and the variants. The concept of varying library 'diversity' during a design process was consequently implemented by using adaptive variant distributions. The efficiency of the design algorithm was demonstrated for the de novo construction of potential thrombin inhibitors mimicking peptide and non-peptide template structures.

• LigBuilder: A multi-purpose program for structure-based drug design
Wang, RX and Gao, Y and Lai, LH
Journal of Molecular Modeling, 2000, 6, 498-516

We have developed a new multi-purpose program, LigBuilder, for structure-based drug design. Within the structural constraints of the target protein, LigBuilder builds up ligands step by step using a library of organic fragments. Various operations, such as growing, linking, and mutation, have been implemented to manipulate molecular structures. The user can choose either growing or linking strategies for ligand construction and a genetic algorithm is adopted to control the whole construction process. Binding affinities of the ligands are estimated by an empirical scoring function and the bioavailabilities are evaluated by a set of chemical rules. Using thrombin and dihydrofolate reductase as examples, we have demonstrated that LigBuilder is able to generate chemical structures similar to the known ligands.

## 1997

• Evaluation of a method for controlling molecular scaffold diversity in de novo ligand design
Todorov, NP
Journal of computer-aided molecular design, 1997, 11(2), 175-192
PMID: 9089435

We describe an algorithm for the automated generation of molecular structures subject to geometric and connectivity constraints. The method relies on simulated annealing and simplex optimization of a penalty function that contains a variety of conditions and can be useful in structure-based drug design projects. The procedure controls the diversity and complexity of the generated molecules. Structure selection filters are an integral part and drive the algorithm. Several procedures have been developed to achieve reliable control. A number of template sets can be defined and combined to control the range of molecules which are searched. Ring systems are predefined. Normally, the ring-system complexity is one of the most elusive and difficult factors to control when fusion-, bridge- and spiro-structures are built by joining templates. Here this is not an issue; the decision about which systems are acceptable, and which are not, is made before the run is initiated. Queries for inclusion and exclusion spheres are incorporated into the objective function, and, by using a flexible notation, the structure generation can be directed and more focused. Simulated annealing is a reliable optimizer and converges asymptotically to the global minimum. The objective functions used here are degenerate, so it is likely that each run will produce a different set of good solutions.

• SMoG:  de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 2. Case Studies in Molecular Design
Robert S DeWitte and Alexey V Ishchenko, and and Shakhnovich, Eugene I
Journal of the American Chemical Society, 1997, 119(20), 4608-4617

In this paper, we summarize three ligand design studies performed using the program SMoG, which was developed in our lab. The aim of this presentation is to communicate through examples the potential of this method:  the richness of the molecules that can be developed and the ease with which they are found. In particular, we present suggestions for ligands to Src SH3 domain (specificity pocket and LP site) and CD4.

## 1996

• SMoG: de Novo Design Method Based on Simple, Fast, and Accurate Free Energy Estimates. 1. Methodology and Supporting Evidence
DeWitte, Robert S and Shakhnovich, Eugene I
Journal of the American Chemical Society, 1996, 118(47), 11733-11744
doi: 10.1021/ja960751u

In this paper, we present SMoG (Small Molecule Growth), a novel, straightforward method for de novo lead design and the evidence for its effectiveness. It is based on a simple model for ligand-protein interactions and a scoring that is directly related to the free energy through a knowledge-based potential. A large number of structures are examined by an efficient metropolis Monte Carlo molecular growth algorithm that generates molecules through the adjoining of functional groups directly in the binding region. Thus SMoG is a method that is able to rank a large number of potential compounds according to binding free energy in a short time. In this sense, SMoG represents a step toward an ideal computational tool for ligand design.

## 1995

• Pro-Ligand - an Approach to De-Novo Molecular Design .4. Application to the Design of Peptides
FRENKEL, D and CLARK, DE and LI, J and Murray, CW and ROBSON, B and WASZKOWYCZ, B and WESTHEAD, DR
Journal of computer-aided molecular design, 1995, 9(3), 213-225

In some instances, peptides can play an important role in the discovery of lead compounds. This paper describes the peptide design facility of the de novo drug design package, PRO_LIGAND. The package provides a unified framework for the design of peptides that are similar or complementary to a specified target. The approach uses single amino acid residues, selected from preconstructed libraries of different residues and conformations, and places them on top of predefined target interaction sites. This approach is a well-tested methodology for the design of organics but has not been used for peptides before. Peptides represent a difficulty because of their great conformational flexibility and a study of the advantages and disadvantages of this simple approach is an important step in the development of design tools. After a description of our general approach, a more detailed discussion of its adaptation to peptides is given. The method is then applied to the design of peptide-based inhibitors to HIV-1 protease and the design of structural mimics of the surface region of lysozyme. The results are encouraging and point the way towards further development of interaction site-based approaches for peptide design.

## 1994

• Multiple Highly Diverse Structures Complementary to Enzyme Binding Sites: Results of Extensive Application of a de Novo Design Method Incorporating Combinatorial Growth
Bohacek, Regine S and McMartin, Colin
Journal of the American Chemical Society, 1994, 116(13), 5560-5571
doi: 10.1021/ja00092a006

Abstract A computer program for de novo molecular design was used to explore the diversity of molecules complementary to the binding sites of enzymes. The program, GrowMol1 (preliminary results presented at the XIIth International Symposium on Medicinal ...

• SPROUT: recent developments in the de novo design of molecules.
Gillet, V J and NEWELL, W and MATA, P and MYATT, G and SIKE, S and Zsoldos, Z and Johnson, A P
Journal of Chemical Information and Computer Sciences, 1994, 34(1), 207-217
PMID: 8144711

SPROUT is a computer program for constrained structure generation. It is designed to generate molecules for a range of applications in molecular recognition. The program uses a number of approximations that enable a wide variety of diverse structures to be generated. Practical use of the program is demonstrated in two examples. The first demonstrates the ability of the program to generate candidate inhibitors for a receptor site of known 3D structure, specifically the GDP binding site of p21. In the second example, structures are generated to fit a pharmacophore hypothesis that models morphine agonists.

## 1992

• The computer program LUDI: A new method for the de novo design of enzyme inhibitors
B hm, Hans-Joachim
Journal of computer-aided molecular design, 1992, 6(1), 61-78
doi: 10.1007/BF00124387

A new computer program is described, which positions small molecules into clefts of protein structures (e.g. an active site of an enzyme) in such a way that hydrogen bonds can be formed with the enzyme and hydrophobic pockets are filled with hydrophobic groups. The program works in three steps. First it calculates interaction sites, which are discrete positions in space suitable to form hydrogen bonds or to fill a hydrophobic pocket. The interaction sites are derived from distributions of nonbonded contacts generated by a search through the Cambridge Structural Database. An alternative route to generate the interaction sites is the use of rules. The second step is the fit of molecular fragments onto the interaction sites. Currently we use a library of 600 fragments for the fitting. The final step in the present program is the connection of some or all of the fitted fragments to a single molecule. This is done by bridge fragments. Applications are presented for the crystal packing of benzoic acid and the enzymes dihydrofolate reductase and trypsin.

• Ludi - Rule-Based Automatic Design of New Substituents for Enzyme-Inhibitor Leads
BOHM, HJ
Journal of computer-aided molecular design, 1992, 6(6), 593-606
PMID: 1291628

Recent advances in a new method for the de novo design of enzyme inhibitors are reported. A new set of rules to define the possible nonbonded contacts between protein and ligand is presented. This method was derived from published statistical analyses of nonbonded contacts in crystal packings of organic molecules and has been implemented in the recently described computer program LUDI. Moreover, LUDI can now append a new substituent onto an already existing ligand. Applications are reported for the design of inhibitors of HIV protease and dihydrofolate reductase. The results demonstrate that LUDI is indeed capable of designing new ligands with improved binding when compared to the reference compound.