Bibliography of computer-aided Drug Design

Updated on 7/18/2014. Currently 2130 references

Binding site prediction / Web services

2013 / 2012 / 2011 / 2010 / 2009 / 2008 / 2006 / 2003 / 1996 /


  • webPDBinder: a server for the identification of ligand binding sites on protein structures.
    Bianchi, Valerio and Mangone, Iolanda and Ferré, Fabrizio and Helmer-Citterich, Manuela and Ausiello, Gabriele
    Nucleic acids research, 2013, 41(Web Server issue), W308-13
    PMID: 23737450     doi: 10.1093/nar/gkt457
    The webPDBinder ( is a web server for the identification of small ligand-binding sites in a protein structure. webPDBinder searches a protein structure against a library of known binding sites and a collection of control non-binding pockets. The number of similarities identified with the residues in the two sets is then used to derive a propensity value for each residue of the query protein associated to the likelihood that the residue is part of a ligand binding site. The predicted binding residues can be further refined using conservation scores derived from the multiple alignment of the PFAM protein family. webPDBinder correctly identifies residues belonging to the binding site in 77% of the cases and is able to identify binding pockets starting from holo or apo structures with comparable performances. This is important for all the real world cases where the query protein has been crystallized without a ligand and is also difficult to obtain clear similarities with bound pockets from holo pocket libraries. The input is either a PDB code or a user-submitted structure. The output is a list of predicted binding pocket residues with propensity and conservation values both in text and graphical format.

  • LISE: a server using ligand-interacting and site-enriched protein triangles for prediction of ligand-binding sites.
    Xie, Zhong-Ru and Liu, Chuan-Kun and Hsiao, Fang-Chih and Yao, Adam and Hwang, Ming-Jing
    Nucleic acids research, 2013, 41(Web Server issue), W292-6
    PMID: 23609546     doi: 10.1093/nar/gkt300
    LISE is a web server for a novel method for predicting small molecule binding sites on proteins. It differs from a number of servers currently available for such predictions in two aspects. First, rather than relying on knowledge of similar protein structures, identification of surface cavities or estimation of binding energy, LISE computes a score by counting geometric motifs extracted from sub-structures of interaction networks connecting protein and ligand atoms. These network motifs take into account spatial and physicochemical properties of ligand-interacting protein surface atoms. Second, LISE has now been more thoroughly tested, as, in addition to the evaluation we previously reported using two commonly used small benchmark test sets and targets of two community-based experiments on ligand-binding site predictions, we now report an evaluation using a large non-redundant data set containing >2000 protein-ligand complexes. This unprecedented test, the largest ever reported to our knowledge, demonstrates LISE's overall accuracy and robustness. Furthermore, we have identified some hard to predict protein classes and provided an estimate of the performance that can be expected from a state-of-the-art binding site prediction server, such as LISE, on a proteome scale. The server is freely available at

  • Nucleos: a web server for the identification of nucleotide-binding sites in protein structures.
    Parca, Luca and Ferré, Fabrizio and Ausiello, Gabriele and Helmer-Citterich, Manuela
    Nucleic acids research, 2013, 41(Web Server issue), W281-5
    PMID: 23703207     doi: 10.1093/nar/gkt390
    Nucleos is a web server for the identification of nucleotide-binding sites in protein structures. Nucleos compares the structure of a query protein against a set of known template 3D binding sites representing nucleotide modules, namely the nucleobase, carbohydrate and phosphate. Structural features, clustering and conservation are used to filter and score the predictions. The predicted nucleotide modules are then joined to build whole nucleotide-binding sites, which are ranked by their score. The server takes as input either the PDB code of the query protein structure or a user-submitted structure in PDB format. The output of Nucleos is composed of ranked lists of predicted nucleotide-binding sites divided by nucleotide type (e.g. ATP-like). For each ranked prediction, Nucleos provides detailed information about the score, the template structure and the structural match for each nucleotide module composing the nucleotide-binding site. The predictions on the query structure and the template-binding sites can be viewed directly on the web through a graphical applet. In 98% of the cases, the modules composing correct predictions belong to proteins with no homology relationship between each other, meaning that the identification of brand-new nucleotide-binding sites is possible using information from non-homologous proteins. Nucleos is available at


  • ProBiS-2012: web server and web services for detection of structurally similar binding sites in proteins.
    Konc, Janez and Janezic, Dusanka
    Nucleic acids research, 2012, 40(W1), W214-W221
    PMID: 22600737     doi: 10.1093/nar/gks435
    The ProBiS web server is a web server for detection of structurally similar binding sites in the PDB and for local pairwise alignment of protein structures. In this article, we present a new version of the ProBiS web server that is 10 times faster than earlier versions, due to the efficient parallelization of the ProBiS algorithm, which now allows significantly faster comparison of a protein query against the PDB and reduces the calculation time for scanning the entire PDB from hours to minutes. It also features new web services, and an improved user interface. In addition, the new web server is united with the ProBiS-Database and thus provides instant access to pre-calculated protein similarity profiles for over 29 000 non-redundant protein structures. The ProBiS web server is particularly adept at detection of secondary binding sites in proteins. It is freely available at, and the new ProBiS web server is at

  • Pocketome: an encyclopedia of small-molecule binding sites in 4D.
    Kufareva, Irina and Ilatovskiy, Andrey V and Abagyan, Ruben
    Nucleic acids research, 2012, 40(1), D535-40
    PMID: 22080553     doi: 10.1093/nar/gkr825
    The importance of binding site plasticity in protein-ligand interactions is well-recognized, and so are the difficulties in predicting the nature and the degree of this plasticity by computational means. To assist in understanding the flexible protein-ligand interactions, we constructed the Pocketome, an encyclopedia of about one thousand experimentally solved conformational ensembles of druggable binding sites in proteins, grouped by location and consistent chain/cofactor composition. The multiplicity of pockets within the ensembles adds an extra, fourth dimension to the Pocketome entry data. Within each ensemble, the pockets were carefully classified by the degree of their pairwise similarity and compatibility with different ligands. The core of the Pocketome is derived regularly and automatically from the current releases of the Protein Data Bank and the Uniprot Knowledgebase; this core is complemented by entries built from manually provided seed ligand locations. The Pocketome website ( allows searching for the sites of interest, analysis of conformational clusters, important residues, binding compatibility matrices and interactive visualization of the ensembles using the ActiveICM web browser plugin. The Pocketome collection can be used to build multi-conformational docking and 3D activity models as well as to design cross-docking and virtual ligand screening benchmarks.

  • PocketAnnotate: towards site-based function annotation.
    Anand, Praveen and Yeturu, Kalidas and Chandra, Nagasuma
    Nucleic acids research, 2012, 40(W1), W400-W408
    PMID: 22618878     doi: 10.1093/nar/gks421
    A computational pipeline PocketAnnotate for functional annotation of proteins at the level of binding sites has been proposed in this study. The pipeline integrates three in-house algorithms for site-based function annotation: PocketDepth, for prediction of binding sites in protein structures; PocketMatch, for rapid comparison of binding sites and PocketAlign, to obtain detailed alignment between pair of binding sites. A novel scheme has been developed to rapidly generate a database of non-redundant binding sites. For a given input protein structure, putative ligand-binding sites are identified, matched in real time against the database and the query substructure aligned with the promising hits, to obtain a set of possible ligands that the given protein could bind to. The input can be either whole protein structures or merely the substructures corresponding to possible binding sites. Structure-based function annotation at the level of binding sites thus achieved could prove very useful for cases where no obvious functional inference can be obtained based purely on sequence or fold-level analyses. An attempt has also been made to analyse proteins of no known function from Protein Data Bank. PocketAnnotate would be a valuable tool for the scientific community and contribute towards structure-based functional inference. The web server can be freely accessed at

  • PepSite: prediction of peptide-binding sites from protein surfaces.
    Trabuco, Leonardo G and Lise, Stefano and Petsalaki, Evangelia and Russell, Robert B
    Nucleic acids research, 2012, 40(Web Server issue), W423-7
    PMID: 22600738     doi: 10.1093/nar/gks398
    Complex biological functions emerge through intricate protein-protein interaction networks. An important class of protein-protein interaction corresponds to peptide-mediated interactions, in which a short peptide stretch from one partner interacts with a large protein surface from the other partner. Protein-peptide interactions are typically of low affinity and involved in regulatory mechanisms, dynamically reshaping protein interaction networks. Due to the relatively small interaction surface, modulation of protein-peptide interactions is feasible and highly attractive for therapeutic purposes. Unfortunately, the number of available 3D structures of protein-peptide interfaces is very limited. For typical cases where a protein-peptide structure of interest is not available, the PepSite web server can be used to predict peptide-binding spots from protein surfaces alone. The PepSite method relies on preferred peptide-binding environments calculated from a set of known protein-peptide 3D structures, combined with distance constraints derived from known peptides. We present an updated version of the web server that is orders of magnitude faster than the original implementation, returning results in seconds instead of minutes or hours. The PepSite web server is available at

  • SiteComp: a server for ligand binding site analysis in protein structures.
    Lin, Yingjie and Yoo, Seungyeul and Sanchez, Roberto
    Bioinformatics (Oxford, England), 2012, 28(8), 1172-1173
    PMID: 22368247     doi: 10.1093/bioinformatics/bts095
    MOTIVATION: Computational characterization of ligand binding sites in proteins provides preliminary information for functional annotation, protein design and ligand optimization. SiteComp implements binding site analysis for comparison of binding sites, evaluation of residue contribution to binding sites, and identification of sub-sites with distinct molecular interaction properties. AVAILABILITY: The SiteComp server and tutorials are freely available at CONTACT: or SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • DoGSiteScorer: a web server for automatic binding site prediction, analysis and druggability assessment.
    Volkamer, Andrea and Kuhn, Daniel and Rippmann, Friedrich and Rarey, Matthias
    Bioinformatics (Oxford, England), 2012, 28(15), 2074-2075
    PMID: 22628523     doi: 10.1093/bioinformatics/bts310
    MOTIVATION: Many drug discovery projects fail because the underlying target is finally found to be undruggable. Progress in structure elucidation of proteins now opens up a route to automatic structure-based target assessment. DoGSiteScorer is a newly developed automatic tool combining pocket prediction, characterization and druggability estimation and is now available through a web server. AVAILABILITY: The DoGSiteScorer web server is freely available for academic use at CONTACT:


  • DEPTH: a web server to compute depth and predict small-molecule binding cavities in proteins.
    Tan, Kuan Pern and Varadarajan, Raghavan and Madhusudhan, M S
    Nucleic acids research, 2011, 39(Web Server issue), W242-8
    PMID: 21576233     doi: 10.1093/nar/gkr356
    Depth measures the extent of atom/residue burial within a protein. It correlates with properties such as protein stability, hydrogen exchange rate, protein-protein interaction hot spots, post-translational modification sites and sequence variability. Our server, DEPTH, accurately computes depth and solvent-accessible surface area (SASA) values. We show that depth can be used to predict small molecule ligand binding cavities in proteins. Often, some of the residues lining a ligand binding cavity are both deep and solvent exposed. Using the depth-SASA pair values for a residue, its likelihood to form part of a small molecule binding cavity is estimated. The parameters of the method were calibrated over a training set of 900 high-resolution X-ray crystal structures of single-domain proteins bound to small molecules (molecular weight <1.5  KDa). The prediction accuracy of DEPTH is comparable to that of other geometry-based prediction methods including LIGSITE, SURFNET and Pocket-Finder (all with Matthew's correlation coefficient of ∼0.4) over a testing set of 225 single and multi-chain protein structures. Users have the option of tuning several parameters to detect cavities of different sizes, for example, geometrically flat binding sites. The input to the server is a protein 3D structure in PDB format. The users have the option of tuning the values of four parameters associated with the computation of residue depth and the prediction of binding cavities. The computed depths, SASA and binding cavity predictions are displayed in 2D plots and mapped onto 3D representations of the protein structure using Jmol. Links are provided to download the outputs. Our server is useful for all structural analysis based on residue depth and SASA, such as guiding site-directed mutagenesis experiments and small molecule docking exercises, in the context of protein functional annotation and drug discovery.

  • Spatial clustering of protein binding sites for template based protein docking.
    Ghoorah, Anisah W and Devignes, Marie-Dominique and Smaïl-Tabbone, Malika and Ritchie, David W
    Bioinformatics (Oxford, England), 2011, 27(20), 2820-2827
    PMID: 21873637     doi: 10.1093/bioinformatics/btr493
    MOTIVATION:In recent years, much structural information on protein domains and their pair-wise interactions has been made available in public databases. However, it is not yet clear how best to use this information to discover general rules or interaction patterns about structural protein-protein interactions. Improving our ability to detect and exploit structural interaction patterns will help to provide a better 3D picture of the known protein interactome, and will help to guide docking-based predictions of the 3D structures of unsolved protein complexes.

  • AADS - An Automated Active Site Identification, Docking, and Scoring Protocol for Protein Targets Based on Physicochemical Descriptors.
    Singh, Tanya and Biswas, D and Jayaram, B.
    Journal of chemical information and modeling, 2011, 51(10), 2515-2527
    PMID: 21877713     doi: 10.1021/ci200193z
    We report here a robust automated active site detection, docking, and scoring (AADS) protocol for proteins with known structures. The active site finder identifies all cavities in a protein and scores them based on the physicochemical properties of functional groups lining the cavities in the protein. The accuracy realized on 620 proteins with sizes ranging from 100 to 600 amino acids with known drug active sites is 100% when the top ten cavity points are considered. These top ten cavity points identified are then submitted for an automated docking of an input ligand/candidate molecule. The docking protocol uses an all atom energy based Monte Carlo method. Eight low energy docked structures corresponding to different locations and orientations of the candidate molecule are stored at each cavity point giving 80 docked structures overall which are then ranked using an effective free energy function and top five structures are selected. The predicted structure and energetics of the complexes agree quite well with experiment when tested on a data set of 170 protein-ligand complexes with known structures and binding affinities. The AADS methodology is implemented on an 80 processor cluster and presented as a freely accessible, easy to use tool at .

  • FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins
    Roche, Daniel B. and Tetchner, Stuart J. and McGuffin, Liam J.
    Bmc Bioinformatics, 2011, 12, 160
    PMID: 21575183     doi: 10.1186/1471-2105-12-160
    Background: The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.Results: The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.Conclusions: The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.

  • Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction
    Zhang, Zengming and Li, Yu and Lin, Biaoyang and Schroeder, Michael and Huang, Bingding
    Bioinformatics (Oxford, England), 2011, 27(15), 2083-2088
    PMID: 21636590     doi: 10.1093/bioinformatics/btr331
    Motivation: Protein-ligand binding sites are the active sites on protein surface that perform protein functions. Thus, the identification of those binding sites is often the first step to study protein functions and structure-based drug design. There are many computational algorithms and tools developed in recent decades, such as LIGSITE(cs/c), PASS, Q-SiteFinder, SURFNET, and so on. In our previous work, MetaPocket, we have proved that it is possible to combine the results of many methods together to improve the prediction result.Results: Here, we continue our previous work by adding four more methods Fpocket, GHECOM, ConCavity and POCASA to further improve the prediction success rate. The new method MetaPocket 2.0 and the individual approaches are all tested on two datasets of 48 unbound/bound and 210 bound structures as used before. The results show that the average success rate has been raised 5% at the top 1 prediction compared with previous work. Moreover, we construct a non-redundant dataset of drug-target complexes with known structure from DrugBank, DrugPort and PDB database and apply MetaPocket 2.0 to this dataset to predict drug binding sites. As a result, > 74% drug binding sites on protein target are correctly identified at the top 3 prediction, and it is 12% better than the best individual approach.

  • Predicting Protein-Ligand Binding Sites Based on an Improved Geometric Algorithm.
    He, Jing and Wei, Dong-Qing and Wang, Jing-Fang and Chou, Kuo-Chen
    Protein and peptide letters, 2011, 18(10), 997-1001
    PMID: 21592081    
    Knowledge of protein-ligand binding sites is very important for structure-based drug designs. To get information on the binding site of a targeted protein with its ligand in a timely way, many scientists tried to resort to computational methods. Although several methods have been released in the past few years, their accuracy needs to be improved. In this study, based on the combination of incremental convex hull, traditional geometric algorithm, and solvent accessible surface of proteins, we developed a novel approach for predicting the protein-ligand binding sites. Using PDBbind database as a benchmark dataset and comparing the new approach with the existing methods such as POCKET, Q-SiteFinder, MOE-SiteFinder, and PASS, we found that the new method has the highest accuracy for the Top 2 and Top 3 predictions. Furthermore, our approach can not only successfully predict the protein-ligand binding sites but also provide more detailed information for the interactions between proteins and ligands. It is anticipated that the new method may become a useful tool for drug development, or at least play a complementary role to the other existing methods in this area.


  • fpocket: online tools for protein ensemble pocket detection and tracking.
    Schmidtke, Peter and Le Guilloux, Vincent and Maupetit, Julien and Tuffery, Pierre
    Nucleic acids research, 2010, 38(Web Server issue), W582-9
    PMID: 20478829     doi: 10.1093/nar/gkq383
    Computational small-molecule binding site detection has several important applications in the biomedical field. Notable interests are the identification of cavities for structure-based drug discovery or functional annotation of structures. fpocket is a small-molecule pocket detection program, relying on the geometric alpha-sphere theory. The fpocket web server allows: (i) candidate pocket detection-fpocket; (ii) pocket tracking during molecular dynamics, in order to provide insights into pocket dynamics-mdpocket; and (iii) a transposition of mdpocket to the combined analysis of homologous structures-hpocket. These complementary online tools allow to tackle various questions related to the identification and annotation of functional and allosteric sites, transient pockets and pocket preservation within evolution of structural families. The server and documentation are freely available at

  • SMAP-WS: a parallel web service for structural proteome-wide ligand-binding site comparison.
    Ren, Jingyuan and Xie, Lei and Li, Wilfred W and Bourne, Philip E
    Nucleic acids research, 2010, 38(Web Server issue), W441-4
    PMID: 20484373     doi: 10.1093/nar/gkq400
    The proteome-wide characterization and analysis of protein ligand-binding sites and their interactions with ligands can provide pivotal information in understanding the structure, function and evolution of proteins and for designing safe and efficient therapeutics. The SMAP web service (SMAP-WS) meets this need through parallel computations designed for 3D ligand-binding site comparison and similarity searching on a structural proteome scale. SMAP-WS implements a shape descriptor (the Geometric Potential) that characterizes both local and global topological properties of the protein structure and which can be used to predict the likely ligand-binding pocket [Xie,L. and Bourne,P.E. (2007) A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand-binding sites. BMC bioinformatics, 8 (Suppl. 4.), S9.]. Subsequently a sequence order independent profile-profile alignment (SOIPPA) algorithm is used to detect and align similar pockets thereby finding protein functional and evolutionary relationships across fold space [Xie, L. and Bourne, P.E. (2008) Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. Natl Acad. Sci. USA, 105, 5441-5446]. An extreme value distribution model estimates the statistical significance of the match [Xie, L., Xie, L. and Bourne, P.E. (2009) A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics, 25, i305-i312.]. These algorithms have been extensively benchmarked and shown to outperform most existing algorithms. Moreover, several predictions resulting from SMAP-WS have been validated experimentally. Thus far SMAP-WS has been applied to predict drug side effects, and to repurpose existing drugs for new indications. SMAP-WS provides both a user-friendly web interface and programming API for scientists to address a wide range of compute intense questions in biology and drug discovery. SMAP-WS is available from the URL

  • 3DLigandSite: predicting ligand-binding sites using similar structures
    Wass, Mark N. and Kelley, Lawrence A. and Sternberg, Michael J. E.
    Nucleic acids research, 2010, 38(Web Server issue), W469-W473
    PMID: 20513649     doi: 10.1093/nar/gkq406
    3DLigandSite is a web server for the prediction of ligand-binding sites. It is based upon successful manual methods used in the eighth round of the Critical Assessment of techniques for protein Structure Prediction (CASP8). 3DLigandSite utilizes protein-structure prediction to provide structural models for proteins that have not been solved. Ligands bound to structures similar to the query are superimposed onto the model and used to predict the binding site. In benchmarking against the CASP8 targets 3DLigandSite obtains a Matthew's correlation co-efficient (MCC) of 0.64, and coverage and accuracy of 71 and 60%, respectively, similar results to our manual performance in CASP8. In further benchmarking using a large set of protein structures, 3DLigandSite obtains an MCC of 0.68. The web server enables users to submit either a query sequence or structure. Predictions are visually displayed via an interactive Jmol applet. 3DLigandSite is available for use at

  • Detection of multiscale pockets on protein surfaces using mathematical morphology.
    Kawabata, Takeshi
    Proteins, 2010, 78(5), 1195-1211
    PMID: 19938154     doi: 10.1002/prot.22639
    Detection of pockets on protein surfaces is an important step toward finding the binding sites of small molecules. In a previous study, we defined a pocket as a space into which a small spherical probe can enter, but a large probe cannot. The radius of the large probes corresponds to the shallowness of pockets. We showed that each type of binding molecule has a characteristic shallowness distribution. In this study, we introduced fundamental changes to our previous algorithm by using a 3D grid representation of proteins and probes, and the theory of mathematical morphology. We invented an efficient algorithm for calculating deep and shallow pockets (multiscale pockets) simultaneously, using several different sizes of spherical probes (multiscale probes). We implemented our algorithm as a new program, ghecom (grid-based HECOMi finder). The statistics of calculated pockets for the structural dataset showed that our program had a higher performance of detecting binding pockets, than four other popular pocket-finding programs proposed previously. The ghecom also calculates the shallowness of binding ligands, R(inaccess) (minimum radius of inaccessible spherical probes) that can be obtained from the multiscale molecular volume. We showed that each part of the binding molecule had a bias toward a specific range of shallowness. These findings will be useful for predicting the types of molecules that will be most likely to bind putative binding pockets, as well as the configurations of binding molecules. The program ghecom is available through the Web server (

  • Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere.
    Yu, Jian and Zhou, Yong and Tanaka, Isao and Yao, Min
    Bioinformatics (Oxford, England), 2010, 26(1), 46-52
    PMID: 19846440     doi: 10.1093/bioinformatics/btp599
    MOTIVATION:Prediction of ligand binding sites of proteins is significant as it can provide insight into biological functions and reaction mechanisms of proteins. It is also a prerequisite for protein-ligand docking and an important step in structure-based drug design.

  • Knowledge-based annotation of small molecule binding sites in proteins.
    Thangudu, Ratna R and Tyagi, Manoj and Shoemaker, Benjamin A and Bryant, Stephen H and Panchenko, Anna R and Madej, Thomas
    Bmc Bioinformatics, 2010, 11, 365
    PMID: 20594344     doi: 10.1186/1471-2105-11-365
    BACKGROUND:The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity.


  • FINDSITE: a combined evolution/structure-based approach to protein function prediction.
    Skolnick, Jeffrey and Brylinski, Michal
    Briefings in bioinformatics, 2009, 10(4), 378-391
    PMID: 19324930     doi: 10.1093/bib/bbp017
    A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the approximately 50% of ORFs of unassigned function in an average proteome. We then describe FINDSITE, a recently developed algorithm for ligand binding site prediction, ligand screening and molecular function prediction, which is based on binding site conservation across evolutionary distant proteins identified by threading. Importantly, FINDSITE gives comparable results when high-resolution experimental structures as well as predicted protein models are used.

  • Accurate prediction of peptide binding sites on protein surfaces.
    Petsalaki, Evangelia and Stark, Alexander and García-Urdiales, Eduardo and Russell, Robert B
    PLoS computational biology, 2009, 5(3), e1000335
    PMID: 19325869     doi: 10.1371/journal.pcbi.1000335
    Many important protein-protein interactions are mediated by the binding of a short peptide stretch in one protein to a large globular segment in another. Recent efforts have provided hundreds of examples of new peptides binding to proteins for which a three-dimensional structure is available (either known experimentally or readily modeled) but where no structure of the protein-peptide complex is known. To address this gap, we present an approach that can accurately predict peptide binding sites on protein surfaces. For peptides known to bind a particular protein, the method predicts binding sites with great accuracy, and the specificity of the approach means that it can also be used to predict whether or not a putative or predicted peptide partner will bind. We used known protein-peptide complexes to derive preferences, in the form of spatial position specific scoring matrices, which describe the binding-site environment in globular proteins for each type of amino acid in bound peptides. We then scan the surface of a putative binding protein for sites for each of the amino acids present in a peptide partner and search for combinations of high-scoring amino acid sites that satisfy constraints deduced from the peptide sequence. The method performed well in a benchmark and largely agreed with experimental data mapping binding sites for several recently discovered interactions mediated by peptides, including RG-rich proteins with SMN domains, Epstein-Barr virus LMP1 with TRADD domains, DBC1 with Sir2, and the Ago hook with Argonaute PIWI domain. The method, and associated statistics, is an excellent tool for predicting and studying binding sites for newly discovered peptides mediating critical events in biology.

  • MetaPocket: A Meta Approach to Improve Protein Ligand Binding Site Prediction
    Huang, Bingding
    Omics-a Journal of Integrative Biology, 2009, 13(4), 325-330
    PMID: 19645590     doi: 10.1089/omi.2009.0045
    The identification of ligand-binding sites is often the starting point for protein function annotation and structure-based drug design. Many computational methods for the prediction of ligand-binding sites have been developed in recent decades. Here we present a consensus method metaPocket, in which the predicted sites from four methods: LIGSITE(cs), PASS, Q-SiteFinder, and SURFNET are combined together to improve the prediction success rate. All these methods are evaluated on two datasets of 48 unbound/bound structures and 210 bound structures. The comparison results show that metaPocket improves the success rate from similar to 70 to 75% at the top 1 prediction. MetaPocket is available at

  • SplitPocket: identification of protein functional surfaces and characterization of their spatial patterns.
    Tseng, Yan Yuan and Dupree, Craig and Chen, Z Jeffrey and Li, Wen-Hsiung
    Nucleic acids research, 2009, 37(Web Server issue), W384-9
    PMID: 19406922     doi: 10.1093/nar/gkp308
    SplitPocket ( is a web server to identify functional surfaces of protein from structure coordinates. Using the Alpha Shape Theory, we previously developed an analytical approach to identify protein functional surfaces by the geometric concept of a split pocket, which is a pocket split by a binding ligand. Our geometric approach extracts site-specific spatial information from coordinates of structures. To reduce the search space, probe radii are designed according to the physicochemical textures of molecules. The method uses the weighted Delaunay triangulation and the discrete flow algorithm to obtain geometric measurements and spatial patterns for each predicted pocket. It can also measure the hydrophobicity on a surface patch. Furthermore, we quantify the evolutionary conservation of surface patches by an index derived from the entropy scores in HSSP (homology-derived secondary structure of proteins). We have used the method to examine approximately 1.16 million potential pockets and identified the split pockets in >26,000 structures in the Protein Data Bank. This integrated web server of functional surfaces provides a source of spatial patterns to serve as templates for predicting the functional surfaces of unbound structures involved in binding activities. These spatial patterns should also be useful for protein functional inference, structural evolution and drug design.

  • EasyMIFS and SiteHound: a toolkit for the identification of ligand-binding sites in protein structures.
    Ghersi, Dario and Sanchez, Roberto
    Bioinformatics (Oxford, England), 2009, 25(23), 3185-3186
    PMID: 19789268     doi: 10.1093/bioinformatics/btp562
    SiteHound uses Molecular Interaction Fields (MIFs) produced by EasyMIFs to identify protein structure regions that show a high propensity for interaction with ligands. The type of binding site identified depends on the probe atom used in the MIF calculation. The input to EasyMIFs is a PDB file of a protein structure; the output MIF serves as input to SiteHound, which in turn produces a list of putative binding sites. Extensive testing of SiteHound for the detection of binding sites for drug-like molecules and phosphorylated ligands has been carried out. AVAILABILITY: EasyMIFs and SiteHound executables for Linux, Mac OS X, and MS Windows operating systems are freely available for download from SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • SITEHOUND-web: a server for ligand binding site identification in protein structures.
    Hernandez, Marylens and Ghersi, Dario and Sanchez, Roberto
    Nucleic acids research, 2009, 37(Web Server issue), W413-6
    PMID: 19398430     doi: 10.1093/nar/gkp281
    SITEHOUND-web ( is a binding-site identification server powered by the SITEHOUND program. Given a protein structure in PDB format SITEHOUND-web will identify regions of the protein characterized by favorable interactions with a probe molecule. These regions correspond to putative ligand binding sites. Depending on the probe used in the calculation, sites with preference for different ligands will be identified. Currently, a carbon probe for identification of binding sites for drug-like molecules, and a phosphate probe for phosphorylated ligands (ATP, phoshopeptides, etc.) have been implemented. SITEHOUND-web will display the results in HTML pages including an interactive 3D representation of the protein structure and the putative sites using the Jmol java applet. Various downloadable data files are also provided for offline data analysis.

  • Fragment-based identification of druggable 'hot spots' of proteins using Fourier domain correlation techniques.
    Brenke, Ryan and Kozakov, Dima and Chuang, Gwo-Yu and Beglov, Dmitri and Hall, David and Landon, Melissa R and Mattos, Carla and Vajda, Sandor
    Bioinformatics (Oxford, England), 2009, 25(5), 621-627
    PMID: 19176554     doi: 10.1093/bioinformatics/btp036
    MOTIVATION:The binding sites of proteins generally contain smaller regions that provide major contributions to the binding free energy and hence are the prime targets in drug design. Screening libraries of fragment-sized compounds by NMR or X-ray crystallography demonstrates that such 'hot spot' regions bind a large variety of small organic molecules, and that a relatively high 'hit rate' is predictive of target sites that are likely to bind drug-like ligands with high affinity. Our goal is to determine the 'hot spots' computationally rather than experimentally.


  • PocketDepth: A new depth based algorithm for identification of ligand binding sites in proteins
    Kalidas, Yeturu and Chandra, Nagasuma
    Journal of Structural Biology, 2008, 161(1), 31-42
    PMID: 17949996     doi: 10.1016/j.jsb.2007.09.005
    Predicting functional sites in proteins is important in structural biology for understanding the function and also for structure-based drug design. Here we report a new binding site prediction method PocketDepth, which is geometry based and uses a depth based clustering. Depth is an important parameter considered during protein structure visualisation and analysis but has been used more often intuitively than systematically. Our current implementation of depth reflects how central a given subspace is to a putative pocket. We have tested the algorithm against PDBbind, a large curated set of 1091 proteins. A prediction was considered a true-positive if the predicted pocket had at least 10% overlap with the actual ligand. Two different parameter sets, 'deeper' and 'surface' were used, for wider coverage of different types of binding sites in proteins. With deeper parameters, true-positives were observed for 841 proteins, resulting in a prediction accuracy of 77%, for any ranked prediction. Of these, 55.2% were first ranked predictions, whereas 91.2% and 97.4% were covered in the first 5 and 10 ranks, respectively. With the 'surface' parameters, a prediction rate of 95.8% was observed, albeit with much poorer ranks. The deeper set identified pocket boundaries more precisely and yielded better ranks, while the latter missed fewer predictions and hence had better coverage. The two parameter sets were therefore algorithmically combined, resulting in prediction accuracies of 96.5% for any ranked prediction. About 41.8% of these were in the first rank, 82% and 94% were in top 5 and 10 ranks, respectively. The algorithm is available at (c) 2007 Elsevier Inc. All rights reserved.


  • CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues.
    Dundas, Joe and Ouyang, Zheng and Tseng, Jeffery and Binkowski, Andrew and Turpaz, Yaron and Liang, Jie
    Nucleic acids research, 2006, 34(Web Server issue), W116-8
    PMID: 16844972     doi: 10.1093/nar/gkl282
    Cavities on a proteins surface as well as specific amino acid positioning within it create the physicochemical properties needed for a protein to perform its function. CASTp ( is an online tool that locates and measures pockets and voids on 3D protein structures. This new version of CASTp includes annotated functional information of specific residues on the protein structure. The annotations are derived from the Protein Data Bank (PDB), Swiss-Prot, as well as Online Mendelian Inheritance in Man (OMIM), the latter contains information on the variant single nucleotide polymorphisms (SNPs) that are known to cause disease. These annotated residues are mapped to surface pockets, interior voids or other regions of the PDB structures. We use a semi-global pair-wise sequence alignment method to obtain sequence mapping between entries in Swiss-Prot, OMIM and entries in PDB. The updated CASTp web server can be used to study surface features, functional regions and specific roles of key residues of proteins.

  • LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation.
    Huang, Bingding and Schroeder, Michael
    BMC structural biology, 2006, 6, 19
    PMID: 16995956     doi: 10.1186/1472-6807-6-19
    BACKGROUND:Identifying pockets on protein surfaces is of great importance for many structure-based drug design applications and protein-ligand docking algorithms. Over the last ten years, many geometric methods for the prediction of ligand-binding sites have been developed.

  • CAVER: a new tool to explore routes from protein clefts, pockets and cavities.
    Petrek, Martin and Otyepka, Michal and Banás, Pavel and Kosinová, Pavlína and Koca, Jaroslav and Damborsk{\'y Jirí
    Bmc Bioinformatics, 2006, 7, 316
    PMID: 16792811     doi: 10.1186/1471-2105-7-316
    BACKGROUND:The main aim of this study was to develop and implement an algorithm for the rapid, accurate and automated identification of paths leading from buried protein clefts, pockets and cavities in dynamic and static protein structures to the outside solvent.


  • A new bioinformatic approach to detect common 3D sites in protein structures.
    Jambon, Martin and Imberty, Anne and Deléage, Gilbert and Geourjon, Christophe
    Proteins, 2003, 52(2), 137-145
    PMID: 12833538     doi: 10.1002/prot.10339
    An innovative bioinformatic method has been designed and implemented to detect similar three-dimensional (3D) sites in proteins. This approach allows the comparison of protein structures or substructures and detects local spatial similarities: this method is completely independent from the amino acid sequence and from the backbone structure. In contrast to already existing tools, the basis for this method is a representation of the protein structure by a set of stereochemical groups that are defined independently from the notion of amino acid. An efficient heuristic for finding similarities that uses graphs of triangles of chemical groups to represent the protein structures has been developed. The implementation of this heuristic constitutes a software named SuMo (Surfing the Molecules), which allows the dynamic definition of chemical groups, the selection of sites in the proteins, and the management and screening of databases. To show the relevance of this approach, we focused on two extreme examples illustrating convergent and divergent evolution. In two unrelated serine proteases, SuMo detects one common site, which corresponds to the catalytic triad. In the legume lectins family composed of >100 structures that share similar sequences and folds but may have lost their ability to bind a carbohydrate molecule, SuMo discriminates between functional and non-functional lectins with a selectivity of 96%. The time needed for searching a given site in a protein structure is typically 0.1 s on a PIII 800MHz/Linux computer; thus, in further studies, SuMo will be used to screen the PDB.


  • An evolutionary trace method defines binding surfaces common to protein families.
    Lichtarge, O and Bourne, H R and Cohen, F E
    Journal of molecular biology, 1996, 257(2), 342-358
    PMID: 8609628     doi: 10.1006/jmbi.1996.0167
    X-ray or NMR structures of proteins are often derived without their ligands, and even when the structure of a full complex is available, the area of contact that is functionally and energetically significant may be a specialized subset of the geometric interface deduced from the spatial proximity between ligands. Thus, even after a structure is solved, it remains a major theoretical and experimental goal to localize protein functional interfaces and understand the role of their constituent residues. The evolutionary trace method is a systematic, transparent and novel predictive technique that identifies active sites and functional interfaces in proteins with known structure. It is based on the extraction of functionally important residues from sequence conservation patterns in homologous proteins, and on their mapping onto the protein surface to generate clusters identifying functional interfaces. The SH2 and SH3 modular signaling domains and the DNA binding domain of the nuclear hormone receptors provide tests for the accuracy and validity of our method. In each case, the evolutionary trace delineates the functional epitope and identifies residues critical to binding specificity. Based on mutational evolutionary analysis and on the structural homology of protein families, this simple and versatile approach should help focus site-directed mutagenesis studies of structure-function relationships in macromolecules, as well as studies of specificity in molecular recognition. More generally, it provides an evolutionary perspective for judging the functional or structural role of each residue in protein structure.