Classification and Similarity Analysis of Binding-Database: A Survey on Application of Multi-Class Classifiers for Deriving General Rules from Large Compound Databases

Document Type : Original Article (s)


1 MSc Student, Department of Biomedical Engineering AND Student Research Committee, School of Advanced Medical Technology, Isfahan University of Medical Sciences, Isfahan, Iran

2 Assistant Professor, Department of Chemistry, School of Basic Sciences, Tarbiat Modares University, Tehran, Iran


Background: In this research, we extracted and modified features of active ligands related to specific biological targets with combination of data mining and classification methods to aid medicinal chemists in their drug discovery projects. Preparing an inactive ligand is the major problem for development of multi-class classifiers. Therefore, our models were developed based on only active ligands found in Binding-database (DB) without any needs for preparing inactive molecules.Methods: Our database consisted of 160372 ligands in 45 classes of common proteins and 1497 different features (topological, chemistry, physical, etc.) were calculated for each molecule. Then, the specific features of active ligands of any target were extracted based on combination of linear discriminate analysis and Apriori algorithm.Findings: Receiver operating characteristic (ROC) was a useful operator to analysis the accuracy and sensitivity of classification models and retrieving molecules from ZINC and Binding-DB databases. Area under curve (AUC) of this diagram was evaluated for analysis of each target in Zinc and Binding-DB and their results were 0.8341 ± 0.1495 and 0.8615 ± 0.1502, respectively.Conclusion: Specific features of active ligands could be found using the methodology described in this work and with these features, we can sort each database based on corresponding target. AUC shows that the present method is useful for virtual screening in big databases without survey on inactive ligands.


  1. Palanisam SK. Association rule based classi cation [MSc Thesis]. Worcester, MA: Worcester Polytechnic Institute; 2006.
  2. Vaidya J, Clifton C. Privacy preserving association rule mining in vertically partitioned data. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002 Jul 23-25; Edmonton, AB, Canada. New York, NY; ACM; p. 639-44.
  3. Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases; 1994 Sep 12-15; Santiego, Chile. San Francisco, CA: Morgan Kaufmann Publishers Inc; p. 487-99.
  4. Chen X, Lin Y, Liu M, Gilson MK. The Binding Database: data management and interface design. Bioinformatics 2002; 18(1): 130-9.
  5. Chuprina A, Lukin O, Demoiseaux R, Buzko A, Shivanyuk A. Drug- and lead-likeness, target class, and molecular diversity analysis of 7.9 million commercially available organic compounds provided by 29 suppliers. J Chem Inf Model 2010; 50(4): 470-9.
  6. Li AP. Preclinical in vitro screening assays for drug-
  7. like properties. Drug Discov Today Technol 2005; 2(2): 179-85.
  8. Camp D, Davis RA, Campitelli M, Ebdon J, Quinn RJ. Drug-like properties: guiding principles for the design of natural product libraries. J Nat Prod 2012; 75(1): 72-81.
  9. Hou T, Wang J, Li Y. ADME evaluation in drug discovery. 8. The prediction of human intestinal absorption by a support vector machine. J Chem Inf Model 2007; 47(6): 2408-15.
  10. Ghose AK, Viswanadhan VN, Wendoloski JJ. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J Comb Chem 1999; 1(1): 55-68.
  11. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 2001; 46(1-3): 3-26.
  12. Mani-Varnosfaderani A, Valadkhani A, Jalali-Heravi M. CS-MINER: A tool for association mining in binding-database. Mol Inform 2015; 34(4): 185-96.