Dimensionality Reduction on Topological Features of the Gene Network Constructed from Microarray Data for Prediction of Breast Cancer Recurrence

Document Type : Original Article (s)

Authors

1 Associate Professor, Department of Biomedical Engineering (Bioelectronics), School of Advanced Medical Technology, Isfahan University of Medical Sciences, Isfahan, Iran

2 MSc Student, Department of Biomedical Engineering (Bioelectronics), School of Advanced Medical Technology AND Student Research Committee, Isfahan University of Medical Sciences, Isfahan, Iran

3 Assistant Professor, Department of Biomedical Engineering, School of Advanced Medical Technology, Isfahan University of Medical Sciences, Isfahan, Iran

Abstract

Background: Extracted features from gene expression profiles of DNA microarrays are traditional tools in cancer classification. In this regard, using topological properties of genes through the gene network reconstruction can provide more reliable findings. The main goal of this article is the prediction of breast cancer recurrence via using topological features of the relevance network reconstructed from gene expression profiles.Methods: We utilized seven gene expression microarray datasets, including 1271 samples from seven studies on breast cancer. In this study, the relevance gene network was reconstructed and FDA (Fisher Discriminant Analysis) method was applied for gene selection based on the characteristics of the network topology. To construct the gene network, we needed a profile of expressions for each gene and it could not be obtained from a single sample. Therefore, to classify a test sample, this sample was added to the training data and new gene networks were reconstructed according to two groups of high- and low-risk samples. The correlation coefficient between topological quantity vectors of the networks before and after adding test sample was calculated. The test sample was classified to the group that corresponded to higher correlation between new reconstructed network and the primary labeled network.Findings: The classification accuracy was calculated using 5-fold cross-validation based on both correlation threshold and k-nearest neighbor (kNN) classifier and non-linear support vector machines (SVM) classifier that were applied on the topological properties of reconstructed gene networks. The results confirmed the advantage of applying topological features to the kNN and the non-linear SVM classifiers. The highest accuracy in prediction with the kNN classifier was obtained via degree centrality property that reached 98.5% in average among various numbers of genes.Conclusion: Topological features of reconstructed gene networks from gene expression profiles provided more stable and accurate results in prediction of breast cancer recurrence.

Keywords


  1. Tahergorabi Z, Moodi M, Mesbahzadeh B. Breast cancer: A preventable disease. J Birjand Univ Med Sci 2014; 21(2): 126-41. [In Persian].
  2. Iorio MV, Ferracin M, Liu CG, Veronese A, Spizzo R, Sabbioni S, et al. MicroRNA gene expression deregulation in human breast cancer. Cancer Res 2005; 65(16): 7065-70.
  3. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415(6871): 530-6.
  4. Brambilla C, Fievet F, Jeanmart M, de Fraipont F, Lantuejoul S, Frappat V, et al. Early detection of lung cancer: role of biomarkers. Eur Respir J Suppl 2003; 39: 36s-44s.
  5. Brennan DJ, O'Brien SL, Fagan A, Culhane AC, Higgins DG, Duffy MJ, et al. Application of DNA microarray technology in determining breast cancer prognosis and therapeutic response. Expert Opin Biol Ther 2005; 5(8): 1069-83.
  6. Rapaport F, Zinovyev A, Dutreix M, Barillot E, Vert JP. Classification of microarray data using gene networks. BMC Bioinformatics 2007; 8: 35.
  7. Brazhnik P, de la Fuente A, Mendes P. Gene networks: how to put the function in genomics. Trends Biotechnol 2002; 20(11): 467-72.
  8. Curtis RE, Yuen A, Song L, Goyal A, Xing EP. TVNViewer: an interactive visualization tool for exploring networks that change over time or space. Bioinformatics 2011; 27(13): 1880-1.
  9. Butte AJ, Kohane IS. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput 2000; 418-29.
  10. Akutsu T, Miyano S, Kuhara S. Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac Symp Biocomput 1999; 17-28.
  11. Gevaert O, De SF, Timmerman D, Moreau Y, De MB. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 2006; 22(14): e184-e190.
  12. Sakamoto E, Iba H. Inferring a system of differential equations for a gene regulatory network by using genetic programming. Proceedings of the 2001 Congress on Evolutionary Computation; 2001 May 27-30; Seoul, South Korea.
  13. Hirose O, Yoshida R, Imoto S, Yamaguchi R, Higuchi T, Charnock-Jones DS, et al. Statistical inference of transcriptional module-based gene networks from time course gene expression profiles by using state space models. Bioinformatics 2008; 24(7): 932-42.
  14. Vohradsky J. Neural network model of gene expression. FASEB J 2001; 15(3): 846-54.
  15. Segal E, Shapira M, Regev A, Peer D, Botstein D, Koller D, et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003; 34(2): 166-76.
  16. Liu CC, Chen WSE, Chang PC, Chen JJW. Topological-based classification using artificial gene networks. Proceedings of 4th IEEE Conference on Cognitive Informatics; 2005 Aug 8-10; Irvine, USA.
  17. Liu CC, Chen WS, Lin CC, Liu HC, Chen HY, Yang PC, et al. Topology-based cancer classification and related pathway mining using microarray data. Nucleic Acids Res 2006; 34(14): 4069-80.
  18. Raza K, Jaiswal R. Reconstruction and analysis of cancer-specific generegulatory networks from gene expression profiles. International Journal on Bioinformatics and Biosciences 2013; 3(2): 25-34.
  19. Yang B, Zhang J, Yin Y, Zhang Y. Network-based inference framework for identifying cancer genes from gene expression data. Biomed Res Int 2013; 2013: 401649.
  20. Chuang HY, Lee E, Liu YT, Lee D, Ideker T. Network-based classification of breast cancer metastasis. Mol Syst Biol 2007; 3: 140.
  21. Ahn J, Yoon Y, Park C, Shin E, Park S. Integrative gene network construction for predicting a set of complementary prostate cancer genes. Bioinformatics 2011; 27(13): 1846-53.
  22. Bockmayr M, Klauschen F, Gyorffy B, Denkert C, Budczies J. New network topology approaches reveal differential correlation patterns in breast cancer. BMC Syst Biol 2013; 7: 78.
  23. Muszynski M, Osowski S. Data mining methods for gene selection on the basis of gene expression arrays. Int J Appl Math Comput Sci 2014; 24(3): 657-68.
  24. Wang Y, Yao M, Yang J. NIM: a node influence based method for cancer classification. Comput Math Methods Med 2014; 2014: 826373.
  25. Teng CY, Lin YR, Adamic L. Recipe recommendation using ingredient networks. Proceedings of Web Science; 2012 Jun 22-24; Evanston, IL, USA.
  26. Moradi M, Shafiee Sardasht M, Ebrahimpour M. Bankruptcy prediction by support vector machines and multiple discriminate analysis models. Journal of Scurities Exchang 2012; 18(5): 113-36. [In Persian].
  27. Sutton O. Introduction to k nearest neighbour classification and condensed nearest neighbour data reduction [Online]. [cited 2012 Feb]; Available from: URL:http://www.math.le.ac.uk/people/ag153/homepage/KNN/OliverKNN_Talk.pdf
  28. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning 2002; 46(1-3): 389-422.
  29. Refaielzadeh P, Tang L, Liu H. Cross validation. Proceedings of AAAI Workshop on Evaluation Methods for Machine Learing II; 2007 Jul 22-23; Vancouver, Canada.
  30. Drozdov I, Ouzounis CA, Shah AM, Tsoka S. Functional Genomics Assistant (FUGA): a toolbox for the analysis of complex biological networks. BMC Res Notes 2011; 4: 462.
  31. Hosack DA, Dennis G, Jr., Sherman BT, Lane HC, Lempicki RA. Identifying biological themes within lists of genes with EASE. Genome Biol 2003; 4(10): R70.