Evaluation of Different Classification Models to Extract Gene Signatures for Breast Cancer Recurrence Using Microarray Data

Document Type : Original Article (s)


1 Assistant Professor, Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran

2 Department of Electrical Engineering, Sepahan Institute of Higher Education, Isfahan, Iran


Background: In this study, we aimed to improve the reliability and biological interpretability of gene signatures selected from microarrays by efficient usage of computational models and mathematical algorithms.Methods: At the first step, a good model with high accuracy was chosen to predict cancer recurrence in microarray gene expression data on breast tumors. In this regard, microarray gene expression data of breast tumor in 1271 cancer patients (379 with recurrence and 892 people without recurrence) were utilized to construct an appropriate predictive model for recurrence by comparing the performance of multiple classifiers. In the pre-processing stage, different methods like correlation-based feature selection (CFS), principal component analysis (PCA), independent component analysis (ICA), and genetic algorithm as well as a random selection method were used to reduce the dimensions and choose the most appropriate genes (features).Findings: A total of five gene signatures were selected by combining genetic algorithm, top scoring set (TSS), and random selection method, which showed the best results in most classification models. The final indicator genes were TRIP13, KIF20A, NEK2, RACGAP1 and TYMS, which had significant contribution in the structure of microtubules and spindle and also regulated the attachment of spindle microtubules to kinetochore.Conclusion: By using hybrid models, we can avoid overfitting in training and achieve acceptable accuracy with biologically interpretable genes.


  1. van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 2002; 347(25): 1999-2009.
  2. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005; 365(9460): 671-9.
  3. Sehhati M, Mehridehnavi A, Rabbani H, Pourhossein M. Stable gene signature selection for prediction of breast cancer recurrence using joint mutual information. IEEE /ACM Trans Comput Biol Bioinform 2015; 12(6): 1440-8.
  4. Mehridehnavi A, Zand H, Sehhati M. Dimensionality reduction on topological features of the gene network constructed from microarray data for prediction of breast cancer recurrence. J Isfahan Med Sch 2016; 33(359): 1973-85. [In Persian].
  5. Li J, Lenferink AE, Deng Y, Collins C, Cui Q, Purisima EO, et al. Identification of high-quality cancer prognostic markers and metastasis network modules. Nat Commun 2010; 1: 34.
  6. Zhao X, Rodland EA, Sorlie T, Naume B, Langerod A, Frigessi A, et al. Combining gene signatures improves prediction of breast cancer survival. PLoS One 2011; 6(3): e17845.
  7. Kriti, Virmani J, Dey N, Kumar V. PCA-PNN and PCA-SVM based CAD systems for breast density classification. In: Hassanien AE, Grosan C, Fahmy Tolba M, editors. Applications of intelligent optimization in biology and medicine: Current trends and open problems. New York, NY: Springer; 2016. p. 159-80.
  8. Yang S, Naiman DQ. Multiclass cancer classification based on gene expression comparison. Stat Appl Genet Mol Biol 2014; 13(4): 477-96.
  9. Babatunde O H, Armstrong L, Leng J , Diepeveen D. A genetic algorithm-based feature selection. International Journal of Electronics Communication and Computer Engineering 2014; 5(4): 899-905.
  10. Reddy SVG, Thammi Reddy K, Valli Kumari V, Varma Kamadi VSPR. An SVM based approach to breast cancer classification using RBF and Polynomial kernel functions with varying arguments. International Journal of Computer Science and Information Technologies 2014; 5(4): 5901-4.
  11. Amini Z, Mehridehnavi A. Comparison of different classifiers for prediction of breast cancer metastasis in microarray analysis. J Isfahan Med Sch 2014; 32(292): 1028-35. [In Persian].