Document Type : Original Article


1 Department of Biostatistics, School of Public Health, Iran University of Medical Sciences, Tehran, Iran

2 Department of Medical Biotechnology, Faculty of Allied Medical Sciences, Iran University of Medical Sciences, Tehran, Iran


Background: Breast cancer is one of the most prevalent types of cancer in Iranian women and the second cause of death in women worldwide. Gene mutations are the key determinants of the disease; therefore, the genetic study of this disease is of paramount importance. One of the genetic evaluation methods of this disease is microarray technology, which allows the examination of the simultaneous expression of thousands of genes. Clustering is the method for analyzing high-dimension data, which we used in the present research for collecting similar genes in separated clusters.
Method: A descriptive and inferential statistical analysis was carried out to evaluate unsupervised learning models of gene expression analysis and five bi-clustering methods (including PLAID (PL), Fabia, Bimax, Cheng & Church (CC), and Xmotif) were compared. For this purpose, we obtained the microarray gene expression data for lapatinib-resistant breast cancer cell lines from previously published research. The enrichment efficacy of the clusters was evaluated with gene ontology, and the results of these five models were compared with the Jaccard index, variance stability, least-square error, and goodness of fit indices. Furthermore, the results of the best model were assessed for building a genes sets network with Bayesian networks.
Results: After preprocessing, clustering was performed on the data with the dimension (4710 × 18) of the genes. Four models, except for CC, successfully found bi-clusters in the data set. The data evaluation revealed that the results of the models were almost the same, but the PL model performed better than the others, finding 11 bi-clusters; this model was used to build the network of gene sets.
Conclusion: According to the results, the PL method was suitable for clustering the data. Accordingly, it could be recommended for data analysis. In addition, the gene sets network formed on gene expression data was incompetent.


How to cite this article:

Sohrabi A, Saraygord-Afshari N, Roudbari M. The application of bi-clustering and Bayesian network for gene sets network construction in breast cancer microarray data. Middle East J Cancer. 2022;13(4):624-40. doi: 10.30476/mejc.2022.89998.1557.

1.Crick F. Central dogma of molecular biology. Nature. 1970;227(5258):561-3. doi: 10.1038/227561a0.
2.Lee JK. Analysis issues for gene expression array data. Clin Chem. 2001;47(8):1350-2. doi: 10.1093/ clinchem/47.8.1350.
3.Divina F, Aguilar-Ruiz JS. Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng. 2006;18:590-602. doi: 10.1109/tkde.2006. 74.
4.Smith RA, Cokkinides V, von Eschenbach AC, Levin B, Cohen C, Runowicz CD, et al. American Cancer Society guidelines for the early detection of cancer. CA Cancer J Clin. 2002;52(1):8-22. doi: 10.3322/ canjclin.52.1.8.
5.Society AC. Cancer News: American Cancer Society, 1947. [Access date: 7/30/2019]. Available from:
6.Vlahovic TA, Wang YC, Kraut RE, Levine JM. Support matching and satisfaction in an online breast cancer support community. The conference on human factors in computing systems; 2014 April 26- May1;Toronto, Canada: ACM SIGCH; 1625-34 p. doi: 10.1145/2556288.2557108
7.Savad S, Mehdipour P, Miryounesi M, Shirkoohi R, Fereidooni F, Mansouri F, et al. Expression analysis of MiR-21, MiR-205, and MiR-342 in breast cancer in Iran. Asian Pac J Cancer Prev. 2012;13(3):873-7. doi: 10.7314/apjcp.2012.13.3.873.
8.Azizi F, Hatami H, Janghorbani M. Epidemiology and control of common diseases in Iran. 2nd ed. Tehran: Khosravi Pub; 2011. p. 542-557.
9.Beltrame F, Papadimitropoulos A, Porro I, Scaglione S, Schenone A, Torterolo L, et al. GEMMAóA Grid environment for microarray management and analysis in bone marrow stem cells experiments. Future Gener Comput Syst. 2007;23:382-90. doi: 10.1016/j.future. 2006.07.008.
10.Knudsen S. Guide to analysis of DNA microarray data. 2nd ed. New York: John Wiley & Sons, Inc.; 2004. p. 23-110.
11.Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinform. 2002;18:S136-S44. doi: 10.1093/ bioinformatics/18.suppl_1.s136
12.Adomas A, Heller G, Olson Å, Osborne J, Karlsson M, Nahalkova J, et al. Comparative analysis of transcript abundance in Pinus sylvestris after challenge with a saprotrophic, pathogenic or mutualistic fungus. Tree Physiol. 2008;28:885-97. doi: 10.1093/treephys/ 28.6.885.
13.Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet. 1999;23:41-6. doi: 10.1038/12640.
14.Hacia JG, Fan JB, Ryder O, Jin L, Edgemon K, Ghandour G, et al. Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays. Nat Genet. 1999;22:164-7. doi: 10.1038/9674.
15.Fei X, Lu S, Pop HF, Liang LR. GFBA: A biclustering algorithm for discovering value-coherent biclusters. In: Mändoiu I, Zelikovsky A, editors. Bioinformatics research and applications. ISBRA 2007. Lecture Notes in Computer Science, vol 4463; 2007. Springer, Berlin, Heidelberg.
16.Gu J, Liu JS. Bayesian biclustering of gene expression data. BMC Genomics. 2008;9 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-9-S1-S4.
17.Yang MS. A survey of fuzzy clustering. Math Comput Model. 1993;18:1-16. doi: 10.1016/0895-7177(93) 90202-A
18.Zhang Y, Wang H, Hu Z. A novel clustering and verification based microarray data bi-clustering method. In: Tan Y, Shi Y, Tan KC, editors. Advances in swarm intelligence. ICSI 2010. Lecture Notes in Computer Science, vol 6146; 2010. Springer, Berlin, Heidelberg.
19.Gan X, Liew AW-C, Yan H. Discovering biclusters in gene expression data based on high-dimensional linear geometries. BMC Bioinform. 2008;9:1-15. doi: 10.1186/1471-2105-9-209.
20.Tanay A, Sharan R, Shamir R. Biclustering algorithms: A survey. Handbook of computational molecular biology. Chapman and Hall/CRC Financial Mathematics Series. 1st ed. USA: Taylor & Francis; 2005. 9,122-4 p.
21.Xie J, Ma A, Fennell A, Ma Q, Zhao J. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform. 2019;20(4):1449-64. doi: 10.1093/bib/bby014.
22.Hartigan JA. Direct clustering of a data matrix. J Am Stat Assoc. 1972;67:123-9. doi: 10.2307/2284710.
23.Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000;8:93-103.
24.Lazzeroni L, Owen A. Plaid models for gene expression data. Statistica Sinica. 2002;12(1):61-86.
25.Murali T, Kasif S. Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput. 2003:77-88. doi: 10.1142/ 9789812776303_0008.
26.Padilha VA, Campello RJ. A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics. 2017;18(1):55. doi: 10.1186/s12859-017-1487-1.
27.Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, et al. FABIA: factor analysis for bicluster acquisition. Bioinformatics. 2010;26(12): 1520-7. doi: 10.1093/bioinformatics/btq227.
28.Abdalla M, Tran-Thanh D, Moreno J, Iakovlev V, Nair R, Kanwar N, et al. Mapping genomic and transcriptomic alterations spatially in epithelial cells adjacent to human breast carcinoma. Nat Commun. 2017;8:1-11. doi: 10.1038/s41467-017-01357-y.
29.Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyze expression data. J Comput Biol. 2000;7:601-20. doi: 10.1089/ 106652700750050961.
30.Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc; 1988.p.77-93.
31.Ma S, Huang J, Shen S. Identification of cancer-associated gene clusters and genes via clustering penalization. Stat Interface. 2009;2(1):1-11. doi: 10.4310/sii.2009.v2.n1.a1.
32.Komurov K, Tseng JT, Muller M, Seviour EG, Moss TJ, Yang L, et al. The glucose-deprivation network counteracts lapatinib-induced toxicity in resistant ErbB2-positive breast cancer cells. Mol Syst Biol. 2012;8:596. doi: 10.1038/msb.2012.25.
33.Nielsen TD, Jensen FV. Bayesian networks and decision graphs: Information Science and Statistics. 2nd ed. New York: Springer-Verlag; 2007.p.35-74.
34.Parry R, Jones W, Stokes T, Phan J, Moffitt R, Fang H, et al. k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J. 2010;10:292-309. doi: 10.1038/tpj.2010.56.
35.Liu X, Li N, Liu S, Wang J, Zhang N, Zheng X, et al. Normalization methods for the analysis of unbalanced transcriptome data: a review. Front Bioeng Biotechnol. 2019;7:358. doi: 10.3389/fbioe.2019.00358.
36.Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinform. 2006;22:1122-9. doi: 10.1093/ bioinformatics/btl060.
37.Dey DK, Ghosh S, Mallick BK. Bayesian modeling in bioinformatics. 1st ed. Boca Raton. Florida: CRC Press, 2010.p.221-270.
38.Oghabian A, Kilpinen S, Hautaniemi S, Czeizler E. Biclustering methods: biological relevance and application in gene expression analysis. PLoS One. 2014;9:e90801. doi: 10.1371/journal.pone.0090801.
39.Ranganathan S, Nakai K, Schonbach C. Encyclopedia of bioinformatics and computational biology: ABC of bioinformatics: 1st ed. Amsterdam: Elsevier Science, 2019.p.135-217.
40.Naghizadeh Jahromi MM, Hajizadeh E, Kazmnejad A. cDNA microarray data normalization. Iran J Biotechnol. 2005;3(1):55-63.
41.Okada Y, Fujibuchi W, Horton P. A biclustering method for gene expression module discovery using a closed itemset enumeration algorithm. IPSJ Digital Courier. 2007;3:183-92. doi: 10.2197/ipsjdc.3.183.
42.Waks AG, Winer EP. Breast cancer treatment: A review. JAMA. 2019;321(3):288-300. doi: 10.1001/jama. 2018.19323.
43.Sung H, DeSantis CE, Fedewa SA, Kantelhardt EJ, Jemal A. Breast cancer subtypes among Eastern-African-born black women and other black women in the United States. Cancer. 2019;125(19):3401-11. doi: 10.1002/cncr.32293.
44.Bui MM, Riben MW, Allison KH, Chlipala E, Colasacco C, Kahn AG, et al. Quantitative image analysis of human epidermal growth factor receptor 2 immunohistochemistry for breast cancer: Guideline from the College of American Pathologists. Arch Pathol Lab Med. 2019;143:1180-95. doi: 10.5858/arpa.2018-0378-cp.
45.Lebeau A, Denkert C, Sinn P, Schmidt M, Wöckel A. Update of the German S3 breast cancer guideline : What is new for pathologists? [Article in German] Pathologe. 2019;40(2):185-98. doi: 10.1007/s00292-019-0578-3.
46.Duffy MJ, McDermott EW, Crown J. Blood-based biomarkers in breast cancer: From proteins to circulating tumor cells to circulating tumor DNA. Tumor Biol. 2018;40(5):1010428318776169. doi: 10.1177/1010428318776169.
47.Cappelletti V, Appierto V, Tiberio P, Fina E, Callari M, Daidone MG. Circulating biomarkers for prediction of treatment response. J Natl Cancer Inst. 2015;2015:60-3. doi: 10.1093/jncimonographs/lgv006.
48.Saraygord-Afshari N, Naderi-Manesh H, Naderi M. Enhanced reproducibility of the human gel-based tear proteome maps in the presence of di-(2-hydroxyethyl) disulfide. Biotechnol Appl Biochem. 2014; 61(6):660-7. doi: org/10.1002/bab.1221.
49.Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform. 2004;1:24-45. doi: 10.1109/tcbb.2004.2.
50. Bozdag D, Kumar AS, Catalyurek UV. Comparative analysis of biclustering algorithms. 1st ACM International Conference on Bioinformatics and Computational Biology; 2010 August 2-4; Niagara Falls. New York. USA: Association for Computing Machinery; 2010. 265-74. doi: 10.1145/1854776.1854814.