Article
Journal :
IEEE Access
ISSN : 2169-3536
Publisher :
Information
Period : January 2023
Volume : 11 Number : 11
Pages : 2674-2699
Details
Efficient Bioinspired Feature Selection and Machine Learning Based Framework Using Omics Data and Biological Knowledge Data Bases in Cancer Clinical Endpoint Prediction
Imene Zenbout Abdelkrim Bouramoul Souham Meshoul Mounira Amrane
Cancer Research has advanced during the past few years. Using high throughput technology and advances in artificial intelligence, it is now possible to improve cancer diagnosis and targeted therapy, by integrating the investigation and analysis of clinical and omics profiles. The high dimensionality and class imbalance of the majority of available data sets represent a serious challenge to the development of computational methods and tools for cancer diagnosis and biomarker discovery. Taking into account multi-omics data further complicates the undertaking. In this paper, we describe a five-step integrative architecture for dealing with the three aforementioned problems by incorporating proteomics data, protein-protein interaction networks, and signaling pathways in order to identify protein biomarkers with a direct association to cancerous patients’ overall survival (OS) and progression free interval (PFI). The core parts of this architecture are a cluster based grey wolf optimization algorithm (CB-GWO) for feature selection and a deep stacked canonical correlation autoencoder (DSCC-AE) for clinical endpoint prediction. A thorough experimental study was carried out to evaluate the performance of the proposed optimization algorithm for feature selection, as well as the performance of the deep learning model in terms of Mathew coefficient correlation (MCC) and Area under the curve (AUC) on breast, lung, colon, and rectum cancers. The results were compared to other methods in the literature. The results are very promising and show the effectiveness of the proposed framework and its ability to outperform the other algorithms and models in terms of AUC (0.91) and MCC (0.64). In addition, hub marker genes with the potential occurence of alterations in colorectal cancer, breast cancer, and lung cancer have been identified.
Key words :
Proteins Cancer Biological system modeling Feature extraction Proteomics Biology Data models Research and development
Ref. laboratory citation :
misc-lab-423
DOI :
10.1109/ACCESS.2023.3234294
Link :
Texte intégral
ACM :
I. Zenbout, A. Bouramoul, S. Meshoul and M. Amrane. 2023. Efficient Bioinspired Feature Selection and Machine Learning Based Framework Using Omics Data and Biological Knowledge Data Bases in Cancer Clinical Endpoint Prediction. IEEE Access, 11, 11 (January 2023), IEEE Access, 2674-2699. DOI: https://doi.org/10.1109/ACCESS.2023.3234294.
APA :
Zenbout, I., Bouramoul, A., Meshoul, S. & Amrane, M. (2023, January). Efficient Bioinspired Feature Selection and Machine Learning Based Framework Using Omics Data and Biological Knowledge Data Bases in Cancer Clinical Endpoint Prediction. IEEE Access, 11(11), IEEE Access, 2674-2699. DOI: https://doi.org/10.1109/ACCESS.2023.3234294
IEEE :
I. Zenbout, A. Bouramoul, S. Meshoul and M. Amrane, "Efficient Bioinspired Feature Selection and Machine Learning Based Framework Using Omics Data and Biological Knowledge Data Bases in Cancer Clinical Endpoint Prediction". IEEE Access, vol. 11, no. 11, IEEE Access, pp. 2674-2699, January, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3234294.
BibTeX :
@article{misc-lab-423,
author = {Zenbout, Imene and Bouramoul, Abdelkrim and Meshoul, Souham and Amrane, Mounira},
title = {Efficient Bioinspired Feature Selection and Machine Learning Based Framework Using Omics Data and Biological Knowledge Data Bases in Cancer Clinical Endpoint Prediction},
journal = {IEEE Access},
volume = {11},
number = {11},
issn = {2169-3536},
pages = {2674--2699},
publisher = {IEEE Access},
year = {2023},
month = {January},
doi = {10.1109/ACCESS.2023.3234294},
url = {https://doi.org/10.1109/ACCESS.2023.3234294},
keywords = {Proteins, Cancer, Biological system modeling, Feature extraction, Proteomics, Biology, Data models, Research and development}
}
RIS :
TI  - Efficient Bioinspired Feature Selection and Machine Learning Based Framework Using Omics Data and Biological Knowledge Data Bases in Cancer Clinical Endpoint Prediction
AU - I. Zenbout
AU - A. Bouramoul
AU - S. Meshoul
AU - M. Amrane
PY - 2023
SN - 2169-3536
JO - IEEE Access
VL - 11
IS - 11
SP - 2674
EP - 2699
PB - IEEE Access
AB - Cancer Research has advanced during the past few years. Using high throughput technology and advances in artificial intelligence, it is now possible to improve cancer diagnosis and targeted therapy, by integrating the investigation and analysis of clinical and omics profiles. The high dimensionality and class imbalance of the majority of available data sets represent a serious challenge to the development of computational methods and tools for cancer diagnosis and biomarker discovery. Taking into account multi-omics data further complicates the undertaking. In this paper, we describe a five-step integrative architecture for dealing with the three aforementioned problems by incorporating proteomics data, protein-protein interaction networks, and signaling pathways in order to identify protein biomarkers with a direct association to cancerous patients’ overall survival (OS) and progression free interval (PFI). The core parts of this architecture are a cluster based grey wolf optimization algorithm (CB-GWO) for feature selection and a deep stacked canonical correlation autoencoder (DSCC-AE) for clinical endpoint prediction. A thorough experimental study was carried out to evaluate the performance of the proposed optimization algorithm for feature selection, as well as the performance of the deep learning model in terms of Mathew coefficient correlation (MCC) and Area under the curve (AUC) on breast, lung, colon, and rectum cancers. The results were compared to other methods in the literature. The results are very promising and show the effectiveness of the proposed framework and its ability to outperform the other algorithms and models in terms of AUC (0.91) and MCC (0.64). In addition, hub marker genes with the potential occurence of alterations in colorectal cancer, breast cancer, and lung cancer have been identified.
KW - Proteins
KW - Cancer
KW - Biological system modeling
KW - Feature extraction
KW - Proteomics
KW - Biology
KW - Data models
KW - Research and development
DO - 10.1109/ACCESS.2023.3234294
UR - https://doi.org/10.1109/ACCESS.2023.3234294
ID - misc-lab-423
ER -