A classification framework applied to cancer gene expression profiles

Hussein Hijazi, Christina Chan

Research output: Contribution to journalArticle

  • 14 Citations

Abstract

Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.

LanguageEnglish (US)
Pages255-283
Number of pages29
JournalJournal of Healthcare Engineering
Volume4
Issue number2
DOIs
StatePublished - Jun 2013

Profile

Neoplasm Genes
Transcriptome
Gene expression
Support vector machines
Neoplasms
Proteins
Gene Expression
Supervised learning
Decision trees
Learning systems
Feature extraction
Classifiers
Genes
Genetic algorithms
Decision Trees
Learning

Keywords

  • Cancer
  • Classification
  • Feature selection
  • Gene expression
  • Machine learning
  • Supervised learning

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Biotechnology
  • Surgery

Cite this

A classification framework applied to cancer gene expression profiles. / Hijazi, Hussein; Chan, Christina.

In: Journal of Healthcare Engineering, Vol. 4, No. 2, 06.2013, p. 255-283.

Research output: Contribution to journalArticle

@article{ed789751dcf3474f9e3ff06218f122dc,
title = "A classification framework applied to cancer gene expression profiles",
abstract = "Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.",
keywords = "Cancer, Classification, Feature selection, Gene expression, Machine learning, Supervised learning",
author = "Hussein Hijazi and Christina Chan",
year = "2013",
month = "6",
doi = "10.1260/2040-2295.4.2.255",
language = "English (US)",
volume = "4",
pages = "255--283",
journal = "Journal of Healthcare Engineering",
issn = "2040-2295",
publisher = "Multi Science Publishing",
number = "2",

}

TY - JOUR

T1 - A classification framework applied to cancer gene expression profiles

AU - Hijazi,Hussein

AU - Chan,Christina

PY - 2013/6

Y1 - 2013/6

N2 - Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.

AB - Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.

KW - Cancer

KW - Classification

KW - Feature selection

KW - Gene expression

KW - Machine learning

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=84881234696&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881234696&partnerID=8YFLogxK

U2 - 10.1260/2040-2295.4.2.255

DO - 10.1260/2040-2295.4.2.255

M3 - Article

VL - 4

SP - 255

EP - 283

JO - Journal of Healthcare Engineering

T2 - Journal of Healthcare Engineering

JF - Journal of Healthcare Engineering

SN - 2040-2295

IS - 2

ER -