A classification framework applied to cancer gene expression profiles

Hussein Hijazi, Christina Chan

    Research output: Research - peer-reviewArticle

    • 12 Citations

    Abstract

    Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.

    LanguageEnglish (US)
    Pages255-283
    Number of pages29
    JournalJournal of Healthcare Engineering
    Volume4
    Issue number2
    DOIs
    StatePublished - Jun 2013

    Profile

    Neoplasm Genes
    Transcriptome
    Neoplasms
    Gene expression
    Support vector machines
    Proteins
    Gene Expression
    Support Vector Machine
    Supervised learning
    Decision trees
    Learning systems
    Feature extraction
    Classifiers
    Genes
    Genetic algorithms
    Decision Trees
    Learning
    Datasets
    Machine Learning
    Forests

    Keywords

    • Cancer
    • Classification
    • Feature selection
    • Gene expression
    • Machine learning
    • Supervised learning

    ASJC Scopus subject areas

    • Biomedical Engineering
    • Health Informatics
    • Biotechnology
    • Surgery

    Cite this

    A classification framework applied to cancer gene expression profiles. / Hijazi, Hussein; Chan, Christina.

    In: Journal of Healthcare Engineering, Vol. 4, No. 2, 06.2013, p. 255-283.

    Research output: Research - peer-reviewArticle

    @article{ed789751dcf3474f9e3ff06218f122dc,
    title = "A classification framework applied to cancer gene expression profiles",
    abstract = "Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.",
    keywords = "Cancer, Classification, Feature selection, Gene expression, Machine learning, Supervised learning",
    author = "Hussein Hijazi and Christina Chan",
    year = "2013",
    month = "6",
    doi = "10.1260/2040-2295.4.2.255",
    volume = "4",
    pages = "255--283",
    journal = "Journal of Healthcare Engineering",
    issn = "2040-2295",
    publisher = "Multi Science Publishing",
    number = "2",

    }

    TY - JOUR

    T1 - A classification framework applied to cancer gene expression profiles

    AU - Hijazi,Hussein

    AU - Chan,Christina

    PY - 2013/6

    Y1 - 2013/6

    N2 - Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.

    AB - Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.

    KW - Cancer

    KW - Classification

    KW - Feature selection

    KW - Gene expression

    KW - Machine learning

    KW - Supervised learning

    UR - http://www.scopus.com/inward/record.url?scp=84881234696&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84881234696&partnerID=8YFLogxK

    U2 - 10.1260/2040-2295.4.2.255

    DO - 10.1260/2040-2295.4.2.255

    M3 - Article

    VL - 4

    SP - 255

    EP - 283

    JO - Journal of Healthcare Engineering

    T2 - Journal of Healthcare Engineering

    JF - Journal of Healthcare Engineering

    SN - 2040-2295

    IS - 2

    ER -