A novel method incorporating gene ontology information for unsupervised clustering and feature selection

Shireesh Srivastava, Linxia Zhang, Rong Jin, Christina Chan

    Research output: Research - peer-reviewArticle

    • 10 Citations

    Abstract

    Background: Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection). Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest. Results: Here, we present a method that integrates gene ontology (GO) information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2) to saturated fatty acid (SFA) and tumor necrosis factor (TNF)-α, as compared to the non-toxic response to the unsaturated FFAs (UFA) and TNF-α. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9) in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP). The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9. Conclusions: A framework is presented that incorporates prior ontology information, which helped to (a) perform unsupervised clustering of the phenotypes, and (b) identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and expression data in order to identify feature genes.

    LanguageEnglish (US)
    Article numbere3860
    JournalPLoS One
    Volume3
    Issue number12
    DOIs
    StatePublished - Dec 4 2008

    Profile

    Gene Ontology
    Cluster Analysis
    Genes
    Ontology
    Feature extraction
    genes
    methodology
    Phenotype
    phenotype
    Palmitates
    palmitates
    Cyclic AMP
    Tumor Necrosis Factor-alpha
    Toxicity
    Cells
    cyclic AMP
    tumor necrosis factor-alpha
    toxicity
    Hepatoblastoma
    Biological Phenomena

    ASJC Scopus subject areas

    • Agricultural and Biological Sciences(all)
    • Biochemistry, Genetics and Molecular Biology(all)
    • Medicine(all)

    Cite this

    A novel method incorporating gene ontology information for unsupervised clustering and feature selection. / Srivastava, Shireesh; Zhang, Linxia; Jin, Rong; Chan, Christina.

    In: PLoS One, Vol. 3, No. 12, e3860, 04.12.2008.

    Research output: Research - peer-reviewArticle

    @article{5cec726a1abe43ef9d4783c703bced9e,
    title = "A novel method incorporating gene ontology information for unsupervised clustering and feature selection",
    abstract = "Background: Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection). Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest. Results: Here, we present a method that integrates gene ontology (GO) information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2) to saturated fatty acid (SFA) and tumor necrosis factor (TNF)-α, as compared to the non-toxic response to the unsaturated FFAs (UFA) and TNF-α. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9) in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP). The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9. Conclusions: A framework is presented that incorporates prior ontology information, which helped to (a) perform unsupervised clustering of the phenotypes, and (b) identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and expression data in order to identify feature genes.",
    author = "Shireesh Srivastava and Linxia Zhang and Rong Jin and Christina Chan",
    year = "2008",
    month = "12",
    doi = "10.1371/journal.pone.0003860",
    volume = "3",
    journal = "PLoS One",
    issn = "1932-6203",
    publisher = "Public Library of Science",
    number = "12",

    }

    TY - JOUR

    T1 - A novel method incorporating gene ontology information for unsupervised clustering and feature selection

    AU - Srivastava,Shireesh

    AU - Zhang,Linxia

    AU - Jin,Rong

    AU - Chan,Christina

    PY - 2008/12/4

    Y1 - 2008/12/4

    N2 - Background: Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection). Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest. Results: Here, we present a method that integrates gene ontology (GO) information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2) to saturated fatty acid (SFA) and tumor necrosis factor (TNF)-α, as compared to the non-toxic response to the unsaturated FFAs (UFA) and TNF-α. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9) in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP). The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9. Conclusions: A framework is presented that incorporates prior ontology information, which helped to (a) perform unsupervised clustering of the phenotypes, and (b) identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and expression data in order to identify feature genes.

    AB - Background: Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection). Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest. Results: Here, we present a method that integrates gene ontology (GO) information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2) to saturated fatty acid (SFA) and tumor necrosis factor (TNF)-α, as compared to the non-toxic response to the unsaturated FFAs (UFA) and TNF-α. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9) in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP). The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9. Conclusions: A framework is presented that incorporates prior ontology information, which helped to (a) perform unsupervised clustering of the phenotypes, and (b) identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and expression data in order to identify feature genes.

    UR - http://www.scopus.com/inward/record.url?scp=57349137697&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=57349137697&partnerID=8YFLogxK

    U2 - 10.1371/journal.pone.0003860

    DO - 10.1371/journal.pone.0003860

    M3 - Article

    VL - 3

    JO - PLoS One

    T2 - PLoS One

    JF - PLoS One

    SN - 1932-6203

    IS - 12

    M1 - e3860

    ER -