A novel method incorporating gene ontology information for unsupervised clustering and feature selection

Shireesh Srivastava, Linxia Zhang, Rong Jin, Christina Chan

Research output: Contribution to journalArticle

  • 10 Citations

Abstract

Background: Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection). Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest. Results: Here, we present a method that integrates gene ontology (GO) information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2) to saturated fatty acid (SFA) and tumor necrosis factor (TNF)-α, as compared to the non-toxic response to the unsaturated FFAs (UFA) and TNF-α. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9) in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP). The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9. Conclusions: A framework is presented that incorporates prior ontology information, which helped to (a) perform unsupervised clustering of the phenotypes, and (b) identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and expression data in order to identify feature genes.

LanguageEnglish (US)
Article numbere3860
JournalPLoS One
Volume3
Issue number12
DOIs
StatePublished - Dec 4 2008

Profile

Gene Ontology
Cluster Analysis
Ontology
Feature extraction
Genes
Palmitates
palmitates
genes
Phenotype
phenotype
cyclic AMP
Cyclic AMP
methodology
tumor necrosis factor-alpha
Toxicity
Tumor Necrosis Factor-alpha
Hepatoblastoma
Biological Phenomena
Cells
toxicity

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

A novel method incorporating gene ontology information for unsupervised clustering and feature selection. / Srivastava, Shireesh; Zhang, Linxia; Jin, Rong; Chan, Christina.

In: PLoS One, Vol. 3, No. 12, e3860, 04.12.2008.

Research output: Contribution to journalArticle

@article{5cec726a1abe43ef9d4783c703bced9e,
title = "A novel method incorporating gene ontology information for unsupervised clustering and feature selection",
abstract = "Background: Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection). Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest. Results: Here, we present a method that integrates gene ontology (GO) information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2) to saturated fatty acid (SFA) and tumor necrosis factor (TNF)-α, as compared to the non-toxic response to the unsaturated FFAs (UFA) and TNF-α. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9) in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP). The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9. Conclusions: A framework is presented that incorporates prior ontology information, which helped to (a) perform unsupervised clustering of the phenotypes, and (b) identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and expression data in order to identify feature genes.",
author = "Shireesh Srivastava and Linxia Zhang and Rong Jin and Christina Chan",
year = "2008",
month = "12",
day = "4",
doi = "10.1371/journal.pone.0003860",
language = "English (US)",
volume = "3",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "12",

}

TY - JOUR

T1 - A novel method incorporating gene ontology information for unsupervised clustering and feature selection

AU - Srivastava,Shireesh

AU - Zhang,Linxia

AU - Jin,Rong

AU - Chan,Christina

PY - 2008/12/4

Y1 - 2008/12/4

N2 - Background: Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection). Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest. Results: Here, we present a method that integrates gene ontology (GO) information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2) to saturated fatty acid (SFA) and tumor necrosis factor (TNF)-α, as compared to the non-toxic response to the unsaturated FFAs (UFA) and TNF-α. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9) in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP). The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9. Conclusions: A framework is presented that incorporates prior ontology information, which helped to (a) perform unsupervised clustering of the phenotypes, and (b) identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and expression data in order to identify feature genes.

AB - Background: Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection). Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest. Results: Here, we present a method that integrates gene ontology (GO) information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2) to saturated fatty acid (SFA) and tumor necrosis factor (TNF)-α, as compared to the non-toxic response to the unsaturated FFAs (UFA) and TNF-α. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9) in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP). The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9. Conclusions: A framework is presented that incorporates prior ontology information, which helped to (a) perform unsupervised clustering of the phenotypes, and (b) identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and expression data in order to identify feature genes.

UR - http://www.scopus.com/inward/record.url?scp=57349137697&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57349137697&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0003860

DO - 10.1371/journal.pone.0003860

M3 - Article

VL - 3

JO - PLoS One

T2 - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 12

M1 - e3860

ER -