A knowledge driven regression model for gene expression and microarray analysis.

Rong Jin, Luo Si, Shireesh Srivastava, Zheng Li, Christina Chan

Research output: Contribution to journalArticle

Abstract

The linear regression model has been widely used in the analysis of gene expression and microarray data to identify a subset of genes that are important to a given metabolic function. One of the key challenges in applying the linear regression model to gene expression data analysis arises from the sparse data problem, in which the number of genes is significantly larger than the number of conditions. To resolve this problem, we present a knowledge driven regression model that incorporates the knowledge of genes from the Gene Ontology (GO) database into the linear regression model. It is based on the assumption that two genes are likely to be assigned similar weights when they share similar sets of GO codes. Empirical studies show that the proposed knowledge driven regression model is effective in reducing the regression errors, and furthermore effective in identifying genes that are relevant to a given metabolite.

Original languageEnglish (US)
Pages (from-to)5326-5329
Number of pages4
JournalConference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference
StatePublished - 2006
Externally publishedYes

Profile

Microarray Analysis
Linear Models
Gene Expression
Genes
Linear regression
Gene expression
Gene Ontology
Microarrays
Ontology
Metabolites
Data reduction

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Biomedical Engineering
  • Health Informatics

Cite this

@article{09d2bdf419654b759ca2138c7750c507,
title = "A knowledge driven regression model for gene expression and microarray analysis.",
abstract = "The linear regression model has been widely used in the analysis of gene expression and microarray data to identify a subset of genes that are important to a given metabolic function. One of the key challenges in applying the linear regression model to gene expression data analysis arises from the sparse data problem, in which the number of genes is significantly larger than the number of conditions. To resolve this problem, we present a knowledge driven regression model that incorporates the knowledge of genes from the Gene Ontology (GO) database into the linear regression model. It is based on the assumption that two genes are likely to be assigned similar weights when they share similar sets of GO codes. Empirical studies show that the proposed knowledge driven regression model is effective in reducing the regression errors, and furthermore effective in identifying genes that are relevant to a given metabolite.",
author = "Rong Jin and Luo Si and Shireesh Srivastava and Zheng Li and Christina Chan",
year = "2006",
pages = "5326--5329",
journal = "Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference",
issn = "1557-170X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - A knowledge driven regression model for gene expression and microarray analysis.

AU - Jin,Rong

AU - Si,Luo

AU - Srivastava,Shireesh

AU - Li,Zheng

AU - Chan,Christina

PY - 2006

Y1 - 2006

N2 - The linear regression model has been widely used in the analysis of gene expression and microarray data to identify a subset of genes that are important to a given metabolic function. One of the key challenges in applying the linear regression model to gene expression data analysis arises from the sparse data problem, in which the number of genes is significantly larger than the number of conditions. To resolve this problem, we present a knowledge driven regression model that incorporates the knowledge of genes from the Gene Ontology (GO) database into the linear regression model. It is based on the assumption that two genes are likely to be assigned similar weights when they share similar sets of GO codes. Empirical studies show that the proposed knowledge driven regression model is effective in reducing the regression errors, and furthermore effective in identifying genes that are relevant to a given metabolite.

AB - The linear regression model has been widely used in the analysis of gene expression and microarray data to identify a subset of genes that are important to a given metabolic function. One of the key challenges in applying the linear regression model to gene expression data analysis arises from the sparse data problem, in which the number of genes is significantly larger than the number of conditions. To resolve this problem, we present a knowledge driven regression model that incorporates the knowledge of genes from the Gene Ontology (GO) database into the linear regression model. It is based on the assumption that two genes are likely to be assigned similar weights when they share similar sets of GO codes. Empirical studies show that the proposed knowledge driven regression model is effective in reducing the regression errors, and furthermore effective in identifying genes that are relevant to a given metabolite.

UR - http://www.scopus.com/inward/record.url?scp=84903876739&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84903876739&partnerID=8YFLogxK

M3 - Article

SP - 5326

EP - 5329

JO - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

T2 - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

JF - Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference

SN - 1557-170X

ER -