A bayesian framework for knowledge driven regression model in micro-array data analysis

Rong Jin, Luo Si, Christina Chan

Research output: Contribution to journalArticle

  • 4 Citations

Abstract

This paper addresses the sparse data problem in the linear regression model, namely the number of variables is significantly larger than the number of the data points for regression. We assume that in addition to the measured data points, the prior knowledge about the input variables may be provided in the form of pair wise similarity. We presented a full Bayesian framework to effectively exploit the similarity information of the input variables for linear regression. Empirical studies with gene expression data show that the regression errors can be reduced significantly by incorporating the similarity information derived from gene ontology.

LanguageEnglish (US)
Pages250-267
Number of pages18
JournalInternational Journal of Data Mining and Bioinformatics
Volume2
Issue number3
DOIs
StatePublished - 2008

Profile

Linear regression
Linear Models
data analysis
regression
Gene expression
Ontology
Gene Ontology
Genes
ontology
Gene Expression
knowledge

Keywords

  • Bayesian analysis
  • Bioinformatics
  • Data mining
  • Data regression
  • Gene expression analysis
  • Graph Laplacian
  • Knowledge driven data regression

ASJC Scopus subject areas

  • Library and Information Sciences
  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

A bayesian framework for knowledge driven regression model in micro-array data analysis. / Jin, Rong; Si, Luo; Chan, Christina.

In: International Journal of Data Mining and Bioinformatics, Vol. 2, No. 3, 2008, p. 250-267.

Research output: Contribution to journalArticle

@article{d786703448424113a00cfffd9fb2f3fb,
title = "A bayesian framework for knowledge driven regression model in micro-array data analysis",
abstract = "This paper addresses the sparse data problem in the linear regression model, namely the number of variables is significantly larger than the number of the data points for regression. We assume that in addition to the measured data points, the prior knowledge about the input variables may be provided in the form of pair wise similarity. We presented a full Bayesian framework to effectively exploit the similarity information of the input variables for linear regression. Empirical studies with gene expression data show that the regression errors can be reduced significantly by incorporating the similarity information derived from gene ontology.",
keywords = "Bayesian analysis, Bioinformatics, Data mining, Data regression, Gene expression analysis, Graph Laplacian, Knowledge driven data regression",
author = "Rong Jin and Luo Si and Christina Chan",
year = "2008",
doi = "10.1504/IJDMB.2008.020525",
language = "English (US)",
volume = "2",
pages = "250--267",
journal = "International Journal of Data Mining and Bioinformatics",
issn = "1748-5673",
publisher = "Inderscience Enterprises Ltd",
number = "3",

}

TY - JOUR

T1 - A bayesian framework for knowledge driven regression model in micro-array data analysis

AU - Jin,Rong

AU - Si,Luo

AU - Chan,Christina

PY - 2008

Y1 - 2008

N2 - This paper addresses the sparse data problem in the linear regression model, namely the number of variables is significantly larger than the number of the data points for regression. We assume that in addition to the measured data points, the prior knowledge about the input variables may be provided in the form of pair wise similarity. We presented a full Bayesian framework to effectively exploit the similarity information of the input variables for linear regression. Empirical studies with gene expression data show that the regression errors can be reduced significantly by incorporating the similarity information derived from gene ontology.

AB - This paper addresses the sparse data problem in the linear regression model, namely the number of variables is significantly larger than the number of the data points for regression. We assume that in addition to the measured data points, the prior knowledge about the input variables may be provided in the form of pair wise similarity. We presented a full Bayesian framework to effectively exploit the similarity information of the input variables for linear regression. Empirical studies with gene expression data show that the regression errors can be reduced significantly by incorporating the similarity information derived from gene ontology.

KW - Bayesian analysis

KW - Bioinformatics

KW - Data mining

KW - Data regression

KW - Gene expression analysis

KW - Graph Laplacian

KW - Knowledge driven data regression

UR - http://www.scopus.com/inward/record.url?scp=53349127892&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=53349127892&partnerID=8YFLogxK

U2 - 10.1504/IJDMB.2008.020525

DO - 10.1504/IJDMB.2008.020525

M3 - Article

VL - 2

SP - 250

EP - 267

JO - International Journal of Data Mining and Bioinformatics

T2 - International Journal of Data Mining and Bioinformatics

JF - International Journal of Data Mining and Bioinformatics

SN - 1748-5673

IS - 3

ER -