Doron Lipson1,4, Amir Ben-Dor2, Elinor Dehan3 and Zohar Yakhini1,2
Proceedings of Algorithms in Bioinformatics: 4th
International Workshop, WABI 2004, Bergen, Norway, September 17-21, 2004
Lecture Notes in Computer Science (LNCS), Vol. 3240/2004, p.135, Springer 2004
1CS Dept,Technion, Israel, 2Agilent Labs, 3NYU, 4corresponding author
| Abstract | Paper | Scores | Examples | References |
|
Genomic instabilities, amplifications, deletions and translocations are often observed in tumor cells. In the process of cancer pathogenesis cells acquire multiple genomic alterations, some of which drive the process by triggering overexpression of oncogenes and by silencing tumor suppressors and DNA repair genes. We present data analysis methods designed to study the overall transcriptional effects of DNA copy number alterations. Alterations can be measured using several techniques including microarray based hybridization assays. The data have unique properties due to the strong dependence between measurement values in close genomic loci. To account for this dependence in studying the correlation of DNA copy number to expression levels we develop versions of standard correlation methods that apply to genomic regions and methods for assessing the statistical significance of the observed results. In joint DNA copy number and expression data we define significantly altered submatrices as submatrices where a statistically significant correlation of DNA copy number to expression is observed. We develop heuristic approaches to identify these structures in data matrices. We apply all methods to several datasets, highlighting results that can not be obtained by direct approaches or without using the regional view. |
||||
|
In the paper [1] we define two scores for identifying significant correlations in joint DNA copy number and gene expression data: The Regional Correlation score and significantly altered Genomic Continuous Submatrices (GCSMs). Here we present updated forms of these scores. Regional CorrelationGiven a gene
Let
A high regional correlation score may arise
due to significant correlation between the expression vector
Genomic Continouous Submatrices (GCSMs)A GCSM is defined by a continuous genomic segment G
and a subset of the samples S. These determine a submatrix of the DNA
copy number measurement matrix C, and a submatrix of the gene
expression matrix E. We denote these matrices
We would like to score the degree to which the DNA copy
numbers and expression levels of the genes
Given k – the
number of positive entries in
Similarly, we would like
to score the overabundance of measurements in E that suggest that M
is indeed aberrant. Unlike the DNA copy number values, we do not expect the
expression measurements of all genes
A total score for an amplification in M is then defined as:
Although
Algorithmic methods for locating high-scoring GCSMs are described in [1]. |
||||
|
We demonstrate the use the use of Regional correaltion and significantly-altered GCSMs on breast cancer data of Pollack et al [3]. The dataset contains parallel DNA copy number and gene expression measurements of 6,095 genes on 41 breast tumor and cell-line samples. The
following Figure depicts the genomic locations of significant aberrations
located in the breast cancer dataset. Pink marks depcit significantly altered
GCSMs (S(M;C,E) >40) where the embedded yellow
marks denote the positions of resident genes that are significantly
differentially expressed (TNoM<7). with relation to the respective
partition. Genes with significant regional correlation score (R(i,*)>1.3,
pval<10-3) are depicted
above by yellow marks, where the surrounding light blue marks depict the
genomic intervals for against which the maximum regional correaltion was
attained. A table that summarizes the information for significantly-altered GCSMs A table that summarizes the information for genes with significant Regional Correlation
|
||||
|
||||
Page created by Doron Lipson