TNoM = Threshold Number of Misclassifications

 

The DNA arrays provide us with information regarding the gene expression levels of every gene tested in different tissues. This gives us the data matrix. From the matrix we take one vector representing the expression level of a specific gene g in all the samples: x[g]. Another vector l is a vector of the labels of the samples, (i.e. healthy or malignant) defining to which of the two populations the samples belongs to. This vector is not dependent on a specific gene and is common to all the genes.

For a given pair: and t (the expression level threshold) we can compute the numbers of errors made induced by them:

The TNoM score is defined as:

This score represents how "good" is a gene as a separator between the two given populations.

 

In addition to the data we referred to so far, every entrance in the matrix also contains information concerning its reliability. Our project goal is to use this data (the weights) when computing the TNoM score and by this to transform the TNoM score to more reliable (less sensitive to noise). After developing an algorithm for computing the TNoM score (considering weights), we shall show how to calculate the corresponding p-value.

 

Now, we will present the adjustment of the TNoM algorithm to the weighted version:

Given 3 vectors x[g], w[g] and l (all of size n) so that:

x[g] is a vector of the expression levels of the gene

w[g] is a vector of the weights corresponding to x[g]

and l is a vector of the labels of the samples, (i.e. healthy or malignant).

We assume that the weights are discrete and between 0 and 1. A weight of 0 represents total unreliability while a weight of 1 represents total reliability.

We will simply adjust the calculation of the number of errors to be:

 

The calculation of the TNoM score remains unchanged.

This change could be explained intuitively in the following way:

If wi=0 e.g. this sample is totally unreliable, and thus should not and will not be taken into account in the Err calculation.

If wi=1 e.g. this sample is totally reliable, and thus will be fully taken into account.

If wi=0.5 e.g. we are not sure if this sample is reliable or not and therefore we'll take it's contribution to the err calculation only as "half error".