__TNoM = Threshold Number of
Misclassifications__

The DNA arrays provide us with information regarding
the gene expression levels of every gene tested in different tissues. This
gives us the data matrix. From the matrix we take one vector representing the
expression level of a specific gene g in all the samples: x[g]. Another vector
l is a vector of the labels of the samples, _{}(i.e.
healthy or malignant) defining to which of the two populations the samples
belongs to. This vector is not dependent on a specific gene and is common to
all the genes.

For a given pair: _{}and t (the expression level threshold) we can compute
the numbers of errors made induced by them:

_{}

The TNoM score is defined as:

_{}

This score represents how "good" is a gene as a separator between the two given populations.

In addition to the data we referred to so far, every entrance in the matrix also contains information concerning its reliability. Our project goal is to use this data (the weights) when computing the TNoM score and by this to transform the TNoM score to more reliable (less sensitive to noise). After developing an algorithm for computing the TNoM score (considering weights), we shall show how to calculate the corresponding p-value.

__ __

Now, we will present the adjustment of the TNoM algorithm to the weighted version:

Given 3 vectors x[g], w[g] and l (all of size n) so that:

x[g] is a vector of the expression levels of the gene

w[g] is a vector of the weights corresponding to x[g]

and l is a vector of the labels of the samples, _{}(i.e.
healthy or malignant).

We assume that the weights are discrete and between 0 and 1. A weight of 0 represents total unreliability while a weight of 1 represents total reliability.

We will simply adjust the calculation of the number of errors to be:

_{}

The calculation of the TNoM score remains unchanged.

This change could be explained intuitively in the following way:

If w_{i}=0 e.g. this sample is totally
unreliable, and thus should not and will not be taken into account in the Err
calculation.

If w_{i}=1 e.g. this sample is totally
reliable, and thus will be fully taken into account.

If w_{i}=0.5 e.g. we are not sure if this sample
is reliable or not and therefore we'll take it's contribution to the err
calculation only as "half error".