Quality Analysis
In order to analyze the quality of our software, we have run a few tests on it.
We ran 3 types of tests:
We compared the tests from the two latter types to the tests from the first type, which served as a baseline.
We wanted to check the percentage of genes that remain good classifiers after adding noise and not damaged from it.
How did we evaluate the
quality of the results?
We wanted to examine the best X scoring genes.
We defined a quality measure M to be the sum of positions of the first X genes.
In type 1 of the tests we received a basic order of the best scoring genes (regarding low TNoM and Pvalue). This is our baseline order.
M_{b} (M baseline = M_{1}) is the sum of all numbers from 1 to X.
In versions 2 and 3 we have set M_{i} (_{}) to be the sum of the new positions of genes, received in the results, summed in the original order determined in test type 1.
For example: if X=2 and in test 1 (non weighted not noised version) the best scoring gene was gene number 49 (original position 1) and the second best scoring gene is 7 (original position 2), and in test type 2 gene number 49 appeared only as a third best scoring gene and gene number 7 is 10^{th} best scoring gene. So, M_{1 }= 1 + 2 = 3 and M_{2 }= 3 + 10 = 13.
After sorting the data we used in our testing by its TNoM score, we decided to set X to be 19, genes whose TNoM score is lower or equal to 5.
We expected the weighted version to be more stable to noise and that the percentage of genes not damaged as good classifiers will be high in that version's results. e.g. we expected M_{3}^{ }to be closer to M_{1} rather then M_{2}.
How did we produce
"noised" data?
Noise can originate from several sources:
We assumed that there is a reverse correlation between the noise and the weights i.e. the higher the noise the lower the weight and vice versa.
We have assumed most errors are from type 1 – probe quality.
Producing Weights:
We have set a parameter Q that indicates how "good" are the genes (Good = not noised).
0≤Q≤10. Q=0 means all genes are very noised. Q=10 means all noise in all genes is 0.
After determining Q, we randomly chose weights. The probability to choose any weight is dependent on Q  the higher Q is, so the probability of choosing low weights is low. The exact probabilities table is:

P(0) 
P(0.25) 
P(0.5) 
P(0.75) 
P(1) 
0 
80% 
10% 
10% 
0% 
0% 
1 
35% 
30% 
30% 
5% 
0% 
2 
15% 
25% 
35% 
20% 
5% 
3 
15% 
20% 
35% 
20% 
10% 
4 
15% 
20% 
30% 
20% 
15% 
5 
10% 
15% 
25% 
25% 
25% 
6 
10% 
10% 
20% 
20% 
40% 
7 
5% 
10% 
10% 
20% 
55% 
8 
0% 
5% 
10% 
15% 
70% 
9 
0% 
0% 
5% 
10% 
85% 
10 
0% 
0% 
0% 
0% 
100% 
Table 1: Weights probabilities according to gene quality.
After producing the weights matrix we produced the noise matrix. We chose 2 ways of producing noise.
A naïve method of creating noise is to define for each discrete weight its corresponding noise. After looking at the gene expression data, we have noticed that the majority of gene expression levels are in the range of: (–0.3, 0.3). For that specific range, we produced the following table preserving to each weight an SNR (Signal to Noise Ratio) lower than 1.
Weight 
Noise 
1 
0 
0.75 
0.1 
0.5 
0.2 
0.25 
0.3 
0 
1 
Table
2: Discrete noise as function of
discrete weights.
After creating the noise matrix we produced the “noised data” in the following way: We randomly chose between adding the noise to the raw data and subtracting the noise from the raw data. In that way it is impossible to decode noise matrix out of the noised data.
The second method is a statistical one. We assumed noise in normally distributed around 0 but with different standard deviation for each weight.
For each weight, the noise is distributed ~N(0,(1weight)). The rationale choosing this standard deviation is to enable higher randomization in noise (greater std) when weight is low and vice versa – the higher the weight is, so we would like noise to be closer to 0 – less randomization. In this method in contrast to the previous one, even for weight 0, the noise can still be 0. Due to this fact, this method is closer to reality. In reality a weight of 0 implies that we have no confidence in this sample and therefore we have no information of the amount of noise.
In addition, this method is statistical and randomized, so even for two identical weights the corresponding noise can be different.
After creating the noise matrix, the noised data is simply created by adding the noise to the original data. Since noise is normally distributed around 0, it has an equal probability of being positive or negative and therefore there is no need to further choose the sign of the noise.
To see the correlation between weights and noise in this method, click here.