Table 1: Experimental Results
Num People | Num Loci | Num Alleles | With Loops | Total Run Time (sec) | Sum Table Sizes | Improvement | Details |
Greedy | Optimized | Greedy | Optimized | In total run time | In sum table sizes |
20 | 75 | 5-10 | yes | 241.579 | 6.079 | 1.0E+9 | 2.8E+6 | 39 | 357 | View Details |
20 | 80 | 4-5 | no | doesn't run | 236,321.203 | 5.06E+13 | 3.50E+11 | N/A | 144 | View Details |
300 | 30 | 4-5 | yes | 32.953 | 32.953 | 1.2E+7 | 1.2E+7 | 1 | 1 | View Details |
40 | 71 | 4-5 | no | doesn't run | doesn't run | 4.80E+46 | 1.13E+40 | N/A | 4.2E+6 | View Details |
500 | 30 | 4-5 | yes | doesn't run | doesn't run | 3.87E+31 | 5.87E+22 | N/A | 6.5E+8 | View Details |
150 | 30 | 5-10 | no | 202.641 | 258.047 | 8.4E+9 | 8.4E+9 | 1.27
slower | 1 | View Details |
70 | 40 | 4-5 | no | 2,594.031 | 79.547 | 8.93E+11 | 1.7E+8 | 32 | 5252 | View Details |
70 | 50 | 4-5 | no | 10,191.891 | 1,624.562 | 3.02E+12 | 6.9E+9 | 6 | 437 | View Details |
80 | 21 | 5-10 | yes | 551.016 | 175.375 | 4.4E+9 | 4.8E+8 | 3 | 9 | View Details |
80 | 23 | 5-10 | yes | 1,024.047 | 75.078 | 9.5E+9 | 1.1E+8 | 13 | 86 | View Details |
90 | 56 | 4-5 | yes | 5,450.39 | 561.984 | 4.5E+10 | 3.2E+8 | 9 | 140 | View Details |
57 | 37 | 4-5 | no | doesn't run | 1,200.172 | 3.09E+13 | 8.2E+9 | N/A | 3768 | View Details |
100 | 12 | 5-10 | yes | 99.454 | 34.688 | 5.3E+9 | 7.8E+7 | 2 | 67 | View Details |
100 | 14 | 5-10 | yes | 99.5 | 6.703 | 5.3E+9 | 1.7E+7 | 14 | 311 | View Details |
100 | 15 | 5-10 | yes | 1,568.203 | 118.297 | 5.2E+10 | 2.0E+8 | 13 | 260 | View Details |
100 | 16 | 5-10 | yes | 628.25 | 46.094 | 8.4E+9 | 5.4E+7 | 13 | 155 | View Details |
100 | 17 | 5-10 | yes | 117.61 | 27.219 | 2.3E+9 | 4.8E+7 | 4 | 47 | View
Details |
Clarifications
to the Experiment Results and Tables :
- Num People -
the number of
people in the pedigree input file.
- Num Loci -
the number of
loci being analyzed in the test data files.
- Num Alleles - the number of
alleles in each of the loci being analyzed.
- With Loops - indicates if
there are inbreeding loops in the input pedigree.
- Total Run Time -
running time (in seconds) of the program on the environment specified
above.
- Sum Table Sizes - Sum of the sizes of all the
probability tables that are created during the variable elimination
according to the found elimination sequence
- Improvement
- In total run time - the speed-up achieved
by the optimized program compared to the original Superlink. I.e. , if
the speed-up is 3 , the optimized program runs 3-times faster than the
original Superlink
- In sum table sizes - the decrease in the sum of table sizes achieved
by the optimized program compared to the original Superlink. I.e. , if
the decrease is 100 , the optimized table sum is 100 times smaller than
that of the original Superlink.
- Complexity Coefficient - this coefficient
approximates the complexity of an elimination sequence. It is calculated by
deriving LOG10 of the sequence's Sum Table Sizes. This
coefficient determines the optimization policy that will be applied to the
problem, using the Optimization Policy
Definition table below.
- Number of optimization iterations
- number of times the probabilistic elimination sequence search is performed
for each of the specified heuristics.
- Best heuristics - the
heuristics that produced the best elimination sequence for the
problem.
- Approximated Optimization Time
- the approximation of the time that the full optimization process will
take. This value is calculated by multiplying the time that takes one
optimization iteration (measured by running the gready elimination
algorithm) by the number of the heuristics that will be applied and by
the number of iterations performed with each heuristics.
- Real Optimization time
- the time that was actually spent for the optimization process. The
difference between this value and the Approximated
Optimization Time is caused by the cut-off
policy applied during the optimization process in order to reduce the
optimization overhead. Detailed information on the cut-off algorithm may be
found in the External Code
Documentation
Table 2 : Current
Optimization Policy Definition
Complexity Coefficient |
Number of iterations |
Maximal optimization time (sec) |
Sizes of variables pools |
0-7 |
Too simple , no need
to optimize, use greedy sequence |
8 |
25 |
30 |
3 |
9 |
50 |
60 |
3 |
10-11 |
100 |
600 |
3 |
12 |
500 |
800 |
3,9 |
13-14 |
1000 |
1500 |
3,9,15 |
15 and higher |
1000 |
5000 |
3,9,15 |
Clarifications to Table 2:
Sizes of variables
pools : This field defines which
sizes of the variables'
pools will be applied during the optimization process.
Number
of iterations and Maximal
optimization time are used to determine the number of iterations that
will be applied for each combination of weight function (LOG,SQRT,EQ) and specified variables' pool size. The number of iterations that is
eventually applied is the minimum between Number
of iterations and the Maximal
optimization time/one_iteration_time. The one_iteration_time is the time
that takes one search for the elimination sequence and it is approximated by the
time that takes to find the greedy elimination sequence. This is useful for the
files where each optimization iteration takes a lot of time (usually the files
with great number of people).
Back to Top
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafile20-80-4-5.dat / pedfile20-80-4-5.dat | 13 | 1000 | SQRT | 1498.77 | 287.234 |
Back to Top
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafile40-71-4-5.dat / pedfile40-71-4-5.dat | 46 | 1000 | EQ | 4996.3635 | 1902.297 |
Back to Top
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafile70-40-4-5.dat / pedfile70-40-4-5.dat | 11 | 100 | SQRT | 112.5 | 38.797 |
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafile70-50-4-5.dat / pedfile70-50-4-5.dat | 12 | 500 | SQRT | 796.572 | 699.219 |
Back to Top
Back to Top
Back to Top
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafileEA9.dat / pedfileEA9.dat | 13 | 1000 | EQ | 1498.77 | 195.781 |
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafileEB3.dat / pedfileEB3.dat | 9 | 50 | LOG | 17.55 | 4.875 |
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafileEB5.dat / pedfileEB5.dat | 9 | 50 | SQRT | 21.15 | 1.391 |
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafileEB6.dat / pedfileEB6.dat | 10 | 100 | LOG | 42.3 | 17.828 |
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafileEB7.dat / pedfileEB7.dat | 9 | 50 | LOG | 24.525 | 3.859 |
Back to Top
Input files | Complexity coefficient | Number of optimization iterations | Best Heuristics | Approximated Optimization Time | Real Optimization Time |
datafileEB8.dat / pedfileEB8.dat | 9 | 50 | LOG | 24.525 | 11.234 |
Back to Top