Elimination Order Implementation - Full Report

        For estimating the time to run the elimination while calculating the elimination order, we used the sum of tables created during the elimination process. Our observation revealed that while the run time is not linear with this factor, they are definitely related. The actual time is also related to other parameters such as number alleles, number of loci, number of people as more, however, changing those factors is beyond our power.

        The basic idea of the algorithm is to run random iterations and choose the best elimination order the we encountered. In each random iteration there are some parameters that effect the outcome:

1) Number of variables from which we choose. while traversing the net we keep the N best variables in the net, from those variables we choose randomly the variable we now eliminate. The number N (sample size) is crucial for the range of elimination orders and consequently to the result.

2) Type of random choice. After traversing the entire net and finding the N best variables, we choose a variable randomly. however, the distribution of the sample space will result different probability to reach different orders.
in this work we considered 3 type of random choosing: uniform distribution - which proved to be inefficient; 1/x: each variable was chosen at the probability of (1/x) / (sum of all 1/x) where x is the table size (now hence will be referred as type 1); and 1/log(x) (now hence will be referred as type 2).

3) An interesting property we have discovered by accident is that a simple definition of "better" can change the outcome. When reaching to a variable with the same table size as the best so far, which one will we take? the original program used the first such variable. Surprisingly, it can have significant effect on the outcome.

4) Number of iterations. The number of times that we run those random iterations is obviously another factor.

        The results table shows the results of some possible parameter decisions:
1) Original algorithm's results - no parameters chosen.
2) Parameters (sample size, iteration num, random distribution type)chosen manually.
3) Parameters chosen automatically, random distribution is half 2 and half 2, "better" is greater.
4) Same as 3, when the tables sizes are greater than a certain constant, random type is 2 (even when in the half of "type 1 run"), "better" is half greater and half greater-equal.
5) Same as 4, when the original sum tables is greater than a certain constant the number of iterations is doubled (idea: when the proposed run time is long, we allow ourselves to take more time in order to improve that run time).
6) Same as 5 (double run times too), "better" definition is chosen randomly each iteration.
in each of these options, if after a certain, sample size dependent, number of run times we stop the loop.
 

Results Analysis:

        Our first conclusion from the manual runs is that the uniform distribution selection is no good, which is quite reasonable, although did have the potential of surprising.
Second, we can see that choosing the parameters at run time has improved even the best results that could be achieved by manual runs. probably because of the increasing number of possible elimination orders and for being able to decide per iteration and per traversal (specifically for the type of random choosing).

        For some reasons, using the last (no. 5) run type - choosing the "better" type randomly has increased dramatically the flexibility of the results. meaning, while in previous run types we could expect getting the same results for each set of parameters, in the run type, the results varied in each execution. In the table there are two pairs of results - a median (estimated) and the best results achieved after several executions.

        We propose our two last run types (5 & 6) as recommended. As we can see from the results table, neither is superior to the other. As we can learn from the table, and as we stated before, while executing method no. 5 will guarantee us the same results that we might not have reached by method no. 6, however using method no.6 might also reach better results than no. 5.
Back