LAB IN BIOINFORMATICS

Optimizing variable elimination order in Bayesian networks using the Superlink program.

Winter 2002, under guidance of Prof. Dan Geiger and Ma'ayan Fishelzon

by Natalia Graiz and Marina Shteinberg    

Problem

Optimize the existing greedy variable elimination heuristics for the Superlink program

Implementation

The Superlink program uses a greedy variable elimination algorithm with product breaking as described below.Our algorithm implements its randomized version suggested by Prof. Dan Geiger.

The greedy algorithm finds the variable elimination order by eliminating the variable having the smallest elimination cost in each step of the algorithm. This cost is determined by the size of the probability table induced by eliminating the chosen variable. If this size is greater than some predefined threshold , the product breaking is performed upon some variable chosen according to the same greedy heuristics.

In our implementation, we find the greedy elimination sequence first. We analyze its overall run time complexity (denoted by the complexity coefficient), which is determined by the variables elimination cost and product breaking cost. The number of optimization iterations and the heuristics that will be applied to a given problem depend on the found complexity coefficient.

The optimization iteration is defined as a single search for a complete elimination sequence.In each step of this search we find several (3, 9 or 15 according to the complexity coefficient)   variables having the smallest elimination cost. Among them we choose the one to be eliminated next by tossing a weighted coin which weights are function (log, square root or constant) of the variables' elimination cost. If the probability table induced by eliminating the chosen variable is greater than some pre-defined threshold , the product breaking is performed in the same manner as in the original greedy heuristics. After the complete elimination sequence is found, we compare its overall elimination cost with that of the best previously found sequence.  If it is smaller , we discard the previous record and remember the new one. 

The pre-determined number of optimization iterations is updated during the optimization process as the complexity coefficient of the newly found sequences reduces. 

Our algorithm has two modes : the Strict Mode which strives to reduce the maximal table size rather than the overall elimination cost as long as this size is greater than a threshold induced by machine memory constraints. In Loose mode the main target for optimization is the sum of the table sizes. Experience shows that Loose mode produces better results , so we applied it in the experiments and the executables below.

 

 

Experimental Results


Full documentation

Download Executable for Windows

Download Executable for Windows (which reports the progress of the optimization process)