LAB IN BIOINFORMATICS

Comparison of Stochastic Greedy Algorithms

under the guidance of  Dan Geiger

by Nickolay Dovgolevsky


Purpose

Comparing various algorithms for minimizing of the sum of table sizes produced during elimination.

Description of the problem

We experiment with various greedy algorithms that strive to minimize the sum of table sizes produced during elimination.
The following stochastic algorithms were implemented:

1. Stochastic - Greedy (SG)
2. Weighted Stochastic - Greedy (WSG)
3. Maximal Cardinality Search (MCS), [1]
4. Weighted Maximal Cardinality (WMCS)
5. Lexicographic Breadth First Search, variant perfect (LEX_P), [1]
6. Lexicographic Breadth First Search, variant minimal (LEX_M), [1]
7. Minimal Fill-in (MIN-FILL)
8. Weighted Minimal Fill-in (WMIN-FILL)

Experimental Results

SG finds the lowest cost in 2%, WSG in 1.5%, MCS in 3.5%, WMCS in 5%, LEX_P in 2%, LEX_M in 1%, MIN-FILL in 17%, WMIN-FILL in 68%.

 

Detailed Results

Each entry in the table below contains the cost found, namely, log10 of the sum of tables produced during elimination. In addition, the lowest cost is underlined. If more than one algorithm find the lowest cost, all of them are underlined.

Weighted graphs pedfile datafile Number of loci Number of people % of typed persons SG WSG MCS WMCS LEX_P LEX_M MIN-FILL WMIN-FILL
graph00 pedfile00 datafile00 12 25 80 4.02 3.97 5.15 5.36 5.18 5.15 3.94 3.93
graph01 pedfile01 datafile01 14 27 63 17.42 17.00 15.54 22.27 16.44 16.55 14.17 14.22
graph02 pedfile02 datafile02 16 22 89 5.30 5.30 8.08 8.61 7.96 8.13 5.22 5.22
graph03 pedfile03 datafile03 20 31 71 11.59 10.99 12.62 15.67 11.72 12.11 10.38 10.03
graph04 pedfile04 datafile04 17 29 56 12.43 12.77 11.41 20.15 10.91 11.83 11.14 10.56
graph05 pedfile05 datafile05 19 33 95 6.18 6.19 7.03 7.55 6.92 6.83 5.70 5.91
graph06 pedfile06 datafile06 18 31 47 9.32 8.36 10.27 13.01 9.47 9.49 8.23 7.90
graph07 pedfile07 datafile07 13 30 30 2.80 2.79 2.81 2.81 2.81 2.81 2.80 2.80
graph08 pedfile08 datafile08 22 38 86 7.08 6.72 9.41 13.85 9.24 9.45 6.46 6.37
graph09 pedfile09 datafile09 20 48 91 9.35 9.28 14.73 13.45 13.70 14.11 7.84 7.61
graph10 pedfile10 datafile10 25 38 84 16.22 14.41 16.46 24.11 15.99 16.03 12.45 11.74
graph11 pedfile11 datafile11 23 45 78 13.23 13.15 19.31 19.45 19.31 19.62 13.05 12.55
graph12 pedfile12 datafile12 19 47 74 6.85 6.54 10.28 9.75 9.89 9.85 6.14 6.12
graph13 pedfile13 datafile13 21 44 80 12.56 12.56 19.52 18.71 18.90 19.63 11.53 11.36
graph14 pedfile14 datafile14 27 42 90 17.86 17.70 16.22 15.74 15.76 17.16 17.21 16.92
graph15 pedfile15 datafile15 25 42 70 14.93 15.75 15.63 24.10 15.85 15.88 14.78 13.58
graph16 pedfile16 datafile16 21 37 87 8.24 7.14 9.69 9.72 9.04 9.89 7.28 7.04
graph17 pedfile17 datafile17 24 43 93 6.59 6.83 8.95 15.45 8.26 8.18 6.16 5.98
graph18 pedfile18 datafile18 30 40 80 12.01 11.78 17.06 23.13 16.09 16.77 10.43 9.77
graph19 pedfile19 datafile19 34 38 100 7.60 7.44 11.43 18.51 12.17 11.93 6.35 6.22
graph20 pedfile20 datafile20 32 40 85 19.72 18.57 15.98 26.37 16.31 16.91 14.71 14.46
graph21 pedfile21 datafile21 31 56 77 15.04 14.40 20.10 41.57 18.84 18.79 13.75 13.35
graph22 pedfile22 datafile22 24 38 69 13.65 12.99 15.26 24.39 13.96 13.85 12.19 12.31
graph23 pedfile23 datafile23 29 41 74 7.57 7.53 10.83 11.23 10.84 11.40 6.93 6.91
graph24 pedfile24 datafile24 26 34 88 11.03 10.63 13.38 17.89 12.96 13.18 9.24 8.95
graph25 pedfile25 datafile25 28 37 64 9.66 9.41 11.61 10.50 12.19 12.19 8.42 8.52
graph26 pedfile26 datafile26 20 50 80 16.61 17.52 14.21 22.30 13.85 14.59 13.60 12.85
graph27 pedfile27 datafile27 18 51 97 11.12 10.02 9.47 10.98 9.97 9.69 8.39 8.01
graph28 pedfile28 datafile28 37 32 83 10.16 10.55 14.24 18.00 13.92 14.31 9.26 8.85
graph29 pedfile29 datafile29 20 36 57 12.83 13.93 14.93 32.55 13.74 14.77 11.83 12.38
graph30 pedfile30 datafile30 14 49 67 17.35 15.24 17.86 24.93 18.37 18.49 16.03 15.06
graph31 pedfile31 datafile31 17 26 73 4.43 4.46 7.05 9.76 6.25 6.20 4.29 4.28
graph32 pedfile32 datafile32 15 24 36 3.09 3.10 3.24 3.49 3.10 3.10 3.08 3.09
graph33 pedfile33 datafile33 18 25 19, 36, 62, 84 12.23 11.94 13.45 18.63 12.05 12.28 11.38 10.49
graph34 pedfile34 datafile34 19 33 14, 31, 48, 66, 85 21.65 17.97 18.64 39.02 20.45 19.98 18.49 16.11
graph35 pedfile35 datafile35 15 38 22, 44, 66, 88 18.58 17.62 19.94 18.04 21.38 21.49 16.86 16.07
graph36 pedfile36 datafile36 16 23 13, 29, 40, 55, 74, 87 12.64 11.14 12.99 15.16 12.07 12.50 11.20 9.07
graph37 pedfile37 datafile37 24 30 15, 29, 49, 66, 79 29.02 24.55 16.44 37.81 15.75 16.06 22.95 16.63
graph38 pedfile38 datafile38 21 35 22, 37, 62, 83 30.31 26.44 21.88 44.34 21.25 21.67 23.27 17.93
graph39 pedfile39 datafile39 27 28 15, 30, 49, 61, 80 15.34 14.04 14.94 13.08 15.11 14.77 13.85 13.20
graph40 pedfile40 datafile40 17 22 12, 26, 44, 56, 69, 82 20.00 16.17 16.89 28.55 15.39 15.49 18.18 13.80
graph41 pedfile41 datafile41 25 32 18, 34, 45, 67, 83 20.54 19.22 16.54 36.08 17.48 17.71 17.96 17.53
graph42 pedfile42 datafile42 20 27 18, 39, 62, 81 22.79 19.68 16.88 15.43 17.28 17.51 18.35 15.97
graph43 pedfile43 datafile43 25 22 17, 32, 49, 60, 85 13.23 12.30 12.06 24.26 11.84 12.13 11.78 11.80
graph44 pedfile44 datafile44 23 19 21, 37, 61, 78 14.99 12.30 16.48 22.57 15.17 15.28 13.64 11.10
graph45 pedfile45 datafile45 30 18 15, 32, 48, 62, 79 14.39 13.50 13.54 12.37 12.05 12.27 12.24 11.97
graph46 pedfile46 datafile46 26 22 23, 54, 78 17.48 14.54 12.21 17.05 10.73 11.09 12.24 10.58
graph47 pedfile47 datafile47 21 33 10, 23, 36, 47, 63, 72, 88 5.19 5.22 8.69 13.06 7.31 7.52 4.89 4.79
graph48 pedfile48 datafile48 34 20 10, 29, 54, 69, 90 8.37 7.54 11.05 21.67 9.44 9.40 7.76 7.15
graph49 pedfile49 datafile49 19 26 18, 35, 65, 87 24.57 21.99 19.88 45.12 20.52 21.01 20.09 17.01
graph50 pedfile50 datafile50 24 24 19, 25, 54, 70, 91 16.99 16.57 16.32 15.11 16.79 16.43 15.87 15.13
graph51 pedfile51 datafile51 25 28 15, 50, 86 21.65 21.53 15.56 14.31 14.37 14.49 17.73 15.85
graph52 pedfile52 datafile52 15 18 9, 20, 33, 44, 53, 78, 98 9.72 7.98 10.63 12.30 8.79 8.34 8.66 6.83
graph53 pedfile53 datafile53 23 33 10, 20, 40, 80 17.81 15.74 18.12 44.70 18.09 18.58 17.08 14.18
graph54 pedfile54 datafile54 19 40 89 6.62 10.54 7.56 12.07 8.79 8.69 6.39 6.62
graph55 pedfile55 datafile55 24 21 74 10.81 12.71 10.58 11.87 11.72 12.12 9.23 9.61
graph56 pedfile56 datafile56 24 40 50 18.25 17.87 16.32 26.12 19.38 19.34 12.15 11.35
graph57 pedfile57 datafile57 25 20 27 4.95 6.90 5.02 5.93 6.58 6.29 4.77 4.76
graph58 pedfile58 datafile58 24 33 64 15.49 16.17 15.56 16.40 16.06 17.27 13.52 13.43
graph59 pedfile59 datafile59 22 22 14 3.36 3.36 3.36 3.37 3.36 3.36 3.35 3.36
graph60 pedfile60 datafile60 14 20 6 3.19 3.20 3.19 3.20 3.19 3.19 3.19 3.19
graph61 pedfile61 datafile61 27 16 3 3.72 4.75 3.71 4.57 4.36 4.32 3.69 3.69
graph62 pedfile62 datafile62 27 22 84 14.62 13.65 12.92 12.67 14.36 14.28 13.51 12.55
graph63 pedfile63 datafile63 24 31 7 3.63 3.63 3.63 3.63 3.63 3.63 3.63 3.63
graph64 pedfile64 datafile64 28 25 51 11.56 11.81 10.73 20.06 12.23 11.80 10.67 9.74
graph65 pedfile65 datafile65 25 31 41 13.54 15.25 13.71 14.73 14.88 14.58 12.12 11.70
graph66 pedfile66 datafile66 26 45 75 8.99 18.44 9.63 16.26 17.12 17.48 8.39 8.22
graph67 pedfile67 datafile67 23 36 73 15.33 19.30 14.65 28.10 19.74 19.61 11.53 11.72
graph68 pedfile68 datafile68 15 26 77 4.24 5.87 4.36 6.55 6.02 6.13 4.15 4.10
graph69 pedfile69 datafile69 22 42 10 3.30 3.45 3.31 3.99 3.32 3.33 3.30 3.30
graph70 pedfile70 datafile70 20 21 96 3.97 5.09 4.00 4.93 5.00 5.02 3.90 3.89
graph71 pedfile71 datafile71 27 15 79 6.79 9.82 6.57 9.65 8.75 8.71 6.45 6.37
graph72 pedfile72 datafile72 29 41 44 10.45 12.27 11.30 12.23 11.67 12.03 9.48 9.60
graph73 pedfile73 datafile73 21 24 10 3.24 3.24 3.24 3.24 3.24 3.24 3.24 3.24
graph74 pedfile74 datafile74 21 19 52 3.35 3.36 3.35 3.36 3.35 3.35 3.35 3.35
graph75 pedfile75 datafile75 26 38 58 7.71 14.50 8.61 28.71 14.93 14.96 7.03 6.97
graph76 pedfile76 datafile76 17 41 98 15.00 19.43 14.07 26.18 19.66 19.50 12.54 12.34
graph77 pedfile77 datafile77 24 35 50 16.96 16.98 16.44 33.95 18.09 18.22 15.11 14.71
graph78 pedfile78 datafile78 28 43 41 12.95 15.48 13.17 27.51 13.11 13.03 12.21 10.95
graph79 pedfile79 datafile79 28 26 37 7.86 9.82 8.26 9.25 8.92 8.70 7.58 7.53
graph80 pedfile80 datafile80 24 27 52 3.66 3.71 3.66 3.68 3.68 3.67 3.66 3.66
graph81 pedfile81 datafile81 11 22 68 5.95 8.53 6.24 9.04 7.41 7.35 5.98 5.91
graph82 pedfile82 datafile82 21 27 95 6.38 8.90 6.06 11.10 9.02 8.49 5.56 5.44
graph83 pedfile83 datafile83 14 47 79 5.18 7.30 5.04 12.00 7.17 7.54 4.82 4.77
graph84 pedfile84 datafile84 15 22 93 7.75 10.46 7.46 10.98 9.41 9.40 6.85 6.53
graph85 pedfile85 datafile85 19 15 82 7.07 8.97 6.83 10.73 8.24 8.10 6.47 6.37
graph86 pedfile86 datafile86 28 38 90 4.22 6.72 4.17 6.47 6.86 6.93 4.14 4.14
graph87 pedfile87 datafile87 10 20 45 5.70 7.38 5.67 6.77 7.08 6.87 5.79 5.74
graph88 pedfile88 datafile88 15 45 22 2.09 2.09 2.18 2.12 2.09 2.09 2.09 2.09
graph89 pedfile89 datafile89 25 32 31 16.74 15.48 16.38 30.73 15.01 15.65 15.00 14.61
graph90 pedfile90 datafile90 12 34 76 5.27 6.00 4.74 5.63 5.72 5.73 4.52 4.49
graph91 pedfile91 datafile91 12 15 80 2.85 2.85 2.85 2.85 2.85 2.85 2.84 2.85
graph92 pedfile92 datafile92 25 49 3 3.88 3.89 3.88 3.89 3.89 3.89 3.89 3.89
graph93 pedfile93 datafile93 21 24 11 3.50 3.51 3.50 3.51 3.50 3.50 3.50 3.50
graph94 pedfile94 datafile94 21 42 12 7.25 8.78 7.29 8.51 8.01 8.40 6.83 6.78
graph95 pedfile95 datafile95 13 28 16 4.44 5.86 4.53 5.88 5.40 5.39 4.27 4.26
graph96 pedfile96 datafile96 13 42 33 8.38 11.66 8.91 10.60 10.64 10.45 7.70 7.74
graph97 pedfile97 datafile97 23 33 43 17.15 16.94 15.64 28.35 17.17 17.45 15.39 14.48
graph98 pedfile98 datafile98 27 22 44 6.51 8.89 6.52 8.21 7.99 7.97 6.14 6.12
graph99 pedfile99 datafile99 27 31 8 3.80 4.79 3.81 5.30 4.49 4.60 3.73 3.72

 

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Conclusion

The main purpose of this project was to find the optimal time division between various stochastic algorithms that were implemented. This problem seems to be complex and needs further investigation. But the most simple thing can be done: to use each stochastic algorithm in proportion to the probability that it is the best algorithm. Another purpose was to find the best algorithms. As we can see Minimal Fill-in and weighted Minimal Fill-in algorithms are clearly advantageous over the other algorithms. They find the lower cost than the other algorithms in 85% of cases. So, we can just use a combination of both of them only.

Acknowledgements:

I thank Dan Geiger for his help and guidance. I also thank Ma'ayan Fishelson for providing the program for generation of probabilistic networks.

 

Some of the heuristic algorithms are defined in the following references:

References

1. Koster, Arie M. C. A. / Bodlaender, Hans L. / Hoesel, Stan P. M. van Treewidth : computational experiments (Konrad-Zuse-Zentrum für Informationstechnik Berlin [ZIB] ; R 2001/38)