Experimental Results - Maximum Likelihood Haplotyping
(Superlink v1.4)

(updated November 24th 2004)


Experiment A: (Simulation Study)

This experiment tested our haplotyping algorithm on a complex pedigree of moderate size (Figure 2 in (Lin 1996)). So far, only an approximate haplotype analysis was possible for this pedigree. We simulated a random haplotype configuration for this pedigree using the simulation guidelines described by Lin and Speed (1997).  This pedigree consists of 27 individuals and is highly inbred. All individuals, except those in the first two generations, are typed at 10 polymorphic markers, each with 5 alleles of equal frequencies.  The recombination fraction between each pair of consecutive markers was set to 0.5.
 

Files (Superlink) pedfile (Genehunter)
 Graphical
Representation
(bmp)
#Markers
 #People
 Loops
 Run-Time
Superlink
Output
Superlink
data_expA / ped_expA
ped_expA
 ped
 10
 27
yes 
 208.71s
haplo

 

Experiment C: (Testing Accuracy)

This experiment tested the accuracy of SimWalk2 (Sobel and Lange, 1996), a state of the art program that uses MCMC.  We tested 75 data sets consisting of 15 to 50 individuals and 1 to 13 markers. As can be seen from the table below, SimWalk2 found a maximum likelihood assignment in 45 out of the 75 data sets.
 
 

Files (Superlink)
Graphical
Representation
(bmp)
#Markers
#People Loops
Log-Likelihood
Superlink (a)
Log-Likelihood
SimWalk2 (b)
  | a-b |
  -------- * 100 (%)
     a
data2_2/ped2_2
ped2 2
2
20
no
35.487592
35.4928 
0.0147
data2_3/ped2_3
ped2_3
3
20
no
 52.77289
52.77289
 =
data2_4/ped2_4
ped2_4
4
20
no
 65.852238
66.02245 
0.2581625
data2_5/ped2_5
ped2_5
5
20
no
 79.34017
79.34017 
 =
data2_6/ped2_6
ped2_6
6
20
no
89.648388
89.65359
0.0058
data2_7/ped2_7
ped2_7
7
20
no
107.591076
107.59628
0.0048
data2_8/ped2_8
ped2_8
8
20
no
123.733
123.7382
0.0042
data2_9/ped2_9
ped2_9
9
20
no
142.956431
142.96163
0.0036
data2_10/ped2_10
ped2_10
10
20
no
155.570819
155.57602
0.0033
data3_2/ped3_2
ped3_2
2
30
no
42.699327
42.69933
=
data3_3/ped3_3
ped3_3
3
30
no
62.367547
62.36755
=
data3_4/ped3_4
ped3_4
4
30
no
80.402762
80.40276
=
data3_5/ped3_5
ped3_5
5
30
no
103.071503
103.09499
0.0228
data3_6/ped3_6
ped3_6
6
30
no
121.41774
122.04171
0.5139
data4_2/ped4_2
ped4_2
2
40
no
55.922544
56.26392 
0.6104
data4_3/ped4_3
ped4_3
3
40
no
 78.703608
 78.70361
=
data4_4/ped4_4
ped4_4
4
40
no
 93.228489
93.22849 
=
data4_5/ped4_5
ped4_5
5
40
no
 113.861693
 113.86169
=
data4_6/ped4_6
ped4_6
6
40
no
 151.522777
151.52278 
=
data4_7/ped4_7
ped4_7
7
40
no
 187.075565
187.69689 
0.3321
data4_8/ped4_8
ped4_8
8
40
no
231.639865
 232.28922
0.2803
data4_9/ped4_9
ped4_9
9
40
no
 259.37
 259.86194
0.1897
data5_2/ped5_2
ped5_2
2
50
no
67.816695
67.81669
=
data5_3/ped5_3
ped5_3
3
50
no
100.956668
100.95667
=
data5_4/ped5_4
ped5_4
4
50
no
143.954956
143.95496
=
data5_5/ped5_5
ped5_5
5
50
no
186.75
186.75279
=
data5_6/ped5_6
ped5_6
6
50
no
223.18
223.17942
=
data5_7/ped5_7
ped5_7
7
50
no
251.2
251.19986
=
data11_1/ped11_1
ped11_1
1
15
yes
11.186272
11.18627 
=
data11_2/ped11_2
ped11_2
2
15
yes
 19.66975
 20.58198
4.6377
data11_3/ped11_3
ped11_3
3
15
yes
 30.753976
 30.75398
=
data11_4/ped11_4
ped11_4
4
15
yes
 38.872989
 38.87299
=
data11_5/ped11_5
ped11_5
5
15
yes
50.452158 
 50.45216
=
data11_6/ped11_6
ped11_6
6
15
yes
59.731785 
 60.27968
0.9173
data11_7/ped11_7
ped11_7
7
15
yes
67.51 
 67.87459
0.5401
data11_8/ped11_8
ped11_8
8
15
yes
78.71787
78.71787
=
data11_9/ped11_9
ped11_9
9
15
yes
88.40
88.76346
0.4112
data12_1/ped12_1
ped12_1
1
25
yes
19.230261
19.23026
=
data12_2/ped12_2
ped12_2
2
25
yes
31.380968
31.38097
=
data12_3/ped12_3
ped12_3
3
25
yes
47.361408
47.36141
=
data12_4/ped12_4
ped12_4
4
25
yes
61.682286
61.68229
=
data12_5/ped12_5
ped12_5
5
25
yes
72.659994
72.90819
0.3416
data12_6/ped12_6
ped12_6
6
25
yes
86.298164
86.29816
=
data12_7/ped12_7
ped12_7
7
25
yes
95.170176
95.17018
=
data12_8/ped12_8
ped12_8
8
25
yes
114.56
114.92013
0.3144
data13_1/ped13_1
ped13_1
1
25
yes
 19.390972
20.3182 
4.7818
data13_2/ped13_2
ped13_2
2
25
yes
 33.917682
 33.91768
=
data13_3/ped13_3
ped13_3
3
25
yes
 49.08427
 49.1747
0.1842
data13_4/ped13_4
ped13_4
4
25
yes
 62.884407
 62.88441
=
data13_5/ped13_5
ped13_5
5
25
yes
 80.090413
80.09041 
=
data13_6/ped13_6
ped13_6
6
25
yes
 96.764923
 96.76492
=
data13_7/ped13_7
ped13_7
7
25
yes
116.221252 
 116.22125
=
data13_8/ped13_8
ped13_8
8
25
yes
 133.519345
 133.62428
0.0786
data13_9/ped13_9
ped13_9
9
25
yes
149.350456
149.4554
0.0703
data14_1/ped14_1
ped14_1
1
35
yes
32.740483
35.04256
7.0313
data14_2/ped14_2
ped14_2
2
35
yes
46.84895
47.00533
0.3338
data14_3/ped14_3
ped14_3
3
35
yes
62.404775
62.40477
=
data14_4/ped14_4
ped14_4
4
35
yes
80.322863
80.52677
0.2539
data14_5/ped14_5
ped14_5
5
35
yes
 95.73
 96.06817
0.3533
 data15_1/ped15_1
 ped15_1
1
45
 yes
 42.36677
 45.33775
7.0125
data15_2/ped15_2
 ped15_2
2
45
 yes
 64.497267
 64.69015
0.2991
data15_3/ped15_3
 ped15_3
3
45 
 yes
88.049595
88.40202
0.4003
data16_1/ped16_1
ped16_1
1
27
 yes
20.24
20.24
 =
data16_2/ped16_2
ped16_2
2
27
 yes
28.2
28.2
 =
data16_3/ped16_3
ped16_3
3
27
 yes
36.17
36.17
 =
data16_4/ped16_4
ped16_4
4
27
 yes
 45.42
 45.42
=
data16_5/ped16_5
ped16_5
5
27
 yes
 55.96
 55.96
=
data16_6/ped16_6
ped16_6
6
27
 yes
 66.48
66.48 
=
data16_7/ped16_7
ped16_7
7
27
 yes
 74.45
74.45 
=
data16_8/ped16_8
ped16_8
8
27
 yes
 90.09
 90.09
=
data16_9/ped16_9
ped16_9
9
27
 yes
101.9 
 101.9
=
data16_10/ped16_10
ped16_10
10
27
 yes
 113.7
 113.7
=
data16_11/ped16_11
ped16_11
11
27
 yes
 122.95
 122.95
=
data16_12/ped16_12
ped16_12
12
27
 yes
 133.48
 133.48
=
data16_13/ped16_13
ped16_13
13
27
 yes
 141.45
 141.45
=
AVERAGE  DIFFERENCE
1.00683875

Experiment D: (Published Disease Data)

We analyzed two published data sets:

Files (Superlink)
Graphical
Representation
(bmp)
#Markers
#People
Loops
Output
Superlink
 Output
SimWalk2
datafile_krabbe / pedfile_krabbe
 ped_krabbe
8
9
  no 
haplo_krabbe
 haplo_krabbe
datafile_EA / pedfile_EA
 ped_EA
9
29
 yes
 haplo_EA
haplo_EA

 

Experiment E: (Stochastic Algorithms)    Detailed Results.

This experiment compared the performance of different stochastic-greedy algorithms. Each of the stochastic algorithms was run for 1000 iterations after the reduction rules have been applied. The graphs used were created from simulated pedigree data.
 

Experiment F: (Total Run Time)           Detailed Results.

This experiment compared the run time of likelihood computation with the new optimization algorithm for determining an elimination order (v1.4) to the run time with the previous optimization algorithm (v1.3).  The total run time includes both optimization time and inference time.  We tested 50 randomly simulated data sets, chosen so that the run times on the new version is above 10 seconds and below 10 hours.
 
 

Experiment G: (Benchmarks)               Detailed Results.

This experiment tested the performance of the three stochastic-greedy algorithms (Min-Weight, Min-Fill and Weighted Min-Fill), on eight known benchmakrs for Bayesian networks inference.  The stochastic algorithms are run after the reduction rules have been applied.  For each benchmark, we compare between the elimination cost found by Superlink, the elimination cost found by Hugin6.1 (Andersen et al., 1989; Hugin, 2002) and the best known elimination cost.
 

Experiment H: (Reduction Rules)           Detailed Results.

This experiment tested the gain due to the reduction rules presented in Eijkhof et al. (2002).  The data sets used are 100 graphs created from simulated pedigree data.
 

References:


Back To Superlink's web-page