Time comparison between the GenHunter algorithm and Sparse tree algorithm

·         Running Environment :    Windows XP pro, P3 1200,  1 GB RAM

 File name Number of loci Number of people Number of Founders Number of Non-founders percentages of genotyped runtime of sparse tree algorithm (STA) in milliseconds 2^f Runtime prediction of STA with founders reduction In milliseconds (STA / 2^f) Runtime GH In milliseconds 10 6 2 4 50 0 4 0 0 10 8 4 4 50 0 16 0 0 10 12 6 6 50 0 64 0 0 10 15 4 11 50 100 16 6.25 20 10 6 2 4 80 0 4 0 0 10 8 3 5 80 10 8 1.25 0 10 12 4 8 80 30 16 1.875 10 10 15 5 10 80 61 32 1.90625 10 50 6 2 4 50 70 4 17.5 20 50 8 4 4 50 20 16 1.25 10 50 12 4 8 50 180 16 11.25 20 50 15 5 10 50 1442 32 45.0625 80 50 6 3 3 80 10 8 1.25 0 50 8 3 5 80 20 8 2.5 10 50 12 4 8 80 80 16 5 20 50 15 4 11 80 1893 16 118.3125 210 100 6 3 3 50 0 8 0 0 100 8 3 5 50 0 8 0 10 100 12 5 7 50 931 32 29.09375 60 100 15 5 10 50 1452 32 45.375 210 100 6 3 3 80 10 8 1.25 10 100 8 3 5 80 30 8 3.75 20 100 12 4 8 80 481 16 30.0625 150 100 15 5 10 80 12368 32 386.5 1842 200 6 2 4 50 20 4 5 0 200 8 3 5 50 190 8 23.75 20 200 12 4 8 50 440 16 27.5 60 200 15 5 10 50 3375 32 105.46875 360 200 6 2 4 80 20 4 5 10 200 8 3 5 80 250 8 31.25 70 200 12 3 9 80 7662 8 957.75 1732 200 15 5 10 80 9124 32 285.125 631 50 15 5 10 According the template 139411 32 4356.59 4827 200 15 5 9 10 180830 32 5650 4206 200 15 5 9 20 88096 32 2753 3615

Conclusions:

Ø      The runtime of the sparse tree algorithm (STA), is longer than the runtime of the GH algorithm, however when predicting the runtime of STA with founders reduction, the runtime of STA is usually much shorter than the runtime of GH.

 The pedigree considered in the following table correspond to the data appears in pedfile32, datafile32 in the previous table (last line) Datafile name Pedfile name Total number of homozygous positions over all loci and persons runtime of sparse tree algorithm (STA) in milliseconds 2^f Runtime prediction of STA with founders reduction In milliseconds (STA / 2^f) Runtime GH In milliseconds Runtime ratio datafile32 0 125840 32 3932.5 16634 4.229879 datafile32 pedfile32_1 100 91591 32 2862.21 12608 4.404988 datafile32 pedfile32_2 200 71273 32 2227.28 11066 4.968392 datafile32 pedfile32_3 300 56492 32 1765.375 7481 4.237627 datafile32 pedfile32_4 400 46627 32 1457.09 7050 4.838411 datafile32 pedfile32_5 500 26578 32 830.56 5208 6.270468 datafile32 pedfile32_6 600 23603 32 737.59 4586 6.217546

 Runtime comparison between GH and Sparse tree algorithms

Conclusions:

Ø      The more homozygous positions over all markers the faster the algorithms run (GH and STA). This occurs due to symmetry nodes
(as explained in section 7-logic-i).

Ø      The ratio between the runtime of STA and GH seems to increase with the increase of the homozygous marker positions, due to the tree sparseness.

 The pedigree considered in the following table correspond 15 people, 5 founders, 100 loci, 80% genotyped. Datafile name Pedfile name Number of typed founders runtime of sparse tree algorithm (STA) in milliseconds 2^f Runtime prediction of STA with founders reduction In milliseconds (STA / 2^f) Runtime GH In milliseconds Runtime ratio Datafile24 0 105232 32 3288.5 5368 1.632355 Datafile24 pedfile24_1 1 51244 32 1601.375 3996 2.495356 Datafile24 pedfile24_2 2 26428 32 825.875 2984 3.613138 Datafile24 pedfile24_3 3 10826 32 338.3125 1682 4.971735 Datafile24 pedfile24_4 4 12308 32 384.625 1802 4.685083 Datafile24 pedfile24_5 5 13289 32 415.2813 1009 2.429679

Conclusions:

Ø      The more genotyped founders the faster the algorithms run (GH and STA).
This occurs due to restrictions imposed on the founder’s alleles assignment, which increases the sparseness of the tree.

 The pedigree considered in the following table correspond 10 people, 4 founders, 200 loci Datafile name Pedfile name Percentages of genotyped people per generation. runtime of sparse tree algorithm (STA) in milliseconds 2^f Runtime prediction of STA with founders reduction In milliseconds (STA / 2^f) Runtime GH In milliseconds Runtime ratio DatafileP0 0 10 16 0.625 10 16 DatafileP10 pedfileP10 10 90 16 5.625 10 1.777778 DatafileP20 pedfileP20 20 20 16 1.25 10 8 DatafileP30 pedfileP30 30 510 16 31.875 40 1.254902 DatafileP40 pedfileP40 40 190 16 11.875 30 2.526316 DatafileP50 50 550 16 34.375 80 2.327273 DatafileP60 60 300 16 18.75 70 3.733333 DatafileP70 pedfileP70 70 811 16 50.6875 160 3.156597 DatafileP80 pedfileP80 80 470 16 29.375 60 2.042553 DatafileP90 pedfileP90 90 280 16 17.5 80 4.571429 DatafileP100 pedfileP100 100 270 16 16.875 70 4.148148

Conclusions:

Ø      The runtime seems to be shorter at the high end and low end of the graph, i.e. when the percentage of typed people is high or low,
this can be explained by the following reasons:

1.      High percentages of genotyped people impose many restrictions on the founder’s alleles assignments, i.e. many impossible gene flows and many symmetry nodes, these shorten the calculations periods. (As explained in section 7-logic-i,ii).

2.      Low percentages genotyped people means low number on calculations, when searching the tree non-genotyped person means no contribution to the marker probability calculations, i.e. skipping to the next node without calculations.

 The pedigree considered in the following table correspond 10 people, 4 founders, 80% genotyped Datafile name Pedfile name Number of loci runtime of sparse tree algorithm (STA) in milliseconds 2^f Runtime prediction of STA with founders reduction In milliseconds (STA / 2^f) Runtime GH In milliseconds Runtime ratio 50 490 16 30.625 50 1.632653 100 641 16 40.0625 100 2.4961 150 911 16 56.9375 140 2.458836 200 961 16 60.0625 180 2.996878 250 1302 16 81.375 230 2.826421 300 1422 16 88.875 290 3.26301 350 1762 16 110.125 320 2.905789 400 1872 16 117 370 3.162393

Conclusions:

Ø      The run time of both algorithms is linear with the number of loci.

1.      For GH the ratio is about: 0.9

2.      For SPA (predicted with founders reduction) the ratio is about: 0.25.

Ø      The runtime ratio between GH and SPA (considering founders reduction) tend to grow with the number of loci growth.