Project on Bioinformatics

Performing approximate inference by using a heuristic
which ignores extreme markers in the likelihood computation.

Winter 2004, under guidance of Prof. Dan Geiger and Ma'ayan Fishelzon
by Rita Shmukler and Tatyana Polovets



Problem Definition:

Until now, SUPERLINK performed exact inference. Regardless of how well algorithms are being perfected,  linkage analysis provides an ever growing challenge to computing because some disease models depend on multiple loci, markers are highly polymorphic and there are many markers available. Therefore the goal of the project is to  identify automatically when a model is too large to be handled via exact inference and to provide the means for performing approximate inference via a heuristic which ignores extreme markers in the computations.  



In order to reduce the running time of SUPERLINK, three heuristic methods were tested:

1. Clipping  markers which are located far from the iterated locus.

2. Clipping less informative markers.

3. Clipping one of two markers which are located very close to each other.


As a conclusion of all performed experiments with the heuristic methods mentioned above, was found a Final Algorithm which uses a combination of all the methods in order to reduce the run time of the program. The point of the interval where the maximum LOD score is accepted doesn't change when approximate program is activated.  The error rate of the LOD score depends on the correspondence of the values of Const Definitions to the current input file. We assume that the values of the constants we defined are optimal , i.e. minimize LOD error rate, but the user is invited to try them with different values.