Program Options

Superlink is a software for performing exact genetic linkage  analysis in general pedigrees.  Several different programs for running Superlink exist.  The desired program is specified by writing the appropriate program code in the first line of the Locus File.  There are also specific input lines that need to be specified for each program in the last few lines of the Locus File.

There are currently seven program options available, coded 4, 5, 8, 9, 10, 12, 13, 21,31, and 40. Program options 10, 12 and 13 have been introduced in Version 1.4.  Program codes 21, 31, 40 introduced in Version 1.5 for use by programmers. Six programs compute lod-scores and likelihoods at various locations. The difference between them is the exact locations where these evaluations take place, whether or not the program seeks the maximum likelihood, and the specification format of the input/output. Program option 10 is for performing maximum likelihood haplotyping.

• SuperLinkmap program (program code 4):

This program computes likelihood and lod-scores of one iterated locus, usually the disease locus, against a fixed map of other loci. The program moves the iterated locus within its interval on the marker map.  This interval is either an exterior interval on one  side of the map, or an interior interval.

The interval is divided into segments of equal length according to the number of likelihood evaluations requested by the user. The length of the interval is computed via Haldane's mapping function for interior intervals, as explained below,  and is set to 0.5 for exterior intervals. Likelihood evaluations are made by moving the iterated locus from left to right in its interval,  according to increment calculated by dividing the interval's computed length into the requested number (minus one) of  likelihood evaluations.

That is, if A and C are the flanking markers of the interior interval and B is the locus that is being moved, then the length theta(AC) is calculated as follows according to Haldane's map function:

theta(AC) = theta(AB) + theta(BC) - 2 * theta(AB) * theta(BC)
In each iteration, theta(AB) is incremented by the increment calculated and theta(BC) is computed via the above formula keeping theta(AC) constant.
To match the specification of Linkage/Fastlink, we have modified  a bit the calculations just described. This is best explained via an example.  Suppose the specified map distance between loci A and B is 0.1 and between B and C is also 0.1.  Then the length of the interval theta(AC) is 0.18. If the user requests 4 likelihood calculations, the increment is set to be 0.18/(4-1)=0.06. The starting point however is 0.1, namely, the  specified distance between loci A and B.

Now,  the user must  also specify  the maximal distance allowed between Locus A and Locus B.  If this distance equals or exceeds the interval's length, 0.18 in our example, then the  program computes the likelihood at positions, 0.1, 0.16. If the distance is smaller than the intervals' computed length,  then the program computes the likelihood so that the final position does not exceed the specified distance. E.g. if the user specifies 0.15, than the program evaluates the  likelihood only in position 0.1, in our example.

To use the program so that all requested likelihood computations are performed, the user should specify a distance of 0 between Locus A and Locus B and a distance of, say, 0.18 between Locus B and Locus C, and the maximal distance should be set to 0.18 (or larger). In this case, likelihood computations are performed for locations 0, 0.06, 0.12, 0.18.

The input line is:     locus_varied    finishing_value    number_of_evaluations

The  parameter  locus_varied  is the index, in input order, of the iterated locus. The parameter finishing_value is  the maximal distance allowed, measured in recombination fraction (RF), between the iterated locus and the one following it in case of a left-side interval and between the iterated locus and the one preceding it, in any other interval. The  parameter  number_of_evaluations  is used to compute the number of points for likelihood evaluations in the interval as explained above.

• SuperMlink program (program code 5):

This program computes likelihood and lod-scores of one iterated locus, usually the disease locus, against a fixed map of other loci, under two assumptions: the iterated locus is leftmost on the map and all markers are completely linked (i.e., on an interval of length smaller than 0.5 RF).  The program evaluates the likelihood at the initial recombination values and then moves the iterated locus within its interval on the marker map using an increment given by the user until it reaches or passes the final recombination value specified by the user. At each increment the program evaluates the likelihood.

The input line is:      recombination_varied     increment     finishing_value

The parameter recombination_varied is the index of the recombination fraction to be varied (should be 1). The parameter increment  is the requested increment  (in RF) for the recombination fraction. The parameter finishing_value is the maximal value  (in RF) for the varied recombination fraction.

Remark:  Note that SuperLinkmap (program code 4) can be used instead of the more restricted SuperMlink program (code 5). Code 5 is implemented and supported to match inputs of other linkage programs.  The main difference from a user perspective is that in code 4, one inputs the number of iterations, while in code 5, the user inputs the size of the increment.  The main difference in the computations is that in code 4 two values of theta are changed at each iteration within an interior interval, while for code 5, only one theta is changed (and therefore this program is appropriate only for exterior intervals).

• SuperGH program (program code 8):

This program computes likelihood and lod-scores of the disease locus/loci against a fixed map of markers. There can be either one disease locus or two. In case of two disease loci, they can be either linked on the same chromosome or unlinked (say, on 2 different chromosomes).

• In case of one disease locus, the program moves it within all intervals included in the specified region of the map. The positions of the disease locus throughout the scan can be specified either by specifying the size of the step between each two successive positions of the disease locus or by specifying the number of equally spaced positions between each two adjacent markers. The program computes and outputs likelihood and lod-scores for each position of the disease locus.
• In case of two disease loci, they can both be iterated or only one of them. If both disease loci are to be iterated, the program moves them to all the possible combinations of a possible position for the first locus and a possible position for the second locus. The requested positions for each iterated locus can be specified in either one of the two ways described above.

The input lines are:

• for one iterated disease locus:
1
<-s   step_iter   or   -n  num_iter>         start_pos        end_pos      <-o    off_map>

• for two iterated disease loci:
2
<-s  step_iter_1   or   -n  num_iter_1>      start_pos_1       end_pos_1    <-o    off_map_1>
<-s  step_iter_2   or   -n  num_iter_2>      start_pos_2       end_pos_2    <-o    off_map_2>

In this program disease loci that are to be iterated must appear first in the input order.

The parameters start_pos  and end_pos specify the indices of the markers that delimit the interval on the map in which the iterated disease locus is moved. Each fixed locus is treated as a marker, and therefore the indices of the markers are counted starting from the first fixed locus. The positions of the iterated disease loci can be specified in one of two ways: either by using the parameter step_iter  together with the flag -s, or by using the parameter num_iter  together with the flag -n. The parameter step_iter specifies the size of the increment between each two successive positions of the disease locus.  This increment can be specified either by recombination fraction RF if it is lower than 0.5 or by cM if it is greater than 0.5. The parameter num_iter  specifies the number of equally spaced positions for evaluations between each two successive markers. The parameter off_map, together with the flag -o, is used to specify a distance which extends the scan interval on both sides of the map. This parameter is optional. If it is not specified, it is assumed to be zero.

Remark: In order to specify two unlinked marker maps in the data, simply enter a recombination fraction of 0.5 between the rightmost marker on the 1st chromosome and the leftmost marker on the 2nd chromosome.

• SuperBinaryIlink program (program code 9):

This program computes a maximum likelihood estimation of recombination fractions. The program finds the maximum likelihood position of the iterated locus in the interval of the map in which it is positioned, using binary search.

The input line is:     locus_varied

The parameter locus_varied  is the index, in input order, of the iterated locus.

• SuperHaplo program (program code 10; only from version 1.4):

This program finds a maximum-likelihood haplotype assignment for the individuals in the input pedigrees. No additional input is required.

• SuperOptLink program (program code 12; only from version 1.4):

This program moves the disease locus over the whole map, or a specified part of it, and finds the position which produces the maximum LOD score, using golden section search. This program has two variations:

• Variation #1: Find the position of the disease locus which produced maximum LOD score, assuming the inheritance model specified in the input.
• Variation #2: Find the position of the disease locus which produced maximum LOD score, once assuming dominant inheritance with 50% penetrance, and once assuming recesseive inheritance with 50% penetrance, and then commit to the model which produced the higher maximum LOD score and subtract 0.3 from this score to correct for multiple tests. This option is called MMLS-C (Maximized Maximum LOD score with correction).
The input lines are:
1    <-M>
start_pos        end_pos        <-o    off_map>

In this program, the disease locus that is to be iterated must appear first in the input.

The parameter -M is used to specify that variation #2 of MMLS-C will be performed.
The parameters start_pos and end_pos specify the indices of the markers that delimit the interval on the map in which the iterated       disease locus is moved. All loci except the specified disease locus are treated as markers with regards to this count. The parameter off_map, together with the flag -o, is used to specify a distance which extends the scan interval on both sides of the map. This parameter is optional. If it is not specified, it is assumed to be zero.

• SuperOptMarginal program (program code 13; only from version 1.4):

This program moves the disease locus over the whole map, or a specified part of it, and finds the position of the disease locus which produced the maximum LOD score, once assuming dominant inheritance, and once assuming recesseive inheritance.The likelihood at a specific position of the disease locus is computed by averaging the likelihood given the specific inheritance model (dominant or recessive) over all penetrance values. The inheritance model which produced the higher maximum LOD score is commited to and and 0.3 is subtracted from the maximum LOD score to correct for multiple tests. This option is called MBLOD (Maximized Bayesian LOD score).

The input lines are:
1
start_pos        end_pos        <-o    off_map>

In this program, the disease locus that is to be iterated must appear first in the input.

The parameters start_pos and end_pos specify the indices of the markers that delimit the interval on the map in which the iterated       disease locus is moved. All loci except the specified disease locus are treated as markers with regards to this count. The parameter off_map, together with the flag -o, is used to specify a distance which extends the scan interval on both sides of the map. This parameter is optional. If it is not specified, it is assumed to be zero.

• Extract Bayes network in XML format (program code 21; only in version 1.5)

This program prints a Bayesian network in XML format to the file "bayesNetFile.xml". The network represents the input data after the elimination of impossible genotypes according to the pedigree and after the elimination of nodes witha single value left at the Bayesian network. No additional input is required. Check XML format for details.

• Extract Bayes network in Microsoft XBN format (program code 31; only in version 1.5)

This program prints a Bayesian network in Microsoft XBN format to the file "bayesNetFile.xbn". The network represents the input data after the elimination of impossible genotypes according to the pedigree and after the elimination of nodes witha single value left at the Bayesian network. No additional input is required. Check Microsoft XBN format for details.

• Find the elimination order of the variables in the Bayes network (program code 40; only in version 1.5)

This program prints the elimination order of the variables in the Bayesian network created from the input data. The elimination order can differ from one execution to another, due to our stochastic algorithms for finding teh most effective elimination order.