IBDmap

We implemented the factorial HMM based algorithm in a java software called IBDmap. Several models are supported in this implementation, including both the standard model and the 4-track model which uses a first order Markovian model for the LD process in a subset of the founders.

Execution

The program is available as a ".jar" file. As such, the execution of the program is as follows:

java -cp IBDmap.jar Experiments.ExperimentWithConfiguration example.parameters

where example.parameters is the parameters input file for the execution. The parameters file contains <key>:<value> pairs, as detailed in the following section. A working example is available in the package.

To add plotting capabilities, one should add jfreechart-1.0.13.jar and jcommon-1.0.16.jar to the class-path.

Parameters

ParameterValuesDescription
methodInteger Method of inference:
-1  Simulation only
0  Standard model
1  2-track model
2  4-track model
GenerationsIntegerNumber of generations used in the simulation process (must be ≥ 2).
InferenceGenerationsIntegerNumber of generations that is assumed during inference (must be ≥ 2).
seedIntegerNumber used in the initializiation of the random seed. Important to enable exact reconstruction of results.
MarkerMapFNStringPath to marker-map file.
GeneticMapFNStringPath to genetic map file.
PriorFNStringPath to marker prior distribution file.
ConditionalFNStringPath to first-order LD tables file.
HaplotypesFNStringPath to haplotypes file, used in the simulation process.
OutputPosteriortrue/falseOutput the posterior probability to screen.
PlotResultstrue/falseDisplay posterior probability as a graph.
SimulationBaseFNStringSimulation output file name. Output is in IBDmap format. Make sure to use Simulation mode under method.
MerlinBaseFNStringSimulation output file name. Output is in Merlin format. Make sure to use Simulation mode under method.
PlinkBaseFNStringSimulation output file name. Output is in Plink format. Make sure to use Simulation mode under method.
GenotypesFNStringPath to genotypes file.
SourcesFNStringPath to true IBD status file.
InjectionLocationReal Inject an IBD segment in simulated data:
-2  No injection
-1  Random
≥0  Location in cM

File formats

The following sub-sections detail the file formats. In the case of a multi-column file format, the columns are separated with a white-space. We are assuming diallelic markers, denoting one allele as '0' and the alternative allele as '1'.

Marker map

This file contains the list of markers to be analyzed along their chromosome number and physical position:

IDStringMarker unique identifier.
ChromosomeIntegerChromosome number.
PositionIntegerPhysical position in base pair.

Genetic map

This file contains the list of genetic positions for markers, in cM.

IDStringMarker unique identifier.
PositionRealGenetic position in cM.

Prior tables

This file contains the prior of analyzed markers, for the '0' allele.

IDStringMarker unique identifier.
ProbabilityRealProbability of '0' allele

Conditional

This file contains the first-order LD tables for analyzed markers.

IDStringMarker unique identifier.
Probability1RealProbability of '0' allele, given previous '0' allele.
Probability2RealProbability of '0' allele, given previous '1' allele.

Haplotypes

This file contains haplotypes, mainly used in the simulation process.

IDStringMarker unique identifier.
Allele(i)0/1An allele for the ith haplotype

Genotypes

This file contains genotypes, mainly used in the IBD inference process. The header contains the list of individuals (String), and each lines contains the genotype information (preserving the order in the header):

IDStringMarker unique identifier.
Genotype(i)-1/0/1/2'1' allele count of the ith individual. Unknown value is marked with -1.

Sources (Genotypes IBD)

This file matches a specific Genotypes, produced during simulation.The header contains the list of individuals (String), and each lines contains the source information (preserving the order in the header):

IDStringMarker unique identifier.
Source(i)IntegerSource of corresponding allele. Values of 0 or 1 correspond to the common ancestor, and any other value matches an off chain founder. When individual have the same source (with both are 0 or both are 1), the corresponding marker location is considered IBD.