__NPL Analysis:
finding NPL scores Spairs and Sall.__

__Algorithms and complexity.__

**The main idea** of the implementation is to go over all possible inheritance
vectors in special order and for each vector to find Spairs
and Sall scores. We use “gray code” to define the
order of looping (over inheritance vectors):

Example of “gray code” for vector containing 3 bits: 000, 001, 011, 010,
110, 111, 101, 100. This order is very useful, because only 1 bit in vector is
changed per iteration, and in the case when the inheritance bit is changed in
the person which is not affected or has no affected children, NPL score (Spairs or Sall) doesn’t depend on
this bit, so we can just copy the scores from the previous iteration.

This order is also used in the computation of Sall(v) for some vector v:
we collect alleles from the affected individuals, one allele from each
individual, get some collection and compute it’s score; when we want to get the
next collection, we change 1 bit for some individual (in the order of “gray
code”) and use the saved score of the previous collection to compute in O(1)
score for current collection.

Here is more detailed description of the algorithm.

**The main steps** in the computation of Spairs
and Sall scores:

Let’s assume:

F – number
of founders in the pedigree

N – number
of non-founders in the pedigree

A – number
of affected individuals in the pedigree

SizeV = 2*N-F– the size
of inheritance vector

PossibilitiesV = 2^SizeV – number of inheritance
vectors

**1)** first
of all, we need to initiate all the relevant data structures for the further
computation:

- the array “ped” of the individuals in current pedigree, where indexes
of parents are smaller than indexes of their children (parents appear to the
left of their children in “ped” – topological sorting
of pedigree graph), and for each person is stored the next information: indexes
of parents in “ped” (mother and father),
affected(true/false), allele1 and allele2 (out of 2*F founders alleles), useful
(if the person has no affected children and is not affected himself => not
useful, else useful). In this step, we initiate “ped”
by using arrays PedigreeGraph, Founders and NonFounders (they already exist in the program), and each
person in “ped” gets indexes of parents and affected status : time and memory complexity is O(F+N);

- we
go over “ped” and create the linked list of affected
persons which is then converted to array of pointers to affected persons in “ped”(Affected array): time complexity O(F+N), memory O(A);

- we
go over all founders in “ped” and for each founder
with index i we store alleles (2*i
= allele1, 2*i+1 = allele2): time complexity O(F);

- we allocate array Falleles of size 2*F, which will be used for storing the
number of times each founder allele appears in affected individuals during the
computation of NPL scores, initiated to 0 at each cell: memory and time O(F);

- we
also allocate array Aff of size A (for Sall computation), each cell with index i
in Aff will reflect the status of collecting allele
from appropriate affected individual (with index i in
Affected): 1 if allele1 is collected, 2 if allele2 is collected (all cells are
initiated to 1): memory O(A), time O(A);

- in order to initiate
“useful” field of each person in “ped”, we
iteratively go over “ped” from the last index (F+N-1)
to first and perform for each person: if he is affected => update for him
and recursively for all his ancestors “useful”=true, else go to the next
(previous in order of “ped”) person; time complexity
O(F+N);

- we allocate array Ivector (inheritance vector) of size SizeV,
in each cell store: inheritance status (0 or 1; initiated to 0), index of
person in “ped” for which this cell is responsible;
parent status for this person (mother or father), for which this cell is
responsible; last 2 fields we initiate by going through all nonfounders
in “ped” and only for those nonfounders
that are not first children of their parents, we store their bits in Ivector; all the alleles in nonfounders
bits which are first children of their parents are stored once (in “ped”, allele1 and allele2) and will be never changed
(assuming inheritance pattern 0 at corresponding bits), for the rest of nonfounders – their alleles are stored using the
inheritance pattern defined by Ivector (and will be
changed at each iteration of the main loop, for each inheritance vector): time
complexity O(N), memory O(SizeV);

Overall time and memory complexity of step 1: O(F+N).

Now we are ready to perform the computation of NPL scores, step 2.

**2)** As it was mentioned above, in the main loop of this step we go over all
inheritance vectors in the “gray code” order, in each
iteration we also remember the values of Spairs
and Sall from the previous iteration. Before
computing new scores we check if the current bit of Ivector
to be changed corresponds to useful person or not: if not => we change this
bit and copy the previous values of Sall and Spairs to the current scores, and go to the next vector. If
the person is useful => we change the bit, rearrange alleles in “ped” according to new inheritance vector (this may take
O(F+N) time) and compute Spairs(v) and Sall(v) separately (for the current inheritance vector v):

__Computing Spairs(v)__**: **intiation: Spairs(v) = 0;** **we go through all affected persons
using Affected array and for each allele (1 and 2) of person i we add to Spairs(v) Falleles[allele] and then increase Falleles[allele]++;
this yields exactly the number of pairs of persons that share some founder
alleles (IBD, because each founder allele is designated with unique number).

Time complexity: O(A).

__Computing Sall(v)__**: ** initiation: Sall(v)
= 1; Sum =0;

For each affected person i from Affected we collect allele1 (by setting Aff[i] = 1 and increasing Falleles[i.allele1]++),
and multiply Sall(v) <- Sall(v)*Falleles[i.person];

Time complexity of initiation:
O(A);

This yelds
initial Sall(v) score ( = Pi (Falleles[i]!); Sum = Sall(v);

Now we iterate through all
possible collections exactly the same way as we did with vectors in main loop
using “gray code” order, and for each collection bit i
to be changed we perform (assuming that the score from the previous iteration
is Sall(v)):

If Aff[i] = 1 (then now we need to pick up allele2 of person i instead of allele1) :

Aff[i]
= 2;

We divide Sall(v) by the number of times that allele1 of person i appears in Falleles (by Falleles[Affected[i].allele1]);

Decrease Falleles[Affected[i].allele1]-- (because now we pick up allele2 instead of
allele1);

Increase Falleles[Affected[i].allele2]++;

We multiply Sall(v)
by the number of times that allele2 of person i
appears in Falleles (Falleles[Affected[i].allele2]);

Else we do exactly the same
but allele1 and allele2 are now changed in their places.

We add Sall(v) to Sum;

Each iteration takes O(1) time.

At the end of loop : Sall(v) = Sum/(2^A);

The overall time complexity of
computing Sall(v) is O(A) + (number of iterations)*O(1) = O(A)+O(2^A) =
O(2^A);

The overall time complexity of step 2:

(number of inheritance vectors)*( O(F+N) + O(A) + O(2^A) ) = O(PossibilitiesV*(F+N+A+2^A)).
Memory complexity of this step is equal to the memory complexity of step 1 = O(F+N) , we just use structures which were initialized in
step 1.

So the memory complexity of the whole algorithm is O(F+N) and time
complexity is O(PossibilitiesV*(F+N+A+2^A)) = O(
2^(A+2*N-F) + (F+N+A)*2^(2*N-F) ). By the definition of “O” it can be said that
time complexity is O( 2^(A+2*N-F) ) only, but in practice O( (F+N+A)*2^(2*N-F)
) can be also meaningful.