Project in Bioninformatics

                            

            Visualization of SNP  Output

            Spring 2005, under guidance of

                                                                    Prof. Dan Geiger

                                                                    Anna Tzemach

                    Yulia Stolin

       by        Sabbagh Wissam

 

 

Linkage analysis aims to extract all the available inheritance from pedigrees and to test for coinheritance if chromosomal region with a trait. In principle , one can use either parametric methods, which involve testing  whether the inheritance pattern fits a specific model for a trait-causing gene, or nonparametric methods, which involve testing whether the inheritance pattern deviates from expectation under independent assortment.

 

Single Nucleotide polymorphism (SNP) is a small change in a sequence of DNA. In fact, it involves only a single chemical change. SNPs by definition donít cause health problems for people that have them, but  they can be useful when studying diseases in general population, as they are natural variations that we all have.

 

Currently available chips 100K SNP. Superlink ( program performing linkage analysis ) is capable to deal with such data, but itís becoming impossible for user to understand up to 1 M lines of output.

 

Goal

 

Implementing  a parser for the Superlink output and implementation of visualization itself.

 

TWO PROGRAMS

 

The project includes two programs , the first one uses Opengl library and the

second one uses gnuplot.

The aim of two programs was to enable the run in bash mode, since openGl creates

new windows,while in regular mode the aim was to enable visualization of graph like

zooming ,overlays and multi graphs.

The program in bash mode uses gnuplot , it draws graphs and export them into

postscript files . Its unix/linux dependent while the other program is cross-platform so

can be run in major platforms like Windows, unix, linux, Mac ,etc.

Despite of the graphic difference between the two programs, the algorithm and the

data structures are almost the same with small changes, so the next documentation cover the two programs.

 

Data Structsures

 

The two programs includes the following data structures

 

MarkerInfo

 

typedef struct MarkerInfo

{

            int size;

            double *loc;

            double *LN_LIKE;

            double *LOD_SCORE;

            char *marker_name;

} MarkerInfo ;

 

This data structure is used in order to hold the marker information .

loc : double array which contains distances from the first marker.

LN_LIKE: double array which contains the values of LN(LIKE)

LOD_SCORE : double array which contains the  values of LODSCORE.

marker_name : marker's name

size : the size of the arrays loc,LN_LIKE and LOD_SCORE.

 

Ranges

typedef struct Ranges

{

            double LeftPoint;

            double RightPoint;

            double LN_LIKE_Bottom;

            double LN_LIKE_Top;

            double LOD_SCORE_Bottom;

            double LOD_SCORE_Top;

} Ranges;

 

Ranges is used to hold the ranges points values of loc ,LN_LIKE and LODSCORE

It means that if  point in the area 

[LeftPoint, RightPoint]  X [LN_LIKE_Bottom, LN_LIKE_Top]

or  [LeftPoint, RightPoint]  X [LOD_SCORE _Bottom, LOD_SCORE _Top]

then its plotted in the screen.

 

I used the function screen to map the values of the markers in that area , this function is called screen

First of all lets understand how can we plot a pixel into the screen:

In my implementation the screen pixels goes from -1 to +1

 

 

x= -1,y=+1                          x,y=+1

 

 

               x=y=0.0

 

 

 

x=y=-1                              x=1,y=-1

 

 

So LeftPoint is mapped to -1 , and RightPoint mapped to +1  , so if we look for linear function to supply that we request

 x=LeftPoint Ť f(LeftPoint) = -1

 x=RightPoint Ť f(RightPoint) = +1

f(x) = 2/(RightPoint-LeftPoint)+LeftPoint

and that's what the function screen do.

The same operation is done on y.

 

Graphs

typedef struct Graphs

{

            long int counter;

            int size;

            char InputFile[50];

            double firstMarker;

}          Graphs;

 

counter is the number of markers in the graph

 size  is the size of each array in the markers (see struct marker above)

InputFile is the name of the data file

firstMarker : The location of the first marker.

 

WindowsInfo

 

typedef struct WindowsInfo

{

            int NumOfGraphs;

            Graphs AddedGraph[10];

            Ranges ranges;

} WindowsInfo;

 

My program includes the option to move from graph to another one , and to add another graph to the current one.

Therefore I used this data structure that consists on

NumOfGraphs : total number of graphs in the window

AddedGraph : array of struct Graphs (see Graphs above)

ranges : (see ranges above)

another global variables are current which identify the window , and last which identify the last window.

 

 

ZOOMING

 

One of the important elements of this project was Zooming.

This implementation consists on mapping the coordination of points into the windowís area, so only points in this area are plotted .

The function which do that is linear, in case of zoom in the area is decreased, and in case of zoom out, the area is increased.

Actually these area is a rectangle , let consider that (xb,yb) ,(xb,ye), (xe,yb) , (xe,ye) are the head points of it , ( xb < xe , yb < ye) , each plotted point (x,y) must fulfill the condition xb < x < xe  and  yb < y < ye . There are 4 options of zooming:

 

But we must pay attention that the zooming is realated only on these points , and not on in specific point, so in order to complete the operation of zooming , I added the option of moving the axis horizontally and vertically by the arrow keys in order to give perfect view of the graph, which is implemented also on moving the intervals [xb,xe] and [yb,ye].

The information about the ranges are located in the data structure Ranges , so when the user want to make zoom in/out  the structure Ranges is updated respectively.

 

Arguments of the program:

 

The program can run from command line with the arguments:

<program name> -f  <file1>  <start1>  Ö <fileN> < startN>  [-o output file]

 

<file1> .. <fileN> are the input files.

<start1> .. <startN> are the locations of first markers.

output file  could be pdf or ps file. Its optional ,in case is not supplied the output file name is out.pdf.