Project in Bioninformatics
Visualization of SNP Output
Spring 2005, under guidance of
Prof. Dan Geiger
Linkage analysis aims to extract all the available inheritance from pedigrees and to test for coinheritance if chromosomal region with a trait. In principle , one can use either parametric methods, which involve testing whether the inheritance pattern fits a specific model for a trait-causing gene, or nonparametric methods, which involve testing whether the inheritance pattern deviates from expectation under independent assortment.
Single Nucleotide polymorphism (SNP) is a small change in a sequence of DNA. In fact, it involves only a single chemical change. SNPs by definition don’t cause health problems for people that have them, but they can be useful when studying diseases in general population, as they are natural variations that we all have.
Currently available chips 100K SNP. Superlink ( program performing linkage analysis ) is capable to deal with such data, but it’s becoming impossible for user to understand up to 1 M lines of output.
Goal
Implementing a parser for the Superlink output and implementation of visualization itself.
TWO PROGRAMS
The project includes two programs , the first one uses Opengl library and the
second one uses gnuplot.
The aim of two programs was to enable the run in bash mode, since openGl creates
new windows,while in regular mode the aim was to enable visualization of graph like
zooming ,overlays and multi graphs.
The program in bash mode uses gnuplot , it draws graphs and export them into
postscript files . Its unix/linux dependent while the other program is cross-platform so
can be run in major platforms like Windows, unix, linux, Mac ,etc.
Despite of the graphic difference between the two programs, the algorithm and the
data structures are almost the same with small changes, so the next documentation cover the two programs.
Data Structsures
The two programs includes the following data structures
MarkerInfo
typedef struct MarkerInfo
{
int size;
double *loc;
double *LN_LIKE;
double *LOD_SCORE;
char *marker_name;
} MarkerInfo ;
This data structure is used in order to hold the marker information .
loc : double array which contains distances from the first marker.
LN_LIKE: double array which contains the values of LN(LIKE)
LOD_SCORE : double array which contains the values of LODSCORE.
marker_name : marker's name
size : the size of the arrays loc,LN_LIKE and LOD_SCORE.
Ranges
typedef struct Ranges
{
double LeftPoint;
double RightPoint;
double LN_LIKE_Bottom;
double LN_LIKE_Top;
double LOD_SCORE_Bottom;
double LOD_SCORE_Top;
} Ranges;
Ranges is used to hold the ranges points values of loc ,LN_LIKE and LODSCORE
It means that if point in the area
[LeftPoint, RightPoint] X [LN_LIKE_Bottom, LN_LIKE_Top]
or [LeftPoint, RightPoint] X [LOD_SCORE _Bottom, LOD_SCORE _Top]
then its plotted in the screen.
I used the function screen to map the values of the markers in that area , this function is called screen
First of all lets understand how can we plot a pixel into the screen:
In my implementation the screen pixels goes from -1 to +1
x= -1,y=+1 x,y=+1
|
x=y=0.0
|
x=y=-1 x=1,y=-1
So LeftPoint is mapped to -1 , and RightPoint mapped to +1 , so if we look for linear function to supply that we request
x=LeftPoint è f(LeftPoint) = -1
x=RightPoint è f(RightPoint) = +1
f(x) = 2/(RightPoint-LeftPoint)+LeftPoint
and that's what the function screen do.
The same operation is done on y.
Graphs
typedef struct Graphs
{
long int counter;
int size;
char InputFile[50];
double firstMarker;
} Graphs;
counter is the number of markers in the graph
size is the size of each array in the markers (see struct marker above)
InputFile is the name of the data file
firstMarker : The location of the first marker.
WindowsInfo
typedef struct WindowsInfo
{
int NumOfGraphs;
Graphs AddedGraph[10];
Ranges ranges;
} WindowsInfo;
My program includes the option to move from graph to another one , and to add another graph to the current one.
Therefore I used this data structure that consists on
NumOfGraphs : total number of graphs in the window
AddedGraph : array of struct Graphs (see Graphs above)
ranges : (see ranges above)
another global variables are current which identify the window , and last which identify the last window.
ZOOMING
One of the important elements of this project was Zooming.
This implementation consists on mapping the coordination of points into the window’s area, so only points in this area are plotted .
The function which do that is linear, in case of zoom in the area is decreased, and in case of zoom out, the area is increased.
Actually these area is a rectangle , let consider that (xb,yb) ,(xb,ye), (xe,yb) , (xe,ye) are the head points of it , ( xb < xe , yb < ye) , each plotted point (x,y) must fulfill the condition xb < x < xe and yb < y < ye . There are 4 options of zooming:
But we must pay attention that the zooming is realated only on these points , and not on in specific point, so in order to complete the operation of zooming , I added the option of moving the axis horizontally and vertically by the arrow keys in order to give perfect view of the graph, which is implemented also on moving the intervals [xb,xe] and [yb,ye].
The information about the ranges are located in the data structure Ranges , so when the user want to make zoom in/out the structure Ranges is updated respectively.
Arguments of the program:
The program can run from command line with the arguments:
<program name> -f <file1> <start1> … <fileN> < startN> [-o output file]
<file1> .. <fileN> are the input files.
<start1> .. <startN> are the locations of first markers.
output file could be pdf or ps file. Its optional ,in case is not supplied the output file name is out.pdf.