Finding approximate palindromes in genomic sequences.

byTlalit Freund and  Noga Engel.

Background

In this section we defined different types of palindromes and gave an example for each type.

We also found important biological roles, in which palindromes take place.

palindrome background -

Problem

·        Implementation of an algorithm for finding approximate palindromes in gnomic sequences.

·        Usage of the algorithm for purposes of creating palindrome "fingerprints".

·        Develop methods for testing the significance of specific approximate palindromes.

Implementation

The program goes through the DNA sequence, searching for all palindromes in a specific length, a range of allowed mismatches and a range of allowed gaps.

The Algorithm -

Running example  -

We build a statistical model to calculate our results expectation on random (non gnomic) sequence

By using these statistical models we can evaluate the significance of the palindromes we have found.

Statistical models  -

Results

In order to check if the algorithm we wrote is accurate, and works according to our demands, we compared our formula expectation, with the average result of a large bunch of random sequences.

Testing our statistical models -

We compared our formulas expectation with our program results on different types of gnomic sequences and evaluate these results.

Genome from different Viruses -

Genome from different Bcteriophages -

Different organisms -

Histones H4 family -

Palindrome application - Find pal's in a nucleotide sequence:   last_ver_pal.c

Expectation calculating - formula a:  palindrome.m (matlab file)

Expectation calculating - formula b: pal_acgt.m (matlab file)

Presentation: presentation.ppt

References

• Finding Approximate Palindromes in Strings   Alexandre H.L. Porto, Valmir C. Barbosa.
• Time and Memory Efficient Algorithm for Extracting Palindromic and Repetitive Subsequences in Nucleic Acid Sequences (1999).Tatsuhiko Tsunoda, Masao Fukagawa, Toshihisa TAKAGI.