SimTree: A Tool for Computing Similarity Between RNA Secondary Structures

Eran Eden1,* , Izhar Wallach1,2,*,  Zohar Yakhini1,3

 

 

Key words: RNA secondary structure, edit distance, structure similarity, rRNA, RNase

 

RNA secondary structures (RSS) play a vital role in determining RNA tertiary structure and therefore affects its function and activity. Being able to accurately assess the similarities between RSS is valuable for a wide range of applications including taxonomy, classification, as well as for inferring function from structure. We present a novel method termed SimTree for computing and analyzing the similarity between two RSS. It receives two RSS as input and attempts to solve the following questions: (i) How similar are the RSS? (ii) Which substructures make them similar? This is done by transforming the RSS into labeled trees and then computing the distance between the two trees resulting in a similarity score and a simulations based p-value. The process also yields a mapping between the two structures that can be used to identify common substructures.

We propose an unbiased benchmark for assessing the accuracy of RSS comparison methods, based on reconstructing phylogenies and comparing them to a standard phylogenetic tree. Using rRNA secondary structures and SimTree as the comparison method we obtained accurate reconstructions of several phylogenies. We also applied SimTree to compute the similarities between a wide range of RNase P RNA secondary structures. It accurately classified organisms into kingdom, subclasses and even organelles based solely on the morphology and shape of their RNase P RNA without utilizing any nucleotide level information. Our findings indicate that the secondary structure of functional RNA is highly conserved. Furthermore, evolutionary changes on the nucleotide level induce secondary structure changes that enable accurate reconstruction of phylogenetic trees. Overall, our findings suggest that RNA comparisons on the secondary structure level are informative and can be used as a complementary approach to classic sequence comparing methods. The SimTree tool, which enables such comparisons, is publicly available at http://bioinfo.cs.technion.ac.il/SimTree.

 

 

 

* The first two authors contributed equally to this work.

1 Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel. E-mail: eraneden@ cs.technion.ac.il

2  Keddem Bioscience, Ashkelon, Israel. E-mail: izharw@keddem.com

3 Agilent Technologies, Palo Alto, California, U.S.A

Abbreviations: RSS, RNA secondary structure; SSU, rRNA small sub-unit; RPR, RNase P RNA; OLT, ordered labeled tree.