Comparison of Metabolic Pathways

by Silberklang Dana and Karp Amit.

10/12/05, under guidance of Rokhlenko Oleg

 

Problem

Our goal is to compare a metabolic pathway of a certain organism against the same metabolic pathways in Each enzyme is characterized by the reaction it catalyzes, or compare a metabolic pathway against other metabolic pathways in the same organism.

We represent a metabolic pathway by a graph whose nodes correspond to enzymes that catalyze the pathway's reactions, and the edges connect two nodes if for the corresponding enzymes the product of one serves as the substrate of the other. Thus, the problem is: Given a pattern tree P and a Text Tree T, find a subtree of T which is isomorphic to P or decide that there is no such tree. 

When looking for the best match, we take into account both label similarity and topology. We are permited to delete only degree 2 vertexes from the text tree. And we  are NOT permited to delete vertexes from the pattern tree.

A previous algorithm to solve the problem ONLY for tree-like graphs already exists today.

Implementation

We used the Modified decision Tree algorithm in order to solve the given problem.

The idea is to perform a decision tree search over the space {u,vuV1; vV2 {f} }, when Each node in the search tree represents a vertex in V1 that has either been matched with a vertex in V2 or with f (deletion). Each pathway from root to leaves in the search tree, represents a match of all vertices in V1 to vertices in V2 (or f).

The search tree advances from root to leaves, trying to find a new G1, G2 match: A vertex (V1,V2) is added to the tree if it V1 can be matched to V2 , when all pathway matches have already been made. In order to allow deletions, we store all simple paths of length up to a predefined length in a preprocessing procedure.

A cost is assigned to the matching between every two vertices and the cost of a partial matching consists of the sum of the costs of the matched vertices minus the deletion penalty. The tree search is expanded only to states which have the potential to overcome the score of the final states already found in the search tree.

The Decision Tree advances in a DFS style, causing it to store only one tree path (only one match between the pattern and tree graphs) at each step. Thus the space complexity of the algorithm is O(m), when 'm' is the number of nodes in a graph.

 Example of a decision tree:

Results

When running the program on a variety of Pattern and Text graphs, which have from 1 to 20 nodes, some containing inner circles and some don't, we found out the actual time results are fast (faster than the existing algorithm) , although the time complexity of the algorithm is high (exponential). The reason for this is that most of the Decision tree paths are not executed at all, causing the actual tree to be sparse.

 

We tested the Decision tree algorithm against the existing algorithm on many tree-like graphs and found the results to be the same (except some minor bugs found in the existing algorithm).

 

The final test was done by taking 80 E.Coli metabolic pathways (some of them include inner circles) and trying to match them against each other. The 6400 results include for each Pattern and Text graph the size of the graphs, the algorithm running time, and the 5 best matches found for them. 

The full results file can be found here.

 

Here is a selection of the results, which have the worst timings:

 

Text File:

Pattern File:

Text Nodes Num

Pattern Nodes Num

Time [sec]

score1

score2

score3

score4

score5

yeast64.grp

yeast57.grp

20

14

0.344

 

 

 

 

 

yeast64.grp

yeast65.grp

20

17

0.344

 

 

 

 

 

yeast64.grp

yeast64.grp

20

20

0.297

0

 

 

 

 

yeast64.grp

yeast17.grp

20

15

0.281

 

 

 

 

 

yeast64.grp

yeast79.grp

20

10

0.25

-4.143

0

-6.917

-6.917

-6.917

yeast65.grp

yeast79.grp

17

10

0.25

-63.017

-61.779

-63.779

-63.017

-65.017

yeast64.grp

yeast77.grp

20

9

0.219

 

 

 

 

 

yeast64.grp

yeast31.grp

20

10

0.203

-74.517

-74.116

-74.517

-74.116

-74.517

yeast64.grp

yeast80.grp

20

7

0.203

 

 

 

 

 

yeast65.grp

yeast31.grp

17

10

0.203

-75.66

-75.66

-75.66

-75.66

-75.66

yeast65.grp

yeast65.grp

17

17

0.187

0

 

 

 

 

yeast57.grp

yeast79.grp

14

10

0.172

 

 

 

 

 

yeast64.grp

yeast9.grp

20

9

0.172

-65.714

-65.714

-65.714

-65.714

-66.952

yeast64.grp

yeast56.grp

20

6

0.172

 

 

 

 

 

yeast65.grp

yeast17.grp

17

15

0.172

 

 

 

 

 

yeast65.grp

yeast64.grp

17

20

0.172

 

 

 

 

 

yeast65.grp

yeast80.grp

17

7

0.172

 

 

 

 

 

yeast65.grp

yeast77.grp

17

9

0.171

 

 

 

 

 

yeast65.grp

yeast2.grp

17

9

0.141

-62.1

-62.1

-62.1

-62.1

-62.1

yeast65.grp

yeast56.grp

17

6

0.141

-46.11

-46.11

-46.11

-46.11

-46.11

yeast17.grp

yeast79.grp

15

10

0.14

 

 

 

 

 

yeast64.grp

yeast1.grp

20

9

0.14

-24.908

-67.613

-30.417

-29.176

-29.176

yeast17.grp

yeast65.grp

15

17

0.125

 

 

 

 

 

yeast42.grp

yeast52.grp

6

3

0.125

-24.652

-24.652

-24.652

-24.652

-24.652

 

User Interface

The User interface is very easy to use. A pattern graph can be run against a single text graph or a whole directory of graphs. The best 5 results for each pattern and text graph are shown. The user can see any match in a visual way as following:

Downloads

Full report (PDF, in Hebrew)

Source code (.zip)