Our goal is to compare a metabolic pathway of a certain organism against the same metabolic pathways in Each enzyme is characterized by the reaction it catalyzes, or compare a metabolic pathway against other metabolic pathways in the same organism.
We represent a metabolic pathway by a graph whose nodes correspond to enzymes that catalyze the pathway's reactions, and the edges connect two nodes if for the corresponding enzymes the product of one serves as the substrate of the other. Thus, the problem is: Given a pattern tree P and a Text Tree T, find a subtree of T which is isomorphic to P or decide that there is no such tree.
When looking for the best match, we take into account both label similarity and topology. We are permited to delete only degree 2 vertexes from the text tree. And we are NOT permited to delete vertexes from the pattern tree.
A previous algorithm to solve the problem ONLY for tree-like graphs already exists today.
We used the Modified decision Tree algorithm in order to solve the given problem.
The idea is to perform a decision tree search over the space {u,v½uÎV1; vÎV2 È {f} }, when Each node in the search tree represents a vertex in V1 that has either been matched with a vertex in V2 or with f (deletion). Each pathway from root to leaves in the search tree, represents a match of all vertices in V1 to vertices in V2 (or f).
The search tree advances from root to leaves, trying to find a new G1, G2 match: A vertex (V1,V2) is added to the tree if it V1 can be matched to V2 , when all pathway matches have already been made. In order to allow deletions, we store all simple paths of length up to a predefined length in a preprocessing procedure.
A cost is assigned to the matching between every two vertices and the cost of a partial matching consists of the sum of the costs of the matched vertices minus the deletion penalty. The tree search is expanded only to states which have the potential to overcome the score of the final states already found in the search tree.
The Decision Tree advances in a DFS style, causing it to store only one tree path (only one match between the pattern and tree graphs) at each step. Thus the space complexity of the algorithm is O(m), when 'm' is the number of nodes in a graph.
Example of a decision tree:
When running the program on a variety of Pattern and Text graphs, which have from 1 to 20 nodes, some containing inner circles and some don't, we found out the actual time results are fast (faster than the existing algorithm) , although the time complexity of the algorithm is high (exponential). The reason for this is that most of the Decision tree paths are not executed at all, causing the actual tree to be sparse.
We tested the Decision tree algorithm against the existing algorithm on many tree-like graphs and found the results to be the same (except some minor bugs found in the existing algorithm).
The final test was done by taking 80 E.Coli metabolic pathways (some of them include inner circles) and trying to match them against each other. The 6400 results include for each Pattern and Text graph the size of the graphs, the algorithm running time, and the 5 best matches found for them.
The full results file can be found here.
Here is a selection of the results, which have the worst timings:
Text File: |
Pattern File: |
Text Nodes Num |
Pattern Nodes Num |
Time [sec] |
score1 |
score2 |
score3 |
score4 |
score5 |
yeast64.grp |
yeast57.grp |
20 |
14 |
0.344 |
|
|
|
|
|
yeast64.grp |
yeast65.grp |
20 |
17 |
0.344 |
|
|
|
|
|
yeast64.grp |
yeast64.grp |
20 |
20 |
0.297 |
0 |
|
|
|
|
yeast64.grp |
yeast17.grp |
20 |
15 |
0.281 |
|
|
|
|
|
yeast64.grp |
yeast79.grp |
20 |
10 |
0.25 |
-4.143 |
0 |
-6.917 |
-6.917 |
-6.917 |
yeast65.grp |
yeast79.grp |
17 |
10 |
0.25 |
-63.017 |
-61.779 |
-63.779 |
-63.017 |
-65.017 |
yeast64.grp |
yeast77.grp |
20 |
9 |
0.219 |
|
|
|
|
|
yeast64.grp |
yeast31.grp |
20 |
10 |
0.203 |
-74.517 |
-74.116 |
-74.517 |
-74.116 |
-74.517 |
yeast64.grp |
yeast80.grp |
20 |
7 |
0.203 |
|
|
|
|
|
yeast65.grp |
yeast31.grp |
17 |
10 |
0.203 |
-75.66 |
-75.66 |
-75.66 |
-75.66 |
-75.66 |
yeast65.grp |
yeast65.grp |
17 |
17 |
0.187 |
0 |
|
|
|
|
yeast57.grp |
yeast79.grp |
14 |
10 |
0.172 |
|
|
|
|
|
yeast64.grp |
yeast9.grp |
20 |
9 |
0.172 |
-65.714 |
-65.714 |
-65.714 |
-65.714 |
-66.952 |
yeast64.grp |
yeast56.grp |
20 |
6 |
0.172 |
|
|
|
|
|
yeast65.grp |
yeast17.grp |
17 |
15 |
0.172 |
|
|
|
|
|
yeast65.grp |
yeast64.grp |
17 |
20 |
0.172 |
|
|
|
|
|
yeast65.grp |
yeast80.grp |
17 |
7 |
0.172 |
|
|
|
|
|
yeast65.grp |
yeast77.grp |
17 |
9 |
0.171 |
|
|
|
|
|
yeast65.grp |
yeast2.grp |
17 |
9 |
0.141 |
-62.1 |
-62.1 |
-62.1 |
-62.1 |
-62.1 |
yeast65.grp |
yeast56.grp |
17 |
6 |
0.141 |
-46.11 |
-46.11 |
-46.11 |
-46.11 |
-46.11 |
yeast17.grp |
yeast79.grp |
15 |
10 |
0.14 |
|
|
|
|
|
yeast64.grp |
yeast1.grp |
20 |
9 |
0.14 |
-24.908 |
-67.613 |
-30.417 |
-29.176 |
-29.176 |
yeast17.grp |
yeast65.grp |
15 |
17 |
0.125 |
|
|
|
|
|
yeast42.grp |
yeast52.grp |
6 |
3 |
0.125 |
-24.652 |
-24.652 |
-24.652 |
-24.652 |
-24.652 |
Full report (PDF, in Hebrew)
Source code (.zip)