Previous Bioinformatics Forum Lectures
Functional representation of enzymes by specific peptides
Prof. David Horn
School of Physics and Astronomy, Tel Aviv University

Thursday, March 22, 2007, 13:30
Department of Computer Science, Technion, Taub Bldg., Room 601


Predicting the function of a protein from its sequence is one of the long-standing goals of bioinformatic research. Emphasis is often put on sequence-motifs, as carrying such information. Applying an unsupervised motif extraction algorithm (MEX) to 50,698 enzyme sequences, and filtering the results by the four-level classification hierarchy of the Enzyme Commission (EC), 52,216 exact motifs, named Specific Peptides (SPs), that appear on single branches of EC are selected. This methodology does not require any preprocessing by multiple sequence alignment, nor does it rely on over-representation of motifs. SPs comprise on average 8.4 amino-acids, and specify the functions of more than 90% of enzymes on which they occur. This compares favorably with the coverage of 63% provided by prosite motifs. SPs can serve as a tool for remote homology, outperforming sequence similarity. SPs contain most of the known active and binding site residues. Other SPs are found to occur in 3D pockets in proximity to active site residues, in a statistically significant manner. Thus we have established a motif-based system capable of comprehensive and accurate classification of enzyme function. Moreover, in some cases, we have demonstrated the relevance of SPs to the enzymatic function.

Joint work with Vered Kunik, Yasmine Meroz, Zach Solan, Ben Sandbank and Eytan Ruppin.

Comparative Analysis of Biochemical Pathways and Networks
Prof. Gabriel Valiente
Technical University of Catalonia, Barcelona, Spain

Wednesday, January 10, 2006, 11:30am
Department of Computer Science, Technion, Taub Bldg., Room 601 *** note special time ***


There is currently a tsunami of genome data with over 400 completely sequenced genomes, allowing comparisons of human to mouse to sea urchin to fruit fly to corn to yeast etc. Comparative analysis techniques allow for the generation of hypotheses regarding biological function across species, and these techniques can be applied at the level of the genome, the transcriptome, the proteome, the metabolome, and the interactome. However, similarity at the genomic level does not always correlate with similarity at the transcriptomic, proteomic, metabolomic, interactomic, or even phenotypic level. In this talk, we review recent techniques for the comparison and alignment of metabolic pathways. Comparative analysis of metabolic pathways produces highly accurate phylogenies, and it has been used to generate biologically meaningful novel hypotheses. Further, we explore similar techniques for the comparative analysis of protein interaction networks, transcriptional networks, and signal transduction networks.
Host: Ron Pinter

An Effective Theory for Adaptive Evolution
Dr. Noam Shoresh, Department of Systems Biology, Harvard Medical School

January 11, 13:30,
Biology Auditorium, Technion *** note special time and place ***


Adaptation in large populations of asexual organisms is complicated by the simultaneous spread of distinct beneficial mutations which interfere with one another. I will present an effective theory for the adaptation of such populations. By focusing on the set of successful mutations that are not out-competed immediately – the effective theory provides a much simplified description of this complex dynamical system. I will also discuss more recent work regarding the evolutionary stability of multi-species coexistence.
Host: Prof. Erez Braun:

Identifying Mechanisms of Transcription Regulation using High-Throughput Data
Prof. Ernest Fraenkel
Biological Engineering, Massachusetts Institute of Technology

Wednesday, January 17, 2006, 11:30am *** note unusual date and time
Department of Computer Science, Technion, Taub Bldg., Room 601


I will present computational and experimental analysis of the mechanisms of transcriptional regulation in yeast and mammals. High-throughput technologies including expression microarrays and genome-wide chromatin immunoprecipitation are increasingly being used to study biological systems. However, the data provided by these assays are noisy, and often do not address directly the questions of greatest biological interest. Using novel algorithms to analyze chromatin immunoprecipitation data, we have created the first genome-wide map of the transcriptional regulatory sites for a eukaryote. I will describe how these data can be analyzed to discovery mechanisms of transcriptional regulation in yeast. We are now applying similar approaches to study transcriptional regulation in human and mouse tissues. Analysis of these data provides new insights into how transcriptional patterns are maintained during evolution.
Host: Ron Pinter

Perturbing networks: computational approaches to the evolution and organization of metabolism
Daniel Segre'

Bioinformatics Program, Boston University
Monday, December 25, 2006, 11:00 *** note special time and day ***
Taub 601, Dept. of Computer Science, Technion


We are interested in the evolutionary dynamics of biological networks, in particular in the interplay between response to genetic and environmental perturbations, genomic-level functional organization, and optimal adaptation. By implementing steady state (flux balance) models of whole-cell biochemical networks, and computing epistatic interactions resulting from double gene deletions, one can make new hypothesis about the organization of genes and pathways into hierarchical modules. To go beyond single organisms, and learn about metabolism at the ecosystem level, different approaches may be required. By relying on reaction topology and a network expansion algorithm it is possible to study the effect of key molecules (such as oxygen) on the evolution of life.

Host: Ron Pinter

On the tree of life and the genome of Eden
Dr. Tal Dagan
Institute of Botany III, Heinrich-Heine University, Dusseldorf, Germany

Thursday, December 28, 2006, 13:30
Room #701, Taub bldg., Dept. of Computer Science, Technion


Few topics in the biology of microbes have generated as much controversy as lateral gene transfer (LGT) has. Views on this issue span from the one extreme that LGT exists, but is insignificant in terms of its overall impact on the evolutionary process, such that a microbial tree of life can be reliably constructed, to the other extreme that LGT occurs in nature to such an extent that a simple bifurcating tree is an inadequate metaphor to represent the process of microbial evolution. Efforts to resolve this debate have focused on attempts to quantify LGT through evolutionary genome comparisons, but are impaired by methodological issues. Using an approach that is independent of gene tree comparisons and nucleotide pattern frequency analysis, we have investigated the distribution of protein-coding genes across 190 prokaryote genomes. By studying the effect of LGT on the size of ancestral genomes in the past we estimated a lower bound for the LGT rate. We find that at least two-thirds of all prokaryotic genes have been affected by LGT at some time in their past, but the average LGT rate across genes is low — 1.1 LGTs per gene family and lifespan — such that the signature of vertical inheritance predominates in gene distribution patterns.
Joint work with William Martin.
Host: Dr. Yael Mandel-Gutfreund

Inferring Regulatory Networks: Genetic Variation, Signaling and Complexity
Dr. Dana Peer
Biological Sciences, Computer Science, Columbia University

Sunday, December 31, 2006, 12:30
Room #601, Taub bldg., Dept. of Computer Science, Technion


With the advent of high throughput technologies, it is now possible to understand the architecture of molecular networks and elucidate a global view of how the molecular network computes and executes a concerted cellular decision and response from this data. The key premise to our approach is that the observed statistical dependencies and correlations between cellular components represent molecular interactions and influences between them. In this talk we will demonstrate successful reconstructions from two different data types that are suitable to our approach.

I will demonstrate how we applied Bayesian networks to the automated derivation of causal influences in signaling networks. This relied on state of the art technology that simultaneously measures the levels of multiple signaling components in thousands of individual human cells. Our method automatically discovered de novo, most traditionally established influences between the measured signaling components, as well as discovering novel inter-pathway crosstalk, which we experimentally verified. A key distinction of our approach is the use of single cell measurements, thus avoiding population averaging, which often masks true activities.

Another powerful data source for understanding the organization and function of the molecular network is “genetic genomics”, the combined genotype and gene expression of genetically diverse individuals. In this case, the genetic variation generates subtle, natural variation between individuals that allows us to uncover regulatory mechanism. I demonstate a novel probabilistic method, called Geronemo, which aims to understand the mechanism by which genetic changes perturb gene regulation. We applied Geronemo to a set of yeast recombinants generated by a cross between laboratory (BY) and wild (RM) strains of S. cerevisiae (Brem and Kruglyak, 2005), resulting in multiple novel hypotheses about genetic perturbations in the yeast regulatory network, including in transcriptional regulation, signal transduction, chromatin modification and mRNA degradation.
Joint work with: Suin Lee, Karen Sachs, Aimee Dudley, George Church, Doug Lauffenburger, Garry Nolan and Daphne Koller.
Hosted by Prof. Yonina Eldar

Discovery of Principles of Nature from Mathematical Modeling of DNA Microarray Data
Orly Alter
Department of Biomedical Engineering, Institute for Cellular and Molecular Biology and Institute for Computational Engineering and Sciences Unversity of Texas at Austin

Tuesday, January 2, 2007, 11:00 *** note special time and day ***
Taub 601, Dept. of Computer Science, Technion


DNA microarrays make it possible to record the complete genomic signals that guide the progression of cellular processes. Future predictive power, discovery and control in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment and drug development.
I will describe the first data-driven models that were created from these large-scale data through generalizations of matrix and tensor computations that have proven successful in describing the physical world. In these models, the mathematical variables and operations might represent biological reality: The variables, patterns uncovered in the data, might correlate with activities of cellular elements, such as regulators or transcription factors, that drive the measured signals. The operations, such as data classification and reconstruction in subspaces of selected patterns, might simulate experimental observation of the correlations and possibly also causal coordination of these activities [1--3].
I will illustrate these models in comparative and integrative analyses of mRNA expression and proteins' DNA-binding data from yeast and human cell cultures. In these analyses, the ability of the models to predict previously unknown biological and physical principles is demonstrated with a prediction of a novel mechanism of regulation that correlates DNA replication initiation with RNA transcription. The predicted mechanism is in agreement with current biological understanding, and is supported by recent experimental results [4].
I will also illustrate these models in the analysis of yeast genome- scale mRNA lengths distribution data measured with DNA microarrays. SVD uncovers in thse data "asymmetric Hermite functions," a generalization of the eigenfunction of the quantum harmonic oscillator. These patterns of mRNA abundance levels across gel migration lengths might be explained by a previously undiscovered asymmetry in RNA gel electrophoresis thermal band broadening [5].
These models may become the foundation of a future in which biological systems are modeled and controlled as physical systems are today [6].
1. Alter, Browni, Botstein, PNAS 97, 10101 (2000);

2. Alter, Brown, Botstein, PNAS 100, 3351 (2003);

3. Alter and Golub, PNAS 101, 16577 (2004);

4. Alter and Golub, PNAS 102, 17559 (2005);

5. Alter and Golub, PNAS 103, 11828 (2006);

6. Alter, PNAS 103, 16063 (2006).

Host: Zohar Yakhini

Large scale modeling of metabolism: some recent challenges and results
Prof. Eytan Ruppin -
School of Computer Science, Tel-Aviv University

Monday, November 27, 11:00
Room #601, Taub bldg., Computer Science Department, Technion


This talk will provide a friendly introduction to and overview of large scale modeling of metabolic networks. Further, we shall describe recent unpublished results from our lab on high-dimensional annotation of metabolic genes and on studying combined metabolic-regulatory networks.

Inferring Phylogenies via LCA-distances
Ilan Gronau
Department of Computer Science, Technion

Thursday, November 30, 13:30
Room #601, Taub bldg., Computer Science Department, Technion


Inferring the evolutionary history of a given set of species is one of the most fundamental problems in Computational Biology. Informally, the problem requires the reconstruction of an evolutionary tree (phylogeny) from sequences of characters (e.g. DNA), each representing a different species. One major approach for solving this task is to construct the tree from a dissimilarity (distance) matrix calculated over all sequence-pairs. One of the major issues dealt by distance-based phylogenetic reconstruction is the ability to reconstruct the correct tree given noisy distance estimates.

In the first part of the talk, we present a new approach for reconstructing phylogenetic trees using LCA-distances. This approach leads to a family of very simple and efficient algorithms called Deepest LCA Neighbor-Joining (DLCA in short). We present a technique which leads to an optimal running time of $O(n^2)$. This technique is also used to speed up the running time of well-known clustering algorithms such as UPGMA. In the second part, we discuss the performance of these algorithms on noisy distance estimates. Specifically, we compare these algorithms to Saitou and Nei's famous neighbor joining algorithm (AKA NJ) both through their performance guarantees and through actual experiments on simulated phylogenetic data.

The talk touches upon core issues in distance-based phylogenetic reconstruction and does not assume any prior biological/mathematical knowledge. It is based on two papers, which can be found here: "Neighbor joining algorithms for inferring phylogenies via LCA-distances" and "Optimal Implementations of UPGMA and Other Common Clustering Algorithms".

This is a joint work with Prof. Shlomo Moran.

Computational Design of RNA Switches
Danny Barash
Department of Computer Science, Ben-Gurion University

Thursday, November 2, 13:30
Room #601, Taub bldg., Computer Science Department, Technion
Hosted by: Ron Pinter


Recent discoveries about various capabilities of small RNAs to affect gene expression have led to some biotechnological advances and medical applications. In this talk, I will start with a brief overview on these discoveries, the diverse roles of RNAs, and their potential use. I will then focus on the phenomenon of RNA conformational switching and how it relates to function, introducing the bioinformatics methods that are involved in RNA folding predictions and the design and search for novel RNA switches. I will concentrate on a matrix representation of the RNA secondary structure, using its properties to identify conformational RNA switching by a single point mutation (RNAMute, available here ) as well as other slight changes in the environment. Finally, I will describe how computer-aided RNA switch design can potentially be performed using computational predictions that are backed by wet-lab experimental verifications.

Decomposition of protein complexes: A graph theoretical method for analyzing static and dynamic protein associations
Elena Zotenko
National Center of Biotechnology Information, NIH Bethesda, MD

Monday, September 18, 13:30
Room #601, Taub bldg., Computer Science Department, Technion
Hosted by: Ron Pinter


The complexity in biological systems arises not only from various individual protein molecules but also from their organization into systems with numerous interacting partners. In fact, most cellular processes are carried out by multi-protein complexes, groups of proteins that bind together to perform a specific task. Some proteins form stable complexes, while other proteins form transient associations and are part of several complexes at different stages of a cellular process. A better understanding of this higher-order organization of proteins into overlapping complexes is an important step towards unveiling functional and evolutionary mechanisms behind biological networks.

We propose a new method for identifying and representing overlapping protein complexes (or larger units that we call functional groups) within a protein interaction network. We develop a graph-theoretical framework that enables automatic construction of such representation. The proposed representation helps in understanding the transitions between functional groups and allows for tracking a protein's path through a cascade of functional groups. Therefore, depending on the nature of the network, our representation is capable of elucidating temporal relations between functional groups. We illustrate the effectiveness of our method by applying it to TNF-alpha/NF-kappaB and pheromone signaling pathways.

Joint work with Katia Guimaraes, Raja Jothi, and Teresa Przytycka from NCBI.

System level analysis identifies timing of force integration during mitosis
Roy Wollman
Lab of Computational Cell Biology Section of Molecular and Cellular Biology University of California, Davis

Monday, September 11, 13:30
Room #601, Taub bldg., Computer Science Department, Technion
Hosted by: Ron Pinter


During cell division, the cytoskeleton and molecular motors produce forces to separate the chromosomes into two daughter cells. Although the identity of the force generators is known, how they are integrated is still a mystery. Here we use genetic algorithms to search for differential equations based models that reproduce experimental data. We found more than 5000 such models. Cluster analysis identified a small number of strategies for force integration during mitosis. In all cases, the timing of force activity must be fine tuned, in contrast to the kinetic parameters that show robustness to change.
Time permitting, I will also briefly describe the work we have done on stochastic simulations that reveal the necessity for bias in chromosomes' random ‘search and capture’ during mitosis.

Fast Neighbor Joining and Accurate Tree Reconstruction
Isaac Elias
Theoretical Computer Science group at Nada and Stockholm Bioinformatics Center, Sweden

Thursday, June 29, 13:30
Room #601, Taub bldg., Computer Science Department, Technion
Hosted by: Ron Pinter


Reconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between $n$ taxa and produces in $Theta(n^3)$ time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius.
In the talk I will present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity $O(n^2)$. I will also provide an overview of the accuracy of other tree reconstruction algorithms. Joint work with Jens Lagergren.

Non-Coding transcripts also control gene expression
R. J. Feldmann
Global Determinants, Inc., USA

Tuesday, 23/5, 15:30
Department of Biology, in the shelter (second floor, near the auditorium)
Hosted by: Yael Mandel-Gutfreund


The existence of transcripts from both Gene-Coding and Non-Coding DNA regions produced at RIKEN makes it possible to determine the complete Connectrome of the Transcriptome of the mouse Genome. The Connectrome is the collection of all the identified Connectrons. A Connectron is a 4-sequence construct composed of two rather short control sequences that are produced as RNA by the transcription of either Gene-Coding or Non-Coding DNA that bind to the other two target sequences. Their effect is to form a loop of DNA that can condense into tightly coiled chromatin structure, so that the Gene-Coding or Non-Coding DNA becomes unavailable for transcription during the lifetime of the Connectron construct, which is thought to be proportional to the length of the shorter of the triple-stranded helices.
We have analyzed the Connectrome of the mouse Transcriptome (~ 40% of the mouse Genome) found that although the Non-Coding DNAs are only 1/3 of the total transcripts, they produce ~1/2 of the Connectrons. Non-Coding and Gene-Coding transcripts control their own expression and that of Gene-Coding DNA. Hence, there seems to be complete symmetry of expression and expression control. This shows for the first time that Non-Coding DNA plays an important role in cellular expression control.
Connectrons can act either as intra-chromosomal expression control agents or if the source and target chromosomes are different, they can act as inter-chromosomal expression control agents. Because the control source RNA can more easily move along the structure of the originating chromosome, there are many intra-chromosomal Connectrons. An RNA that generates inter-chromosomal expression control must transit from one chromosome to another in the nuclear volume. There are significantly fewer inter-chromosomal Connectrons.
Many properties of Connectrons will be discussed.

Kernel methods for predicting protein-protein interactions
Asa Ben-Hur
Department of Computer Science, Colorado State University

Thursday, 25/5, 13:30
Taub #601, Technion, Haifa
Hosted by: Golan Yona


Most proteins perform their function by interacting with other proteins. Therefore, information about the network of interactions that occur in a cell can greatly increase our understanding of protein function. Experimental assays that probe interaction networks on a large scale are now available; and yet, interaction networks of even well studied organisms are still sketchy at best, and the experimental data is highly noisy. We present a kernel method for predicting protein-protein interactions using a combination of data sources, including protein sequences, annotations of protein function, local properties of the network, and interactions in different species. We propose a pairwise sequence kernel that provides a similarity between pairs of proteins, and illustrate its effectiveness in conjunction with a support vector machine classifier. The performance of the pairwise sequence kernel is enhanced by using a combination of sequence features (k-mer, motif, and domain composition), and by further augmenting the sequence kernel by additional features. In yeast, we obtain a classifier that retrieves close to 80% of a set of trusted interactions at a false positive rate of 1%, demonstrating the ability of our method to make accurate predictions despite the sizable fraction of false positives that are known to exist in interaction databases.

Statistical Analysis of Genetic Interactions based on Clinical Data
Hadas Barkay
Faculty of Industrial Engineering and Management

Sunday, 23/4, 13:30
Room 527, Bloomfield bldg., Technion, Haifa


The goal of my research is to develop methods for statistical analysis in association studies of complex traits, where the traits are determined by multiple genes and their interactions, and by environmental effects. The trait under study is the response of people affected by a disease to medical treatment (pharmacogenetic study). In this talk, the basic concepts of genetic population association studies will be presented. Special attention will be drawn to two of the problems that arise in the Israeli population: Stratification and Inbreeding. Two of the benchmark statistical methods in genetic association studies of diseases, genomic control (GC) and the transmission/disequilibrium test (TDT), will be described. I will explain the specific challenges that pharmacogenetic studies present and which are the subject of my research. No preliminary knowledge in genetics is required.

Modeling and Comparing Protein Structures
Dr. Rachel Kolodny,
Department of Biochemistry and Molecular Biophysics, Columbia University

Thursday, March 23, 2006, 13:30
Room #601, Taub bldg.
Hosted by Prof. Pinter


Proteins are ubiquitous macromolecules involved in all biological processes. The three-dimensional structure of a protein encodes its function. Two fundamental computational challenges in the study of protein structure are modeling and comparing proteins. Efficient models are crucial for structure prediction, in particular for generating decoy sets (ab initio protein folding) and loop conformations (homology modeling). Structural similarities of proteins can hint at distant evolutionary relationships that are hard or impossible to discern from protein sequences alone. Thus, structural comparison, or alignment, is an essential tool when classifying known structures and analyzing their relationships.

The first part of the presentation focuses on structural comparison of protein pairs. We formalize the protein structural alignment problem as an optimization problem of a geometric similarity score over the space of rigid transformations. This formulation leads to an approximate polynomial-time alignment algorithm, thus proving that the problem is not NP-hard as was previously thought. We also present a large-scale experiment in which we compare six popular structural alignment heuristic methods by evaluating the quality of their solutions, using a common geometric measure. Our approach to comparing structural alignment methods contrasts with the traditional way of using ROC curves and relying on a classification gold standard. Finally, we describe a method for measuring structural dissimilarity between proteins of similar sequences.

The second part of the presentation focuses on efficient modeling of approximate protein structure. Our model concatenates elements from libraries of commonly-observed protein backbone fragments into structures. Thus, a string of fragment labels fully defines a three-dimensional structure, and the set of all strings defines a set of structures. By varying the size of the library and the length of its fragments, we generate structure sets of varying sizes, which act as efficient approximating nets, that is, all protein structures have good approximations among them; the larger the set, the better the approximations. We also present how these structure strings can be used for protein loop closure.

Variability and memory of protein levels in human cells
Dr. Ron Milo
Department of Molecular Cell Biology, Weizmann Institute of Science

Thursday, March 9, 2006, 13:30
Room 701, Taub bldg.
Hosted by Prof. Ron Pinter


Protein expression is a stochastic process that leads to cell-cell variation in phenotypes. The cell-cell distribution of protein levels in microorganisms has been well characterized, using snapshots of the variability in cell populations. Little is known about the temporal dynamics of the variability: are fluctuations in protein levels ergodic, where cells higher than average eventually become lower, and if so, on what timescale? Here we studied fluctuations in the levels of 20 endogenous proteins over time in living human cells, tagged by YFP at their chromosomal loci. We found variability with a standard deviation that ranged, for different proteins, between 15% and 40% of the mean. Protein levels were ergodic, but the mixing time was found to be longer than 2 generations (over 40 hours) for many proteins. Such persistent memory in protein fluctuations may underlie individuality in cell behavior and set a timescale needed for signals to fully affect every member of a cell population.

Data, technology and populations for genomewide association studies
Itsik Peer, PhD,
The Program for Medical and Population Genetics, Broad Institute of MIT and Harvard Center for Human Genetic Research, Massachusetts General Hospital


The pervasive effect of genetic variation on medically important phenotypes provides a means for dissecting their underlying mechanisms by identifying variants that are associated with traits of interest. Current trends in human genetics now facilitate, for the first time, pursuing this potential by execution of large scale studies that scan the entire genome for potentially associated variants. Specifically, the talk will present
(1) The International HapMap Project, a data resource we participated in developing to enable genomewide association studies, and what our analyses of these data tell us about human variation.
(2) The current generation of SNP array technology, and how computation and statistics improvements allow it to cover the majority of common human variants.
(3) The tale of an isolated population in Micronesia, where we show association scans are more promising than elsewhere, though we expose practical complexities of real data and the computational challenges they present. Some of the research presented was performed as part of the International HapMap Analysis Team, or in collaborations with Affymetrix Inc. and the Friedman lab at Rockefeller University.

Executable Biology: Towards Computer Programs that Mimic Life
Dr. Jasmin Fisher,
School of Computer & Communication Sciences, Swiss Federal Institute of Technology (EPFL), Switzerland

Thursday, January 5th, 12:30
Room 601, Taub bldg.
Hosted by Dr. Yael Mandel-Gutfreund


The new emerging field of Systems Biology aims to gain a system-level understanding of biology. In order to achieve such an understanding we need to establish the methodologies and techniques that will enable us to understand biological systems as systems. One such attempt is to use existing formal methods designed for the construction of computerized systems to model biological systems. We have recently shown that describing mechanistic models in a dynamic and executable language has various advantages. Dynamic models can represent phenomena of importance to biology that static models cannot represent, such as time, concurrency, and simultaneity. In addition, formal verification methods can be used to ensure the consistency of computational models with the biological data on which they are based. In my talk, I will discuss the strength of constructing and analyzing high-level executable models in biology. This pioneering approach, which I call 'Executable Biology', will be illustrated through a model representing the crosstalk between EGFR and LIN-12/Notch signaling during Caenorhabditis elegans vulval development. The construction and analysis of this model has provided new biological insights that highlight important aspects of cell fate specification.

Information theoretic analysis of biological data
Dr. Noam Slonim
Department of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University

In recent years, researchers have been facing a rapid increase in the available biological data. These data come in a variety of forms - complete genome sequences, mRNA transcriptional profiles, protein-protein interactions, and so forth. Automatic data analysis methods are often the only route for extracting meaningful insights out of these data. Existing techniques, however, typically employ nontrivial assumptions. These assumptions might be explicit, as in assuming a specific model which reflects one's prior beliefs about the data; or implicit, as in arbitrarily specifying a correlation or a ``similarity'' measure which lies at the core of any further analysis. While it is clear that such assumptions should be avoided, the conventional wisdom is that in practice they are actually unavoidable. In this talk I will describe an information theoretic framework that allows to extract biologically important insights without any prior assumptions about the nature of the data for a wide variety of problems. I will briefly discuss several recent applications of this approach, and will present in more detail results for systematic genotype-phenotype association in bacteria and archaea.

Back to Bioinformatics Forum