Home
|
Torgeir R. Hvidsten - Computational Inference of Regulatory Networks in Trees |
|
|
Forest trees are a renewable source of raw material not only for paper production, but also for energy. However, to get higher productivity from forests, we need to acquire basic understanding of important processes, such as development and growth. To this end, experimentalists collect huge amounts of molecular data from aspen (Populus tremula) using transcriptomics, proteomics and metabolomics platforms. The goal of this project is to use computers to explain these experimental data in terms of network models that describe interactions between genes, proteins and metabolites, and the underlying regulatory logics hard-wired in the treesí DNA. These models will become important platforms from which experimentalists can obtain understanding, overviews and hints for the direction of future experiments.

We take a systems biology approach to the understanding of important molecular processes in trees. Systems biology aims at integrating high-throughput experimental data and existing biological knowledge to model organisms at a system level. Such models can provide an underlying system-wide explanation of the large amount of information produced by experimental platforms, but can also act as prediction devices to propose new hypotheses for further experiments. For example, the models can be used to generate hypotheses about which biological processes particular genes take part in and which proteins (transcription factors) regulate those particular processes. Massive readouts of cell contents in terms of RNA molecules (transcriptomics), proteins (proteomics) and the products of metabolic processes (metabolomics) can be explained by the information hard-wired in the DNA sequence. The same is true for complex phenotypes, such as growth or seasonal responses, although the rate of transcription and translation, the stability of proteins and the molecular dynamics of proteins and their interactions with each other and other molecules also come into play. A basic model of gene regulation must describes where transcription factors bind to initiate transcription and how they combine and cooperate to facilitate complex expression responses to, for example, changes in the environment. An extraordinary example of models describing the hard-wired logics of development in sea urchin embryos was published by Davidson et al (Science 295(5560): 1669-1678, 2002). We aim at describing similar logics in aspens. However, we rely on computational inference of regulatory network models from DNA sequences, highthroughput dynamic data and knowledge obtained by studying other plants (comparative genomics).
A model of the transcriptional network in aspen tree leaves. Regulatory proteins (red triangles) orchestrate the activity of modules of genes (grey circles) by recognizing and binding to certain regulatory switches in the DNA (indicated by red arrows). Regulatory proteins are themselves members of modules as indicated by blue lines. The dynamic behavior of genes during four days in the autumn is illustrated by coloring genes and proteins according to their activity levels in the senescence sub-network (red implies high activity while green implies low activity). Leaf senescence is the process in which trees prepare for winter by storing up nutrients before getting rid of the leaves.
To infer models from data, we use a technique from computer science called machine learning. Machine learning uses observations with biologically known properties as examples to learn general models that can be used to explain underlying patterns in data and to make predictions for new, uncharacterized observations. There are two conceptually different approaches to the computational step. The most common is that of nearest neighbour(s) approaches, where biological knowledge is transferred from the closest example (e.g. sequence) for which such information is available. The second approach is that of inducing a general model from the available examples and to use this model for prediction. The advantage of the latter approach is that similarities can be found among many otherwise dissimilar examples and these patterns can be used to predict weak similarities (e.g. distant homologues). Another advantage is that models can be inspected and interpreted, and thereby provide insight into the biological system. A critical breakthrough for sys- tems biology methods like this is to reach a level of quality in terms of descriptive and predictive power, so that the models can be helpful in guiding experimentalists in choices related to hypotheses to consider and experiments to do next. The results of these experiments may then be used to iteratively improve the model.
Svensk samanfattning
Key publications
Wabnik K, Hvidsten TR, Kedzierska A, Van Leene J, De Jaeger G, Beemster GTS, Komorowski J and Kuiper MTR (2008). Gene expression trends and protein features effectively complement each other in gene function prediction. Bioinformatics. 2009 1;25(3):322-30.
Strömbergsson H, Daniluk P, Kryshtafovych A, Fidelis K, Wikberg JES, Kleywegt GJ and Hvidsten TR (2008). An interaction model based on local protein substructures generalizes to the entire structural enzyme-ligand space. Journal of Chemical Information and Modeling 48: 2278–2288.
Wilczynski B, Hvidsten TR, Kryshtafovych A, Tiuryn J, Komorowski J, Fidelis K (2006). Using local gene expression similarities to discover regulatory binding site modules. BMC Bioinformatics 7: 505.
Hvidsten TR, Wilczynski B, Kryshtafovych A, Tiuryn J, Komorowski J and Fidelis K (2005). Discovering regulatory binding site modules using rule-based learning. Genome Research 15: 856-66.
Lægreid A, Hvidsten TR, Midelfart H, Komorowski J, and Sandvik AK (2003). Predicting Gene Ontology Biological Process From Temporal Gene Expression Patterns. Genome Research 13: 965-979.
Expand publications list
-
Hvidsten TR, Lægreid A, Kryshtafovych A, Andersson G, Fidelis K, Komorowski J
A comprehensive analysis of the structure-function relationship in proteins based on local structure similarity PLoS One: 2009 4:e6266
-
Björkholm P, Daniluk P, Kryshtafovych A, Fidelis K, Andersson R, Hvidsten TR
Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue–residue contacts Bioinformatics: 2009 25(10):1264-1270
-
Hvidsten TR, Kryshtafovych A, Fidelis K
Local descriptors of protein structure: A systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions Proteins: Structure, Function, and Bioinformatics: 2008 75:870-884
-
Wabnik K, Hvidsten TR, Kedzienska A, Van Leene J, De Jaeger G, Beemster GTS, Komorowski J, Kuiper MTR
Gene expression trends and protein features effectively complement each other in gene function prediction Bioinformatics: 2009 25(3):322-330
-
Strömbergsson H, Daniluk P, Kryshtatovyck A, Fidelis K, Wikberg JES, Kleywegt GJ, Hvidsten TR
Interaction model based on local protein substructures generalizes to the entire structural enzyme-ligand space
Journal of Chemical Information and Modeling: 2008 48:2278-2288
|
|
|
July 2010 |
|
|
Mo
|
Tu
|
We
|
Th
|
Fr
|
Sa
|
Su
|
|
|
|
1
|
2
|
3
|
4
|
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
|
26
|
27
|
28
|
29
|
30
|
31
|
|
|
|