[Basel Computational Biology Conference 2005]

  Abstracts
 
 

 

Keynote Lecture:  Mass Spectrometry based Proteomics: Computational Challenges and Partial Solutions
 

Ruedi Aebersold Institute for Molecular Systems Biology, ETH Zurich, Switzerland and
Institute for Systems Biology, Seattle, USA


The objective of proteomics is the systematic analysis of the proteins expressed by a cell, tissue or organism. It is expected that such analyses will define comprehensive molecular signatures of tissues, cells and body fluids in health and disease. Such signatures are impacting a wide range of biological and clinical research questions, such as the systematic study of biological processes and the discovery of molecular clinical markers for detection, diagnosis and assessment of treatment outcome. The application of proteomics technology has proven particularly beneficial in cases in which differences between the proteomes (or fractions thereof) isolated from cells at different states have been analyzed, i.e. in which the analyses have been performed with accurate quantification.  

Currently most successful quantitative proteomic analyses are based on mass spectrometry and tandem mass spectrometry. In the context of such studies 10exp4 to 10exp5 tandem mass spectra are generated, each one potentially representing a unique peptide sequence. The computational assignment of these spectra to their corresponding peptide sequences, the statistical validation of these assignments, the extraction of reliable biological information from these datasets and the dissemination of the data represent a series of significant computational challenges that are at present only partially solved.

In this presentation we will discuss current platforms for the mass spectrometric collection of proteomic data, describe a suite of OS source tools for their computational analysis and discuss remaining challenges

Since most biological networks involve proteins, proteomics, the global analysis of the protein complement of a cell or tissue is a central element of systems biology. In this presentation we will discuss the current status of quantitative proteomics technologies and some of the resources that have emerged form the data they produce.   We will also show with selected examples how quantitative proteomics can impact common types of experiments currently carried out in many biological research projects and discuss the challenges that remain to turn proteomics into a truly genomic science.

References:

  • Aebersold R, Mann M, Mass spectrometry-based proteomics, Nature: 2003: 422 (6928):198-207.
  • Ranish JA, Yi EC, Leslie DM, Purvine SO, Goodlett DR, Eng J, Aebersold R. The study of macromolecular complexes by quantitative proteomics. Nat Genet. 2003 Mar;33(3):349-55.
  • Ranish JA, Hahn S, Lu Y, Yi EC, Li XJ, Eng J, Aebersold R. Identification of TFB5, a new component of general transcription and DNA repair factor IIH. Nat Genet. 2004 Jul;36(7):707-13.
  • Giglia-Mari G, Coin F, Ranish JA, Hoogstraten D, Theil A, Wijgers N, Jaspers NG, Raams A, Argentini M, van der Spek PJ, Botta E, Stefanini M, Egly JM, Aebersold R, Hoeijmakers JH, Vermeulen W. A new, tenth subunit of TFIIH is responsible for the DNA repair syndrome trichothiodystrophy group A. Nat Genet. 2004 Jul;36(7):714-9.
  • Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze MG, Kennedy KA, Kregenow F, Lee H, Lin B, Martin D, Ranish JA, Rawlings DJ, Samelson LE, Shiio Y, Watts JD, Wollscheid B, Wright ME, Yan W, Yang L, Yi EC, Zhang H, Aebersold R. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol. 2005;6(1):R9.
  • Flory MR, Carson AR, Muller EG, Aebersold R. An SMC-domain protein in fission yeast links telomeres to the meiotic centrosome. Mol Cell. 2004 Nov 19;16(4):619-30.



Current and future challenges in proteome informatics

Ron D. Appel (Swiss institute of Bioinformatics, University of Geneva, and Geneva Bioinformatics (GeneBio) )

 

Proteomics aims at deciphering the proteome, the complement of the genome, with the goal of increasing the understanding of biological processes, as well as improving and speeding up the development of drugs by discovering disease biomarkers and drug targets. The major elements of proteome analysis are powerful protein separation techniques such as liquid chromatography (LC) and two-dimensional electrophoresis (2-DE) associated to enzymatic processing and mass spectrometry (MS). These techniques have requested considerable efforts in the development of dedicated bioinformatics for over two decades, providing researchers with comprehensive and state-of-the-art software tools and databases for proteome analysis. The major areas of interest encompass protein identification from MS data and the storage and exchange of proteomics data. Current challenges include in particular the in-depth characterization of proteins from MS data using new and advanced algorithms, exploiting to its fullest the available experimental data, in particular data redundancy to extract biological knowledge, and the integration of information available across several databases as a step towards integrated systems biology.


  



Computational approaches in microbial strain engineering at DSM Nutritional Products

Sabine Arnold (DSM Nutritional Products, Basel, Switzerland)

The wealth of high-quality functional genomics data sets and the improved accessibility of high-performance computing power have vastly propelled the development of new computational methods that are capable of integrating these large-scale heterogeneous data sets. Depending on the application purpose, one may select from a variety of methods with different analysis focus (e.g., clustering techniques, statistical replicate analysis, correlation analysis, neural nets). Additionally, genome-derived metabolic network models are now increasingly developed in particular for microbial systems, mostly due to their reduced biochemical complexity in comparison to higher-eukaryotic systems. These usually stoichiometric models are applied for studying the effects of genetic modification and change in environmental parameters, and the impact these modifications cause on metabolic flux distribution embedded into the cellular context. The ultimate vision of utilizing such models in the biotech industry is to gain a systems-level understanding of cellular physiology and thereby to assist in both rational strain engineering and process development strategies.

 

 


Towards spatial and temporal protein interaction networks

Peer Bork (EMBL, Heidelberg and MDC, Berlin)

As cellular networks are getting more and more refined, it is becoming feasible to move from 2D representations (nodes and edges) to 4D i.e. explore temporal and spatial aspects of interaction networks. I will introduce into recent work from our group to reveal temporal changes (ranging from 90 minutes during the yeast cell cycle to more than 2
billion years during species evolution) and will also touch upon a few spatial aspects (protein complexes and cellular compartments).

  

 


Simulating physiological states, regulatory networks and metabolic pathways of bacteria for applications in antibiotic drug discovery

Christoph Freiberg (Bayer HealthCare AG)

As current antibiotics therapy becomes increasingly ineffectual, new technologies are required to identify and develop novel classes of antibacterial agents. Our comparative genome analyses enabled prediction of novel cellular functions and complete pathways in bacteria, in order to characterise novel targets suitable for antibacterial compound screening. However, holistic strategies alternative to the focused target-based approach become more and more important in antibiotic drug discovery. Based on a compendium of genome-wide expression profiles reflecting the physiological response of the model bacterium Bacillus subtilis to hundred different antibiotic agents, we are able to simulate regulatory networks and pathways and to predict their genetic control elements. This way, we identified novel biomarkers for physiological stress states, suitable for screening of compounds with specific mechanisms of action. Moreover, our more elaborate expression profile analysis based on classification algorithms as well as regulon and pathway-specific data evaluation became a valuable tool to discover the mechanism of action of novel antibiotic agents.

  

 

 


Reverse engineering of metabolic pathways using sparse GGM  

Wilhelm Gruissem  (Functional Genomics Center and ETH Zürich)

Wilhelm Gruissem [1], Anja Wille, Philip Zimmermann, Eva Vranova, Andreas Fürholz, Oliver Laule, Stefan Bleuler, Lars Hennig, Mattthias Hirsch-Hoffmann, Amela Prelic, Lothar Thiele, Eckart Zitzler and Peter Bühlmann, Reverse Engineering Group [2] and Functional Genomics Center Zurich [3], Swiss Federal Institute of Technology (ETH), Zurich.

The analysis of genetic regulatory networks was greatly advanced by the availability of large data sets from high-throughput technologies such as DNA microarrays.  The genome-wide, parallel monitoring of gene activity will increase our understanding of the molecular basis of pathway functions and their cellular network context. In simple eukaryotes or prokaryotes, gene expression data has been combined with two-hybrid data and phenotypic data to successfully predict protein-protein interaction and transcriptional regulation on a large scale.   In higher organisms, however, little is known about regulatory control mechanisms and pathway networks on a larger scale.   As a first step we have focused on isoprenoid metabolism, which is universally conserved and essential for cell survival.   Arabidopsis has to independent pathways that function in the cytoplasm and chloroplast [4]. We developed a novel graphical Gaussian modelling (GGM) approach to elucidate the regulatory network of the two isoprenoid biosynthesis pathways bases on large scale expression data [5].  When applying this approach to infer a gene network, we detect modules of closely connected genes and candidate genes for cross-talk between the isoprenoid pathways.   Genes of downstream pathways also fit well into the network.   We evaluated our approach in a simulation study and using the yeast galactose utilization network. Connected genes were independently validated using Genevestigator [6], a novel powerful software suite for visualization of microarray and other data in their biological context.

References:

  • [1] http://www.pb.ethz.ch
  • [2] http://www.rep.ethz.ch
  • [3] http://www.fgcz.ethz.ch
  • [4] Laule O, Fürholz A, Chang HS, Zhu T, Wang  X, Heifetz PB, Gruissem W and Lange M. (2003) Crosstalk between cytosolic and plastidial pathways of isoprenoid biosynthesis in Arabidopsis thaliana, PNAS 100, 6866-6871.
  • [5] Wille A, Zimmermann P, Vranová E, Fürholz A, Laule O, Bleuler S, Hennig L, Prelic A, Rohr P, Thiele L, Zitzler E, Gruissem W, Bühlmann P (2004). Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology 5 : R92.
  • [6] http://www.genevestigator.ethz.ch

  

 


Predicting biomolecular systems and tracing their evolution

Martijn Huynen (Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen)

The accumulating wealth of genomes and other types of genomics data gives us the opportunity both to predict the function of proteins and their involvement in pathways as well as to trace the evolution of such biomolecular systems. As genomics data are however inherently noisy we need comparative analysis between multiple sets of data to make reliable predictions. We have shown that, while the co-expression of genes or the yeast-2-hybrid interaction of their proteins in one species only provides a weak signal that their proteins functionally interact, when that co-expression or interaction is measured in multiple species, it does become a reliable signal (van Noort et al, 2003; Huynen et al., 2004). One of the surprising observations of such “horizontal comparative genomics” between species, is the low level of conservation: less that 5% of genes that are co-expressed in are also co-expressed in C.elegans , and less than 25% of the yeast-2-hybrid interacting proteins from S.cerevisiae have been observed to interact in D.melanogaster . The question rises whether such low conservation reflects evolution and the changing relations between proteins or merely the noisy level of the datasets. When comparing yeast-2-hybrid data between species the level of conservation is only slightly lower than when comparing independently generated datasets from a single species, indicating that indeed the low reproducibility of genomics data might be the main cause for the low level of measured conservation between species. In order to filter out the noise from such analyses we have constructed a set of reliably co-regulated genes in S.cerevisiae by combining co-expression data with transcription factor binding data from ChIP-on-chip experiments. For those gene-pairs for which we have multiple sources of evidence that they are indeed truly co-regulated in S.cerevisiae , the conservation of co-regulation in C.elegans is 78%. Co-regulation therefore does appear well conserved in evolution (Snel et al., 2004). Such analyses however only apply to cases where both co-regulated genes are present in the species compared. Analyses of the phylogenetic distribution of proteins from a single biomolecular system indicate however suprisingly little “evolutionary modularity” of functional modules (Snel et al, 2004). By mapping such variation in the makeup of biomolecular systems on a phylogenetic tree one can actually reconstruct the evolution of biomolecular systems, a specific example of a large protein complex in eukaryotes will be discussed.


References:

  • van Noort, V., Snel, B. and Huynen MA (2003) Predicting gene function by conserved co-expression Trends Genet. 19: 238-242.

  • Huynen MA, Snel B, van Noort V (2004) Comparative genomics for reliable protein-function prediction from genomic data. Trends Genet. 20: 340-344.

  • Snel B, van Noort V, Huynen MA. (2004) Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes. Nucleic Acids Res. 32: 4725-4731

  • Snel B and Huynen MA (2004) Quantifying modularity in the evolution of biomolecular systems. Genome Res. 14: 391-397


SystemsX - its relevance for science and innovation policy

Olaf Kübler (President of ETH Zurich)

Life sciences play a crucial role in our society. Breakthroughs in fundamental research are exploited at an increasingly higher rate and practically applied in diagnostic, medical therapy and agro-food industry. It is imperative that Switzerland avoids de-industrialization and re-organizes itself in order to take on the challenges of the future and to make significant efforts with the most important technologies and their applications to life sciences, health care and nutrition.

Systems biology is a new discipline with a high potential for scientific discoveries which provide new insights and understanding of biosystems. Unlike molecular biology, systems biology does not exclusively examine basic components, but rather the complex processes of a complete biological system. For this holistic understanding systems biology requires the support of various disciplines. It needs, for example, information technology in order to record, manage and mine the enormous amounts of data. Physics, engineering sciences, mathematics, chemistry, and bioinformatics are further important disciplines

SystemsX's strategic vision of systems biology research is to contribute substantially to the wealth of Switzerland's science, industry and society. Present and future orientations of research will help to create new ventures and industries for (bio)-technology, and future health care and nutrition. Future directions will also open up new fields of research and applications that are not yet foreseeable from today's limited perspective.

SystemsX has been formed as a joint initiative of ETH Zurich and the Universities of Basel and Zurich to establish an internationally leading program in the emerging science of systems biology and to provide the organizational and financial background to practice systems biology at the participating institutions.

Based on a close collaboration between the individual disciplines and with industry, it is envisaged to break up and dissolve the boundaries between the disciplines, leading to true interdisciplinary work and between basic and applied sciences, leading to transdisciplinary research. New approaches will be fundamental to develop common language and common research culture.

Another goal of SystemsX is to engage industry in Switzerland in a long-term research effort and in financial support, based on the achievements and significance of the initiative.


Proteomics strategies for pharmaceutical and diagnostic research and for biomarker discovery

Hanno Langen ( F. Hoffmann-La Roche AG )

Proteomics is a key technology for the discovery of biomarkers that are required for pharmaceutical research and diagnostics. These markers can be found by massive parallel investigation of biological samples, preferably directly using diseased tissue. In order to obtain sufficient sensitivity, multidimensional protein fractionation schemes have to be employed, whereas statistical significance is achieved by the comparison of large numbers of samples.

This strategy imposes limitations on the employed technologies. Thus, gel image comparison, as well as manual curation of mass spectrometric identification results are not feasible for large scale biomarker studies. We will show that meaningful data interpretation is possible only with high accuracy in protein identification so that false positive identifications will not obscure the true differences. In our group we have developed alternative solutions to the problem of protein quantification which employ data redundancies built into the experimental design of the biomarker study.

Examples of successful biomarker discovery including the methodology for pre-validation and validation will be shown.


Coarse grained modeling of cellular and transduction networks

Felix Naef (ISREC & Swiss Institute of Bioinformatics)

In my presentation Iwill discuss two applications of physical modeling to biological systems. In the first, we model populations of cellular oscillators to interpret recordings of a luciferase reporter in a circadian cell culture assay. Correlation with single cell data illustrates the complimentary of both techniques. Our analysis uncovered reciprocal interactions between the circadian and cell cycle oscillators, manifest for example as a gating of mitosis time by the clock.

In the second part, I will discuss the study of information flow in small size transduction networks, based on discrete dynamical network models. Main concepts will be explained and selected examples like the yeast cell-cycle and UVB response will be addressed in some detail.

References:

  • Nagoshi E, Saini C, Bauer C, Laroche T, Naef F, Schibler U., Circadian gene expression in individual fibroblasts: cell-autonomous and self-sustained oscillators pass time to daughter cells, Cell. 2004 Nov 24;119(5):693-705.


Modelling the IGF signalling pathway.

Mark Penney (Novartis Pharma)

Brian Stoll, Anna Georgieva, Gabriel Helmlinger, Birgit Schoeberl, Tad Stewart, Ulrik Nielsen and Mark Penney

The IGF network has been implicated in a number of cancers and therefore IGF1 and IGF1R have become promising targets for therapeutic intervention. In this work, a systems biology model of the IGF signalling pathway was produced in collaboration with Merrimack Pharmaceuticals Inc with the objective of identifying and quantifying biochemical biomarkers for the pharmacodynamic efficacy of an IGFR-1 inhibitor with clinical potential, and to further evaluate biomarkers for patient response.

The IGF pathway topology was described using existing data in the literature and expressed as a mathematical model in Matlab. It was quantified by training it with measured in-house data in an iterative process which demonstrated the importance of having a high quality data set, in this case ones which described the peak activation of ERK and AKT well. This resulted in a model which predicted the downstream activation of ERK and AKT with a good degree of accuracy. (can you split this sentence into two) The model was then validated by comparing the predicted output in response to an IGF1-R inhibitor to an independently measured experimental set.

Model simulation and sensitivity analysis were used to determine those biomarkers most sensitive to IGF1R blockade. These showed that the IGF network lacks downstream signal amplification; consequently the most sensitive biomarker is the phosphorylation of the IGF1R itself, in contrast to similar pathways such as the EGF signalling pathway. Model-based analysis also shows that normal expression levels of IGF1R, levels of free IGF1 and IGF2, and IRS-1 expression are the most important biomarkers for patient response. Furthermore, it is shown that IGFBP-5 may also be important due to its ability to amplify IGF signalling, illustrating that the regulation of free IGF levels play a key role in IGF signalling prior to interaction with IGF receptors.


A Plausible Model for the Digital Response of p53 to DNA Damage: A Tale of Limiting Resources, Negative Feedback and Time Delays.

John Jeremy Rice (IBM Computational Biology Center, Yorktown Heights, NY)

J. Jeremy Rice, Lan Ma, John Wagner, and Gustavo Stolovitzky. IBM Computational Biology Center, Yorktown Heights, NY.
 

The tumor suppressor p53 protein is critical to ensure genomic stability when cells are under ionizing radiation (IR) stress. Recently it was observed that single-cell response of p53 to IR is "digital", in that it is number of oscillations (rather than the amplitude) of p53 what shows dependence with the radiation dose. We present a mathematical model of this phenomenon. In our model, double strand break (DSB) sites induced by IR interact with a limiting pool of DNA repair proteins, forming DSB-protein complexes at DNA damage foci. Both the initial number of DSBs and the DNA repair process are modeled taking into account the stochastic nature of the repair process. The model assumes that the persisting complexes are sensed by ataxia telangiectasia mutated (ATM), a kinase with a positive feedback mechanism of autophosphorylation that sensitively transduces the DNA damage information to downstream processes. The ATM sensing module produces a step-like, ON-to-OFF signal as the input to a downstream oscillator consisting of a p53-Mdm2 (Mdm2 is the negative regulator of p53) autoregulatory feedback loop. Our simulation results show that p53 and Mdm2 exhibit a coordinated oscillatory dynamics upon IR stimulation, with a stochastic number of oscillations whose mean increases with IR dosage, in good agreement with the observed response of p53 to DNA-damage in single-cell experiments. We conjecture that the robustnes of the oscillatory behavior of p53 is in part induced by the ATM-induced autodegradation of MDM2, a mechanism recently reported but not yet included in other models.

 


Combining Models and Data for Systems Analysis of Cellular Networks

Jörg Stelling (ETH Zürich)

Systems biology aims at understanding complex biological networks through a combination of (comprehensive) experimental analysis and (quantitative) mathematical modeling. At present, however, it is largely unclear, which knowledge and data will be required for establishing realistic mathematical models. Related to this, it is equally important to ask to what extent the already available data allow for meaningful model development.

In this talk, I will argue that one can extract an unexpectedly high degree of information by appropriate combinations of modeling approaches, biological knowledge and only few experimental data. The examples presented will include (i) structural analysis of metabolic networks to infer key aspects of functionality and regulation, (ii) comparative computational modeling of the TOR (‘target of rapamycin') pathway to reveal signaling mechanisms, and (iii) detailed modeling of a complex network in yeast cell cycle regulation. These studies point to the robustness of cellular networks as an ‘enabling' feature for model development, and they suggest strategies for efficiently linking future experimental and theoretical approaches to cellular networks.

  


Computational Challenges in Integrating Transcriptomics, Proteomics and Metabolomics

Jim Samuelsson (GeneData)

In this presentation we give an overview of some computational challenges in the integrative analysis of transcriptomics, proteomics and metabolomics data in order to obtain a better understanding of cellular processes, a prerequisite for systems biology.

We start by discussing some of the strengths and weaknesses pertaining to the different 'omics' levels, calling for the need to be able to work at more than one level. We continue with a few examples of how we at Genedata tackle some of those challenges. For example, how metabolic pathways can be studied in an integrated fashion together with expression analysis data of different types.

We then move on to some issues that we find especially important to address in order to be able to make maximum use of the data, particularly the necessity for quality assessment of the raw data as well as the subsequent statistical analysis and data mining to extract the biological information. Finally, we illustrate the methods by applying them to the problem of biomarker identification.


Qualitative modelling, analysis and simulation of genetic regulatory networks

Denis Thieffry ( Université de la Méditerrannée-CNRS-INSERM, Marseille)

A proper understanding of the mechanisms controlling gene expression requires the integration of molecular and genetic data into full fledge mathematical models. An overview of the main dynamical modelling approach will be provided, before focusing on a multi-level, logical approach, which enables a flexible qualitative modelling of complex regulatory networks. This approach encompasses the development of a dedicated software suite (GIN-sim), and will be illustrated by applications to pattern formation and cell differentiation in the fly Drosophila melanogaster .



Integration of biological knowledge to deliver new biotherapeutics: From data integration to system modeling


Ioannis Xenarios (Serono Pharmaceutical Research Institute)

The face of biological research in the biotech industry has evolved at an alarming rate. From a one-gene/one protein analysis it has borne witness to a multitude of new technologies that allow us to capture and integrate a vast amount of information generated by high throughput methods such as DNA microarrays, proteomics and bioinformatic. Then along has come the sequenced human genome, and suddenly we have a complete skeleton upon which to integrate the mass of information generated.   The scientific community now has an integrated way of looking at what have previously been isolated snippets of knowledge. We have known for some time the function(s) of many proteins in signaling pathways, developmental regulation, cell cycle progression, and so on. However, what is becoming clearer as we gather more information and gaze upon the global picture, is that a single protein rarely performs a single function.   Rather, the activity that we assign to it is the product of its interaction with other proteins, small molecules or nucleic acids at any given time. Despite the advance in high throughput technologies (or, perhaps, because of this), we are faced with an avalanche of data but only flakes of knowledge. What is needed is a system approach that would enable us to integrate all the information generated from these technology platforms and develop both mathematical and biological methodologies to test them.

 


microRNAs Spread to Viruses

Mihaela Zavolan (Biozentrum Basel & Swiss Institute of Bioinformatics)

MicroRNAs are a large class of endogenous RNA molecules approximately 22 nucleotides in length that regulate translation of protein-coding genes in plants and animals. Hundreds of miRNA genes have been identified in various eukaryotic organisms and their number is still growing. Their existence and function in “simpler” life forms such as viruses have not been extensively investigated, although viruses are known to use many elaborate RNA processing functions of the host. Using a combination of computational miRNA gene prediction and small RNA cloning, we discovered miRNAs encoded by herpes viruses. The predicted viral targets and the expression profile of viral miRNAs suggest a role of these miRNAs in the viral life cycle, and thus interaction with the host.

 

 
Biozentrum, University of Basel