[BC]2's tutorials and workshops provide an informal setting to learn about the latest bioinformatics methods, discuss technical issues, exchange research ideas, and share practical experiences on focused or emerging topics in bioinformatics.

Tutorials and workshops take place on Monday, 13 September, from 9:00 - 16:00 at the Kollegienhaus of the University of Basel. 

Please note that you need to register for your tutorial or workshop of choice via the online registration system (once registrations are open), they are not included in the [BC]2 registration fee. Tutorials and workshops have a limited number of participants they can take - register on time!

COVID-19 and on-site participation

Tutorials and workshops are planned as on-site events with a hygiene concept closely coordinated with the cantonal and national decisions in order to ensure maximum safety for our participants.

Participation at the tutorials and workshops is given on a first-come, first-served basis. Please note that room capacities may be influenced by the local COVID-19 regulations in September. In case that we have to accept fewer people than foreseen, participants who registered late to a tutorial or workshop will not be able to attend. Already paid registration fees for the tutorial or workshops will be reimbursed.

In the event of restrictions preventing the tutorials and workshops to take place in person, participants will be informed about a probable virtual version.


Tutorials aim to provide participants with lectures or practical training covering topics relevant to the bioinformatics field. Please note that some of the tutorial schedules still need to be adapted to align with the start of the [BC]2 Welcome lecture.


beginner level –– high-throughput data analysis –– gene expression –– gene regulatory networks

Do you have transcriptomic or epigenomic data that measures changes in gene expression or chromatin state across a set of conditions and want to know which regulators and regulatory interactions are driving these changes? Our laboratory has developed  tools that model transcriptomic (i.e. RNA-seq) or epigenomic (i.e. ChIP-seq, DNase-seq, or ATAC-seq) in terms of computationally predicted binding sites of transcription factors and microRNAs, to infer the key regulatory interactions that are driving changes in gene expression and chromatin state.

In this tutorial, we will introduce users to two tools: ISMARA, which analyzes gene expression data, and CREMA, which analyzes chromatin state data to infer activities of cis-regulatory elements (i.e. both promoters and distal enhances) genome-wide. Both tools are implemented as web servers that take raw sequencing data as input, perform all modelling of this data in a completely automated manner, and provide comprehensive predictions of the regulatory network and regulatory interactions through an interactive online interface. Both ISMARA and CREMA are highly sophisticated tools that provide users a large number of interactive possibilities to explore predictions and to generate new analyses of the data.  The main objective of this tutorial is to allow attendees to develop an in-depth understanding of the variety of analyses that these tools provide, and how to optimally use them for answering specific questions about the regulatory network operating in the user’s system of interest. 

Learning objectives
The attendees of the tutorial will learn how to use the ISMARA and CREMA systems, and obtain an in-depth exploration of all the analysis results that the system provides including:

  • What are the key regulators and how do their activities change across the samples?
  • What genes and pathways are targeted by each regulator?
  • What is the core network of interactions between the key regulators.
  • What are the main regulators of a particular gene or enhancer  element, and how does each regulator contribute to its expression/activity across the samples?
  • Exploring embedded links to the String and SwissRegulon databases.
  • How to average across replicate data and how to calculate contrasts between subsets of samples.
  • How to download comprehensive predictions to allow further downstream processing and analysis of these results.

At the end of the tutorial the attendants should have the expertise to perform sophisticated regulatory network predictions from RNA-seq or ChIP/ATAC/DNase-seq data using the ISMARA and CREMA tools.


Time Activity
09:00 – 10:15
Introduction to Motif Activity Response Analysis. Transcription factor binding site predictions, processing of raw RNA-seq and ChIP-seq data, and the MARA model.
10:15 – 10:45
Coffee break
10:45 – 12:15
Introduction to CREMA. Identification of cis-regulatory elements (CREs) genome-wide and application of the MARA model. Overview of the analysis results
12:15 – 13:30
Lunch break
13:30 – 14:30
Using ISMARA and CREMA: data types, data upload, web interface, advanced interactive analysis features and command line tools.
14:30 – 16:00
Questions and answers session, exploring user provided datasets
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 30

The tutorial is targeted to a broad audience. This meeting should be of interest to all people interested in inference of gene regulation from gene expression and chromatin data including computational biologists, bioinformaticians and experimental biologists. Since no specific bioinformatic skills are required  in order to perform the analysis, purely experimental researchers with no data analysis background are also encouraged to attend.

The participants are expected to be familiar with the molecular biology of gene regulation. No specific bioinformatics or data analysis skills are required.

Users should, whenever possible, bring their own laptop and  upload their own data in advance to allow exploration of results on their own data during the tutorial. A wireless connection will be needed for users to be able to interact with the system.


  • Erik van Nimwegen (Professor and Group Leader; University of Basel & SIB Swiss Institute of Bioinformatics; Switzerland)
  • Mikhail Pachkov (Senior Research Assistant; University of Basel & SIB Swiss Institute of Bioinformatics; Switzerland)


beginner level –– clinical data –– data interoperability –– semantics –– RDF (resource description framework) –– FAIR

Healthcare information is collected in very diverse systems. Often, ad hoc databases or data models are created for a specific medical use case. When dealing with heterogeneous and sensitive health-related data in the research setting, one of the biggest challenges is to bring the data together and achieve interoperability. The SPHN - Swiss Personalized Health Network1 provides a semantic framework to foster interoperability of health-related data across the fragmented Swiss health care systems. During the past two years, a Resource Description Framework (RDF)-based FAIR (Findable, Accessible, Interoperable and Reusable) data framework has been developed to define, represent and store clinical data using common semantics. Biomedical data from any institution, implementation or platform can be expressed in this framework and semantically annotated. The encoded data are represented as standard URIs that allow direct linking to common ontologies such as SNOMED CT. The flexibility of RDF enables the use of other existing standard terminologies of interest such as ICD-10, LOINC or SNOMED CT and extendable for the unforeseen. This allows the researcher to use the knowledge of the terminology together with their data.

Learning objectives
Participants will learn how to access, understand and use the SPHN RDF schema. After the course, they will know about the infrastructure/tool stack provided by SPHN and BioMedIT to handle the SPHN RDF data. The tutorial's main objectives are to allow participants to:

  • Learn about the SPHN RDF schema and the use of the SPHN infrastructure
  • Learn how to visualize and browse graph-based data  
  • Learn how to query data with SPARQL
  • Learn how to do reasoning using external terminologies such as SNOMED CT 
  • Learn about common packages for Python and R to be used with RDF data


Time Activity
09:00 – 09:10 Welcome and Introduction
09:10 – 09:30 The Swiss Personalized Health Network infrastructure: Introduction and aim
09:30 – 10:00 RDF as exchange format for clinical data in SPHN: Why and how?
10:00 – 10:15 SNOMED CT in SPHN
10:15 – 10:45 Coffee break
10:45 – 11:15 How to expand the SPHN RDF schema for your project
11:15 – 12:15 How to visualize your own schema and data
12:15 – 13:30 Lunch break
13:30 – 14:30 How to query your data with SPARQL
14:30 – 15:00 How to integrate terminologies such as SNOMED CT to do reasoning
15:00 – 16:00 How to use Python and R with RDF data
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 15

This tutorial is for (clinical) scientist working with RDF data or interested in applying RDF to their clinical data. Attendees should have very basic bioinformatics and data analysis skills. A basic knowledge about RDF and graph-based technologies is helpful but not required.


  • Sabine Österle (Team Lead Data Interoperability; SIB Swiss Institute of Bioinformatics; Switzerland)
  • Vasundra Touré (Scientific Coordinator; SIB Swiss Institute of Bioinformatics; Switzerland)
  • Kristin Gnodtke (Senior Clinical Data Specialist; SIB Swiss Institute of Bioinformatics; Switzerland)

1 SPHN is a national initiative with the goal of developing, implementing and validating a coordinated data infrastructure, in order to make health-relevant data interoperable and shareable for research in Switzerland. An integral part of SPHN is the BioMedIT project: a national IT infrastructure backbone that enables nationwide health-data exchange for research. BioMedIT provides researchers all over Switzerland with access to a secure and protected computing environment for analysis of sensitive data without compromising data privacy.


beginner level –– protein 3D structure and activity –– glycosylation –– personalized medicine and oncology

Experimental or modelled 3D structures are widely used as a main source of information in the studies of the structure-activity relationships of proteins. However, post-translational modifications (PTM), including glycosylation, are often neglected even though they are known to play a major role in protein structure stability, solubility, protein-protein recognition and resistance to aggression.  

This tutorial will demonstrate why and how crossing information regarding protein structures, sequences and glycosylation can help get a better understanding of protein structure and activity. It will also show the utility of such analyses in the field of personalized medicine.

Learning objectives
At the end of this tutorial, attendees should be able to interpret 3D structures of proteins in terms of molecular interactions, understand the effect of mutations on the modified protein structure and activity, appreciate the role of structural bioinformatics in precision oncology and be able to use the related Web tool Swiss-PO.ch. Examples will be taken from real mutations occurring in cancer cells of patients that were discussed during the weekly Molecular Tumor Board of the “Reseau Romand d’Oncologie”. Attendees will also be introduced to considering the effect of gaining or losing a glycosylation site on protein structure and function following the examples provided with Swiss-PO.ch. The main aspects covered are:

  • Introduction to molecular interactions
  • Introduction to protein structure and activity
  • Introduction to post-translational modifications with a focus on glycosylation
  • Introduction to GlyConnect: browse and search site-specific glycosylation data
  • Introduction to GLYCAM-Web: how to model glycoproteins 
  • Introduction to the role of structural bioinformatics in precision oncology
  • Introduction to Swiss-PO.ch: objectives, content and how-to


Time Activity
09:00 – 09:45
Introduction to molecular interaction, and protein structure and activity
09:45 – 10:15
Introduction to glycosylation as a distinct post-translational modification: 2D and 3D representations
10:15 – 10:45
Coffee break
11:45 – 12:15
Tutorial & exercises: using GlyConnect and GLYCAM-Web
12:15 – 13:30
Lunch break
13:30 – 14:00
Introduction to the role of structural bioinformatics in precision oncology
14:00 – 15:15
Tutorial & exercises: using Swiss-PO.ch
15:15 – 16:00
Open discussion and closing remarks
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 25

This tutorial is targeting a wide audience ranging from Master students to senior researchers with and interest in proteomics, structural bioinformatics and in protein structure and activity.

Attendees will need a laptop equipped with a recent browser (Google Chrome or Firefox).


  • Vincent Zoete (Assistant Professor and Group Leader; UNIL-CHUV, Ludwig Institute for Cancer Research, University of Lausanne & SIB Swiss Institute of Bioinformatics; Switzerland)
  • Fanny Krebs (Postdoctoral Researcher; UNIL-CHUV, Ludwig Institute for Cancer Research, University of Lausanne & SIB Swiss Institute of Bioinformatics; Switzerland)
  • Oliver Grant (Research Scientists; University of Georgia; USA)
  • Frédérique Lisacek (Group Leader; SIB Swiss Institute of Bioinformatics & University of Geneva; Switzerland)


intermediate level –– reproducibility –– data management –– container

Going beyond the usual 20-minute Docker tutorial... lots of tutorials on the net leave the user at the point where they can run a simple application out of a single container. But how do you get your own full-blown web service in the cloud thereafter?

Connecting your web service to the cloud requires multiple containers working together in the scope of container orchestration. In a real-world setup, this requires the containers to be provided via a container registry. Finally - since no one likes to rebuild containers manually for every code change - the whole deployment should be automated by continuous integration (CI).

Participants of this tutorial will learn how to take a (Django) web app, distribute it into multiple containers and run it via Docker Compose as an entry-level orchestration tool. The pipeline will be stored within a Git repository allowing participants to learn about GitLab's CI/CD functionality and to integrate it into their projects. To not store confidential information like passwords inside the Git repository while using the CI/CD, GitLab secrets will be a topic too.

Learning objectives
After the tutorial, attendees should have an idea on how to organise a web application including the web server, databases and other components in Docker Compose. In addition, they will also learn GitLab CI/CD allowing them to automatically deploy their web application.


The coarse-grained schedule for the tutorial is:

  1. Introduction of the example application
    • How is it set up on bare-metal? 
    • How is it set up with Docker Compose? 
    • Some Docker Best Practices

  2. Container orchestration
    • Introduction of different orchestration tools 
    • Docker Compose configuration (of the example application)

  3. CI/CD
    • Introduction of CI/CD terminology 
    • GitLab CI configuration 
    • GitLab CI secrets 
    • GitLab Runners

Audience and requirements

Maximum number of participants: 20

This tutorial is suited for developers, web-developers, Dev-Ops and sysadmins. Attendees should have a basic experience with Docker and Docker Compose and should know Git and GitLab (or similar system). Basic Linux shell experience is also required.

Attendees should bring their own laptop with an SSH client and Git installed plus their favourite editor.


  • Stefan Bienert (Bioinformatician; SIB Swiss Institute of Bioinformatics; Switzerland)
  • Pablo Escobar López (Linux Sysadmin; SIB Swiss Institute of Bioinformatics; Switzerland)
  • Jaroslaw Surkont (Bioinformatician; SIB Swiss Institute of Bioinformatics; Switzerland)


intermediate level –– genomic data analysis –– single-cell –– R

In the biological and clinical context, the identification of molecular signatures and corresponding feature extraction are two critical steps to understand diverse biological processes. In particular, a signature is defined as a group of molecular features (e.g. genes or genomic regions) that are sufficient to identify certain genotype or phenotype. For instance, expression signatures link a phenotype to a certain pattern of gene expression1,2 whereas enhancer signatures define subtypes based on the regulatory landscape3.

Non-negative Matrix Factorization (NMF) has been widely used for the analysis of genomic data to perform feature extraction and signature identification4,5. However, running a basic NMF analysis requires the installation of multiple tools and dependencies, along with a steep learning curve and computing time. To mitigate such obstacles, we developed ButchR and ShinyButchR6, a novel NMF toolbox that provides a complete NMF-based analysis workflow, allowing the user to perform matrix decomposition using NMF, feature extraction, interactive visualization, relevant signature identification and association to biological and clinical variables.

Learning objectives
The aim of this tutorial is to learn how to use ButchR to perform signature identification in different types of genomic data. To explore the results of an NMF analysis, we will provide a ready to use Docker image with RStudio, ButchR, and pre-loaded publicly available datasets, including bulk and single-cell RNA-seq data, as well as an interactive application. The tutorial will show how to run an NMF-based analysis from start to end.


Time Activity
Session 1 - Introduction
09:00 – 09:30 Ice breaker: Course expectations
09:30 – 10:15 Introduction to Non-Negative Matrix Factorization (NMF) and its usage in genomics
10:15 – 10:45 Coffee break and discussion
Session 2 - Matrix decompensation
10:45 – 11:15 How to use ButchR with Docker
11:15 – 11:45 Pre-processing data to use with NMF
11:45 – 12:15 Matrix decomposition with ButchR
12:15 – 13:30 Lunch break
Session 3 - Results interpretation
13:30 – 14:00 Selection of optimal factorization rank
14:00 – 14:30 Signature identification
14:30 – 15:00 Feature extraction and enrichment analysis
15:00 – 15:30 Interactive analysis with ShinyButchR
Session 4 - Discussion
15:30 – 16:00 Discussion and concluding remarks
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 14

This tutorial is for computational biologists dealing with large scale omics datasets (e.g. RNA-seq, ATAC-seq, …) looking for solutions to reduce the dimensionality of the data to a small set of informative signatures.

The attendees are expected to bring their own laptop with Docker pre-installed. To avoid any delay in setting up the container during the practice sessions, the Docker image for the workshop should be downloaded beforehand. This can be done by opening a command-line terminal (e.g., Powershell and Terminal) and running the command “docker pull hdsu/butchr”. A complete overview of how to install Docker can be found here: https://docs.docker.com/desktop/. In addition, a detailed explanation of how to use the ButchR docker image can be found here: https://hub.docker.com/r/hdsu/butchr. Basic R coding skills will be helpful, although the tutorial will cover all the steps, from loading data to exporting results.

Upon arrival, the attendees will receive an R Markdown file with a step-by-step guide of how to use ButchR and ShinyButchR including an example dataset and how to interpret the NMF results.


  • Carl Herrmann (Group Leader, University Clinics Heidelberg, Germany)
  • Andres Quintero (PhD candidate, University Clinics Heidelberg, Germany)


1. Szymczak, F., Colli, M. L., Mamula, M. J., Evans-Molina, C. & Eizirik, D. L. Gene expression signatures of target tissues in type 1 diabetes, lupus erythematosus, multiple sclerosis, and rheumatoid arthritis. Sci. Adv. 7, (2021).

2. Sotiriou, C. & Pusztai, L. Gene-Expression Signatures in Breast Cancer. N. Engl. J. Med. 360, (2009).

3. Gartlgruber, M. et al. Super enhancers define regulatory subtypes and cell identity in neuroblastoma. Nat. Cancer 2, (2021).

4. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature (2020). doi:10.1038/s41586-020-1943-3

5. Pal, S. et al. Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes. Nucleic Acids Res. 42, e64 (2014).

6. Quintero, A. et al. ShinyButchR: Interactive NMF-based decomposition workflow of genome-scale datasets. Biol. Methods Protoc. 5, (2020).

7. Liu, L. et al. Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity. Nat. Commun. 10, (2019).


intermediate level –– biological sequences –– data analysis  –– deep learning –– python

The abundance of biological sequence data is growing exponentially, and is revolutionizing many fields of research, from medicine and agriculture to energy and manufacturing. Machine learning offers a powerful toolkit to computational biologists for interpreting and capturing value from this vast ocean of data. One particularly important machine learning task is to identify patterns from variable-length biological sequences that are associated with a functional outcome. This tutorial will introduce attendees to a technique for pattern recognition with a long tradition in bioinformatics—Hidden Markov Models (HMMs). With that as a backdrop, we will then introduce a modern approach to pattern recognition—Recurrent Neural Networks (RNNs). We will learn about these two algorithms in the context of a codon optimization problem, where we train each model to design a gene sequence from a protein sequence in an optimal way for expression in a new host organism. Within this well-understood context, we will explore how each model is structured and the associated assumptions. We will outline algorithms for exploiting the models, and compare the advantages and disadvantages of these two frameworks. We will gain practical experience by performing codon optimization with open-source software implementations of these models. We will finish by discussing the ways that RNNs are being leveraged in recent computational biology publications.

Learning objectives 
After this tutorial, participants will:

  1. Understand the structure of HMMs and RNNs, and the relative strengths of each approach.
  2. Have solved a bioinformatics codon optimization problem using both modeling approaches by means of freely available Python packages.
  3. Have connected their new knowledge with recent uses of RNNs in computational biology research.


Time Activity
09:00 – 09:30
Overview of tutorial, set up computing environment.
09:30 – 10:30 Introduction to Hidden Markov Models (HMMs). Discuss assumptions. Introduce Forward-Backward and Viterbi algorithms for estimating probable hidden states given an HMM and input sequence. Work through examples of HMM setup, use, and interpretation with toy examples.
10:30 – 10:45 Coffee break
10:45 – 12:30 Introduction to codon optimization. Walk through an example of HMM setup based on a host genome, application to the codon optimization of an exogenous protein, and evaluation of the results, all using the hmmlearn package.
12:30 – 13:30 Lunch break
13:30 – 14:30
Introduction to Recurrent Neural Networks (RNNs). Discuss assumptions. Introduce back-propagation for parameter fitting. Work through examples of RNN setup, training, prediction and interpretation with toy examples.
14:30 – 15:30 Walk through an example of RNN setup and training based on a host genome, application to the codon optimization of an exogenous protein, and evaluation of the results, all using the PyTorch package.
15:30 – 16:00 Highlight recent computational biology literature using RNNs. Connect what we practiced with the literature, and expand on the possibilities of model application
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 40

This tutorial is intended for participants with past experience using Python and familiarity with introductory molecular biology concepts (i.e. “the central dogma”). Participants should have an introductory foundation in statistics and/or machine learning as we will rely on ideas such as probability and inference. This tutorial will not cover the derivations of the algorithms under discussion, or require similar advanced mathematical skills.


  • Matthew Biggs (Computational Biologist at AgBiome Inc. & Adjunct Assistant Professor of Biostatistics at the University of North Carolina; USA)


intermediate level –– data visualization –– R and ggplot2

One of the biggest challenges in disseminating your research is visualizing the results in a way that is meaningful, easy to interpret and aesthetically pleasing. Oftentimes, the extensive time dedicated to generating experimental results can rival the creation and optimization of their figures. With a point and click environment, you can spend hours or even days tweaking the settings to get the perfect figure - only to realize that you now have to repeat this process for the remaining data. This process can be especially challenging when needing to perform customizations or when pivoting your figures to adhere to guidelines from conferences, journals or other publishing platforms.

In this tutorial, we introduce an efficient and reproducible workflow in R for creating publication-ready figures. We will introduce ggplot2 syntax to create custom plots, and we will explore how to determine the type of plots most appropriate for your data. We will explore how to ensure consistency between figures using custom theme and colour selections, with an emphasis on colourblind-friendly palettes from the RColorBrewer and viridis packages. We will also examine methods for enhancing our plots with functions from the ggpubr and cowplot packages, especially regarding layout and labelling of figures. Finally, we will conclude with an activity to use what we have learned to reproduce a published figure.

Expected Goals 

  • Learn how to determine the type of plots that are best for your data
  • Appreciate the power and flexibility of ggplot2 to create custom plots
  • Know how to use custom functions and palettes to create figures with consistent themes, styles and colours
  • Understand how to use the R packages cowplot and ggpubr to easily add layouts and labels often required in published figures 
  • Know how to save plots in a variety of formats

Learning Objectives

  • Determine the plot types best for visualizing a given dataset
  • Define the syntax for creating a plot using ggplot2
  • Generate plots for various data types using ggplot2
  • Explain how to create multiple plots using the same themes, styles, and colours
  • Discuss how to quickly alter figures to meet a different set of requirements (different journal or conference)


Time Activity
09:00 – 09:10 Introduce the instructors and scope of the workshop (lecture)
09:10 – 09:30 Introduction to the dataset (discussion)
  • Discuss how to determine appropriate plotting methods for your data
  • Describe the types of relevant plots you would like to include/create
09:30 – 10:15 Explore ggplot2 syntax and plots (live coding)
  • Examine the ggplot2 syntax for a basic scatter plot
  • Customize scatter plot by adding layers to the base plot
10:15 - 10:45 Coffee break
10:45 - 11:45
Discuss creating consistent plots (live coding)
  • Create functions for themes to use with all figures
  • Define colour palettes to keep colours consistent
11:45 - 12:15
Introduce features of cowplot for aligning and labelling plots (live coding)
12:15 - 13:30
Lunch break
13:30 - 14:15
Introduce features of ggpubr for adding statistical comparisons and ordering of plots (live coding)
14:15 - 15:10
Practice by walking through re-creating published/provided figure(s) (live coding)
15:10 - 15:50
Practice by changing code to adhere to a journal’s figure requirements (live coding)
15:50 - 16:00
Wrap-up and exit survey (lecture)
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 40

This tutorial is for researchers interested in using R to create publication-ready figures. It is a hands-on tutorial in which the data and code will be distributed to participants who wish to follow along. All tutorial lessons and materials will be hosted on GitHub pages. Participants will be required to have R and RStudio downloaded and installed on their personal computers, in addition to any required R packages. This tutorial assumes an intermediate level of R knowledge.


  • Mary Piper (Research Scientist and Associate Director of Training; Harvard T.H. Chan School of Public Health; USA)
  • Radhika Khetani (Director for Training; Harvard School of Public Health; USA)


Workshops encourage participants to discuss technical issues, exchange research ideas, and share practical experiences on some focused or emerging topics in bioinformatics. Please note that some of the workshop schedules still need to be adapt to align with the start of the [BC]2 Welcome lecture.

The workshops "Toward a common framework for annotated, accessible, reproducible and interoperable computational models in biology" and "BioNetVisA: biological network reconstruction, data visualization and analysis in biology and medicine" will issue additional abstract submission calls to allow participants to present their work in the respective context. More information on how to submit your abstract to these workshops can be found in the workshops' descriptions.


beginner level –– systems biology –– modelling –– data annotation and curation

Computational models have long been used in Systems Biology to answer a variety of questions regarding the dynamical behaviours of complex systems. As the number of computational models rapidly increases, questions regarding models’ reproducibility and reusability, and model annotation in community-supported and standardised formats are needed more than ever. In [BC]2 2019, the Consortium for Logical Models and Tools (CoLoMoTo – http://colomoto.org) organized a workshop to develop community-driven guidelines and efforts for curation and annotation of logical models (Niarakis et al., 2020). Organised by members of the Consortium for Logical Models and Tools and Systems Modelling (SysMod - https://sysmod.info/), the proposed workshop aims to expand on these lines and bring together scientists from broader computational communities (multiscale, multicellular, and also quantitative modelling) to harmonize practices and foster interoperability and reusability of models. 

Selection of contributions
Several experts on data annotation, model curation, and community standard development will be invited (see list of tentative speakers below). In addition, a call for abstract submissions will be issued (more information to follow soon).

List of tentative speakers

  • Claudine Chaouiya (Aix-Marseille University, FR): Logical models
  • Dagmar Waltemath (Greifswald University, DE): FAIR, COMBINE confirmed
  • Henning Hermjacob (EBI, UK): BioModels confirmed
  • Falk Schreiber (University of Konstanz, DE):  SBGN confirmed
  • Sarah Keating (UCL, UK): SBML confirmed
  • Edda Klipp (Humboldt University, DE): Metabolism, rxncon
  • Anne Siegel (IRISA, FR): Metabolic network analysis confirmed
  • David Nickerson (University of Auckland, NZ): CellML and SED-ML standards confirmed
  • James Glazier (University of Indiana, Bloomington, USA): Multiscale models confirmed


The workshop will be split into four sessions: The morning sessions will be dedicated to 1) model curation/annotation, and community standards development and 2) interoperability/reusability issues - tool requirements; the afternoon sessions will cover 3) Combine archives and SED-ML for logical models and beyond: requirements for global rules, and 4) model repositories: requirements for model deposit pre or post-publication.

Time Activity
09:00 – 09:05 Welcome and introduction to the workshop
Session 1 - Model curation/annotation, and community standards development
09:05 – 10:15 Invited talk
09:25 – 09:45 Selected talk
09:45 – 10:05 Invited talk
10:05 – 10:15 Discussion
10:15 – 10:45 Coffee break
Session 2 - Interoperability/reusability issues and tool requirements
11:15 – 11:35 Invited talk
11:35 – 11:55 Selected talk
11:55 – 12:15 Invited talk
12:15 – 13:30 Lunch break
Session 3 - Combining archives and SED-ML for logical models and beyond: requirements for global rules
13:30 – 13:50 Invited talk
13:50 – 14:10 Selected talk
14:10 – 14:30 Invited talk
14:30 – 15:00 Coffee break
Session 4 - Model repositories: requirements for model deposit pre or post-publication
15:00 – 15:20 Invited talk
15:20 – 15:40 Selected talk
15:40 – 16:00 Invited talk
16:00 – 16:10 Discussion and conclusions
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 50

This workshop addresses students and researchers interested in learning about best practices in data/model curation/annotation and community standard developments, giving them a unique opportunity to discover and discuss state of the art in the field.

It brings together scientists involved in BioModels, (a central repository of mathematical models of biological/biomedical processes), COMBINE (the COmputational Modeling in BIology NEtwork), CoLoMoTo (the Consortium for Logical Models and Tools), SysMod (the Computational Modeling of Biological Systems Community of Special Interest of the International Society for Computational Biology (ISCB)), SBGN (the Systems Biology Graphical Notation project), SBML (Systems Biology Markup Language), SED-ML (Simulation Experiment Description Markup Language), and in other relevant projects. 


  • Anna Niarakis (Associate Professor; UEVE, Univ Paris-Saclay & INRIA Saclay; France)
  • Tomas Helikar (Associate Professor; University of Nebraska; USA)
  • Laurence Calzone (Research Scientist; Institut Curie, U900 INSERM & Mines Paris Tech; France)
  • Sylvain Soliman (Researcher; Lifeware & INRIA Saclay; France)

A scientific committee has been assembled to select the presentations from the call for submissions. Denis Thieffry (ENS, Paris, FR); Rahuman S. Malik Sheriff (EMBL-EBI, London, UK); Ioannis Xenarios (UNIL, Lausanne, CH); Ina Koch (Johann Wolfgang Goethe‐University, Frankfurt am Main, DE); Juilee Thakar (University of Rochester, New York, USA); Benjamin Hall (UCL, London, UK).


beginner level –– FAIR –– large-scale data analysis –– genomic and biomedical data

Vast data volumes, the lack of uniform data security standards, and the maze of infrastructure solutions and computational tools constitute significant hurdles in the life sciences' race towards efficient personalized healthcare on a global scale. Open Science, FAIR Policies, and broadly adopted community standards and best practices are widely recognized as effective methods to lower these hurdles.

This insight has motivated the establishment of the Global Alliance for Genomics and Health (GA4GH), an international standard-setting and policy-framing organisation dedicated to promoting a legal, technical and scientific framework for the ethical sharing and processing of personalized health data. Supported by 650+ organisations representing 50+ countries, the standards and policies set by the GA4GH are based on broad consensus across a wide range of different interest groups and cultures. The [BC]2 conference in Switzerland, with its rich history and culture of federalism and the experiences gained hosting the GA4GH Plenary Meeting in 2018, is a perfect venue for a workshop dedicated to advancing the development, establishment and promotion of GA4GH standards-based federated cloud solutions for the large-scale analysis of genomic and biomedical data.

The workshop will feature:

  • A visionary keynote lecture highlighting the benefits and risks of a globally federated research IT infrastructure
  • An introduction of the relevant GA4GH API standards and a high-level view of how these can play together
  • Technical sessions centered around the GA4GH Work Streams (1) Data Use & Researcher Identities (DURI), (2) Discovery (3) Cloud, and (4) Data Security, in which two technical solutions essential or beneficial for federated cloud computing are briefly presented (ideally one from within the GA4GH community and one previously not associated with it)
  • A panel discussion of session chairs, speakers and other key participants on how the various building blocks can be integrated into coherent, viable and secure products for use in academia and industry
  • An open floor discussion to give all participants the chance to raise their concerns, share their ideas and give feedback on the panel discussion

The event will feature a range of invited experts from the GA4GH and related communities/initiatives* (speakers/key contributors to be announced), but in order to further extend the network beyond current contributors we are also explicitly soliciting applications for technical contributions from any interested parties (please send an email with a technical abstract, relevant references and a short motivation statement to one of the organizers; all together <1 page). 


Time Activity
09:00 – 09:15 Welcome
09:15 – 10:15 Visionary keynote
10:15 – 10:45 Coffee break
10:45 – 11:15
Introduction of relevant GA4GH APIs
11:15 – 11:45
Technical session 1: Data Use & Researcher Identity
11:45 – 12:15
Technical session 2: Discovery
12:15 – 13:30
Lunch break
13:30 – 14:00
Technical session 3: Cloud
14:00 – 14:30
Technical session 4: Data Security
14:30 – 15:15
Panel discussion
15:15 – 15:45
Open floor discussion
15:45 – 16:00
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 100

The expected audience includes (a) researchers that benefit from or require access to efficient federated computational analysis platforms to address important research questions, particularly in the field of personalized healthcare, (b) bioinformaticians, computational biologists, computer scientists and scientific software developers who are interested to tackle the scientific, technological and ethical challenges of large-scale biomedical data analysis openly and FAIRly, together with the global community, and (c) managers and administrators of scientific compute centers who are interested in providing access to federated, FAIR cloud computing infrastructure to their clients.


  • Michael Baudis (Professor and Group Leader; University of Zurich & SIB Swiss Institute of Bioinformatics; Switzerland)
  • Katrin Crameri (Director Personalized Health Informatics; SIB Swiss Institute of Bioinformatics; Switzerland)
  • Alexandre Kanitz (Co-lead of the ELIXIR Cloud Initiative; University of Basel & Swiss Institute of Bioinformatics; Switzerland)
  • Shubham Kapoor (Lead System Architect; SIB Swiss Institute of Bioinformatics; Switzerland)

Next to the general frameworks provided GA4GH, ELIXIR/SIB and the SPHN, related/similar initiatives, stakeholders and projects include the GA4GH Driver Projects and other national and international initiatives, such as the 1+ Million Genomes and the Beyond 1 Million Genomes (B1MG) projectsCINECA, the European Open Science Cloud (EOSC)EUCANCanH3Africa, various NIH initiatives, the Personal Health Train, as well as various IT, bioinformatics and pharmaceutical companies. What unites them is a common vision for a globalized, affordable and ethical healthcare system that is able to tackle complex medical problems such as cancers, rare diseases and pandemics and that is driven by scientific and technological innovation emerging from open discourse in a community effort.


beginner level –– biological networks –– molecular interactions and pathways  –– modelling –– annotation, curation and contextualisation

Today's biology is largely data-driven thanks to high-throughput technologies that allow investigating molecular and cellular aspects of life on large scales. Making biological sense out of the amount of produced data requires their interpretation in the context of biomolecular networks that govern cellular and physiological processes. In parallel to this technological revolution, the last decades have seen the accumulation of considerable knowledge about those processes and their role in the health and diseases.

The goal of BioNetVisA workshop1 is to bring together the different actors of network biology, whether database providers, experimental biologists and clinicians involved in systems biology approaches, as well as computational biologists involved in data analysis and modelling. The participants will be exposed to the different paradigms of network biology and the latest achievements in the field. The BioNetVisA workshop also aims at identifying bottlenecks and proposing short- and long-term objectives for the community as discussing questions about accessibility of available tools for wide range of user in every-day standalone application in biological and clinical labs. In addition, the possibilities for collective efforts and future development directions will be discussed during the round table panel.

A call for abstract submissions to this workshop will be organised in addition (more information to follow soon).


Time Activity
Session 1
09:00 – 09:30 Henning Hermjacob (EMBL-EBI, Cambridge, UK)
09:30 – 10:00 Åsmund Flobak (NTNU, Trondheim, NO)
10:00 – 10:15 Selected talk
10:15 – 10:45 Coffee break
Session 2
10:45 – 11:15 Laura Cantini (IBENS – ENS, Paris, FR)
11:15 – 11:45 Valentina Boeva (ETH, Zurich, CH)
11:45 – 12:00 Selected talk
12:00 – 12:15 Selected talk
12:15 – 13:30 Lunch break
Session 3
13:30 – 14:00 Alexander Kel (geneXplain, Wolfenbüttel, DE)
14:00 – 14:30 Thomas Helikar (University of Nebraska, Nebraska, US)
14:30 – 15:00 Rupert W Overall (Technische Universität Dresden, DE)
15:00 – 15:15 Selected talk
15:15 – 15:30 Selected talk
15:30 – 15:45 Selected talk
15:45 – 16:00 Round table
17:00 [BC]2 Welcome lecture

Audience and requirements

Maximum number of participants: 70

The workshop targets computational systems biologists, molecular and cell biologists, clinicians and a wide audience interested in update and discussion around current status of network biology, pathway databases, and related analysis tools, including visualization, statistical analysis and dynamic modelling.

No computational background is required to attend the workshop. The round table panel planned at the end of the workshop will be a forum for live discussion around those topics.


  • Emmanuel Barillot (U900 Institut Curie - INSERM & Mines ParisTech; France)
  • Hiroaki Kitano (RIKEN Center for Integrative Medical Sciences; Japan)
  • Inna Kuperstein (U900 Institut Curie - INSERM & Mines ParisTech; France)
  • Andrei Zinovyev (U900 Institut Curie - INSERM & Mines ParisTech; France)
  • Samik Ghosh (Systems Biology Institute - Tokyo; Japan)
  • Robin Haw (Ontario Institute for Cancer Research; Canada)
  • Alfonso Valencia (Barcelona Supercomputing Centre; Spain)

1 BioNetVisA is an annual workshop series to bring together different actors of network biology from database providers, networks creators, computational biologists, biotech companies involved in data analysis and modelling to experimental biologists, clinicians that use systems biology approaches. The participants are exposed to the different paradigms of network biology and the latest achievements in the field. The workshop takes place in the context of major international conferences in the field of Computational Biology and Bioinformatics such as [BC]2, ECCB or ISCB.