T7: Single-cell RNA-Seq Analysis

Organizers:

Michael Stadler, Atul Sethi, Panagiotis Papasaikas (FMI, Basel)
Vincent Gardeux, Bart Deplancke (EPFL, Lausanne)
 

Tutorial Summary:

Motivation 
Over the last few years, Single-cell RNA-sequencing (scRNAseq) has emerged as a revolutionary tool with the potential to explore biological heterogeneity at the most basic level of organismal organization. Several questions inaccessible in the context of bulk RNAseq can now be addressed or at least probed in a meaningful manner. Examples include the possibility for detailed mapping of the cellular composition of tissue types and identification of novel cell-types, characterization of their individual transcriptomes, discrimination of transcriptional variation arising at the single-cell versus the cell-ensemble level and highlighting the contribution of individual cells to tissue differentiation, development and disease progression.

This promise comes along with multiple technical and computational challenges. While scRNAseq data is structurally similar to bulk RNA-seq, the paucity of starting material combined with multiple confounded sources of variance result in low signal to noise ratio exemplified by high abundance of zeroes in the gene expression matrices. In this new setting existing techniques need to be modified or novel approaches need to be developed for downstream analyses.

Organisation
The proposed session consists of (A) a tutorial session and (B) a workshop session put together by members of two groups (from EPFL & FMI) working on scRNAseq data analysis:

A. Tutorial session consisting of two parts:

  1. A general overview of single-cell transcriptomics, and single-cell based sequencing technologies. In particular, we’ll discuss the limitations of bulk workflows that can be overcome with single-cell analyses, as well as the advantages and limitations of single-cell analyses in gathering quantitative data.
  2. A practical session highlighting several of the most critical and common issues associated with the computational analysis of scRNAseq data:
    • Characteristics, comparison and limitations of scRNAseq data generated from the different protocols and commercially available systems
    • Data pre-processing and quality control
    • Data visualization and detection of biologically meaningful subpopulations
    • Differential gene expression
    • Sources of biological and technical variation and circumvention of confounding effects.

This tutorial is designed as a guided conversation through scRNAseq analyses combining lecture and hands-on sessions. It intends to give audience a feel for the data and walk them through major analyses techniques and concepts using illustrative examples and R-scripts that are applicable/extendable to most commonly available types of scRNAseq data.

B. Workshop session consisting of short talks from the members of the 3 groups discussing specific analysis tools for scRNAseq data

Expected Goals
The learning outcomes of this tutorial for the audience:

  • Gain basic knowledge about scRNAseq protocols and kind of data produced data by them.
  • Perform basic QC, filtering ( reads, cells, genes ), and normalization of scRNAseq data.
  • Detect possible sources of technical and biological confounding variables (e.g. library complexity, cell cycle, etc.). Apply techniques to remove or account for these confounders in subsequent analyses and evaluate their strengths and weakness.
  • Identify scRNAseq specific challenges in visualization and clustering for subpopulation detection, and population marker identification.
  • Implement aforementioned concepts with practical examples from publicly available scRNAseq datasets using custom R-scripts provided in the tutorial.
  • Evaluate the applicability of specific tools on different data types and problem settings/contexts.

Level
Working knowledge of R and RNA-seq data analyses is assumed. R-scripts will be provided for the hands-on session to allow for discussion on concepts and challenges in the field.

Intended audience
Computational biologists, bioinformaticians, and molecular biologists involved in transcriptomic data-analysis with any level of experience and an interest in the analysis of scRNAseq data.

Prerequisites

  • Participants bring wifi-enabled laptops and connect to an R server set-up by FMI to run analyses.
  • Internet access for accessing the R server and accessing the prepared datasets and code.

Tutorial Agenda:

Tuesday, September 12, 2017
Venue: Kollegienhaus, University of Basel, Room 212, Floor 2.

9:00 – 9:45 Tutorial (EPFL, theoretical introduction): motivation, technologies, state-of-the art
9:45 – 10:10 Workshop (Sebastien Smallwood, FMI Basel): Technical overview of single-cell –omics methods
10:10 – 10:30 Tutorial (FMI, hands-on): Initial data exploration: characteristics of expression data, quality control
10:30 – 11:00 Coffee break
11:00 – 12:30 Tutorial (FMI, hands-on): Filtering genes and cells, visualization of scRNAseq data, identification and removal of confounding factors e.g. cell cycle, cell complexity, etc.
12:30 – 13:30 Lunch break
13:30 – 15:00 Tutorial (FMI, hands-on): Subpopulation detection from scRNA-seq data in the presence of confounders and differential gene expression
15:00 – 15:30 Coffee break
15:30 – 16:00 Workshop (Manfred Claassen, ETH Zurich): (Un-)supervised learning of cell population structure from single-cell snapshot data
16:00 – 16:30 Workshop (Dominic Grün, MPI Freiburg, DE): Revealing fate bias of multipotent progenitor cells by single-cell RNA-seq
16:30 – 17:00 Workshop (Vincent Gardeux, EPFL): Automated Single-cell Analysis Pipeline (ASAP)

Tutorial speakers:

EPFL, Lausanne
Vincent Gardeux is a senior research scientist in Bart Deplancke’s group at the EPFL. His research focuses on transcriptional regulation and computational methods for the analysis of single-cell RNA-seq datasets. He designed and implemented a web-based bioinformatics tool for single-cell data analysis, named ASAP.

Bart Deplancke is an associate professor and group leader in systems biology and genetics at the EPFL. His research focuses on microfluidics, single-cell genomics, and computational approaches to characterize the regulatory code in Drosophila and mammals, and to examine how variations in this code affect molecular and organismal diversity. He is in charge of a single-cell course at the EPFL together with Pr. David Suter.

FMI, Basel
Michael Stadler is a staff scientist and head of computational biology at FMI, Basel. His group is collaborating with experimental biologists and studies regulation of gene expression through the analysis and modeling of genome-wide datasets on various biological topics including cancer progression and cellular differentiation. He has been teaching R and ‘omics data analysis at FMI and Universities since ten years and is a member of the MetastasiX/SystemsX.ch project that studies single cell level heterogeneity in breast cancer.

Atul Sethi is a postdoctoral fellow in Michael Stadler’s group investigating single cell level heterogeneity in breast tumors and metastases. He studied computational biology and bioinformatics at ETH Zurich. During his PhD in group of Ruedi Aebersold, he worked on integration of diverse omics data with protein interaction networks to prioritize protein biomarkers in ovarian cancer. He has lectured and organized hands-on sessions on R and Bioconductor at FMI and University of Basel.

Panagiotis Papasaikas is a bioinformatics specialist at FMI, Basel. He received his PhD on computational biology from Carnegie Mellon University working on machine learning and graphical model based techniques for the study of post-transcriptional regulation. As a post-doc and Research Associate at CRG, Barcelona he worked on statistical methods for transcript-isoform quantification from RNAseq data and network modelling approaches for studying splicing regulation. He has organized and lectured in several statistics, R and high-throughput data analysis courses.

Workshop speakers:

FMI, Basel
Sebastien Smallwood
is the head of Functional Genomics platform at FMI. At FMI and prior to that in Babraham institute, Cambridge he has spearheaded the setup and successful implementation of state of the art approaches for high-throughput biology. Notably, he has introduced novel single-cell genomic and epigenomic methodologies for the parallel assessment of transcriptional and epigenetic heterogeneity.

ETH, Zurich
Manfred Claassen
is an assistant professor and group leader in computational biology at ETH. His research aims at elucidating the composition of heterogeneous cell populations and how these implement function in the context of cancer and immune biology by jointly evaluating single cell and genome wide measurements. His group builds on concepts from statistics, machine learning and mathematical optimization to develop probabilistic approaches to describe biological systems, learn these descriptions from data and to design experiments to validate hypotheses following from computational analyses.

MPI Freibourg
Dominic Grün is a group leader in quantitative single cell biology at MPI. His research focuses on quantitative single cell biology in order to elucidate mechanisms of cellular differentiation. He has made seminal contributions in the field of single cell transcriptomics highlighting the role of gene expression noise in regulatory networks involved in cellular differentiation.