T5: Reproducible research using Docker: A case study using high throughput sequencing tools

Organizer:

Walid H. Gharib (SIB Swiss Institute of Bioinformatics and Training Group Interfaculty Bioinformatics Unit (IBU), University of Bern)

Tutorial Website

Tutorial Summary:

Bioinformatics analysis usually involves a large number of software tools, reference data and pipelines used to elaborate the results. Reproducing the same analysis by other researchers is often a burden as many pieces of the puzzle are missing from the used methodology. While the raw datasets are generally available; a clear workflow/ pipeline detailing the results reproducibility is often missing.
In order to achieve reproducibility in computational biology, publishing a clear commented source code is a crucial step, but this is not enough as in almost every case the working environments are not armed with the right tools and dependencies to run the code. The biggest obstacle in computational reproducibility would be to create a reliable, standalone, multiplatform and lightweight-working environment in which all the computational needs for a study are met.
Virtualisation and containerisation are the two approaches to address this issue.
While virtualization e.g. VirtualBox is an option, it is memory intense and computationally expensive with limited not scalable performance and usually difficult to couple with high performance computing platforms. Containerization e.g. Docker is a widely used as a lightweight fast and scalable alternative to Virtual machines as it communicates directly with the Kernel of the host operating system. It can easily be deployed on a high performance computing clusters or to a cloud based elastic computation center e.g. Amazon web services.

The Docker technology position itself as promising approach to computational biology research reproducibility by

  • Saving time and expenses on human and computational resources allocated to already performed analysis
  • Boosting communication between computational biologists working on similar topics
  • Enhancing transparency within the community
  • Granting open access computational knowledge to the community
  • Building upon previous discoveries rather than building all over

Expected goals
During this one-day tutorial participants will practice basic Docker command line functionalities, eg setting up a Docker image, deploying images as “containers” and opening ports targeting pre-installed high throughput sequencing software tools. We will also introduce Amazon web services as cloud based tools hosting the pre-built Docker containers. The knowledge acquired by the participants in this tutorial should allow them to fetch and build reproducible workflows using Docker technology.

Intended audience
Bioinformaticians and Biologists.

Level and Prerequisites
Knowledge of the Next generation sequencing techniques is not required however Basic Unix command line knowledge is needed.

Technical requirements
Participants should bring their own laptops with Docker installed and register to AWS.

Tutorial Agenda:

Tuesday, September 12, 2017
Venue: Kollegienhaus, University of Basel

9:00 – 10:30 Session I                                                                                               
10:30 – 11:00 Coffee break
11:00 – 12:30 Session II
12:30 – 13:30 Lunch break
13:30 – 15:00 Session III
15:00 – 15:30 Coffee break
15:30 – 17:00 Session IV

See more details on the tutorial website.

Tutorial speaker:

Walid H. Gharib occupies two complementary positions. Member of the NCCR RNA & Disease, he advises on Next Generation Sequencing NGS experimental design and conducts the downstream genomics analysis at the Interfaculty Bioinformatics Unit IBU/University of Bern. He is also a Bioinformatics trainer at the Swiss Institute of Bioinformatics SIB mainly teaching NGS related courses.