T2: Validating and annotating genomes from evolutionary perspective with BUSCO and OrthoDB

Organizers:

Evgenia V. Kriventseva and Robert M. Waterhouse (University of Geneva and Swiss Institute of Bioinformatics, Switzerland)

Tutorial Summary:

The OrthoDB catalogue of orthologues represents a comprehensive resource of comparative genomics data to help researchers make the most of their newly-sequenced genomes.OrthoDB’s sets of Benchmarking Universal Single-Copy Orthologs, BUSCO, provide a rich source of data to assess the quality and completeness of these genome assemblies and their gene annotations. These resources and tools enable improved and extended orthology-based genome annotation and interpretation in a comparative genomics framework that incorporates the rapidly growing numbers of newly-sequenced genomes. Such comparative approaches are wellestablished as immensely valuable for gene discovery and characterization, helping to build resources to support biological research.

Orthology delineation is a cornerstone of comparative genomics, offering evolutionarily-qualified hypotheses on gene function by identifying “equivalent” genes in different species, as well as highlighting shared and unique genes that offer clues to understanding species diversity and providing the means to begin to investigate key biological traits – for both large-scale evolutionary biology research and targeted gene and gene family studies. The success of such interpretative analyses relies on the comprehensiveness and accuracy of the input data, making BUSCO quality assessment an important part of the process of genome sequencing, assembly, and annotation.

Orthology-based approaches therefore offer not only a vital means by which to begin to interpret the increasing quantities of genomic data, but also to help prioritize improvements, and to ensure that initial “draft” genomes develop into high-quality resources that benefit the entire research community. This tutorial offers insights into the approaches behind both OrthoDB and BUSCO, as well as the opportunity to learn about their applications to different types of genomics data.

Motivation
With advances in sequencing technologies comes a pressing need for reliable access to, and a comprehensive understanding of, available large-scale high-throughput comparative genomics methods and resources. This tutorial aims to explore concepts of orthology as a cornerstone of comparative genomics, using the OrthoDB catalogue for annotation of newly-sequenced genomes and BUSCO sets for quantifying completeness of de novo genome assemblies.

Expected Goals
Participants are expected to gain both a broad overview of the concepts, approaches, and challenges in the field of large-scale orthology delineation, as well as deeper insights into how OrthoDB and BUSCO work and what they have to offer both to researchers browsing the OrthoDB website, and to power-users of large orthology or BUSCO datasets.

Intended Audience
Students and researchers interested in learning about orthology in general, as well as those interested more particularly in the approaches behind OrthoDB and what it provides as a large-scale orthology resource. This tutorial should interest not only new-comers to the field, but also those already familiar with the concepts but wanting to learn more about the details and what OrthoDB has to offer. The complementary BUSCO part of the tutorial should be of particular interest to anyone who is working with genomics data such as newly assembled genomes or transcriptomes or genome annotations.

Experience level and prerequisites
Participants should bring along their own laptops if they wish to actively follow the demonstrations, practicals or if they have specific questions to ask of the instructors. No prior experience of orthology delineation algorithms or running of the BUSCO assessment tool is necessary. Taking some time to browse www.orthodb.org and http://busco.ezlab.org to become familiar with the websites is recommended, but not required.

Tutorial Agenda:

Tuesday, September 12, 2017
Venue: Kollegienhaus, University of Basel

9:00 – 10:30 Introduction to orthology: concepts and approaches. Overview of OrthoDB.
10:30 – 11:00 Coffee break
11:00 – 12:30 Practical on how to use OrthoDB
12:30 – 13:30 Lunch break
13:30 – 15:00 Introduction to BUSCO. Demonstration of BUSCO applications
15:00 – 15:30 Coffee break
15:30 – 17:00 Practical on BUSCO. Open floor Q&A

Tutorial speakers:

Evgenia V. Kriventseva did her Ph.D at European Bioinformatics Institute EBI-EMBL creating a CluSTr resource for classification of Swiss-Prot and TrEMBL proteins, then she was a functional annotation team leader at BASF Plant Science, Germany. She came back to EMBL to Prof. Kafatos group, working on vector genomics (AnoEST database). After she moved to UniGE/SIB where she created the OrthoDB database. Evgenia gives courses at SIB and UniGE on bioinformatics resources and comparative genomics.

Robert M. Waterhouse read his undergraduate degree in Biochemistry at the University of Oxford. He then formalised his training in computational biology through an MSc in Bioinformatics at Imperial College London, followed by his doctoral studies focusing on the computational comparative analysis of insect genomes. He moved to the University of Geneva for his first postdoctoral position working in the Zdobnov group on arthropod genomics and the OrthoDB catalogue of orthologues. A Marie Curie International Outgoing Fellowship then took him to the Massachusetts Institute of Technology where he led the comparative analysis of multiple mosquito genomes. Returning to Geneva, he is developing his research with a focus on the comparative evolutionary and functional genomics of disease-vector mosquitoes and other insects. Robert has extensive experience of teaching workshops and tutorials, e.g. ETHZ ‘Triple A’ winter school on how to assemble, annotate and analyse whole genome sequence data; arthropod genomics biocuration workshops in South Africa; SIB spring school on bioinformatics and population genomics; the genome train workshop on learning about each stage of a genome sequencing project; as well as several VectorBase hands-on instruction workshops.