Bioinformatics for DNA metabarcoding and long-read amplicon data

Dates: 17-21 April, 2023
Location:University of Oslo, Department of Biosciences, Kristine Bonnevies Hus

Lecturers: Håvard Kauserud (UiO), Ramiro Logares (Barcelona), Frederic Mahè (France), Anders Krabberød (UiO), Torbjørn Rognes (UiO)
Credits: 3 ECTS
Registration:Closed

Registration deadline: 5th April

 

Course description

For mapping and exploring communities of both micro- and macroorganisms, high throughput sequencing of environmental DNA has become a powerful approach. One approach is to sequence a selected PCR amplified marker to obtain information about the taxonomic composition (DNA metabarcoding). In this course, we will focus on this approach. The students will be introduced to important analytical bioinformatics approaches from processing of raw sequence data to establishment of the OTU/sample matrix and retrieval of taxonomic identity of the sequences. Further downstream analyses will also be introduced. Important themes will be (1) different sequencing techniques, (2) filtering and quality assessment of high throughput sequence data, (3) error correction and/or clustering of high throughput sequence data, and (4) taxonomic annotation of high throughput sequence data. We will also touch upon some further downstream analyses, including evolutionary placement of short-read sequences onto backbone phylogenies. In this regard we will also focus on long-read metabarcoding. Applications of a wide suite of tools will be presented, including DADA2, SWARM and VSEARCH. There will also be a few guest lectures, focusing on specific topics such as long-read metabarcoding and decontamination. The course will be a mix of lectures and hands-on sessions.

 

Course program

Monday: Introduction to DNA-metabarcoding and important concepts. Introduction to UNIX, command line and R. Start to work with data-filtering.

Tuesday: Start to work with DADA2.

Wednesday: Continue with DADA2. Introduce and work with other approaches, such as SWARM and VSEARCH.

Thursday: Continue with SWARM and VSEARCH. Introduce the program LULU.

Friday: We will this day focus on long read-sequencing, including phylogenetic placement of short-reads on long-read phylogenies. We will here also focus on chimeras, an important problem with long-read sequences.

 

Learning outcomes and competence

The students will learn about terms and concepts in the field of environmental sequencing, both for short-read and long-read sequencing. The participants will be introduced to important programs and approaches, such as DADA2, SWARM and VSEARCH, as well as inferences for phylogenetic analyses of short and long reads from environmental samples. The students will learn how to establish a final datamatrix from raw read sequences. The participants will become more conscious about the different types of errors and biases, and use the different inferences in a critical way.

 

Prerequisites

Some knowledge in UNIX, command line usage and R will be very beneficial. It will be somewhat hard to follow without any knowledge in these areas.

 

Evaluation

A report must be written up after the course, outlining how to analyse an environmental sequence dataset.