High Performance Computing in Bioinformatics

THE COURSE IS GIVEN DIGITALLY DUE TO COVID-19! All participants have received information about how classes will be given.

Dates: March 23rd – April 3rd.

Location: University of Oslo. The course will be held in auditorium Smalltalk (room 1416) in Ole-Johan Dahl’s building (Gaustadalléen 23B) on Mondays, Tuesdays and Wednesdays. On Thursdays and Fridays it will be held in the Large auditorium in the neighbouring Kristen Nygaard’s building (Gaustadalléen 23A).

Course code: INF9380, see UiO course page here for course schedule, course material and practical information

Organisers: Torbjørn Rognes, Abdulrahman Azab, Arvind Sundaram

Invited lecturers: TBA

Credits: 5 ECTS

Registration for NORBIS students can be found here

Registration is closed

Important notes about earning credits at UiO:

The deadline to sign up for earning credits at UiO is December 15th and if you know you’ll attend this course you can apply HERE. Contact us at contact-norbis@uib.no if you need credits and missed this.

Illustration: Colourbox

 

Course description:

This course focuses on the application of high performance computing (HPC) to bioinformatics analysis. The main target is to provide a background on how to effectively use HPC clusters for running computationally or data intensive bioinformatics applications. The course will mainly include teaching students selected bioinformatics tools and workflows, and how to use HPC platforms to speed up and maximize the overall throughput of intensive bioinformatics analysis. This will include, e.g. how to optimize the use of available compute nodes, and how to adapt the application to the available resources on each compute node. The course will cover both how to efficiently use parallelism when writing your own programs, as well as how to adapt and wrap existing tools in manner that efficiently exploits resources available on parallel architectures.

 

Course program:

We are planning this as an intensive two weeks course with lectures and hands-on exercises 7 hours a day, Monday to Friday, 9-17, with 1 hour lunch break. About half the time with lectures and the other half with exercises. In total lectures and exercises are estimated to 70 hours.

Self-study / reading of curriculum is estimated to 30 hours. Preparation time for written report (home exam) is estimated to 40 hours. The total time required for the course is estimated to 140 hours.

Overview of sessions planned:

  • Introduction to High Performance Computing
  • Hands on with server hardware
  • Performance efficiency analysis
  • High performance scientific computing
  • Code and data management with version control and hands-on
  • Parallel programming
  • Programming for HPC
  • Introduction to resource demanding bioinformatics applications
  • Bioinformatics workflows
  • Parallel programming with R
  • Workflows on supercomputers
  • Virtualisation (cloud and containers)
  • MPI with SLURM
  • Galaxy
  • Guided tour of supercomputer systems

 

Learning outcome:

After finishing the course the students should know:

  • Resource intensive bioinformatics tools for, e.g. assembly, mapping/alignment, and multiple alignment. This would include the use of command line tools and portal based tools, e.g. Galaxy.
  • How those tools work, how this would influence the runtime, and the possibility of parallelizing the computation.
  • When to use parallelization and distribution.
  • The basic structure of HPC clusters, and how to run jobs on a cluster
  • How to evaluate the use of resources on a cluster, and how to optimize the use of memory and CPUs
  • How to write your own tools that works efficiently on parallel hardware
  • How to adapt or write wrappers around existing tools to process large datasets efficiently using parallellisation

 

Prerequisites:

Basic unix competence and basic knowledge of bioinformatics applications is required.