High Performance Computing in Bioinformatics
Dates: March 23rd – April 3rd.
Location: University of Oslo. The course will be held in auditorium Smalltalk (room 1416) in Ole-Johan Dahl’s building (Gaustadalléen 23B) on Mondays, Tuesdays and Wednesdays. On Thursdays and Fridays it will be held in the Large auditorium in the neighbouring Kristen Nygaard’s building (Gaustadalléen 23A).
Course code: INF9380, see UiO course page here for course schedule, course material and practical information
Organisers: Torbjørn Rognes, Abdulrahman Azab, Arvind Sundaram
Invited lecturers: TBA
Credits: 5 ECTS
Registration for NORBIS students can be found here
Registration deadline: February 21st
Important notes about earning credits at UiO:
The deadline to sign up for earning credits at UiO is December 15th and if you know you’ll attend this course you can apply HERE. Contact us at firstname.lastname@example.org if you need credits and missed this.
This course focuses on the application of high performance computing (HPC) to bioinformatics analysis. The main target is to provide a background on how to effectively use HPC clusters for running computationally or data intensive bioinformatics applications. The course will mainly include teaching students selected bioinformatics tools and workflows, and how to use HPC platforms to speed up and maximize the overall throughput of intensive bioinformatics analysis. This will include, e.g. how to optimize the use of available compute nodes, and how to adapt the application to the available resources on each compute node. The course will cover both how to efficiently use parallelism when writing your own programs, as well as how to adapt and wrap existing tools in manner that efficiently exploits resources available on parallel architectures.
We are planning this as an intensive two weeks course with lectures and hands-on exercises 7 hours a day, Monday to Friday, 9-17, with 1 hour lunch break. About half the time with lectures and the other half with exercises. In total lectures and exercises are estimated to 70 hours.
Self-study / reading of curriculum is estimated to 30 hours. Preparation time for written report (home exam) is estimated to 40 hours. The total time required for the course is estimated to 140 hours.
Overview of sessions planned:
Introduction to High Performance Computing
Hands on with server hardware
Performance efficiency analysis
High performance scientific computing
Code and data management with version control and hands-on
Programming for HPC
Introduction to resource demanding bioinformatics applications
Parallel programming with R
Workflows on supercomputers
Virtualisation (cloud and containers)
MPI with SLURM
Guided tour of supercomputer systems
After finishing the course the students should know:
- Resource intensive bioinformatics tools for, e.g. assembly, mapping/alignment, and multiple alignment. This would include the use of command line tools and portal based tools, e.g. Galaxy.
- How those tools work, how this would influence the runtime, and the possibility of parallelizing the computation.
- When to use parallelization and distribution.
- The basic structure of HPC clusters, and how to run jobs on a cluster
- How to evaluate the use of resources on a cluster, and how to optimize the use of memory and CPUs
- How to write your own tools that works efficiently on parallel hardware
- How to adapt or write wrappers around existing tools to process large datasets efficiently using parallellisation
Basic unix competence and basic knowledge of bioinformatics applications is required.