High Performance Computing in Bioinformatics
Dates: April 16-27, 2018
Location: University of Oslo
The course will be held in auditorium 1416 Smalltalk in Ole-Johan Dahl’s building on all days except Wednesdays. On Wednesday 18 April the course will be held in auditorium 2269 Python in Ole-Johan Dahl’s building, while on Wednesday 25 April the course will be held in the Large auditorium in the neighbouring Kristen Nygaard’s building.
Course code: INF9380, see UiO course page here for course schedule, course material and practical information
Organisers: Torbjørn Rognes, Abdulrahman Azab, Arvind Sundaram
Invited lecturers: Marrti Louhivuori, Jussi Enkovaara, CSC, Finland
Credits: 5 ECTS
Important notes about earning credits at UiO:
The deadline to sign up for earning credits at UiO has now expired. Contact us at email@example.com if you need credits and missed this.
This course focuses on the application of high performance computing (HPC) to bioinformatics analysis. The main target is to provide a background on how to effectively use HPC clusters for running computationally or data intensive bioinformatics applications. The course will mainly include teaching students selected bioinformatics tools and workflows, and how to use HPC platforms to speed up and maximize the overall throughput of intensive bioinformatics analysis. This will include, e.g. how to optimize the use of available compute nodes, and how to adapt the application to the available resources on each compute node.
We are planning this as an intensive two weeks course with lectures and hands-on exercises 7 hours a day, Monday to Friday, 9-17, with 1 hour lunch break. About half the time with lectures and the other half with exercises. In total lectures and exercises are estimated to 70 hours.
Self-study / reading of curriculum is estimated to 30 hours. Preparation time for written report (home exam) is estimated to 40 hours. The total time required for the course is estimated to 140 hours.
Overview of sessions planned:
- Introduction to High Performance Computing
- Hands on with server hardware
- Performance efficiency analysis
- High performance scientific computing
- Code and data management with version control and hands-on
- Parallel programming
- Programming for HPC
- Introduction to resource demanding bioinformatics applications
- Bioinformatics workflows
- Parallel programming with R
- Workflows on the Abel supercomputer
- Virtualization (Cloud)
- Virtualization (Containers)
- MPI with SLURM
After finishing the course the students should know:
- Resource intensive bioinformatics tools for, e.g. assembly, mapping/alignment, and multiple alignment. This would include the use of command line tools and portal based tools, e.g. Galaxy.
- How those tools work, how this would influence the runtime, and the possibility of parallelizing the computation.
- When to use parallelization and distribution.
- The basic structure of HPC clusters, and how to run jobs on a cluster
- How to evaluate the use of resources on a cluster, and how to optimize the use of memory and CPUs
- How to write your own tools that works efficiently on parallel hardware
- How to adapt or write wrappers around existing tools to process large datasets efficiently using parallellisation
Basic unix competence and basic knowledge of bioinformatics applications is required.