High Performance Computing in Bioinformatics

Dates: March 14-25, 2022

Location: University of Oslo.

Course code: INF9380, see  UiO course page here for course schedule, course material and practical information

Organisers: Torbjørn Rognes, Abdulrahman Azab, Arvind Sundaram

Invited lecturers:

Marrti Louhivuori & Jussi Enkovaara, CSC (IT Center for Science), Finland.

Abdulrahman Azab, Chief Engineer, University Centre for Information Technology, UiO.

Arvind Sundaram, Research scientist, Norwegian Sequencing Centre, Oslo University Hospital.

Bjørn-Helge Mevik, Darren Starr, Leon Charl du Toit, Marcin Krotkiewski, Maiken Pedersen, Ole Widar Saastad, Sabry Razick, USIT, UiO

Credits: 5 ECTS

Important notes about earning credits at UiO:

PhD students not at the University of Oslo (UiO) who wish to earn credits for the course must apply for status as a visiting PhD candidate within 15th December 2021. Click here for more information. For NORBIS participants we have and extended deadline to 18th February 2022 and you have to send the following information to contact-norbis@uib.no

Your name (first name and surname)
-Norwegian national id number (11-digits). If you don’t have one, please send your date of birth
-Citizenship
-e-mail address
-Address
-Post code/City
-Telephone
-official documentation of your admission to a PhD programme
-confirmation from your supervisor/administration that the course(s) you wish to follow will be included as part of the training component in your PhD programme
-International applicants are encouraged to also enclose a Letter of Recommendation from a professor/research group at the University of Oslo

This require a lot of extra work for all parties, so we encourage you to apply directly to UiO within 15th December. The application form for visiting PhD students is opened 15th November and you find it here. All students should register for the course using Studentweb at UiO. In addition, you should also register for the course at NORBIS. The number of slots is limited to 30.Contact us at contact-norbis@uib.no if you need credits and missed this.

Registration to NORBIS: Closed.

Evaluation: Practical student project (home exam) with hand-in of written report.

 

Course description

This course focuses on the application of high performance computing (HPC) to bioinformatics analysis. The main target is to provide a background on how to effectively use HPC clusters for running computationally or data intensive bioinformatics applications. The course will mainly include teaching students selected bioinformatics tools and workflows, and how to use HPC platforms to speed up and maximize the overall throughput of intensive bioinformatics analysis. This would include, e.g. how to optimize the use of available compute nodes, and how to adapt the application to the available resources on each compute node.The course will cover both how to efficiently use parallelism when writing your own programs, as well as how to adapt and wrap existing tools in a manner that efficiently exploits resources available on parallel architectures. The course will be identical to the University of Oslo course INF9380.

 

Course program

We are planning this as an intensive two weeks course with lectures and hands-on exercises 7 hours a day, Monday to Friday, 9-17, with a 1 hour lunch break. About half the time with lectures and the other half with exercises. In total lectures and exercises are estimated to 70 hours.

 

Self-study / reading of the curriculum is estimated to 30 hours. Preparation time for the written report (home exam) is estimated to 40 hours. The total time required for the course is estimated to 140 hours.

Overview of sessions planned:

– Introduction to High Performance Computing

– Hands on with server hardware – Performance efficiency analysis

– High performance scientific computing

– Code and data management with version control and hands-on

– Parallel programming

– Programming for HPC

– Introduction to resource demanding bioinformatics applications

– Bioinformatics workflows

– Parallel programming with R

– Workflows on supercomputers

– Virtualisation (cloud and containers)

– MPI with SLURM

– Galaxy

– Guided tour of supercomputer systems

 

 

Learning outcomes and competence

After finishing the course the students should know:

– Resource intensive bioinformatics tools for, e.g. assembly, mapping/alignment, and multiple alignment. This would include the use of command line tools and portal based tools, e.g. Galaxy.

– How those tools work, how this would influence the runtime, and the possibility of parallelising the computation.

– When to use parallelisation and distribution.

– The basic structure of HPC clusters, and how to run jobs on a cluster

– How to evaluate the use of resources on a cluster, and how to optimize the use of memory and CPUs

– How to write your own tools that works efficiently on parallel hardware

– How to adapt or write wrappers around existing tools to process large datasets efficiently using parallelisation

 

 

Prerequisites

Basic unix competence and basic knowledge of bioinformatics applications is required. Basic programming skills, preferably in Python. The university offers Software Carpentry workshops and other courses that enable participants to fulfil this requirement.