Photo: Colourbox
Dates: 3-7 May, 2021
Location: The course is given online.
Lecturers: Kornel Labun, Eivind Valen, Håkon Tjeldnes, Adnan Niazi
Credits: 3 ECTS
Registration: HERE. NORBIS members will be prioritized. UPDATE: Please note the course is full, and new registrations are put on the waiting list.
Course description:
The R crash course is an introductory course and no previous knowledge about programming is necessary for participation. It extends over five days, from 9-16. You will learn how to get started with R, how to make small useful programs, how to analyze your data in reproducible ways, how to visualize your dataset with beautiful plots and how to take advantage of thousands of R packages. The course is biology-oriented, and will introduce you to the wonders of Bioconductor. We will also expect you to complete a few homework assignments to practice and hone your skills. The R crash course is adapted from Software Carpentry workshops, and we expect a high level of interaction and hands-on exercises.
Course material is available here: r-crash-course.github.io
Course program:
Day 1
- R + Rstudio installation (60 min)
- Interface of Rstudio (45 min)
- Describe the purpose and use of each pane in the RStudio IDE
- Locate buttons and options in the RStudio IDE
- Define a variable
- Assign data to a variable
- Manage a workspace in an interactive R session
- Use mathematical and comparison operators
- Call functions
- Manage packages
- Exercises (30 min)
- Create self-contained projects in RStudio (15 min)
- Exercises (30 min)
- Lunch Break (30 min)
- Seeking Help (20 min)
- To be able read R help files for functions and special operators.
- To be able to use CRAN task views to identify packages to solve a problem.
- To be able to seek help from your peers.
- Exercises (30 min)
- Data Structures (40 min)
- To be aware of the different types of data.
- To begin exploring data frames, and understand how they are related to vectors, factors and lists.
- To be able to ask questions from R about the type, class, and structure of an object.
- Exercises (120 min)
Day 2
- Exploring Data Frames (30 min)
- Be able to add and remove rows and columns.
- Be able to remove rows with NA values.
- Be able to append two data frames
- Be able to articulate what a factor is and how to convert between factor and character.
- Be able to find basic properties of a data frames including size, class or type of the columns, names, and first few rows.
- Exercises (30 min)
- Subsetting Data (30 min)
- To be able to subset vectors, factors, matrices, lists, and data frames
- To be able to extract individual and multiple elements: by index, by name, using comparison operations
- To be able to skip and remove elements from various data structures.
- Exercises (45 min)
- Control Flow (45 min)
- Write conditional statements with if() and else().
- Write and understand for() loops.
- Lunch break (30 min)
- Exercises (90 min)
- Creating Publication-Quality Graphics (60 min)
- To be able to use ggplot2 to generate publication quality graphics.
- To understand the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and coloring or panelling by groups.
- Exercises (60 min)
Day 3
- Vectorization (30 min)
- To understand vectorized operations in R.
- Exercises (30 min)
- Functions Explained (60 min)
- Define a function that takes arguments.
- Return a value from a function.
- Check argument conditions with stopifnot() in functions.
- Test a function.
- Set default values for function arguments.
- Explain why we should divide programs into small, single-purpose functions.
- Exercises (60 min)
- Lunch break (30 min)
- Writing Data (30 min)
- To be able to write out plots and data from R.
- Exercises (60 min)
- Split-Apply-Combine (30 min)
- To be able to use the split-apply-combine strategy for data analysis.
- Exercises (90 min)
Day 4
- Dataframe Manipulation with dplyr (60 min)
- To be able to use the six main dataframe manipulation ‘verbs’ with pipes in dplyr.
- To understand how group_by() and summarize() can be combined to summarize datasets.
- Be able to analyze a subset of data using logical filtering.
- Exercises (120 min)
- Lunch break (30 min)
- Dataframe Manipulation with tidyr (30 min)
- To understand the concepts of ‘long’ and ‘wide’ data formats and be able to convert between them with tidyr.
- Exercises (45 min)
- Short about style (15 min)
- Producing Reports With knitr (30 min)
- Value of reproducible reports
- Basics of Markdown
- R code chunks
- Chunk options
- Inline R code
- Other output formats
- Exercises (90 min)
Day 5
- Making use of Bioconductor (180 min)
- Installation of Bioconductor packages
- Understanding IRanges
- Understanding GRanges
- Loading up fastq files into R, ShortRead
- Aligning reads using Biostrings
- Plotting using ggbio
- Lunch break (30 min)
- Exercises (120 min)
- Writing Good Software (10 min)
- Specific packages students are interested in
- Explanation of homework assignment
Learning outcomes and competence
After the course students will become proficient programmers in R. Students will become familiar with several CRAN and Bioconductor R packages. Also, self-learning of new R packages and programming skills should become feasible as the steepest, beginner level knowledge gap will be filled with this course. Each student should be able to approach problems in an algorithmic fashion, deconstruct seemingly difficult issues into smaller steps that can be solved with the use of basic knowledge. Students will become able to conduct typical bioinformatics tasks with understanding of underlying principles. Students will also be able to transition their knowledge to other programming languages (eg. python, java) and may benefit from programming skills in their non-scientific life.
Prerequisites
– Bring your own laptop.
– Install R (can be downloaded here) and Rstudio (an environment for developing using R, download here).