Genealogies and Ancestral Recombination Graph

Time: April/May 2024

Location: Drøbak Research Station (Tollboden)

Workshop responsible: José Cerca (UiO) and Ole Kristian Tørresen (UiO)

Invited lecturers: Dr. Yan Wong (Oxford U, UK), Dr. Mark Ravinet (Nottingham U, UK), and Dr. Per Unneberg (SciLib Uppsala, Sweden)

 

Workshop description

The Genealogies and Ancestral Recombination Graph workshop (GARG’w) is a comprehensive 3-day program designed to introduce participants to the fascinating world of population genomics and how recombination events shape genomes. This hands-on workshop will focus on utilizing the tskit environment, a powerful tool for simulating and analyzing genomic data (whole-genome resequencing data), to explore and understand complex inheritance patterns within populations. Join us for this immersive 3-day workshop to gain valuable skills and insights into the world of genealogies and Ancestral Recombination Graphics using tskit. Unlock the secrets hidden in genetic data and discover the historical events that have shaped the genetic ancestry of species and populations. 

 

Workshop program

 

Day 1 – Morning: Introductions of the participants and their projects.  

Talk: Introduction – Pedigrees, Genealogy, Historical Background, Different ARG Definitions.

Provide an introduction to Ancestral Recombination Graphs (ARGs) and their significance in studying genealogical history within populations. Explain the relationship between ARGs and concepts such as pedigrees, genealogy, and genetic ancestry. Show how genetic data can be encoded on an ARG, neutrality, and the equivalence between site-based and branch-based population genetic statistics. Discuss the various definitions of ARGs and approaches to constructing and simplifying them, setting the stage for a deeper understanding in subsequent parts of the workshop.  

Practical: Intro to tskit (edge-labelled gARGs). Introduce the tskit environment, a powerful tool for handling and analyzing large-scale genetic data. Focus on edge-labeled gARGs, a graphical representation within tskit that facilitates efficient analysis of ARGs. Provide practical demonstrations to familiarize participants with using tskit for basic ARG analysis.  

 

Day 1 – Afternoon  

Talk & practical: simplification (as this is fundamental to forward simulation), collapsing of recombination nodes. Build a very simple Wright-Fisher simulator. Explain the importance of simplifying ARGs for forward simulation and more efficient analysis. Demonstrate techniques for collapsing recombination nodes to simplify ARGs without losing important genetic information. Engage participants in practical exercises to simplify genealogies using tskit. Talk: Basic coalescent theory, coalescent with recombination, SPRs, etc. Introduce participants to the fundamental principles of coalescent theory, a key concept in understanding genetic ancestry and population genetics. Explain how the coalescent process is influenced by recombination events, resulting in Subtree Prune and Regraft (SPR) operations in local trees. Introduce the SMC and SMC’ (sequential Markov coalescent) approximations. Discuss the implications of recombination on the genealogy of populations, preparing attendees for ARG analysis.  

Analysis: Single Site – Branch vs. Site Stats Focus on analyzing single-site data within ARGs, comparing branch statistics with site statistics. Discuss the insights that can be gained from examining genetic variation at specific loci within ARGs. Illustrate the significance of single-site analysis in understanding recombination patterns. Explore how haplotype-based approaches can contribute to ARG analysis.  

 

Day 2 – Morning: Simulating & Visualizing ARGs  

Introduce techniques for simulating genealogies in forwards and backwards time, and the principle of recapitation. Discuss how to simulate multiple chromosomes and the Wright-Fisher vs Hudson coalescent. Explain how to record full ARGs during simulation to capture comprehensive genetic ancestry information. Showcase methods to visualize simulated ARGs, helping participants grasp the patterns and complexity of genealogies.  

Talk & Practical: msprime (incl WF and Hudson models, multiple chromosomes, record_full_arg parameter, and likelihood calcs) Dive into the msprime library, a powerful tool for simulating data. Provide hands-on experience with running simulations, generating ARGs for multiple chromosomes, and recording full ARG information for analysis. Introduce likelihood calculations, enabling participants to assess the fit between observed data and simulated ARGs.  

 

Day 2 – Afternoon  

Dominance of Recombination Nodes if Unsimplified Discuss the significance of recombination nodes and how their dominance affects ARG analysis. Explain how unsimplified ARGs can impact forward simulations and the interpretation of genetic ancestry patterns. Stdpopsim Introduce Stdpopsim, a tool for generating standardized demographic models and simulating ARGs under different scenarios. Discuss the advantages of using standardized demographic models in population genetics research. SLiM Explore the use of SLiM (Selection Linked to Individual-based Models) in simulating complex evolutionary scenarios, including selection and demographic events. Highlight the versatility of SLiM in capturing various aspects of population genetics.  

 

Day 3 – Morning: Inferring ARGs Present different approaches for inferring ARGs from genetic data, emphasizing their strengths and limitations. Discuss popular tools such as ARGweaver (MCMC), Relate (tree construction), ARGneedle (threading), and tsinfer (HMM matching). Focus on tsinfer, incl. mismatch & SGkit Focus on tsinfer as a powerful tool for inferring ARGs based on a Hidden Markov Model (HMM) matching approach. Demonstrate the use of tsinfer for handling mismatched datasets and its integration with SGkit for enhanced analysis.  

 

Day 3 – Afternoon: Tsdate Introduce Tsdate, a method for dating ancestral recombination events within ARGs. Explain how Tsdate can provide valuable insights into the timing of recombination events in evolutionary history. Playtime with Real Datasets Provide participants with real genetic datasets to work with, applying the techniques learned throughout the workshop. Encourage exploration and analysis of ARGs from diverse populations and species to gain practical experience.  

 

Learning outcomes and competence

By the end of this 3-day workshop, participants will have gained valuable skills and knowledge in working with Ancestral Recombination Graphics using the tskit environment. They will be equipped to analyze, simulate, and interpret ARGs, contributing to their understanding of genetic ancestry and recombination events within populations. 

 

Prerequisites 

Familiarity with basic genetic concepts (alleles, mutations, genetic variation, etc.). Basic knowledge of Python programming (though not mandatory, it will be helpful). Participants are required to bring their laptops.