This course is a fork (copy) of the Genomics Workshop from Data Carpentry which is adjusted for the Microbial Genomics course at Utrecht University. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for genomics research including:

best practices for organization of bioinformatics projects and data,
use of command-line utilities,
use of command-line tools to analyze sequence quality and perform variant calling,
and connecting to and using cloud computing.

Getting Started

This lesson assumes that learners have no prior experience with the tools covered in the workshop.
However, learners are expected to have some familiarity with biological concepts, including the concept of genomic variation within a population. Participants should bring their own laptops and plan to participate actively.

Data

This workshop uses data from a long term evolution experiment published in 2016: Tempo and mode of genome evolution in a 50,000-generation experiment by Tenaillon O, Barrick JE, Ribeck N, Deatherage DE, Blanchard JL, Dasgupta A, Wu GC, Wielgoss S, Cruveiller S, Médigue C, Schneider D, and Lenski RE. (doi: 10.1038/nature18959)

All of the data used in this workshop can be downloaded from Figshare. However to make things more convenient and limit the time to download large files, all data is provided locally in your cocalc personal home directory as well.
More information about this data is available on the Data page.

Workshop Overview

Monday 11 Nov 2024 (day 1)

Room BOL-1.128 (morning) & BOL-1.138 (afternoon)

Time	Lesson	Overview
09:00-09:45	Introduction to course	[ Michael Seidl ]
09:45-10:15	Introduction data carpentry days 1-2	[ Alex Bossers / Tim Dallman / Julian Paganini / Aldert Zomer]
10:15-12:45	Introduction command line (1-4) + code along	Students: Learn to navigate your file system, create, copy, move, and remove files and directories, and automate repetitive tasks using scripts and wildcards. [Julian/Alex]
12:45-13:15	Break	Move to other room!
13:15-13:45	Sequence data formats and QC	[Alex/Julian]
13:45-14:45	Introduction command line (5-6)	Students: Continue command line practicals or start Project organisation
14:45-16:00	Project organization and management	Students: Learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database.
16:15-17:00	Self study	Students

Tuesday 12 Nov 2024 (day 2)

Room BOL-1.138 (morning) & BOL-1.075 (afternoon)

Time	Lesson	Overview
09:00-09:15	Checkin and recap day 1	[Alex/Julian]
09:15-10:00	Introduction to LTEE	[Julian]
10:00-12:45	Data wrangling and processing	Students: Use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation.
12:45-13:15	Break	Move to other room!
13:15-15:45	Data wrangling and processing	Students: Continue to the finish
15:45-16:00	Plenary status check, problems/difficulties to address?	All
16:15-17:00	Self study	Students

🔴 If all of the above is peanuts, move ahead and master your skills with the optional extra’s shown below!

Optional Additional Lessons (Extra self-study)

Lesson	Overview
Bash scripting exercises: grep	Learn grep and regular expressions with examples
Bash scripting exercises: sed	sed tutorial for advanced learners
Bash scripting exercises: awk	awk tutorial with many examples
Putting your grep/sed/awk knowledge to the test	Challenges for grep/sed/awk programmers. Requires a (free) user account at hackerrank
Intro to R and RStudio for Genomics	Use R to analyze and visualize between-sample variation. Please note that workshop materials for working with Genomics data in R are in “alpha” development. In these lessons they instruct to do a local install or go to Amazon for Rstudio. Alternative you can use a FREE instance in the cloud at https://rstudio.cloud/ (preferred option)
R for reproducible scientific analyses	Fundamentals in R and tidyverse
Genomics in the cloud (AWS)	Learn how to work with Amazon AWS cloud computing and how to transfer data between your local computer and cloud resources (setting-up own cloud instance instructions). Only necessary when you want to run your own instance of this course on an AWS instance!

Teaching Platform

This workshop is as original as possible from the Data Carpentries workshop series however it was partly redesigned to initially be run at the CoCalc platform hosted at Utrecht University, but from edition 2024 onwards we moved back to a classical (full) command line interface at ssh/sftp gemini.science.uu.nl! (solisid required)
If you want to run your own instance of a similar server used for this workshop, follow the directions to generate an Amazon version as described on DataCarpentry genomics or alternatively try the Windows Subsystem for Linux (WSL Ubuntu) on MS windows.

Genomics Workshop Overview