Introduction

Overview

Teaching: 20 min
Exercises: 10 min

Questions

Where does the dataset come from?

How to login

Where are the files located

Objectives

Understand the data

Choose login details

Familiarize yourself with the environment

Introduction

We will be making use of the files on the @gemini.science.uu.nl server

Dataset

Introduction to the dataset is given in the presentation. In total we will be analyzing 62 genomes, of which one is a closed reference genome (OXC141).

Follow the instructions at as before. The server we will be using has host address gemini.science.uu.nl. Please login using SSH (e.g. putty/mobaxterm/ssh) using your solisid/password (ssh solisid@gemini.science.uu.nl )

Where are the files located

In your home folder (~/), you may find different files. It is your own responsibility to take care of your files. We will create the folders you will be using and download the read files that are part of this study. As assembling of all the genomes in this study would be too time consuming, we will assembling only two genomes per person. We will combine the outputs of each person later on. Familiarize yourself a bit with the terminal in Cocalc.

Getting the Illumina read files

First we need to make an appropriate folder for your read files. In the example we will be making use of the folder called “reads”

$ cd ~
$ mkdir reads
$ ls

You will see you have created the folder reads. Next we need to get the appropriate files from the server. Go to the website klif.uu.nl/klif/mgen/reads and download the appropriate files. You will need two files for each sample, the forward and reverse file. In the example I have picked the top two, but please take a look at the the Google Sheets table and write your name in the appropriate field to find out which two samples are assigned to you. You will need to get four files, as each sample has a forward and reverse read file.

$ cd ~/reads
$ wget https://klif.uu.nl/klif/mgen/reads/ERR326690_1.fastq.gz
$ wget https://klif.uu.nl/klif/mgen/reads/ERR326690_2.fastq.gz
$ wget https://klif.uu.nl/klif/mgen/reads/ERR326694_1.fastq.gz
$ wget https://klif.uu.nl/klif/mgen/reads/ERR326694_2.fastq.gz
$ ls

In the above example we have downloaded the read files. Why are there two files per sample? Please continue on with the next part of the course which is a lecture on sequence quality of read files.

Key Points

Sequencing S. pneumoniae patient isolates to determine assocations of bacterial genes with disease severity

previous episode

Microbial Genomics course 2022

next episode

Introduction

Overview

Introduction

Dataset

Where are the files located

Getting the Illumina read files

Key Points

previous episode

next episode