Summary and Schedule
RNA sequencing (RNA-seq) has revolutionized the field of genomics, enabling researchers to gain insights into gene expression, transcriptome dynamics, and molecular pathways. Bioconductor is an open-source software project that provides a rich set of tools for analyzing high-throughput genomic data, including RNA-seq data. This Carpentries-style workshop is designed to equip participants with the essential skills and knowledge needed to analyze RNA-seq data using the Bioconductor ecosystem. Throughout this workshop, you will delve into key concepts, including data preprocessing, quality control, differential gene expression analysis, visualization of results, and gene set analysis.
Prerequisites
- Familiarity with Linux command-line environment. Preferably, you have completed the link to hpc workshop on Unix goes here workshop.
- Familiarity with R/Bioconductor. Preferably, you have completed the link to hpc workshop on R goes here workshop.
- Familiarity with statistical hypothesis testing, such as Chapter 6 of Modern Statistics for Modern Biology book by Holmes and Huber.
- Familiarity with the biology of gene expression and RNA-seq, such as RNA sequencing: the teenage years manuscript by Hadfield et.al.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to RNA-seq |
What are the different choices to consider when planning an RNA-seq
experiment? How does one process the raw fastq files to generate a table with read counts per gene and sample? Where does one find information about annotated genes for a given organism? What are the typical steps in an RNA-seq analysis? |
Duration: 01h 40m | 2. Downloading and organizing files |
How do we process raw RNAseq reads to generate count data? What are the key steps involved in quality control, alignment, and quantification of RNAseq data? How do different tools and parameters affect the outcome of RNAseq data processing? Why is it important to ensure accurate and consistent read mapping and quantification? |
Duration: 02h 15m | 3. Mapping reads to a reference genome |
How can we map raw reads to a reference genome? What programs are available for mapping reads to a reference genome? What opitons and parameters need to be considered when mapping? How do to access the mapping results? |
Duration: 03h 00m | 4. Quantifying Gene Expression |
How do we quantify gene expression from aligned reads? What tools can we use to count reads mapped to genes or other genomic features? What options do we need to consider for accurate read counting? How do we interpret read count outputs for downstream analyses? |
Duration: 03h 35m | 5. RStudio Project and Experimental Data |
How do you use RStudio project to manage your analysis project? What is the most effective way to organize directories for an analysis project? How to download a dataset from the internet and save it as a file. |
Duration: 04h 05m | 6. Importing and annotating quantified data into R |
How can one import quantified gene expression data into an object
suitable for downstream statistical analysis in R? What types of gene identifiers are typically used, and how are mappings between them done? :::::::::::::::::::::::::::::::::::::::::::::::::: |
Duration: 06h 05m | 7. Exploratory analysis and quality control |
Why is exploratory analysis an essential part of an RNA-seq
analysis? How should one preprocess the raw count matrix for exploratory analysis? Are two dimensions sufficient to represent your data? |
Duration: 09h 05m | 8. Differential expression analysis |
What are the steps performed in a typical differential expression
analysis? How does one interpret the output of DESeq2? |
Duration: 10h 50m | 9. Extra exploration of design matrices | How can one translate biological questions and comparisons to statistical terms suitable for use with RNA-seq analysis packages? |
Duration: 11h 50m | 10. Gene set enrichment analysis |
What is the aim of performing gene set enrichment analysis? What is the method of over-representation analysis? What are the commonly-used gene set databases? |
Duration: 13h 35m | 11. Next steps |
How to go further from here? What other types of analyses can be done with RNA-seq data? |
Duration: 13h 55m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Ensure that you have the most recent versions of R and RStudio installed on your computer. For detailed instructions on how to do this, you can refer to the section “If you already have R and RStudio installed” in the Introduction to R episode of the Introduction to data analysis with R and Bioconductor lesson.
Additionally, you will also need to install the following packages that will be used throughout the lesson.
R
install.packages(c("BiocManager", "remotes"))
BiocManager::install(c("tidyverse", "SummarizedExperiment",
"ExploreModelMatrix", "AnnotationDbi", "org.Hs.eg.db",
"org.Mm.eg.db", "csoneson/ConfoundingExplorer",
"DESeq2", "vsn", "ComplexHeatmap", "hgu95av2.db",
"RColorBrewer", "hexbin", "cowplot", "iSEE",
"clusterProfiler", "enrichplot", "kableExtra",
"msigdbr", "gplots", "ggplot2", "simplifyEnrichment",
"apeglm", "microbenchmark", "Biostrings",
"SingleCellExperiment"))
If you are attending a workshop, please complete all of the above before the workshop. Should you need help, an instructor will be available 30 minutes before the workshop commences to assist.