How to read and normalize microarray data in R - RMA normalization | Bioinformatics 101

Bioinformagician

23 Feb 202216:44

Summary

TLDRIn this educational video, the host delves into the process of reading and performing RMA normalization on microarray data using R. The tutorial begins with an overview of DNA microarrays, their terminologies, and the workflow involved in gene expression analysis. The host then demonstrates how to retrieve data from NCBI GEO, process it, and apply RMA normalization. The video concludes with a guide on mapping probe IDs to gene symbols, providing viewers with a comprehensive understanding of microarray data analysis in R.

Takeaways

😀 The video is a tutorial on reading and performing RMA normalization on microarray data in R.
🔬 The focus is on DNA microarrays, which are used to measure gene expression levels by hybridizing fluorescently labeled target sequences to complementary probes on a solid surface.
📈 Microarray data normalization is crucial to correct for systematic biases and variations in signal intensities across the array.
📚 The video introduces terminologies associated with microarrays, like probes, solid support, and signal quantification methods.
🧬 The workflow for microarray data involves RNA extraction, labeling, hybridization, washing, and signal intensity readout.
💡 Higher signal intensity indicates higher gene expression, as more RNA fragments hybridize with the probes.
🛠️ The video demonstrates the use of three R packages: tidyverse for data manipulation, GEOquery for retrieving data from NCBI GEO, and affy for RMA normalization.
🔍 The tutorial uses a breast cancer dataset (GSE148537) to illustrate the process of fetching, processing, and normalizing microarray data.
📁 The raw data files are large and contain information extracted from the probes, necessitating normalization for meaningful analysis.
🔑 After normalization, the probes need to be mapped to gene symbols to make the expression data more interpretable.
🔗 The video concludes with the merging of normalized expression data with gene symbols and mentions that the script and additional resources will be shared.

Q & A

What is the main topic of the video?
-The main topic of the video is about how to read and perform RMA normalization on microarray data in R programming language.
What is a DNA microarray and how does it work?
-A DNA microarray is a collection of microscopic features that can be probed with target molecules. It works by allowing DNA sequences (probes) immobilized on a solid surface to hybridize with fluorescently labeled target sequences, such as mRNA from gene expression experiments. The signal intensity generated is then quantified to indicate gene expression levels.
What is the purpose of using an Epimetrics GeneChip?
-The Epimetrics GeneChip is used to genotype human samples and create gene expression profiles. It consists of a glass wafer with chips that contain millions of identical probes for hybridization with RNA fragments.
What are the steps in the microarray workflow mentioned in the video?
-The steps include RNA extraction and labeling with fluorescent tags, breaking down the labeled RNA into fragments, hybridization of the RNA with complementary probes on the array, washing off unbound RNA, and finally scanning the array to obtain signal intensities that indicate gene expression levels.
What is RMA normalization and why is it used?
-RMA (Robust Multi-array Average) normalization is a method used to correct for variability in microarray data. It is used to ensure accurate and comparable gene expression measurements across different samples.
Which R packages are used in the video to manipulate data and perform RMA normalization?
-The video mentions using three R packages: 'tidyverse' for data manipulation, 'geoquery' to retrieve data from NCBI GEO, and 'affy' which provides functions for RMA normalization.
What is the accession ID used in the video to fetch supplementary files from NCBI GEO?
-The accession ID used in the video is GSE148537, which is associated with a breast cancer dataset.
What are the types of files downloaded and used for the microarray data analysis in the video?
-The video discusses downloading 'dot.cell' files, which are raw data files generated by Epimetrics DNA microarray image analysis software, and contain information extracted from the probes.
How are the RMA normalized expression data extracted from the normalized data object?
-The RMA normalized expression data is extracted using the 'expression' function from the 'affy' package, which is then saved to a variable called 'normalizedExpression'.
What is the final step in the analysis after obtaining the RMA normalized expression data?
-The final step is to map the probe IDs to gene symbols using the 'featureData' function from the 'geoquery' package, which allows for a more informative comparison of gene expression across samples.
How can viewers access the script and additional resources mentioned in the video?
-The script and additional resources, including links to papers and the dataset used, will be available in the video description, with a direct link to the GitHub repository where the script is uploaded.