Epigenetics3: Histone Modification and ChIP-seq
Summary
TLDRThis course delves into epigenetics, focusing on how histone modifications regulate gene expression without altering DNA sequences. It explains the role of histones in chromatin structure and the impact of various post-translational modifications on gene transcription. The script introduces Chromatin Immunoprecipitation (ChIP) and its sequencing (ChIP-seq) as tools for studying these modifications genome-wide. It also covers the analytical challenges and methods for processing and interpreting ChIP-seq data, using Engelmann syndrome as a case study to illustrate the practical application of these techniques in understanding neurodegenerative diseases.
Takeaways
- 🧬 Epigenetics is the study of gene expression changes not encoded in the DNA sequence, including DNA methylation, histone modification, and non-coding RNA activity.
- 🌟 Histone modification is a significant part of epigenetic regulation, affecting how DNA is packed around histones and influencing gene transcription.
- 🔬 Nucleosomes, made up of histones, are the basic repeating units of chromatin, with DNA wrapped around core histones H3, H4, H2A, and H2B.
- 📐 Histone modifications can alter DNA packing density and include various types such as acetylation, phosphorylation, ubiquitination, and methylation.
- 🔑 Histone modifications can either activate or repress transcription, with certain modifications linked to active transcription and others to gene silencing.
- 🔍 Chromatin Immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) is a powerful tool for identifying genome-wide profiles of histone modifications and transcription factor binding sites.
- 🧲 The specificity of antibodies used in ChIP is crucial for high-quality data, as they bind to specific histone modifications or proteins attached to DNA.
- 📈 ChIP-seq data analysis involves identifying enriched genome fragments, which can indicate transcription factor binding sites, chromatin remodeling, or gene transcription events.
- 🛠️ Accurate peak detection in ChIP-seq is challenging and involves comparing signal enrichment against a control to determine the significance of binding sites.
- 🧬 Histone modifications are associated with specific widths and profiles of peaks in ChIP-seq data, which can be studied in the context of diseases like Angelman syndrome.
- 📚 The course encourages further exploration of epigenomics data analysis, suggesting deeper dives into projects related to Angelman syndrome and asthma for practical experience.
Q & A
What is the main focus of the course 'Epigenetics Three: Histone Modification and Chromatin Immunoprecipitation'?
-The course focuses on understanding how gene expression is controlled by histones and how histone modifications can be studied using specialized protocols like chromatin immunoprecipitation (ChIP).
What are the mechanisms included in epigenetic regulation mentioned in the script?
-Epigenetic regulation includes DNA methylation, histone modification, the activity of non-coding RNAs such as microRNAs, and the regulatory function of non-coding repeating regions found in the DNA.
What is the significance of histones in epigenetic regulation?
-Histones are proteins around which double-stranded DNA is wrapped. Their modification plays a major role in gene expression by influencing how densely the histones are grouped together or spread apart, affecting the accessibility of the DNA and thus gene transcription.
What is a nucleosome, and what is its composition?
-A nucleosome is the basic repeating unit of chromatin where 146 base pairs of DNA are wrapped around an octamer of core histones, consisting of pairs of H3, H4, H2A, and H2B. The N-terminal tails of these histones protrude out and are subject to various post-translational modifications.
What types of post-translational modifications can histones undergo?
-Histones can undergo modifications such as acetylation, phosphorylation, ubiquitination, and methylation of lysine and arginine.
What is the role of histone modifications in gene transcription?
-Histone modifications can either activate or repress gene transcription. For example, acetylation was the first modification linked with active transcription, while some histone methylation events have been associated with transcription activation or gene silencing.
What is the purpose of Chromatin Immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq)?
-ChIP-seq is a powerful tool used to identify genome-wide profiles of transcription factor binding sites, histone modifications, and nucleosome positioning, providing insights into the regulatory mechanisms of gene expression.
How does the specificity of an antibody impact ChIP-seq data quality?
-The specificity of the antibody, whether it is monoclonal or polyclonal and the organism it is specific for, is crucial in generating high-quality ChIP-seq data, as it ensures that the correct protein-DNA complexes are selected for sequencing.
What is the main challenge in analyzing ChIP-seq data?
-The main challenge in ChIP-seq data analysis is accurately detecting enriched genome fragments, known as peaks, which represent regions where the protein of interest is bound to the DNA. This requires proper peak calling and accounting for sequencing and mapping errors.
How does the script relate ChIP-seq analysis to the study of diseases like Angelman syndrome?
-The script uses Angelman syndrome as an example to illustrate how ChIP-seq analysis can help understand the epigenetic deregulation responsible for the disorder, by studying histone modifications and their impact on gene expression in affected individuals.
What are some of the techniques used in the ChIP-seq workflow for peak detection and peak shift?
-Techniques such as MACS (Model-based Analysis of ChIP-Seq), BinH's peak calling method, and the Penn algorithm are used in the ChIP-seq workflow for accurate peak detection and determining the position of protein-DNA binding sites.
How can researchers use ChIP-seq data to study the regulatory mechanisms of specific genes?
-Researchers can analyze the enrichment of reads in specific regions of the genome, which can correspond to genes of interest. By comparing ChIP-seq profiles of different samples, they can identify differential profiles of histone modifications and understand their regulatory roles in specific conditions or diseases.
Outlines
🧬 Epigenetics and Histone Modification Overview
This paragraph introduces the concept of epigenetics, focusing on how gene expression is controlled by histones and their modifications. It explains that epigenetics involves mechanisms that alter gene expression without changing the DNA sequence, including DNA methylation, histone modification, and non-coding RNA activity. The paragraph delves into the role of histones as proteins around which DNA is wrapped, forming nucleosomes. It describes how histone modifications, such as acetylation and methylation, can affect the compaction of DNA and, consequently, gene expression. The importance of specific histone modifications in transcription activation or gene silencing is highlighted, emphasizing the complexity of this regulatory machinery.
🔬 Chromatin Immunoprecipitation (ChIP) and Sequencing
This section discusses the technique of Chromatin Immunoprecipitation (ChIP) followed by high-throughput sequencing, known as ChIP-seq, as a method to study histone modifications genome-wide. It details the ChIP protocol, starting with cross-linking to bind proteins to DNA, followed by DNA fragmentation and the use of antibodies to extract DNA fragments with proteins of interest. The paragraph explains the process of sequencing these fragments and analyzing the data to identify binding sites of the proteins. It also touches on the importance of antibody specificity and the challenges of peak detection and analysis in ChIP-seq data.
🧠 Engelmann Syndrome and ChIP-seq Analysis
The third paragraph focuses on applying ChIP-seq analysis to study Engelmann syndrome, a neurodegenerative disease linked to epigenetic deregulation. It describes the process of preparing and analyzing ChIP-seq data, including the use of control samples to identify differential profiles of histone modifications. The paragraph provides a hands-on approach to analyzing ChIP-seq data, mentioning the use of specific software and algorithms for alignment, peak detection, and segmentation. It also discusses the significance of identifying enriched regions and the potential insights these can provide into the epigenetic regulation of diseases like Engelmann syndrome.
📊 Analyzing ChIP-seq Data and Interpreting Results
The final paragraph discusses the analysis of ChIP-seq data, emphasizing the importance of accurate peak detection and the interpretation of results. It describes a simple pipeline for ChIP-seq data analysis, including alignment, segmentation, and the use of various algorithms to identify enriched regions. The paragraph also mentions the use of gene ontology terms for enrichment analysis to understand the biological significance of the identified regions. It concludes with an invitation to explore the data further and provides resources for deeper study, including guides for accessing and analyzing multi-omics data.
Mindmap
Keywords
💡Epigenetics
💡Histone Modification
💡Chromatin Immunoprecipitation (ChIP)
💡Nucleosome
💡Post-translational Modifications
💡Transcription
💡ChIP-Seq
💡Peak Calling
💡Engelmann Syndrome
💡Gene Ontology (GO) Enrichment
💡KLHL17
Highlights
Epigenetics involves mechanisms that alter gene expression without changing the DNA sequence, including DNA methylation, histone modification, and non-coding RNA activity.
Histone modification is a key aspect of epigenetic regulation, influencing how DNA is packed around histones and affecting gene transcription.
Nucleosomes, composed of histones, are the basic repeating units of chromatin, with DNA wrapped around them.
Post-translational modifications of histone tails, such as acetylation and methylation, can either activate or repress gene transcription.
Histone modifications can have a combinatorial effect on processes like transcription, DNA repair, and apoptosis.
Chromatin Immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-seq) is a powerful tool for identifying genome-wide profiles of histone modifications.
The specificity of antibodies used in ChIP is crucial for generating high-quality data, with factors like monoclonal vs. polyclonal and organism specificity being important.
ChIP-seq data analysis involves cleaning raw reads, aligning them to the reference genome, and identifying enriched regions indicative of protein binding.
Peak calling in ChIP-seq experiments is essential for accurately identifying binding sites, with algorithms like MACS used for this purpose.
Engelmann syndrome, a neurodegenerative disorder, is linked to epigenetic deregulation affecting neuron differentiation.
Hands-on analysis of ChIP-seq data can help understand the epigenetic mechanisms behind diseases like Engelmann syndrome.
Different histone modifications are associated with specific widths and profiles of peaks in ChIP-seq data, influencing gene expression levels.
Gene ontology terms can be used to assess the significance of enriched regions identified in ChIP-seq analysis.
KLHL17, a gene involved in neural development, was identified through ChIP-seq as having enrichment of reads, indicating its potential role in neuronal function.
ChIP-seq analysis can reveal differential profiles of histone modifications, aiding in understanding condition-specific epigenetic regulation.
The course provides practical guides for further exploration of epigenomics data analysis, including projects on Engelmann syndrome and asthma.
Data integration from multiple sources, such as ChIP-seq, methyl-seq, and RNA-seq, is crucial for a comprehensive understanding of epigenetic regulation.
Transcripts
welcome to epigenetics three histone
modification and chromatin
immunoprecipitation in this course we
will explore how gene expression is
controlled by histones and how histone
modification can be studied using
specialized protocols like chromatin
immunoprecipitation we will also discuss
analytical approaches to study genome
wide histone modification data or chip
seek and specific challenges these
methods are designed to address before
we jump in let's first do a quick
overview of epigenetic regulation and
discuss the significance of histone
modification as you remember from
previous courses epigenetics is the
study of mechanisms that cause changes
in gene expression but that are not
encoded in the DNA sequence itself these
mechanisms include DNA methylation
histone modification activity of
non-coding RNAs such as micro RNAs and
the regulatory function of non-coding
repeating regions found in the DNA
histone modification is a major aspect
of epigenetic regulation histones are
proteins that the double-stranded DNA is
wrapped around on these images you can
see how the DNA string has small bumps
or beads on it these visible beads are
nucleosomes that are made up of histones
when these beads were originally
discovered scientists confused them with
jeans until was eventually proven that
the string like structure is wrapped
around and not to beat themselves
actually contain the genetic code
histones can be either grouped together
forming lumps of DNA or as we can see in
the picture on the right more relaxed
and spread out it turns out that the
modification of histones plays a major
role in how densely the histones are
grouped together or spread apart
moreover additional modifications of
histone tails can have other effects on
gene transcription one important aspect
of histones is that they can be changed
to alter how much packing the DNA is
capable of there are several key
modifications that take place as a
result of changes to groups of atoms
that are at the ends of histones these
changes can be of several types and will
have a positive or negative charge that
either attract them closer together or
force them apart as a result the DNA
region that is wrapped around the head
stones can be more or less accessible
causing variation in gene expression the
nucleosome is considered to be the basic
repeating unit of chromatin in which 146
base pairs of DNA are wrapped around an
octamer of core histones consisting of
pairs of h3 h4 h2a and h2b n terminal
tails of these systems protrude out of
the nucleosome and are subject to a
variety of post translational
modifications such as acetylation
phosphorylation ubiquitination
and lysine and arginine methylation a
set elation was the first of these
modifications to be linked with active
transcription and subsequently
phosphorylation of histone h3 was found
to cooperate with a substation in
transcriptional activation some histone
methylation events have also been
associated with transcription activation
and others with gene silencing histone
modifications can function either
individually or combinatorially to
govern such processes as transcription
replication DNA repair and apoptosis it
has been proposed that methyl group of
atoms increases packing and a central
group decreases packing also phosphoryl
group can be attached to the histones
and causes a decrease in packing these
modifications and their impact on
transcription and other processes are
being actively studied and as our
understanding grows researchers
appreciate how complex this machinery
works in real life as we mentioned
histones are organized in october's
forming nucleosomes the four histones
are called h-2a h-2b h3 and h4 DNA is
wrapped around with structure of four
times with histone h1 holding everything
together the most studied modification
of histones are methylation in a set
elation of the core histone tails as you
can see modification to the tails of
histones is linked to the length of the
tail itself when a specific type of the
modification is studied it will be
referred to as h3k4me3 or h3k27 me3 the
name simply refers to the position on
histone tail and the type of
modification it is
here you can see a table of histone
modifications and their reported impact
on transcription for example you might
read about h3k4 methylation free signal
this means that h3 histone tail is being
modified by removal of methyl group
items different signals act as different
regulatory mechanisms and sometimes work
together in the complex patterns the
function of specific histone
modifications is an active field of
research and many of these modifications
are not yet known on the right you can
see how these modifications and appear
in chip seek data against the control
sample to understand this better let's
discuss chromatin immunoprecipitation
for studying histone modification
chromatin immunoprecipitation followed
by high-throughput sequencing for chip
seek is a powerful tool to identify
genome-wide profiles of transcription
factor binding sites histone
modifications and nucleosome positioning
to understand the way this data is
generated we have to look at the process
in greater detail and appreciate the
biological elements involved in various
steps of the chip seek protocol
chromatin immunoprecipitation uses
antibodies designed to bind to specific
proteins of interest for pris
these can be histones or other complexes
attached to the DNA first cross-linking
ensures the proteins are tightly bound
to DNA so that DNA shattering does not
remove them
DNA is shattered leaving fragments of
DNA with or without protein of interest
on them special antibodies bind to
fragments of DNA with proteins of
interest the antibodies are typically
attached to magnetic beads so that it
can be used to extract fragments of DNA
with protein antibody complexes the
complexes can then be removed and
libraries for sequencing are prepared
the sequence reads will contain tags
that are used to denote the position
next to where the protein was bound to
DNA as a result peaks of reads will
accumulate next to positions of DNA
bound to protein in the analysis that
these positions are identified Peaks
quantified and position of the protein
binding site is determined accumulation
of reads with tags is analyzed for
accurate detection of binding sites to
detect accumulation properly a second
library called I DNA is prepared as a
control abundance of reads vs. control
provides assessment of abundant
significance then reads align to
positive and negative strands are
analyzed for overlap helping select the
exact position of PRI binding sites a
key step for chromatin
immunoprecipitation is the design of an
antibody that binds to a specific
protein for example you might have a kid
for histone h3 or even more specific to
h3k27 or h3k4 specificity of the
antibody whether it is monoclonal or
polyclonal and the organism it is
specific for are all key factors in
generating high quality data the whole
process of data preparation looks like
this proteins of interest like histones
are bound to DNA DNA is fragmented
resulting in DNA fragments that have
protein of interest on them and others
that do not they might have other
proteins on them or nothing at all only
DNA with proteins of interest bound to
them or event
they're selected with specially designed
antibodies the DNA is then separated
from the protein and the DNA fragments
sequenced as a result reads that were in
the region bound to the protein of
interest can be aligned to the reference
genome and regions with abundance of
reads are analyzed when we deal with
chip sig data we are analyzing sequences
of short reads that came from DNA
fragments that were attached to protein
of interest we selected after we
sequence the raw reads they need to be
cleaned from adapter sequences and PCR
duplicates then - they're cleaned reads
are aligned to the positive and negative
strands of the DNA separately this is
done because when the DNA is wrapped
around the protein of interest and
fragmented there will be a slight shift
to the left or right left from the
positive strand and right from the
negative string that needs to be
accounted for then the challenge is to
accurately identify Peaks what is
commonly referred to as peak Collin peak
calling is performed on the two strands
separately which leads to the next step
determining the position of the p/y
Center by shifting the positive and
negative strand reads and identifying
the intersecting segments finally the
intersecting segments are annotated and
marked as positions where the DNA was
wrapped around a py the major objective
of chip seek analysis to detect genome
fragments that are enriched with
upregulated signals these fragments
could be transcription factor binding
sites chromatin remodeling or gene
transcription events the main
algorithmic challenge is to accurately
detect enriched genome fragments
generating a whole genome landscape for
the signal to do that let's review
several techniques related to peak
detection and peak shift that are part
of the chip seek workflow probably the
most discussed issue and chip seek
experiments is the best method to find
true peaks in the data and precisely
identify a binding site a peak is a site
where multiple reads have been mapped to
and produce a pilot to accurately assess
the pilot significance a control library
is often used also referred to as I DNA
chip sequencing is most often performed
with single and reads and chip fragments
per sequence from the five prime ends
only this creates two disty
pekes one on each strand with the
binding site fallen in the middle of
these Peaks the distance from the middle
of the peaks to the binding site is
often referred to as the shift as the
DNA is wrapped around the histones and
can be more or less accessible various
portions of the DNA are affected these
include the promoter region
transcription start sites and the
introns and exons of genes other
regulatory regions like enhancers and
micro rna's are encoded in introns or in
proximity to coding regions each one of
these elements can have a significant
effect on the level of gene expression
alternative splicing as well as other
downstream regulation of pathways the
size of the region and its location are
both important
researchers are reporting that different
histone modifications are associated
with specific widths and profiles of
Peaks that can be detected in chip seek
data often times specific regions like
enhancers and transcription factors or
micro RNAs are being studied in cancer
neurodegenerative diseases and other
conditions researchers are studying one
or multiple types of modifications and
the elements they regulate by comparing
chip seek profiles of groups of samples
accurate identification of differential
profiles of histone modification can
lead to the understanding of epigenetic
regulation specific to a condition now
that we understand the way chip seek
data is prepared and analyzed let us try
to perform an analysis ourselves in this
hands-on section we will build a simple
pipeline to analyze the data and discuss
methods that were used in pipeline
results Before we jump in to building
our pipeline let's review the options we
have for analysis the two options are
for analysis of Peaks in a given sample
or for detecting differences between two
conditions in our case we will use the I
DNA option to identify areas that have
histone modifications by comparing a
condition versus control the condition
we will study is Engelmann syndrome an
inherited neurodegenerative disease
Engelmann syndrome also known as a s is
a neurogenic disorder caused by deletion
of the maternally inherited ube 3a
allele and is characterized by the
development delay
intellectual disability ataxia seizures
and a happy effect it is believed to be
caused by epigenetic deregulation of a
region that is responsible for proper
neuron differentiation we will use a
pair of samples from the study to try a
hands-on analysis of chip seek data the
full article is available here let's
navigate to server dot T - bio info and
open the areas of analysis here you will
see a dedicated section for analysis of
chip on chip and chip seek data when you
are in select single and reads as e and
Homo sapiens grch38 for reference genome
we already have to fast queue files in
SRA format archive of sequence reads
uploaded to the server so we don't have
to wait for lengthy uploads over various
connection speeds SR are five nine nine
zero eight eight five is a healthy
control sample and SR are five nine nine
zero eight eight two is a cell line from
a patient with angleman syndrome we can
place the healthy and sample group and
sick and I DNA essentially we will get a
contrast between the two so it does not
matter which one goes where our first
pipeline will be very simple for now
let's skip the pre-processing steps and
discuss two central processing steps
alignment and segmentation alignment or
mapping as it is also called can be done
with several methods the main goal is to
align reads to the reference genome we
selected Homo sapiens grch38 both i2g is
a newer version of the bowtie algorithm
that was published in 2012 the second
main step in the analysis of map reads
is accurate detection of the rich
regions for this purpose we will use
Isaac a method that uses a Bayesian
hidden Ising model for chip seek data to
ignore falsely enriched regions caused
by sequencing and/or mapping errors
another more complete pipeline is shown
here the main change is with including
PCR clean to remove PCR duplicates and
using multiple methods for segmentation
these include Binh s a segmentation
method developed by the Taliban from
Alex
research center and max to the popular
method for analysis of chip seek data
penis algorithm is based on a method
developed by broadsky Adele in 2010
called the binary search approach to
whole genome data analysis this dividing
conquer type algorithm detects a series
of non-interest in fragments of various
lengths with locally optimal scores
original version of blacks - was
published by Ying chieh and Cao Lu Lu
from the lab of extra Li Lu at the
dana-farber Cancer Institute Boston here
we will use version two developed and
maintained by taolu as a result of the
second pipeline we can use mapping
statistics as well as a table of each
chromosome
let's select from some one to see what
the output looks like in the output
table you will see a number of columns
corresponding to outputs by each method
bin s I seek and max - the most reliable
results will be when we each method
gives an enrichment score significantly
greater than zero for example these
positions highlighted between nine
hundred sixty one thousand two hundred
and 960 1800 on chromosome one since the
output is already annotated we can see
this segment corresponds with NS G zero
zero zero zero zero one eight seven nine
six one this is KLH seventeen catch like
family member 17 let's look at this gene
in greater detail the protein encoded by
this gene is expressed in neurons of
most regions of the brain it contains an
n-terminal BTB domain which mediates
dimerization of the protein na seed
terminal Kelch domain which mediates
binding to f-actin this protein may play
a key role in the regulation of actin
based neuronal function KHL 17 has been
reported to be involved in neural
development for example in their study
of infantile spasms I assess pack yakov
ski at I'll link a leech l7 theme to
expansion of epilepsy phenotypes as you
can see in this chart where enrichment
scores from I seek
were plotted KHL 17 is only one of the
genes identified by enrichment of reeds
as you remember these reeds mark a
protein of interest attached to the DNA
reading deeper into the method section
of the paper we can find what antibody
they used for preparing their chip seek
data the protein of interest what was
h3k4me3 if we plot the scores on the
scatterplot with y-axis and the position
on the chromosome we will see that KHL
17 is at the start of the chromosome one
of the first and rich regions identified
by all three algorithms but that is not
the only region that we can see highly
enriched in this type of methylation of
h3 histone we also see another region
around the 150 million position that
stands out one of the ways we can gauge
the significance of the whole region is
to use David for enrichment with gene
ontology terms now that you have this
data you are welcome to play around with
these enriched regions to see what you
can discover thank you for taking this
course and we hope that you enjoyed
learning about epigenomics data analysis
if you want to take things further
we recommend diving deeper into the
Engelmann project or this other project
focused on asthma to help you get
started with these datasets we've
prepared guides on accessing all of the
data that include chip seek my sulphate
seek and RNA sequence that's in the
project you will find data guides and
explanations of methods you can rely on
analyzing and integrating this multi
onyx data
Посмотреть больше похожих видео
Chromatin structure, gene regulation, and epigenomic mapping assays
Control of Gene Expression | Transcription Factors, Enhancers, Promotor, Acetylation vs Methylation
MK Biologi Molekuler - Regulasi Ekspresi Gen Eukariota
Regulation of Gene Expression: Operons, Epigenetics, and Transcription Factors
CENTRAL DOGMA: FROM DNA TO PROTEINS 🧬💡
Post Translational Modifications
5.0 / 5 (0 votes)