Epigenetics3: Histone Modification and ChIP-seq

OmicsLogic
1 Feb 201918:16

Summary

TLDRThis course delves into epigenetics, focusing on how histone modifications regulate gene expression without altering DNA sequences. It explains the role of histones in chromatin structure and the impact of various post-translational modifications on gene transcription. The script introduces Chromatin Immunoprecipitation (ChIP) and its sequencing (ChIP-seq) as tools for studying these modifications genome-wide. It also covers the analytical challenges and methods for processing and interpreting ChIP-seq data, using Engelmann syndrome as a case study to illustrate the practical application of these techniques in understanding neurodegenerative diseases.

Takeaways

  • 🧬 Epigenetics is the study of gene expression changes not encoded in the DNA sequence, including DNA methylation, histone modification, and non-coding RNA activity.
  • 🌟 Histone modification is a significant part of epigenetic regulation, affecting how DNA is packed around histones and influencing gene transcription.
  • 🔬 Nucleosomes, made up of histones, are the basic repeating units of chromatin, with DNA wrapped around core histones H3, H4, H2A, and H2B.
  • 📐 Histone modifications can alter DNA packing density and include various types such as acetylation, phosphorylation, ubiquitination, and methylation.
  • 🔑 Histone modifications can either activate or repress transcription, with certain modifications linked to active transcription and others to gene silencing.
  • 🔍 Chromatin Immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) is a powerful tool for identifying genome-wide profiles of histone modifications and transcription factor binding sites.
  • 🧲 The specificity of antibodies used in ChIP is crucial for high-quality data, as they bind to specific histone modifications or proteins attached to DNA.
  • 📈 ChIP-seq data analysis involves identifying enriched genome fragments, which can indicate transcription factor binding sites, chromatin remodeling, or gene transcription events.
  • 🛠️ Accurate peak detection in ChIP-seq is challenging and involves comparing signal enrichment against a control to determine the significance of binding sites.
  • 🧬 Histone modifications are associated with specific widths and profiles of peaks in ChIP-seq data, which can be studied in the context of diseases like Angelman syndrome.
  • 📚 The course encourages further exploration of epigenomics data analysis, suggesting deeper dives into projects related to Angelman syndrome and asthma for practical experience.

Q & A

  • What is the main focus of the course 'Epigenetics Three: Histone Modification and Chromatin Immunoprecipitation'?

    -The course focuses on understanding how gene expression is controlled by histones and how histone modifications can be studied using specialized protocols like chromatin immunoprecipitation (ChIP).

  • What are the mechanisms included in epigenetic regulation mentioned in the script?

    -Epigenetic regulation includes DNA methylation, histone modification, the activity of non-coding RNAs such as microRNAs, and the regulatory function of non-coding repeating regions found in the DNA.

  • What is the significance of histones in epigenetic regulation?

    -Histones are proteins around which double-stranded DNA is wrapped. Their modification plays a major role in gene expression by influencing how densely the histones are grouped together or spread apart, affecting the accessibility of the DNA and thus gene transcription.

  • What is a nucleosome, and what is its composition?

    -A nucleosome is the basic repeating unit of chromatin where 146 base pairs of DNA are wrapped around an octamer of core histones, consisting of pairs of H3, H4, H2A, and H2B. The N-terminal tails of these histones protrude out and are subject to various post-translational modifications.

  • What types of post-translational modifications can histones undergo?

    -Histones can undergo modifications such as acetylation, phosphorylation, ubiquitination, and methylation of lysine and arginine.

  • What is the role of histone modifications in gene transcription?

    -Histone modifications can either activate or repress gene transcription. For example, acetylation was the first modification linked with active transcription, while some histone methylation events have been associated with transcription activation or gene silencing.

  • What is the purpose of Chromatin Immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq)?

    -ChIP-seq is a powerful tool used to identify genome-wide profiles of transcription factor binding sites, histone modifications, and nucleosome positioning, providing insights into the regulatory mechanisms of gene expression.

  • How does the specificity of an antibody impact ChIP-seq data quality?

    -The specificity of the antibody, whether it is monoclonal or polyclonal and the organism it is specific for, is crucial in generating high-quality ChIP-seq data, as it ensures that the correct protein-DNA complexes are selected for sequencing.

  • What is the main challenge in analyzing ChIP-seq data?

    -The main challenge in ChIP-seq data analysis is accurately detecting enriched genome fragments, known as peaks, which represent regions where the protein of interest is bound to the DNA. This requires proper peak calling and accounting for sequencing and mapping errors.

  • How does the script relate ChIP-seq analysis to the study of diseases like Angelman syndrome?

    -The script uses Angelman syndrome as an example to illustrate how ChIP-seq analysis can help understand the epigenetic deregulation responsible for the disorder, by studying histone modifications and their impact on gene expression in affected individuals.

  • What are some of the techniques used in the ChIP-seq workflow for peak detection and peak shift?

    -Techniques such as MACS (Model-based Analysis of ChIP-Seq), BinH's peak calling method, and the Penn algorithm are used in the ChIP-seq workflow for accurate peak detection and determining the position of protein-DNA binding sites.

  • How can researchers use ChIP-seq data to study the regulatory mechanisms of specific genes?

    -Researchers can analyze the enrichment of reads in specific regions of the genome, which can correspond to genes of interest. By comparing ChIP-seq profiles of different samples, they can identify differential profiles of histone modifications and understand their regulatory roles in specific conditions or diseases.

Outlines

00:00

🧬 Epigenetics and Histone Modification Overview

This paragraph introduces the concept of epigenetics, focusing on how gene expression is controlled by histones and their modifications. It explains that epigenetics involves mechanisms that alter gene expression without changing the DNA sequence, including DNA methylation, histone modification, and non-coding RNA activity. The paragraph delves into the role of histones as proteins around which DNA is wrapped, forming nucleosomes. It describes how histone modifications, such as acetylation and methylation, can affect the compaction of DNA and, consequently, gene expression. The importance of specific histone modifications in transcription activation or gene silencing is highlighted, emphasizing the complexity of this regulatory machinery.

05:02

🔬 Chromatin Immunoprecipitation (ChIP) and Sequencing

This section discusses the technique of Chromatin Immunoprecipitation (ChIP) followed by high-throughput sequencing, known as ChIP-seq, as a method to study histone modifications genome-wide. It details the ChIP protocol, starting with cross-linking to bind proteins to DNA, followed by DNA fragmentation and the use of antibodies to extract DNA fragments with proteins of interest. The paragraph explains the process of sequencing these fragments and analyzing the data to identify binding sites of the proteins. It also touches on the importance of antibody specificity and the challenges of peak detection and analysis in ChIP-seq data.

10:03

🧠 Engelmann Syndrome and ChIP-seq Analysis

The third paragraph focuses on applying ChIP-seq analysis to study Engelmann syndrome, a neurodegenerative disease linked to epigenetic deregulation. It describes the process of preparing and analyzing ChIP-seq data, including the use of control samples to identify differential profiles of histone modifications. The paragraph provides a hands-on approach to analyzing ChIP-seq data, mentioning the use of specific software and algorithms for alignment, peak detection, and segmentation. It also discusses the significance of identifying enriched regions and the potential insights these can provide into the epigenetic regulation of diseases like Engelmann syndrome.

15:04

📊 Analyzing ChIP-seq Data and Interpreting Results

The final paragraph discusses the analysis of ChIP-seq data, emphasizing the importance of accurate peak detection and the interpretation of results. It describes a simple pipeline for ChIP-seq data analysis, including alignment, segmentation, and the use of various algorithms to identify enriched regions. The paragraph also mentions the use of gene ontology terms for enrichment analysis to understand the biological significance of the identified regions. It concludes with an invitation to explore the data further and provides resources for deeper study, including guides for accessing and analyzing multi-omics data.

Mindmap

Keywords

💡Epigenetics

Epigenetics is the study of changes in gene expression that do not involve alterations to the underlying DNA sequence. It is a key theme of the video, as it sets the stage for understanding how gene expression can be influenced by mechanisms other than genetic mutation. In the script, epigenetic regulation is discussed in the context of histone modification, DNA methylation, and the role of non-coding RNAs.

💡Histone Modification

Histone modification refers to chemical changes to histone proteins around which DNA is wound. These modifications play a crucial role in the regulation of gene expression by affecting how tightly or loosely the DNA is packaged. The script emphasizes the significance of histone modification as a major aspect of epigenetic regulation, with various types of modifications impacting gene transcription differently.

💡Chromatin Immunoprecipitation (ChIP)

ChIP is a technique used to study protein-DNA interactions, such as where a specific protein binds to DNA. It is central to the video's discussion on how histone modifications can be studied. The script describes the ChIP process in detail, including the use of antibodies to precipitate DNA fragments with specific histone modifications.

💡Nucleosome

A nucleosome is the basic unit of chromatin, consisting of a segment of DNA wound around a core of histone proteins. The script explains that the accessibility of DNA to transcription factors is influenced by how the nucleosomes are positioned and modified, which in turn affects gene expression.

💡Post-translational Modifications

Post-translational modifications are changes made to proteins after their translation from mRNA, such as acetylation, phosphorylation, and methylation. In the context of the video, these modifications occur on the N-terminal tails of histones and can influence the compaction state of chromatin and gene expression.

💡Transcription

Transcription is the process by which the genetic information in DNA is copied into RNA, specifically mRNA, which can then be translated into proteins. The video discusses how histone modifications can either activate or repress transcription, thereby controlling gene expression.

💡ChIP-Seq

ChIP-Seq is a method that combines chromatin immunoprecipitation with high-throughput DNA sequencing to identify the genome-wide distribution of proteins that bind to DNA. The script describes the ChIP-Seq process as a powerful tool for studying histone modifications and their impact on gene regulation.

💡Peak Calling

Peak calling in ChIP-Seq analysis refers to the identification of regions in the genome where there is a high concentration of reads, indicating where the protein of interest is bound to the DNA. The script discusses the importance of accurate peak calling for determining the exact positions of protein binding sites.

💡Engelmann Syndrome

Engelmann Syndrome, also known as AS, is a neurodegenerative disorder discussed in the script as an example of how epigenetic deregulation can lead to disease. The video uses this syndrome to illustrate the practical application of ChIP-Seq in understanding the epigenetic mechanisms behind specific conditions.

💡Gene Ontology (GO) Enrichment

GO enrichment is a method used to determine which biological processes, molecular functions, or cellular components are overrepresented among a list of genes. In the script, GO enrichment is mentioned as a way to gauge the significance of the regions identified through ChIP-Seq analysis, particularly in the context of Engelmann Syndrome.

💡KLHL17

KLHL17 is a gene mentioned in the script that has been implicated in neural development and is associated with actin-based neuronal function. The video uses KLHL17 as an example of a gene identified through ChIP-Seq analysis, demonstrating how such analysis can reveal important regulatory genes.

Highlights

Epigenetics involves mechanisms that alter gene expression without changing the DNA sequence, including DNA methylation, histone modification, and non-coding RNA activity.

Histone modification is a key aspect of epigenetic regulation, influencing how DNA is packed around histones and affecting gene transcription.

Nucleosomes, composed of histones, are the basic repeating units of chromatin, with DNA wrapped around them.

Post-translational modifications of histone tails, such as acetylation and methylation, can either activate or repress gene transcription.

Histone modifications can have a combinatorial effect on processes like transcription, DNA repair, and apoptosis.

Chromatin Immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-seq) is a powerful tool for identifying genome-wide profiles of histone modifications.

The specificity of antibodies used in ChIP is crucial for generating high-quality data, with factors like monoclonal vs. polyclonal and organism specificity being important.

ChIP-seq data analysis involves cleaning raw reads, aligning them to the reference genome, and identifying enriched regions indicative of protein binding.

Peak calling in ChIP-seq experiments is essential for accurately identifying binding sites, with algorithms like MACS used for this purpose.

Engelmann syndrome, a neurodegenerative disorder, is linked to epigenetic deregulation affecting neuron differentiation.

Hands-on analysis of ChIP-seq data can help understand the epigenetic mechanisms behind diseases like Engelmann syndrome.

Different histone modifications are associated with specific widths and profiles of peaks in ChIP-seq data, influencing gene expression levels.

Gene ontology terms can be used to assess the significance of enriched regions identified in ChIP-seq analysis.

KLHL17, a gene involved in neural development, was identified through ChIP-seq as having enrichment of reads, indicating its potential role in neuronal function.

ChIP-seq analysis can reveal differential profiles of histone modifications, aiding in understanding condition-specific epigenetic regulation.

The course provides practical guides for further exploration of epigenomics data analysis, including projects on Engelmann syndrome and asthma.

Data integration from multiple sources, such as ChIP-seq, methyl-seq, and RNA-seq, is crucial for a comprehensive understanding of epigenetic regulation.

Transcripts

play00:00

welcome to epigenetics three histone

play00:03

modification and chromatin

play00:05

immunoprecipitation in this course we

play00:08

will explore how gene expression is

play00:09

controlled by histones and how histone

play00:12

modification can be studied using

play00:13

specialized protocols like chromatin

play00:16

immunoprecipitation we will also discuss

play00:18

analytical approaches to study genome

play00:20

wide histone modification data or chip

play00:23

seek and specific challenges these

play00:25

methods are designed to address before

play00:27

we jump in let's first do a quick

play00:29

overview of epigenetic regulation and

play00:31

discuss the significance of histone

play00:33

modification as you remember from

play00:36

previous courses epigenetics is the

play00:38

study of mechanisms that cause changes

play00:40

in gene expression but that are not

play00:42

encoded in the DNA sequence itself these

play00:45

mechanisms include DNA methylation

play00:47

histone modification activity of

play00:49

non-coding RNAs such as micro RNAs and

play00:53

the regulatory function of non-coding

play00:55

repeating regions found in the DNA

play00:58

histone modification is a major aspect

play01:00

of epigenetic regulation histones are

play01:03

proteins that the double-stranded DNA is

play01:06

wrapped around on these images you can

play01:08

see how the DNA string has small bumps

play01:10

or beads on it these visible beads are

play01:13

nucleosomes that are made up of histones

play01:15

when these beads were originally

play01:17

discovered scientists confused them with

play01:19

jeans until was eventually proven that

play01:21

the string like structure is wrapped

play01:23

around and not to beat themselves

play01:25

actually contain the genetic code

play01:27

histones can be either grouped together

play01:29

forming lumps of DNA or as we can see in

play01:32

the picture on the right more relaxed

play01:33

and spread out it turns out that the

play01:35

modification of histones plays a major

play01:37

role in how densely the histones are

play01:39

grouped together or spread apart

play01:41

moreover additional modifications of

play01:44

histone tails can have other effects on

play01:46

gene transcription one important aspect

play01:49

of histones is that they can be changed

play01:51

to alter how much packing the DNA is

play01:53

capable of there are several key

play01:55

modifications that take place as a

play01:57

result of changes to groups of atoms

play01:59

that are at the ends of histones these

play02:01

changes can be of several types and will

play02:04

have a positive or negative charge that

play02:06

either attract them closer together or

play02:08

force them apart as a result the DNA

play02:11

region that is wrapped around the head

play02:13

stones can be more or less accessible

play02:15

causing variation in gene expression the

play02:18

nucleosome is considered to be the basic

play02:20

repeating unit of chromatin in which 146

play02:24

base pairs of DNA are wrapped around an

play02:26

octamer of core histones consisting of

play02:28

pairs of h3 h4 h2a and h2b n terminal

play02:35

tails of these systems protrude out of

play02:36

the nucleosome and are subject to a

play02:38

variety of post translational

play02:39

modifications such as acetylation

play02:42

phosphorylation ubiquitination

play02:44

and lysine and arginine methylation a

play02:47

set elation was the first of these

play02:49

modifications to be linked with active

play02:51

transcription and subsequently

play02:53

phosphorylation of histone h3 was found

play02:56

to cooperate with a substation in

play02:58

transcriptional activation some histone

play03:01

methylation events have also been

play03:03

associated with transcription activation

play03:04

and others with gene silencing histone

play03:08

modifications can function either

play03:09

individually or combinatorially to

play03:12

govern such processes as transcription

play03:14

replication DNA repair and apoptosis it

play03:19

has been proposed that methyl group of

play03:20

atoms increases packing and a central

play03:23

group decreases packing also phosphoryl

play03:27

group can be attached to the histones

play03:28

and causes a decrease in packing these

play03:31

modifications and their impact on

play03:33

transcription and other processes are

play03:35

being actively studied and as our

play03:38

understanding grows researchers

play03:39

appreciate how complex this machinery

play03:41

works in real life as we mentioned

play03:44

histones are organized in october's

play03:46

forming nucleosomes the four histones

play03:48

are called h-2a h-2b h3 and h4 DNA is

play03:53

wrapped around with structure of four

play03:55

times with histone h1 holding everything

play03:57

together the most studied modification

play03:59

of histones are methylation in a set

play04:01

elation of the core histone tails as you

play04:04

can see modification to the tails of

play04:07

histones is linked to the length of the

play04:08

tail itself when a specific type of the

play04:11

modification is studied it will be

play04:13

referred to as h3k4me3 or h3k27 me3 the

play04:20

name simply refers to the position on

play04:22

histone tail and the type of

play04:24

modification it is

play04:26

here you can see a table of histone

play04:28

modifications and their reported impact

play04:30

on transcription for example you might

play04:33

read about h3k4 methylation free signal

play04:36

this means that h3 histone tail is being

play04:39

modified by removal of methyl group

play04:41

items different signals act as different

play04:44

regulatory mechanisms and sometimes work

play04:46

together in the complex patterns the

play04:49

function of specific histone

play04:50

modifications is an active field of

play04:52

research and many of these modifications

play04:54

are not yet known on the right you can

play04:57

see how these modifications and appear

play04:59

in chip seek data against the control

play05:01

sample to understand this better let's

play05:04

discuss chromatin immunoprecipitation

play05:06

for studying histone modification

play05:09

chromatin immunoprecipitation followed

play05:12

by high-throughput sequencing for chip

play05:14

seek is a powerful tool to identify

play05:16

genome-wide profiles of transcription

play05:19

factor binding sites histone

play05:21

modifications and nucleosome positioning

play05:23

to understand the way this data is

play05:25

generated we have to look at the process

play05:27

in greater detail and appreciate the

play05:29

biological elements involved in various

play05:31

steps of the chip seek protocol

play05:35

chromatin immunoprecipitation uses

play05:37

antibodies designed to bind to specific

play05:39

proteins of interest for pris

play05:42

these can be histones or other complexes

play05:44

attached to the DNA first cross-linking

play05:47

ensures the proteins are tightly bound

play05:49

to DNA so that DNA shattering does not

play05:52

remove them

play05:53

DNA is shattered leaving fragments of

play05:55

DNA with or without protein of interest

play05:58

on them special antibodies bind to

play06:01

fragments of DNA with proteins of

play06:03

interest the antibodies are typically

play06:05

attached to magnetic beads so that it

play06:07

can be used to extract fragments of DNA

play06:10

with protein antibody complexes the

play06:13

complexes can then be removed and

play06:15

libraries for sequencing are prepared

play06:18

the sequence reads will contain tags

play06:20

that are used to denote the position

play06:22

next to where the protein was bound to

play06:25

DNA as a result peaks of reads will

play06:28

accumulate next to positions of DNA

play06:31

bound to protein in the analysis that

play06:33

these positions are identified Peaks

play06:35

quantified and position of the protein

play06:38

binding site is determined accumulation

play06:40

of reads with tags is analyzed for

play06:43

accurate detection of binding sites to

play06:45

detect accumulation properly a second

play06:48

library called I DNA is prepared as a

play06:50

control abundance of reads vs. control

play06:53

provides assessment of abundant

play06:55

significance then reads align to

play06:58

positive and negative strands are

play06:59

analyzed for overlap helping select the

play07:01

exact position of PRI binding sites a

play07:04

key step for chromatin

play07:06

immunoprecipitation is the design of an

play07:08

antibody that binds to a specific

play07:10

protein for example you might have a kid

play07:12

for histone h3 or even more specific to

play07:15

h3k27 or h3k4 specificity of the

play07:19

antibody whether it is monoclonal or

play07:21

polyclonal and the organism it is

play07:23

specific for are all key factors in

play07:26

generating high quality data the whole

play07:29

process of data preparation looks like

play07:31

this proteins of interest like histones

play07:33

are bound to DNA DNA is fragmented

play07:36

resulting in DNA fragments that have

play07:38

protein of interest on them and others

play07:40

that do not they might have other

play07:42

proteins on them or nothing at all only

play07:45

DNA with proteins of interest bound to

play07:48

them or event

play07:48

they're selected with specially designed

play07:51

antibodies the DNA is then separated

play07:54

from the protein and the DNA fragments

play07:56

sequenced as a result reads that were in

play07:59

the region bound to the protein of

play08:01

interest can be aligned to the reference

play08:03

genome and regions with abundance of

play08:05

reads are analyzed when we deal with

play08:08

chip sig data we are analyzing sequences

play08:10

of short reads that came from DNA

play08:12

fragments that were attached to protein

play08:14

of interest we selected after we

play08:16

sequence the raw reads they need to be

play08:18

cleaned from adapter sequences and PCR

play08:21

duplicates then - they're cleaned reads

play08:24

are aligned to the positive and negative

play08:26

strands of the DNA separately this is

play08:29

done because when the DNA is wrapped

play08:30

around the protein of interest and

play08:32

fragmented there will be a slight shift

play08:34

to the left or right left from the

play08:36

positive strand and right from the

play08:38

negative string that needs to be

play08:40

accounted for then the challenge is to

play08:42

accurately identify Peaks what is

play08:44

commonly referred to as peak Collin peak

play08:48

calling is performed on the two strands

play08:49

separately which leads to the next step

play08:52

determining the position of the p/y

play08:54

Center by shifting the positive and

play08:56

negative strand reads and identifying

play08:58

the intersecting segments finally the

play09:01

intersecting segments are annotated and

play09:03

marked as positions where the DNA was

play09:05

wrapped around a py the major objective

play09:08

of chip seek analysis to detect genome

play09:10

fragments that are enriched with

play09:11

upregulated signals these fragments

play09:13

could be transcription factor binding

play09:15

sites chromatin remodeling or gene

play09:17

transcription events the main

play09:19

algorithmic challenge is to accurately

play09:21

detect enriched genome fragments

play09:23

generating a whole genome landscape for

play09:25

the signal to do that let's review

play09:27

several techniques related to peak

play09:29

detection and peak shift that are part

play09:31

of the chip seek workflow probably the

play09:34

most discussed issue and chip seek

play09:35

experiments is the best method to find

play09:37

true peaks in the data and precisely

play09:40

identify a binding site a peak is a site

play09:42

where multiple reads have been mapped to

play09:44

and produce a pilot to accurately assess

play09:47

the pilot significance a control library

play09:50

is often used also referred to as I DNA

play09:52

chip sequencing is most often performed

play09:54

with single and reads and chip fragments

play09:58

per sequence from the five prime ends

play10:00

only this creates two disty

play10:02

pekes one on each strand with the

play10:04

binding site fallen in the middle of

play10:06

these Peaks the distance from the middle

play10:09

of the peaks to the binding site is

play10:10

often referred to as the shift as the

play10:13

DNA is wrapped around the histones and

play10:15

can be more or less accessible various

play10:17

portions of the DNA are affected these

play10:19

include the promoter region

play10:21

transcription start sites and the

play10:22

introns and exons of genes other

play10:24

regulatory regions like enhancers and

play10:27

micro rna's are encoded in introns or in

play10:30

proximity to coding regions each one of

play10:32

these elements can have a significant

play10:34

effect on the level of gene expression

play10:35

alternative splicing as well as other

play10:38

downstream regulation of pathways the

play10:41

size of the region and its location are

play10:43

both important

play10:45

researchers are reporting that different

play10:47

histone modifications are associated

play10:49

with specific widths and profiles of

play10:51

Peaks that can be detected in chip seek

play10:53

data often times specific regions like

play10:56

enhancers and transcription factors or

play10:58

micro RNAs are being studied in cancer

play11:01

neurodegenerative diseases and other

play11:03

conditions researchers are studying one

play11:05

or multiple types of modifications and

play11:07

the elements they regulate by comparing

play11:10

chip seek profiles of groups of samples

play11:12

accurate identification of differential

play11:14

profiles of histone modification can

play11:16

lead to the understanding of epigenetic

play11:19

regulation specific to a condition now

play11:21

that we understand the way chip seek

play11:23

data is prepared and analyzed let us try

play11:25

to perform an analysis ourselves in this

play11:28

hands-on section we will build a simple

play11:30

pipeline to analyze the data and discuss

play11:33

methods that were used in pipeline

play11:35

results Before we jump in to building

play11:38

our pipeline let's review the options we

play11:40

have for analysis the two options are

play11:42

for analysis of Peaks in a given sample

play11:45

or for detecting differences between two

play11:47

conditions in our case we will use the I

play11:50

DNA option to identify areas that have

play11:52

histone modifications by comparing a

play11:54

condition versus control the condition

play11:57

we will study is Engelmann syndrome an

play11:59

inherited neurodegenerative disease

play12:02

Engelmann syndrome also known as a s is

play12:06

a neurogenic disorder caused by deletion

play12:09

of the maternally inherited ube 3a

play12:12

allele and is characterized by the

play12:14

development delay

play12:16

intellectual disability ataxia seizures

play12:19

and a happy effect it is believed to be

play12:22

caused by epigenetic deregulation of a

play12:24

region that is responsible for proper

play12:26

neuron differentiation we will use a

play12:29

pair of samples from the study to try a

play12:31

hands-on analysis of chip seek data the

play12:34

full article is available here let's

play12:37

navigate to server dot T - bio info and

play12:41

open the areas of analysis here you will

play12:44

see a dedicated section for analysis of

play12:46

chip on chip and chip seek data when you

play12:49

are in select single and reads as e and

play12:52

Homo sapiens grch38 for reference genome

play12:56

we already have to fast queue files in

play12:59

SRA format archive of sequence reads

play13:02

uploaded to the server so we don't have

play13:04

to wait for lengthy uploads over various

play13:06

connection speeds SR are five nine nine

play13:09

zero eight eight five is a healthy

play13:11

control sample and SR are five nine nine

play13:14

zero eight eight two is a cell line from

play13:16

a patient with angleman syndrome we can

play13:19

place the healthy and sample group and

play13:21

sick and I DNA essentially we will get a

play13:24

contrast between the two so it does not

play13:26

matter which one goes where our first

play13:29

pipeline will be very simple for now

play13:31

let's skip the pre-processing steps and

play13:34

discuss two central processing steps

play13:36

alignment and segmentation alignment or

play13:40

mapping as it is also called can be done

play13:42

with several methods the main goal is to

play13:44

align reads to the reference genome we

play13:46

selected Homo sapiens grch38 both i2g is

play13:51

a newer version of the bowtie algorithm

play13:53

that was published in 2012 the second

play13:56

main step in the analysis of map reads

play13:58

is accurate detection of the rich

play14:00

regions for this purpose we will use

play14:02

Isaac a method that uses a Bayesian

play14:05

hidden Ising model for chip seek data to

play14:08

ignore falsely enriched regions caused

play14:10

by sequencing and/or mapping errors

play14:13

another more complete pipeline is shown

play14:16

here the main change is with including

play14:19

PCR clean to remove PCR duplicates and

play14:21

using multiple methods for segmentation

play14:24

these include Binh s a segmentation

play14:26

method developed by the Taliban from

play14:28

Alex

play14:29

research center and max to the popular

play14:31

method for analysis of chip seek data

play14:34

penis algorithm is based on a method

play14:36

developed by broadsky Adele in 2010

play14:39

called the binary search approach to

play14:41

whole genome data analysis this dividing

play14:44

conquer type algorithm detects a series

play14:46

of non-interest in fragments of various

play14:48

lengths with locally optimal scores

play14:51

original version of blacks - was

play14:54

published by Ying chieh and Cao Lu Lu

play14:56

from the lab of extra Li Lu at the

play14:59

dana-farber Cancer Institute Boston here

play15:02

we will use version two developed and

play15:04

maintained by taolu as a result of the

play15:08

second pipeline we can use mapping

play15:09

statistics as well as a table of each

play15:11

chromosome

play15:12

let's select from some one to see what

play15:15

the output looks like in the output

play15:18

table you will see a number of columns

play15:19

corresponding to outputs by each method

play15:22

bin s I seek and max - the most reliable

play15:26

results will be when we each method

play15:29

gives an enrichment score significantly

play15:31

greater than zero for example these

play15:33

positions highlighted between nine

play15:35

hundred sixty one thousand two hundred

play15:37

and 960 1800 on chromosome one since the

play15:42

output is already annotated we can see

play15:45

this segment corresponds with NS G zero

play15:48

zero zero zero zero one eight seven nine

play15:51

six one this is KLH seventeen catch like

play15:55

family member 17 let's look at this gene

play15:58

in greater detail the protein encoded by

play16:02

this gene is expressed in neurons of

play16:04

most regions of the brain it contains an

play16:06

n-terminal BTB domain which mediates

play16:09

dimerization of the protein na seed

play16:12

terminal Kelch domain which mediates

play16:14

binding to f-actin this protein may play

play16:18

a key role in the regulation of actin

play16:20

based neuronal function KHL 17 has been

play16:24

reported to be involved in neural

play16:26

development for example in their study

play16:28

of infantile spasms I assess pack yakov

play16:32

ski at I'll link a leech l7 theme to

play16:36

expansion of epilepsy phenotypes as you

play16:39

can see in this chart where enrichment

play16:41

scores from I seek

play16:43

were plotted KHL 17 is only one of the

play16:46

genes identified by enrichment of reeds

play16:48

as you remember these reeds mark a

play16:50

protein of interest attached to the DNA

play16:52

reading deeper into the method section

play16:54

of the paper we can find what antibody

play16:57

they used for preparing their chip seek

play16:59

data the protein of interest what was

play17:02

h3k4me3 if we plot the scores on the

play17:08

scatterplot with y-axis and the position

play17:10

on the chromosome we will see that KHL

play17:12

17 is at the start of the chromosome one

play17:15

of the first and rich regions identified

play17:17

by all three algorithms but that is not

play17:19

the only region that we can see highly

play17:21

enriched in this type of methylation of

play17:23

h3 histone we also see another region

play17:26

around the 150 million position that

play17:29

stands out one of the ways we can gauge

play17:32

the significance of the whole region is

play17:33

to use David for enrichment with gene

play17:35

ontology terms now that you have this

play17:38

data you are welcome to play around with

play17:40

these enriched regions to see what you

play17:41

can discover thank you for taking this

play17:44

course and we hope that you enjoyed

play17:46

learning about epigenomics data analysis

play17:49

if you want to take things further

play17:51

we recommend diving deeper into the

play17:53

Engelmann project or this other project

play17:55

focused on asthma to help you get

play17:57

started with these datasets we've

play17:59

prepared guides on accessing all of the

play18:01

data that include chip seek my sulphate

play18:03

seek and RNA sequence that's in the

play18:06

project you will find data guides and

play18:08

explanations of methods you can rely on

play18:10

analyzing and integrating this multi

play18:13

onyx data

Rate This

5.0 / 5 (0 votes)

Related Tags
EpigeneticsHistoneGene ExpressionChIP-SeqNucleosomeImmunoprecipitationDNA MethylationTranscriptionEpigenetic RegulationChromatin Analysis