Next Generation Sequencing - A Step-By-Step Guide to DNA Sequencing.

ClevaLab
4 Dec 202207:37

Summary

TLDRThe video script details the evolution of genome sequencing from the 32-year Human Genome Project to the rapid Next Generation Sequencing (NGS) that now takes just a day. NGS, enabled by the human reference DNA sequence, sequences billions of DNA strands simultaneously, unlike the Sanger method used previously. The script explains the process of library preparation, sequencing by synthesis on Illumina instruments, and the importance of read depth and coverage in various applications, from diagnosing diseases to ecological research.

Takeaways

  • šŸ§¬ The Human Genome Project sequenced 3.2 billion bases of the human genome, taking 32 years from 1990 to 2003 to complete.
  • šŸš€ Next Generation Sequencing (NGS) can sequence a person's entire genome in just one day, a significant advancement from the 32-year timeline of the Human Genome Project.
  • šŸ”¬ NGS allows for the simultaneous sequencing of billions of DNA strands, in contrast to Sanger sequencing which sequences only one strand at a time.
  • šŸŒŸ The success of NGS is built upon the foundation of the human reference DNA sequence established by the Human Genome Project.
  • šŸ§© NGS involves cutting DNA into small pieces and sequencing them, then assembling the sequences based on the reference genome.
  • šŸ§Ŗ The process begins with sample collection and purification of DNA or RNA, with RNA needing to be reverse-transcribed into DNA before sequencing.
  • šŸ“š A 'library' of short DNA fragments is prepared from the purified DNA, cut to a specified size and tagged with adapters for sequencing.
  • šŸ”„ The sequencing process on Illumina instruments uses 'sequencing by synthesis' on a flow cell, where DNA fragments are amplified to form clusters for detection.
  • šŸ” Each sequencing cycle involves the addition of fluorescently tagged nucleotides, recording their color to determine the DNA sequence, and repeating until complete.
  • šŸ§¬ Paired-end sequencing provides two reads from the same DNA fragment, improving alignment confidence and analysis of longer stretches of DNA or RNA.
  • šŸ“Š Read depth and coverage are key metrics in sequencing, with different applications requiring different depths for effective analysis.
  • šŸŒ NGS has a wide range of applications, from diagnosing diseases and guiding treatments to various research fields, and can sequence various types of RNA and DNA, including non-coding RNAs and methylation sites.

Q & A

  • What was the duration of the Human Genome Project from its start to the completion of 85 percent of the first genome?

    -The Human Genome Project started in 1990 and took until 2003 to complete 85 percent of the first genome, which is a duration of 13 years.

  • How long did it take to fully sequence the human genome after the initial 85 percent was completed?

    -After completing 85 percent of the first genome in 2003, it took an additional 19 years to fully sequence the human genome by 2022.

  • What is the difference in sequencing time between the methods used during the Human Genome Project and Next Generation Sequencing (NGS)?

    -The Human Genome Project took 32 years to sequence the human genome, whereas with Next Generation Sequencing (NGS), it takes only a day to sequence a person's entire genome.

  • How does Next Generation Sequencing (NGS) differ from Sanger sequencing in terms of the number of DNA strands sequenced at once?

    -With Sanger sequencing, only one DNA strand can be sequenced at a time, whereas NGS allows for the simultaneous sequencing of billions of DNA strands.

  • What is the significance of the human reference DNA sequence in the context of NGS?

    -The human reference DNA sequence, created by the Human Genome Project, is crucial for NGS as it provides a basis for assembling the sequences of the small pieces of DNA that are cut and sequenced.

  • What is the process of creating a library in NGS and why is it necessary?

    -A library in NGS is a collection of short DNA fragments from a long stretch of DNA. It is created by cutting the DNA into short pieces of a specified size, adding adapter sequences to each fragment, and removing any non-bound adapters. This process is necessary to prepare the DNA for sequencing and to include an index for sample identification.

  • What is the purpose of the PCR step in the NGS process and when is it used?

    -The PCR step in the NGS process is used to increase the library amount. It is applied depending on the application when a higher quantity of the library is required for sequencing.

  • Which company's sequencing instruments are predominantly used in NGS, and what method do they employ?

    -Illumina's sequencing instruments are predominantly used in NGS, and they employ a method called sequencing by synthesis.

  • What is the purpose of the clonal amplification step in the NGS sequencing process?

    -The clonal amplification step is necessary to increase the signal of each unique library fragment to a level that is detectable by the sequencing instrument. It is achieved through a PCR process that forms clusters of identical DNA fragments.

  • How does the sequencing by synthesis method work in Illumina's NGS instruments?

    -In the sequencing by synthesis method, fluorescent nucleotides with different color tags and terminators are added to the flow cell along with DNA polymerase. Only one nucleotide is sequenced at a time, and the complementary base binds to the sequence before the camera records the color of each cluster. The process repeats for the number of reads set on the sequencer.

  • What are the two essential metrics in sequencing mentioned in the script, and what do they represent?

    -The two essential metrics in sequencing mentioned in the script are read depth and coverage. Read depth is the number of reads for a nucleotide, and average read depth is the average depth across the region sequenced. Coverage refers to the aim of having no missing areas across the target DNA.

Outlines

00:00

šŸ§¬ Revolution in Genome Sequencing with NGS

The first paragraph discusses the monumental shift in genome sequencing capabilities from the completion of the Human Genome Project in 2003 to the advent of Next Generation Sequencing (NGS). The project, which took 32 years to sequence 85 percent of the human genome, now can be completed in a day thanks to NGS. This technology allows for the simultaneous sequencing of billions of DNA strands, contrasting with the Sanger sequencing method used in the past, which sequenced only one strand at a time. The paragraph also explains the process of creating a DNA library for sequencing, including purification, checking for purity, reverse transcription of RNA, cutting DNA into short pieces, and adding adapters. It details the sequencing process using Illumina instruments, which involves binding DNA to a flow cell, denaturing the library, forming single-stranded DNA, and amplifying fragments to form clusters. The sequencing itself uses fluorescent nucleotides and a terminator, with each cycle adding a nucleotide and recording its color, until the desired number of reads is achieved. The paragraph concludes with the sequencing of indices for identification and, if necessary, the reverse strand for paired-end sequencing.

05:03

šŸŒŸ NGS Applications and Sequencing Metrics

The second paragraph delves into the applications and metrics of NGS. It highlights the ability to sequence up to 384 samples simultaneously using dual indices, and the process of filtering out bad reads, including those with overlaps or low intensity. The paragraph explains demultiplexing, which sorts reads from each sample based on indexes, and mapping these reads to a reference genome. It also discusses the significance of read depth and coverage in sequencing, emphasizing the importance of having no missing areas in the target DNA. The paragraph outlines various applications of NGS, such as diagnosing cancer and rare diseases, guiding cancer treatments, and research in diverse fields like ecology, botany, and medical science. It mentions the sequencing of whole genomes, transcriptomes, exomes, target genes, and various types of RNA, including non-coding RNAs. Additionally, it touches on the sequencing of cell-free DNA, single cells, and epigenetic markers like methylation or protein binding sites.

Mindmap

Keywords

šŸ’”Human Genome Project

The Human Genome Project was an international scientific research project aimed at determining the sequence of DNA nucleotides that make up the human genome, as well as identifying and mapping the genes present in the genome. In the video, it is mentioned as the initial effort that took 32 years to sequence the first human genome, highlighting the significant advancements in sequencing technology since its inception.

šŸ’”Next Generation Sequencing (NGS)

Next Generation Sequencing, also known as high-throughput sequencing, is a technology that allows for the rapid sequencing of large numbers of DNA strands simultaneously. The video emphasizes the dramatic increase in sequencing speed due to NGS, which can now sequence a person's entire genome in just a day, as opposed to the 32 years it took for the first genome with the Human Genome Project.

šŸ’”Sanger Sequencing

Sanger Sequencing is a method of DNA sequencing that was used during the Human Genome Project. It is named after its developer, Frederick Sanger, and is based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. The script mentions that Sanger Sequencing was limited to sequencing one DNA strand at a time, which made the process much slower compared to NGS.

šŸ’”Reference Genome

A reference genome is a representation of the complete genetic information of a species, typically based on the DNA sequence of a single individual or a composite of multiple individuals. The video script explains that NGS relies on a reference genome to assemble the sequences of small DNA pieces, which is a direct result of the work done in the Human Genome Project.

šŸ’”DNA Library

In the context of NGS, a DNA library is a collection of short DNA fragments that have been prepared from a larger DNA source. The script describes the process of creating a library, which involves cutting DNA into specified sizes, adding adapters, and removing any non-bound adapters to prepare the DNA for sequencing.

šŸ’”Adapters

Adapters in NGS are short sequences of DNA that are added to the ends of DNA fragments in a library. They serve as primers for sequencing and contain information necessary for the sequencing process, including an index to identify the sample. The video script mentions that adapters are crucial for the sequencing process and for organizing the data from multiple samples.

šŸ’”Sequencing by Synthesis

Sequencing by synthesis is a method used in NGS where DNA is sequenced by adding labeled nucleotides one at a time and detecting the incorporation of each nucleotide. The video script describes how this method is used in Illumina sequencing instruments, where fluorescent nucleotides are added, and the sequence is read by detecting the color of each cluster.

šŸ’”Flow Cell

A flow cell is a component of the sequencing instrument used in NGS. It is a glass surface where DNA fragments are immobilized and sequenced. The script explains that oligonucleotides bound to the flow cell's surface interact with the adapters of the DNA library, facilitating the sequencing process.

šŸ’”Paired-End Sequencing

Paired-end sequencing is a technique in NGS where two reads are generated from the ends of the same DNA fragment, providing more information about the region of interest. The video script explains that this method allows for better alignment and increased confidence in the sequencing results, especially for longer stretches of DNA or RNA.

šŸ’”Read Depth

Read depth in sequencing refers to the number of times a particular nucleotide is sequenced or the number of reads covering a specific region of the genome. The video script mentions that an average read depth of 30x is considered good for whole genome sequencing, while a higher read depth like 1500x is suitable for detecting rare mutation events in cancer.

šŸ’”Coverage

Coverage in sequencing is the extent to which the target DNA has been sequenced, with the goal of having no missing areas. The script explains that coverage is an essential metric to ensure comprehensive sequencing of the target genome, which is crucial for accurate analysis and interpretation of the sequencing data.

Highlights

The Human Genome Project sequenced all 3.2 billion bases of the human genome, taking 32 years to complete.

Next Generation Sequencing (NGS) can sequence a person's entire genome in just one day, a significant speed increase compared to 32 years.

NGS allows for the simultaneous sequencing of billions of DNA strands, unlike Sanger sequencing which sequences one strand at a time.

The completion of the Human Genome Project provided a reference DNA sequence essential for the functioning of NGS.

NGS involves cutting DNA into small pieces and sequencing them, then assembling the sequences based on the reference genome.

Both DNA and RNA can be sequenced using NGS, with RNA requiring reverse transcription into DNA first.

A DNA library is prepared for sequencing, consisting of short DNA fragments from a long stretch of DNA.

Adapters with sequencing information and sample indexes are added to each DNA fragment in the library.

PCR may be used to increase the library amount depending on the application.

Illumina sequencing instruments use sequencing by synthesis on a flow cell's glass surface.

DNA oligonucleotides matching the library adapters are bound to the flow cell surface for sequencing.

Clonal amplification forms clusters of unique library fragments for increased sequencing signal.

Fluorescent nucleotides with different color tags and terminators are used for stepwise sequencing.

The sequencing process involves cycles of base pairing, fluorescence recording, and terminator removal.

Paired-end sequencing provides two reads from the same fragment, enhancing alignment confidence.

Read depth and coverage are essential metrics in sequencing, with different applications requiring different depths.

NGS has a wide range of applications, from diagnosing cancer to ecological and medical research.

NGS can sequence various types of RNA, including non-coding RNAs like microRNAs and long non-coding RNAs.

Advanced NGS techniques allow for the sequencing of cell-free DNA, single cells, and detection of methylation or protein binding sites.

Transcripts

play00:00

ClevaLab. The Human Genome Project uncoveredĀ  all 3.2 billion bases of the human genome.Ā Ā 

play00:08

This project started in 1990 and took until 2003Ā  to complete 85 percent of the first genome. But,Ā Ā 

play00:17

in 2022, the gaps got filled and the sequenceĀ  became complete. So in total, sequencing theĀ Ā 

play00:24

human genome took 32 years. Now, with NextĀ  Generation sequencing or NGS, it takes onlyĀ Ā 

play00:31

a day to sequence a person's entire genome. OneĀ  day is a dramatic speed increase compared to 32Ā Ā 

play00:39

years! The difference is due to the number of DNAĀ  strands sequenced at once. Billions of DNA strandsĀ Ā 

play00:46

get sequenced simultaneously using NGS. However,Ā  only Sanger sequencing was available for theĀ Ā 

play00:52

Human Genome Project. With Sanger Sequencing, onlyĀ  one strand can get sequenced at a time. However,Ā Ā 

play00:59

NGS only works because the Human Genome ProjectĀ  created a human reference DNA sequence. TheĀ Ā 

play01:07

basic principle behind NGS is that DNA can be cutĀ  into small pieces and sequenced. The sequences ofĀ Ā 

play01:14

these small pieces then get assembled based on theĀ  reference genome. NGS can be used to sequence bothĀ Ā 

play01:21

DNA and RNA. First, samples get collected, andĀ  the DNA or RNA gets purified. Next, the DNA or RNAĀ Ā 

play01:30

gets checked to ensure it's pure and undergraded.Ā  RNA first needs to be reversed-transcribed intoĀ Ā 

play01:36

DNA before it can get sequenced. A libraryĀ  then gets prepared from the DNA. A libraryĀ Ā 

play01:42

is a collection of short DNA fragments from a longĀ  stretch of DNA. Libraries get made by cutting theĀ Ā 

play01:49

DNA into short pieces of a specified size. ThisĀ  cutting gets done by using high frequency soundĀ Ā 

play01:56

waves or enzymes. Then sequences of DNA calledĀ  adapters get added to each end of a DNA fragment.Ā Ā 

play02:03

These adapters contain the information needed forĀ  sequencing. They also include an index to identifyĀ Ā 

play02:10

the sample. Finally, any non-bound adapters getĀ  removed, and the library is complete. DependingĀ Ā 

play02:17

on the application, there can be a PCR step toĀ  increase the library amount. A successful libraryĀ Ā 

play02:23

will be of the correct size. It will also be ofĀ  a high enough concentration for sequencing. TheĀ Ā 

play02:29

main sequencing instruments used in NGS are fromĀ  Illumina. These instruments use a method calledĀ Ā 

play02:35

sequencing by synthesis. The sequencing occursĀ  on a glass surface of a flow cell. Short piecesĀ Ā 

play02:42

of DNA, called oligonucleotides, are bound to theĀ  surface of the flow cell. These oligonucleotidesĀ Ā 

play02:49

match the adapter sequences of the library. First,Ā  the library gets denatured to form single DNAĀ Ā 

play02:56

strands. Then this Library gets added to the flowĀ  cell, which attaches to one of the two aligos. TheĀ Ā 

play03:03

strand that attaches to the oligo is the forwardĀ  strand. Next, the reverse strand gets made, andĀ Ā 

play03:09

the forward strand gets washed away. The libraryĀ  is now bound to the flow cell. If sequencingĀ Ā 

play03:15

started now the fluorescent signal would be tooĀ  low for detection. So each unique library fragmentĀ Ā 

play03:21

needs to get amplified to form clusters. ThisĀ  clonal amplification is by a PCR that happensĀ Ā 

play03:28

at a single temperature. Annealing, extension andĀ  melting occur by changing the flow cell solution.Ā Ā 

play03:35

First, the strands bind to the second oligo onĀ  the flow cell to form a bridge. The strands getĀ Ā 

play03:41

copied. Then these double-stranded fragmentsĀ  get denatured. This copying and denaturingĀ Ā 

play03:47

repeats over and over. Localized clusters getĀ  made, and finally, the reverse strands get cut.Ā Ā 

play03:53

These strands get washed away, leaving theĀ  forward strand ready for sequencing. TheĀ Ā 

play03:59

sequencing primer binds to the forward strands.Ā  Next, fluorescent nucleotides G, C, T and A getĀ Ā 

play04:05

added to the flow cell along with DNA polymerase.Ā  Each nucleotide has a different color fluorescentĀ Ā 

play04:12

tag and a terminator. So only one nucleotide canĀ  get sequenced at a time. First, the complementaryĀ Ā 

play04:18

base binds to the sequence. Then the camera readsĀ  and records the color of each cluster. Next, aĀ Ā 

play04:25

new solution flows in and removes the terminators.Ā  The nucleotides and DNA polymerase flowing again,Ā Ā 

play04:31

and another nucleotide gets sequenced. These readĀ  cycles continue for the number of reads set on theĀ Ā 

play04:37

sequencer. Once complete, these read sequences getĀ  washed away. Then the first index gets sequenced,Ā Ā 

play04:44

and washed away. If only a single read is needed,Ā  the sequencing ends here. But, for paired-endĀ Ā 

play04:51

sequencing, the second index is sequenced, asĀ  well as the reverse strand of the library. ThereĀ Ā 

play04:56

is no primer for the second index read. Instead, aĀ  bridge gets created so that the second oligo actsĀ Ā 

play05:03

as the primer. The second index is then sequenced.Ā  These two index reads use unique dual indices.Ā Ā 

play05:10

These allow the use of up to 384 samples in theĀ  same flow cell. Next, the reverse strand getsĀ Ā 

play05:17

made, and the forward strands are cut and washedĀ  away. The reverse strands are then sequenced.Ā Ā 

play05:23

Once the sequencing is complete, any bad readsĀ  get filtered out. These include the clusters thatĀ Ā 

play05:29

overlap, lead or lag with sequencing or are of lowĀ  intensity. The clusters cannot overlap on a patentĀ Ā 

play05:36

flow cell, but there can be more than one libraryĀ  fragment per nanowell. These polyclonal wells willĀ Ā 

play05:43

also get filtered out. Next, the reads passing theĀ  filter get demultiplexed. Demultiplexing uses theĀ Ā 

play05:50

attached indexes to identify and sort reads fromĀ  each sample. Finally, the reads get mapped to theĀ Ā 

play05:57

reference genome. The different reads align toĀ  the reference genome, overlapping each other.Ā Ā 

play06:03

Paired-end sequencing creates two sequencing readsĀ  from the same library fragment. During sequenceĀ Ā 

play06:09

alignment, the alogarithm knows that these readsĀ  belong together. Longer stretches of DNA or RNAĀ Ā 

play06:15

can get analyzed with greater confidence that theĀ  alignment is correct. Read depth is an essentialĀ Ā 

play06:21

metric in sequencing. Read depth is the numberĀ  of reads for a nucleotide. Average read depth isĀ Ā 

play06:28

the average depth across the region sequenced. ForĀ  whole genome sequencing, a 30x average read depthĀ Ā 

play06:34

is good. A 1500x average read depth is suitableĀ  for detecting rare mutation events in cancer.Ā Ā 

play06:42

Another essential metric is coverage. The aim isĀ  to have no missing areas across the target DNA.Ā Ā 

play06:49

NGS gets used in a wide variety of applications.Ā  In diagnosing cancer and rare disease, treatmentĀ Ā 

play06:56

guidance for cancers, and many research areas fromĀ  ecology to botany to medical science. Both DNA andĀ Ā 

play07:04

RNA can be sequenced. It could be the whole genomeĀ  or transcriptome, just the coding regions (calledĀ Ā 

play07:10

exomes) of the DNA, or target genes in the DNA orĀ  RNA. All types of RNA can be sequenced includingĀ Ā 

play07:18

non-coding RNAs such as microRNAs and longĀ  non-coding RNA. In addition, cell-free DNA,Ā Ā 

play07:26

single cells, as well as methylation orĀ  protein binding sites can also get sequenced.

Rate This
ā˜…
ā˜…
ā˜…
ā˜…
ā˜…

5.0 / 5 (0 votes)

Related Tags
Genome SequencingHuman Genome ProjectNext Generation SequencingNGS TechnologyDNA AnalysisRNA SequencingCancer ResearchRare DiseasesMolecular BiologyGenetic MutationsBiotechnology