FUNCTIONALLY PROFILING METAGENOMES AND... - Eric Franzosa - Late-Breaking Research - ISMB 2016

ISCB
26 Aug 202016:13

Summary

TLDRIn the script, Dr. Orlando discusses advancements in profiling microbial communities from sequencing data, focusing on taxonomic and functional profiling. The tool 'MetaPhlan 2' is highlighted for taxonomic profiling, while 'HUMAnN2' is introduced for functional profiling, using a tiered read mapping strategy to improve accuracy and speed. HUMAnN2 enables species-level functional profiling, offering insights into both the composition and activity of microbial communities, with applications demonstrated in the Human Microbiome Project and other cohorts.

Takeaways

  • 🌟 The script discusses advancements in profiling microbial communities from shotgun sequencing data, focusing on taxonomic and functional profiling.
  • 🔍 Taxonomic profiling identifies species and higher-level clades present in a microbial community, while functional profiling identifies gene families and pathway composition.
  • 🚀 MetaPhlan 2 is a tool developed for taxonomic profiling that searches reads against a pre-selected set of marker genes unique to each clade or species for efficiency and accuracy.
  • 🛠 Humann2 introduces a 'tiered read mapping' strategy to improve the accuracy, speed, and resolution of functional profiling by leveraging species identification for more targeted searches.
  • 🔑 The first tier of Humann2's method rapidly identifies species in a community by mapping reads against species-specific marker genes.
  • 🌐 The second tier builds a custom database for the sample by concatenating the pan-genomes of detected species, allowing for a detailed search of the remaining reads against these genomes.
  • 📚 Humann2's tiered search is more specific and sensitive, leading to higher accuracy and faster processing times compared to traditional comprehensive searches.
  • 📈 The method was tested using synthetic metagenomes created from sets of bacterial genomes, demonstrating high sensitivity and precision even for low-abundance species.
  • 🌿 Humann2 has been applied to real-world data, profiling hundreds of human metagenomes, and has shown to explain a majority of reads during the initial accelerated search tier.
  • 🔬 The tool allows for the identification of metabolic pathways that are signatures for particular body areas, providing insights into the taxonomic resolution of functions within communities.
  • 📊 Humann2 also enables the distinction between functional potential (DNA level) and functional activity (RNA level), showing that the two are not always the same.

Q & A

  • What are the two primary questions in microbial community profiling from sequencing data?

    -The two primary questions are 'who is there', which refers to taxonomic profiling to identify species and higher-level clades, and 'what are those species doing', which is functional profiling to identify gene families and pathway composition.

  • What is the computational challenge in profiling microbial communities from sequencing data?

    -The computational challenge lies in the need to search short reads from shotgun sequencing against a vast database of microbial reference genomes, which is computationally intensive and prone to errors due to potential spurious mappings.

  • What is MetaPhlAn 2 and how does it address the taxonomic profiling challenge?

    -MetaPhlAn 2 is a tool developed for taxonomic profiling that searches reads against a pre-selected set of marker genes unique to each clade or species, rather than the entire database, making the profiling process more efficient and accurate.

  • What is the concept behind the tool HUMAn2 and its tiered read mapping strategy?

    -HUMAn2 implements a tiered read mapping strategy, which starts with rapidly identifying species in a community by mapping reads against species-specific marker genes, followed by a custom database search of pan genomes of detected species, and finally a comprehensive search for any remaining reads.

  • How does HUMAn2 improve the accuracy and speed of functional profiling?

    -HUMAn2 improves accuracy by ensuring reads are mapped to the correct species-specific genes, reducing spurious mappings. It improves speed by using a reduced database for the initial search, thus processing the metagenome much faster than a comprehensive search against an exhaustive database.

  • What is the advantage of HUMAn2's tiered search over traditional comprehensive searches?

    -HUMAn2's tiered search is more specific and sensitive, placing reads in the correct locations, and thus increasing overall accuracy. It also processes the metagenome faster due to the use of a reduced pan genome database.

  • How does HUMAn2 provide species-level resolution in functional profiling?

    -HUMAn2's tiered search allows for the reconstruction of functions on a species-by-species basis within the community, providing insights into which species are performing specific functions, even for low-abundance species.

  • What synthetic metagenomes or meta transcriptomes are used to evaluate HUMAn2's performance?

    -Synthetic metagenomes or meta transcriptomes are created by taking sets of bacterial genomes and pulling synthetic sequencing reads from them, allowing for the evaluation of HUMAn2's accuracy and performance against expected profiles.

  • What are the different patterns of species contribution to a conserved pathway observed in the human gut microbiome using HUMAn2?

    -Different patterns include a complex attribution where multiple species contribute to a conserved pathway in varying mixtures across individuals, a per-person dominant attribution where one species dominates the contribution to the pathway for each individual, and a universal dominant pattern where one species consistently provides the pathway across the population.

  • How does HUMAn2 distinguish between functional potential and functional activity in a community?

    -HUMAn2 can analyze both DNA and RNA data, allowing it to distinguish between the functional potential (number of encoded pathways) and functional activity (actual expression of those pathways) within a community, showing that the two are not always the same.

  • How can one access and learn more about HUMAn2 and its related tools?

    -HUMAn2 can be found by searching its name online, and it is installable via source pip as a Python package and via Homebrew. There is a detailed user manual and an active user group on Google Groups for further support and information.

Outlines

00:00

🌱 Advances in Microbial Community Profiling

Orlando discusses the lab's work on profiling microbial communities from shotgun sequencing data, focusing on taxonomic and functional profiling. Taxonomic profiling identifies species and clades, while functional profiling examines gene families and pathways within the community. The challenge lies in the computational intensity and potential for error due to the vast size of data and databases. The lab developed MetaPhlan 2 to address taxonomic profiling by searching reads against a reduced, pre-selected set of marker genes unique to each clade or species, resulting in faster and more accurate taxonomic composition profiling.

05:00

🔍 Humann2: Tiered Read Mapping for Enhanced Functional Profiling

The script introduces Humann2, a method that improves the speed and accuracy of functional profiling by implementing tiered read mapping. The process begins with a rapid identification of species using a small, specific dataset of marker genes. This is followed by a custom database search using the pan genomes of detected species, excluding genomes with no evidence of presence. The final tier involves a comprehensive search for unmapped reads. Humann2's approach is shown to be more accurate and faster, with the ability to reconstruct functions on a species-by-species basis, providing both community totals and species-specific insights.

10:00

🧬 Synthetic Metagenomes and Real-World Data Analysis

To evaluate Humann2's effectiveness, synthetic metagenomes were created using bacterial genomes and sequencing reads. The method was tested on a challenging dataset with a wide abundance range and congeneric species. Humann2 demonstrated higher specificity and sensitivity compared to traditional methods, resulting in improved accuracy and a significant speedup. The method was also applied to real-world data from the Human Microbiome Project, showing similar performance with faster processing times and the ability to identify signature metabolic pathways conserved in specific body areas.

15:01

🌐 Humann2's Broader Impact on Microbiome Research

Humann2 has been used to analyze hundreds of human metagenomes, revealing distinct metabolic pathway signatures for different body areas. The method allows for species-level resolution of function, showing different patterns of species contribution to conserved pathways across individuals. It also distinguishes between functional potential and activity within a community. Humann2 is available for use and is part of a suite of tools developed by the Huttenhower lab for microbiome analysis, which will be further discussed in upcoming presentations.

📚 Availability and Further Resources for Humann2

Humann2 is accessible for those interested, available through a simple Google search and installable via pip for Python or Homebrew. The lab provides a detailed user manual and an active user group for support. The presentation also highlights the broader ecosystem of tools for microbiome analysis developed by the Huttenhower lab, with a technology track presentation planned for the following day. The team acknowledges the contributions of collaborators and the Human Microbiome Project for the valuable data used in their research.

Mindmap

Keywords

💡Microbial Communities

Microbial communities refer to diverse groups of microorganisms that live in a specific environment and interact with each other. In the context of the video, the focus is on profiling these communities using shotgun sequencing data to understand their composition and function. The script discusses how to identify the species within these communities and what roles they play, highlighting the importance of taxonomic and functional profiling.

💡Shotgun Sequencing

Shotgun sequencing is a method used in genomics to sequence DNA by breaking it into small, random pieces and then determining the sequence of each piece. The script mentions this technique as the starting point for profiling microbial communities, where short reads are generated and then compared against databases to identify the species and functions present.

💡Taxonomic Profiling

Taxonomic profiling is the process of identifying and categorizing the different species within a microbial community. The script explains that this involves answering the question 'who is there' by identifying the species and higher-level clades present in the community, which is a fundamental step in understanding the community's composition.

💡Functional Profiling

Functional profiling is the identification of the functions performed by the species within a microbial community, such as the gene families and pathway composition. The script emphasizes this aspect as the main focus of the talk, explaining how it differs from taxonomic profiling and is crucial for understanding what the species in a community are doing.

💡Metaphlan 2

Metaphlan 2 is a tool mentioned in the script for taxonomic profiling. It is designed to alleviate some of the computational challenges associated with searching reads against a large database by using a pre-selected set of marker genes that are unique for each clade or species. This allows for efficient and accurate profiling of the community's taxonomic composition.

💡Tiered Read Mapping

Tiered read mapping is a strategy implemented in the tool HUMANn2, as discussed in the script. It involves searching reads through different tiers or levels of databases, starting with a small, specific dataset of species-specific marker genes, followed by a custom database built from the pan-genomes of detected species, and finally, a more traditional comprehensive search. This approach aims to improve the accuracy, speed, and resolution of functional profiling.

💡Pan-genome

A pan-genome refers to the complete set of genes found within all the isolates of a given species. In the script, the concept is used to describe the custom database created for each sample after the initial taxonomic prescreen step, which includes the genomes or pan-genomes of the species believed to be present in the community.

💡HUMANn2

HUMANn2 is a method developed for improving the accuracy and speed of functional profiling in microbial communities. The script describes it as implementing tiered read mapping and being able to reconstruct functions on a species-by-species basis within the community, providing both community totals of different functions and the ability to attribute these functions to specific species.

💡Synthetic Metagenomes

Synthetic metagenomes are artificial datasets created for testing and validation purposes. In the script, they are used to evaluate the performance of HUMANn2 by simulating a complex microbial community with a wide range of species abundances and genetic similarities, allowing the researchers to assess the accuracy and sensitivity of the method.

💡Metabolic Pathways

Metabolic pathways are a series of chemical reactions in a cell that are linked together to produce a certain product or carry out a specific function. The script discusses how HUMANn2 can identify and quantify these pathways within a microbial community, providing insights into the functional capabilities of the community and how different species contribute to these pathways.

Highlights

Orlando discusses advancements in profiling microbial communities from shotgun sequencing data.

Two primary questions in microbial community profiling: taxonomic profiling and functional profiling.

Taxonomic profiling identifies species and clades present in a microbial community.

Functional profiling focuses on gene families and pathway composition within the community.

Metaphlan 2 is a tool for taxonomic profiling using pre-selected marker genes.

Metaphlan 2's efficiency comes from reduced database size and specificity of marker genes.

Humann2 introduces a tiered read mapping strategy for functional profiling.

Tiered read mapping starts with rapid species identification using marker genes.

Second tier involves building a custom database from detected species' pan genomes.

Remaining reads are searched against a comprehensive protein database in the final tier.

Humann2 provides functional profiles stratified by contributing organism and unclassified abundance.

Humann2 can quantify gene families and collapse their abundance into pathways.

Synthetic metagenomes are used to validate Humann2's accuracy and performance.

Humann2 outperforms traditional methods in precision, sensitivity, and speed.

Humann2 enables species-level resolution of functional profiles in microbial communities.

The method has been applied to hundreds of human metagenomes from the Human Microbiome Project.

Humann2 can identify metabolic pathway signatures specific to different body areas.

Species-level profiling reveals different patterns of contribution to conserved pathways.

RNA data analysis with Humann2 shows a distinction between functional potential and activity.

Humann2 is available as a Python package and part of a broader suite of microbiome analysis tools.

The Huttlehauer lab offers a technology track presentation for an overview of their tools.

Transcripts

play00:00

mv orlando talking about some of the

play00:02

work

play00:03

in our lab profiling microbial

play00:05

communities from shotgun sequencing data

play00:07

and we'll be talking about some recent

play00:08

advancements in that area today

play00:12

so when we're talking about the

play00:14

microbial community there's two primary

play00:16

questions or types of questions we're

play00:17

trying to answer from sequencing data

play00:20

one is this question of who is there

play00:22

which is the issue of taxonomic

play00:24

profiling of identifying the species and

play00:26

higher level clades that are present in

play00:28

that microbial community

play00:30

and the second question which will be

play00:31

the closer focus for today

play00:33

is a question of what those species are

play00:34

doing which is functional profiling

play00:36

identifying the gene families

play00:39

and pathway composition of that

play00:40

community and both of these because

play00:42

they're starting from sequence

play00:44

data are classic bioinformatics problems

play00:46

in in sequence search so that's where we

play00:48

begin our story um so to actually do

play00:52

this to actually do either of these

play00:53

types of profiling

play00:54

we're interested in searching short

play00:56

reads a shotgun sequence metagenome or

play00:58

meta transcriptome

play01:00

against a vast database of microbial

play01:02

reference genomes

play01:03

and as you can imagine as the size of

play01:05

these data metagenomes

play01:07

increases as the size of this database

play01:09

increases with new isolate genomes being

play01:11

sequenced every year

play01:12

this is a very computationally intensive

play01:14

problem and there's also a lot of

play01:16

opportunity for error here if we're

play01:18

spuriously mapping

play01:19

reeds where they don't belong and so

play01:21

this is really where we find ourselves

play01:22

in trying to solve these profiling

play01:24

questions

play01:26

previously we developed a technique for

play01:28

taxonomic profiling to alleviate some of

play01:31

those issues

play01:31

and more specifically what we're doing

play01:33

there in this tool called metaphon 2

play01:36

is instead of searching reads against

play01:38

the entire database

play01:39

is to search them against a pre-selected

play01:41

set of marker genes that are unique for

play01:43

each clade or species

play01:45

across the community so for example here

play01:48

we have isolated this a gene here that's

play01:51

well conserved within the yellow species

play01:53

we always see it in isolates of that

play01:54

species

play01:56

but it's not seen anywhere else so if

play01:58

that gene recruits a reed if a reed maps

play02:00

to that gene in the community

play02:01

it's sort of like a little name tag

play02:03

telling us that the yellow species

play02:05

is there and we can use this technique

play02:07

to very efficiently and accurately

play02:09

profile the taxonomic composition of a

play02:11

community this is a reduced database so

play02:13

it gives us a nice speed

play02:14

uh bonus and we've also pre-selected

play02:16

these genes to be very specific so we

play02:18

know when they recruit reads

play02:20

that were um that they're being assigned

play02:23

at the correct place

play02:24

the issue in texas and functional

play02:26

profiling is we're not interested in

play02:27

just a subset of genes but rather we're

play02:29

interested in all the genes and pathways

play02:30

in the community

play02:32

but in today's talk what i would like to

play02:34

to go over is how we can leverage this

play02:36

idea

play02:36

of being able to rapidly and accurately

play02:38

identify the species in a community

play02:40

in order to improve the accuracy speed

play02:43

and resolution of functional profiling

play02:48

so the method that we developed for that

play02:50

is called human2 and it implements a

play02:51

strategy called

play02:52

tiered read mapping and i'll go through

play02:54

what that means now

play02:56

so the idea here is that we're starting

play02:58

from a shotgun sequenced

play02:59

metagenome or meta transcriptome from a

play03:02

microbial community and then we're going

play03:04

to search these reads through a set of

play03:06

tiers of different searches against

play03:08

different databases

play03:10

the first search tier is what i just

play03:11

described in the previous slide

play03:13

we're going to attempt to rapidly

play03:14

identify the species in this community

play03:17

by mapping these reeds against the small

play03:19

and highly specific data set

play03:21

of species-specific marker genes and so

play03:23

you can see here in this example

play03:25

that genes are being recruited to this

play03:26

marker gene from the blue species

play03:28

and the orange species here indicating

play03:30

that they are likely to be present in

play03:32

this community

play03:32

but not to this green species indicating

play03:34

that it's likely absent from that

play03:36

community

play03:37

and this is our first search tier the

play03:39

second search tier which is really the

play03:41

meat of this whole process then

play03:43

is to build a custom database for this

play03:45

sample by concatenating the pan genomes

play03:48

of species that we detected in this

play03:49

taxonomic prescreen step

play03:51

so now we're going to do is do a

play03:53

detailed search of all the remaining

play03:54

reeds

play03:55

against the genomes or pan genomes of

play03:58

species we believe to be present in the

play03:59

community

play04:00

we're throwing away this genome here

play04:02

we're not including this genome here

play04:03

because we had no

play04:05

evidence that it was actually present in

play04:06

the community and although i'm just

play04:08

showing one here being excluded

play04:09

in reality this process is excluding

play04:11

many many pan genomes that we won't

play04:13

search through in this second tier of

play04:14

the search

play04:16

in the last tier of the search anything

play04:19

that doesn't map in this process will

play04:21

then

play04:21

let flow into a more traditional

play04:23

comprehensive search strategy so we'll

play04:25

try to explain as much as we can

play04:27

as quickly and as specifically as

play04:28

possible and then what's left over will

play04:30

take a

play04:31

a traditional approach and just search

play04:32

comprehensively by translated search

play04:35

against a protein database at the very

play04:37

end of this some reads will still map

play04:39

nowhere they don't map to any reference

play04:41

and these are set aside for possible

play04:43

assembly downstream and outside of human

play04:45

too

play04:47

so the end result of this tiered search

play04:50

are functional profiles of a metagenome

play04:52

or meta transcriptome looking something

play04:54

like this

play04:54

where we have a particular function in

play04:56

this case a gene family

play04:57

and it's stratified by both contributing

play05:00

organism as well as unclassified

play05:02

abundance that we couldn't assign to a

play05:04

particular species

play05:05

that adds up to a total here this is for

play05:08

gene families

play05:09

once we've actually quantified the gene

play05:10

families in the community

play05:12

for genes like imp dehydrogenase that

play05:14

participate in a metabolic pathway we

play05:16

can collapse the abundance of multiple

play05:18

genes

play05:19

into a smaller subset of pathways which

play05:21

is

play05:23

more tractable to work with downstream

play05:25

and we end up with similar looking data

play05:26

where we get four each pathway

play05:28

an abundance at the community level as

play05:30

well as stratified by the species that

play05:32

contributed to that pathway

play05:33

and a measure of pathway coverage which

play05:35

is a measure of our confidence that the

play05:37

pathway is actually complete within this

play05:39

particular sample

play05:41

so these are what the outputs look like

play05:43

for a typical run of human 2 on a

play05:45

metagenome or metatranscriptome

play05:48

to evaluate that this actually works and

play05:50

it's behaving the way that we expect it

play05:51

to

play05:52

we were able to create synthetic

play05:54

metagenomes or meta transcriptomes

play05:56

by taking sets of bacterial genomes and

play05:59

pulling synthetic sequencing reads from

play06:01

them

play06:02

and so in the example i'll go over here

play06:03

we have a selection of 20 bacterial

play06:06

species that are the most commonly

play06:07

occurring species in the human gut

play06:09

microbiome

play06:10

and what we've done is to sample wreaths

play06:12

i think the color there just shifted if

play06:14

there's anything we're able to do

play06:16

with that on the projector if not you

play06:18

can adjust your eyes

play06:20

um so we have sampled these reeds in a

play06:22

staggered composition here

play06:24

such that the most abundant species in

play06:27

this

play06:27

synthetic metagenome is about a thousand

play06:30

times more abundant than the least

play06:31

abundant species and so this makes for a

play06:33

challenging

play06:34

problem here and that the species have a

play06:36

really broad range of

play06:38

abundance brilliance also you can see

play06:41

challenging us here the fact that we

play06:42

have

play06:42

congeneric species multiple species

play06:44

within the same genus

play06:46

and so there's a lot of homology we

play06:47

expect among these congeneric species

play06:49

which can make

play06:50

mapping reuse two specific species

play06:52

within that gene is more difficult

play06:55

so once we have this synthetic

play06:56

metagenome we can create an expected

play06:58

profile of what genes and pathways we

play07:00

expect to observe

play07:01

and then analyze this meta genome using

play07:03

different methods and see how well they

play07:04

do

play07:05

when we actually analyze this using a

play07:07

traditional method a traditional

play07:08

comprehensive search

play07:10

we actually see a lot of error due to

play07:12

spurious mapping we have

play07:13

reads that are mapping where they're not

play07:15

supposed to across those broad databases

play07:17

which hurts our precision

play07:18

as well as reads that we're supposed to

play07:19

map to a gene and wound up mapping

play07:21

somewhere else which hurts our

play07:22

sensitivity

play07:24

in contrast humantu's tiered search is

play07:26

both more specific

play07:27

and more sensitive in terms of putting

play07:30

reads in the right place which gives us

play07:32

a nice boost in overall accuracy

play07:36

in addition because the tiered search is

play07:38

trying to explain as much as possible

play07:40

using that reduced pan genome database

play07:42

it tends to process the metagenome a lot

play07:44

faster than the comprehensive search

play07:46

you're spending more time working with a

play07:48

small database than you are the large

play07:49

database with the tiered search

play07:51

and so in this particular synthetic

play07:52

example it was about a 7x speedup

play07:56

but the last thing which is really one

play07:57

of the key advantages of human 2 here is

play07:59

that in addition to getting

play08:01

us to community totals of different

play08:03

functions which both methods can do

play08:05

human 2's tiered search is able to

play08:06

reconstruct functions on a species by

play08:08

species basis within the community

play08:11

and what we can see here is our

play08:12

sensitivity for those functions across

play08:13

species is very very high

play08:15

down to about 1x coverage once we get

play08:17

below 1x coverage

play08:19

you're actually not sampling the entire

play08:20

genome anymore and so you're well the

play08:22

gold standard says you should be able to

play08:23

find everything

play08:24

because breeds weren't necessarily

play08:25

sampled from every gene we don't see

play08:27

them

play08:28

however it's critical to note that our

play08:30

precision remains very high even for

play08:32

these low abundant species meaning that

play08:33

their genomes are in this database

play08:35

they're in this custom database

play08:37

but they're not recruiting reads that

play08:38

they're not supposed to so we were very

play08:40

happy with the overall performance of

play08:41

the method here

play08:44

in terms of performance on real world

play08:46

data we've used human 2 to profile

play08:48

hundreds of human metagenomes from the

play08:49

human microbiome project

play08:51

and there we tend to see similar

play08:52

performance that we're able to move very

play08:55

quickly through these meta genomes

play08:56

in the pan genome search stage about an

play08:58

order to two orders of magnitude faster

play09:00

than in the translated search

play09:03

now that doesn't work that well for us

play09:05

if we don't end up explaining a lot of

play09:07

reads in pan genome search

play09:09

and indeed what we find is that up to

play09:11

about 60 percent of reeds in a typical

play09:13

metagenome are explained by the pangeno

play09:15

mapping during that fast step

play09:17

with about 15 percent of additional

play09:19

reads explained when we then

play09:20

take the rest of the reeds and push them

play09:22

through pan genomes translated search

play09:24

so we're really explaining the majority

play09:26

of what we can explain during this

play09:28

accelerated

play09:29

initial tier of the search

play09:33

so that's that's some benchmarking and

play09:35

accuracy stats for you

play09:37

moving into some actual science we've

play09:38

then looked at the profiles that human 2

play09:40

produced

play09:41

from those hmp metagenomes and we've

play09:44

isolated metabolic pathways that we call

play09:46

signatures for particular body areas

play09:49

meaning that they're really well

play09:50

conserved within a particular body area

play09:52

and tend to be absent from other body

play09:54

areas so in this example at the top

play09:56

romnos degradation we see that it's

play09:58

quite abundant and conserved at the in

play10:00

gut metagenomes across individuals and

play10:02

fairly rare at these three other

play10:04

microbiome body sites so this is sort of

play10:07

a grand overview

play10:09

if we zoom in on that top example here

play10:11

to see how this looks sample by sample

play10:13

we can see that indeed human 2 is

play10:15

showing us a lot of very consistent

play10:16

abundance for this pathway

play10:18

of about six parts per thousand across

play10:21

the gut meta genomes and then it drops

play10:23

off very quickly thereafter that we

play10:24

don't really see much abundance for

play10:25

those pathways at the other

play10:27

sites in addition this is a little

play10:30

tricky to see here but this light gray

play10:32

down here is the

play10:32

unclassified amount of the uh the

play10:35

pathway that's

play10:36

identified during the translated search

play10:38

the rest of this in darker gray

play10:40

outside that box is actually being

play10:42

assigned a particular species so in this

play10:44

particular example

play10:45

not only are we seeing this pathway

play10:46

consistently across gut meta genomes but

play10:48

we're able to assign it

play10:50

assign the majority of the copies of

play10:51

that pathway to particular species

play10:54

and we actually take those signature

play10:56

pathways and dig into that species level

play10:58

attribution

play10:59

we see some interesting patterns so for

play11:01

example if we zoom in on what's going on

play11:03

in that little box there for the gut

play11:04

meta genomes and take a look at the

play11:06

species attribution

play11:08

we can see different patterns of how

play11:09

species contribute to a conserved

play11:11

pathway or how a pathway is conserved

play11:13

across different individuals

play11:15

in the case of this gut pathway romney's

play11:18

degradation

play11:19

the attribution is actually relatively

play11:21

complex that although the total is

play11:23

fairly well conserved across individuals

play11:25

we see very different mixtures of

play11:27

species contributing the pathway from

play11:29

one person to another

play11:31

most individuals have a handful of

play11:33

species that are contributing

play11:34

and those aren't necessarily the same

play11:36

from one person to the next

play11:38

suggesting that the overall abundance of

play11:40

this pathway seems to be more conserved

play11:41

than its taxonomic attribution

play11:44

i'll contrast that with another

play11:45

mechanism where we can observe a per

play11:47

person

play11:47

dominant attribution of the pathway so

play11:50

for example this is peptidoglycan

play11:51

biosynthesis from the vaginal microbiome

play11:54

here again we see a relatively constant

play11:56

abundance across individuals

play11:58

but a very different pattern of

play11:59

attribution that each individual is

play12:01

dominated by about

play12:02

one species mostly in the the genus

play12:04

lactobacillus

play12:05

and so while they all wind up with about

play12:07

the same abundance of the pathway it

play12:09

tends to be contributed by just

play12:10

one species per person that may differ

play12:12

between people

play12:15

a last mechanism is a universal dominant

play12:17

pattern of attribution and

play12:19

an example of this is trellis

play12:20

degradation on the skin

play12:22

this is a pathway that's provided by

play12:24

propianobacterium acnes it's not really

play12:26

seen

play12:26

at other body sites and on the skin this

play12:29

pathway is fairly consistently

play12:30

and completely provided by just

play12:33

propionibacterium

play12:34

acnes across the population so unlike

play12:36

the previous two examples

play12:38

where different species could contribute

play12:40

the pathway in different mixtures

play12:41

here on the skin we're really just

play12:43

seeing this one very common skin bug

play12:45

prop acne is contributing this

play12:47

signature pathway for the skin across

play12:49

individuals

play12:50

and so humantu's tier search combined

play12:52

with the species level profiling

play12:54

allows us to see this sort of taxonomic

play12:56

resolution to function

play12:57

that we haven't been able to do in

play12:59

previous approaches

play13:01

as a final biological example we'll move

play13:03

outside of the human microbiome project

play13:04

to another cohort that

play13:06

two of my colleagues jason and gallop

play13:08

work on

play13:10

this is a cohort of health professionals

play13:11

within the boston area where we have

play13:13

both

play13:14

metagenomes and metatranscriptomes of

play13:17

the human gut

play13:18

and jason and elite have profiled these

play13:20

samples using human two

play13:22

and we look at the dna level

play13:24

contributions for for pathways in this

play13:26

uh

play13:26

group we see a lot of things that are

play13:28

similar to that first mechanism i showed

play13:30

a

play13:30

complex attribution where a particular

play13:33

pathway is relatively conserved in the

play13:35

gut but can be contributed by multiple

play13:37

organisms per person and those

play13:39

potentially differ between people

play13:41

so this is looking at the the dna level

play13:43

we see that this is also a fairly

play13:45

conserved pattern between individuals of

play13:47

this mixture of bugs

play13:49

when we look at the rna data however we

play13:51

see a very different picture

play13:52

that there's more of a gradient of some

play13:54

people having a complex attribution

play13:56

pattern in the rna pool

play13:57

whereas in others the rna pool is

play13:59

completely dominated by a single species

play14:01

tequila bacterium prismiciae

play14:04

and so here what we're seeing from human

play14:05

too is this ability to distinguish

play14:07

the functional potential of a community

play14:10

in this case

play14:12

numbers or relative copy numbers of

play14:13

encoded pathways across bugs

play14:16

with functional activity the actual

play14:18

relative expression of that

play14:20

that pathway within a community and see

play14:22

that the two are not always the same

play14:25

so in summary human too implements this

play14:28

tiered approach which is a new approach

play14:30

to functional profiling that aims to

play14:32

explain as many reads as possible with

play14:33

progressively

play14:35

broader and less specific databases this

play14:38

approach is more accurate and a lot

play14:39

faster than the traditional approach of

play14:41

just doing a comprehensive search

play14:42

against an exhaustive database

play14:45

and lastly we get stratification of our

play14:47

results by species for free in this

play14:49

process which allows us to access both

play14:51

those questions of who's there

play14:53

and what they're doing at the same time

play14:56

which is increasingly what we want to

play14:57

know not just what a community is doing

play14:59

but what species are actually performing

play15:01

those functions in the community

play15:04

human 2 is available now if this is

play15:06

something that's of interest to you it's

play15:07

the first hit if you google humond 2.

play15:10

it's installable via source pip as a

play15:12

python package and

play15:14

via homebrew we have a fairly detailed

play15:16

user manual as well as a very active

play15:18

user group on google groups where you

play15:20

and i can converse by email if you'd

play15:22

like

play15:23

and so i'd encourage you to try it out

play15:25

human 2 is part of a broader menagerie

play15:28

of tools that the huttonhauer lab has

play15:30

put together for analyzing microbiomes

play15:32

both in terms of profiling data

play15:34

profiling metagenomes from raw

play15:35

sequencing as well as doing downstream

play15:37

statistical analysis on those results if

play15:40

you'd like to learn more about that

play15:41

overall system

play15:42

we'll be doing a technology track

play15:43

presentation tomorrow evening that will

play15:45

give more of a broader survey of these

play15:47

tools than i've done now

play15:49

and also if you stay tuned my colleague

play15:50

ali will be presenting one of our

play15:52

statistical methods for the analysis of

play15:53

paired high-dimensional data sets

play15:56

so a big thanks to the lab especially

play15:58

the human 2 team highlighted there in

play16:00

green for

play16:02

having a lot of fun with this project

play16:03

our collaborators on human 2 and

play16:05

also the entire human microbiome project

play16:07

for providing a lot of excellent data

play16:09

for us to work with

play16:10

thank you

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
Microbial ProfilingBioinformaticsHUMAnN2MetagenomicsTaxonomyFunctional AnalysisSpecies IdentificationGene FamiliesPathway CompositionMicrobiome Research
Benötigen Sie eine Zusammenfassung auf Englisch?