Technological Advancements in public health bioinformatics

The Bioinformatics Lab
20 Sept 202325:22

Summary

TLDRIn this episode of the Bioinformatics Lab podcast, Kevin Linwood and Andrew Page discuss the evolution of technology adoption in pathogen genomics. They explore the journey from software packages to cloud-based solutions, emphasizing the impact on interoperability and reproducibility. They reflect on the challenges of software installation, the rise of package managers like BioConda, and the game-changing introduction of workflow managers. The conversation also touches on the importance of open-source tools for public health and the potential of machine learning and AI to revolutionize the field.

Takeaways

  • 🌐 The narrative of tech adoption in pathogen genomics has evolved from software packages to cloud-based solutions, impacting the field significantly.
  • 🛠️ Early challenges included lengthy software installation processes and managing dependencies, which have improved with advancements in package managers.
  • 📦 The introduction of Debian Med was a pivotal step, providing a sustainable and maintainable way to handle software packages in bioinformatics.
  • 🔧 The significance of easy software installation was highlighted as a key factor in adoption, with tools like Homebrew and Bioconda streamlining the process.
  • 🐍 Language shifts from Perl to Python have influenced the field, with Python emerging as a preferred language for its mathematical capabilities and community support.
  • 🔄 Workflow managers like Make, Snakemake, and Nextflow have standardized the way complex analyses are conducted, enhancing interoperability and collaboration.
  • 🌐 The adoption of cloud computing has been a game-changer, offering flexible, scalable resources that can be spun up quickly, without the need for extensive procurement processes.
  • 🔒 Security and legislative restrictions have influenced cloud adoption, with some countries limiting data to specific regions, affecting the scalability of bioinformatics workflows.
  • 🧬 The discussion highlighted the importance of open bioinformatics ecosystems, emphasizing the need for open-source tools and databases to ensure long-term support and accessibility.
  • 🤖 The future of pathogen genomics is anticipated to involve significant adoption of AI and machine learning, which may change the field in ways that are currently hard to predict.

Q & A

  • What is the main topic discussed in the podcast?

    -The main topic discussed in the podcast is the narrative of technology adoption in pathogen genomics, including software packages, containers, workflow languages, and the impact of cloud-based technologies on the field.

  • What does the podcast highlight about the early days of software installation in bioinformatics?

    -The podcast highlights that in the early days, installing software could take a significant amount of time and effort, sometimes requiring a dedicated person to manage the process due to the complexity of dependencies and compatibility issues.

  • What role did package managers play in the evolution of tech adoption in bioinformatics?

    -Package managers played a crucial role by simplifying the installation process of software and their dependencies, making it easier for users to access and use bioinformatics tools.

  • How did the introduction of workflow managers change the field of bioinformatics?

    -Workflow managers allowed for the standardization and streamlining of bioinformatics processes, enabling users to string together different tools in a cohesive way, which improved interoperability and collaborative development.

  • What is the significance of containerization in bioinformatics?

    -Containerization, through technologies like Docker and Singularity, has made it easier to manage software dependencies and ensured portability of tools across different computing environments, thus enhancing reproducibility and ease of use.

  • What is the impact of cloud computing on the field of pathogen genomics as discussed in the podcast?

    -Cloud computing has allowed for faster access to computational resources, reducing the need for long procurement processes and enabling researchers to scale up their analyses quickly and efficiently.

  • Why is the adoption of open-source tools and platforms emphasized in the podcast?

    -The podcast emphasizes the adoption of open-source tools and platforms because they promote accessibility, collaborative development, and long-term sustainability, which are crucial for public health and research applications.

  • How does the podcast view the future of AI and machine learning in pathogen genomics?

    -The podcast views the future of AI and machine learning in pathogen genomics as transformative, with the potential to change the field in ways that are not yet fully understood, but will likely lead to increased efficiency and new analytical capabilities.

  • What challenges are associated with cloud adoption mentioned in the podcast?

    -The podcast mentions challenges such as data sovereignty issues, where some countries have legislation that restricts the use of cloud services to within their borders, and the sensitivity of clinical data that cannot be moved across certain boundaries.

  • What is the significance of graphical user interfaces (GUIs) in making bioinformatics more accessible?

    -Graphical user interfaces (GUIs) are significant because they abstract the technical complexities of bioinformatics, allowing users to focus on analysis and interpretation rather than command-line operations, thus making the field more accessible to a broader range of professionals.

  • How does the podcast reflect on the importance of open bioinformatics ecosystems?

    -The podcast reflects on the importance of open bioinformatics ecosystems by discussing how they facilitate the distribution of tools, enable collaboration, and ensure that resources developed with public funding are available for public health laboratories worldwide.

Outlines

00:00

🌐 Evolution of Tech Adoption in Pathogen Genomics

Kevin Linwood and Andrew Page discuss the historical narrative of technology adoption in pathogen genomics, starting from software packages to cloud-based solutions. They reflect on the evolution from manual software installation to package managers like Debian and the impact of containerization and workflow languages on interoperability and reproducibility in the field. The conversation emphasizes the importance of making software easy to install to encourage usage and the challenges faced in the past with dependency management and software installation.

05:02

🛠️ Transition to Workflow Managers and Containerization

The discussion shifts to the advent of workflow managers, starting with make files and evolving to more sophisticated tools like Snakemake and Nextflow. The speakers highlight the pivotal role of workflow managers in standardizing analysis processes and enabling the seamless integration of tools. They also touch upon the transition from traditional HPC clusters to containerization with Docker and Singularity, which解决了 dependencies and portability issues, and the initial resistance and subsequent acceptance of these technologies in the field.

10:02

🌩️ The Impact of Cloud Computing on Bioinformatics

Kevin and Andrew explore the transformative effect of cloud computing on bioinformatics, discussing how it has simplified the procurement and management of computational resources. They recount personal experiences with cloud adoption, the ease of scaling up resources, and the challenges faced due to legislative restrictions in some countries. The conversation also covers the importance of open-source tools and the risks associated with closed-source solutions, emphasizing the need for open access to maintain and extend critical tools for public health.

15:04

🌱 Tech Adoption in Wet Lab Sequencing and Data Sharing

The speakers delve into the adoption of different sequencing platforms, from Illumina to Oxford Nanopore, and the implications for data generation and benchmarking. They express concern over the lack of inclusion of certain technologies in benchmarks and the potential biases this may introduce. The discussion also addresses data sharing practices, the challenges of cross-border data transfers due to security and privacy concerns, and the need for standardized outputs to facilitate data integration and analysis.

20:07

🌟 The Future of AI and Machine Learning in Pathogenomics

Looking ahead, Kevin and Andrew anticipate the significant impact of AI and machine learning on pathogenomics. They discuss the initial forays into AI with tools like Pangolin and the preference for interpretable methods over black-box models. The conversation speculates on how AI could change the field, potentially automating routine tasks and enhancing data analysis, while also raising questions about trust, interpretability, and the need for policy frameworks to guide the adoption of these advanced technologies.

25:07

🔚 Wrapping Up the Discussion on Tech Adoption

In the final part of the conversation, the hosts summarize their discussion on tech adoption, highlighting the dynamic nature of the field and the continuous evolution of tools and platforms. They express optimism for the future, with the anticipation of less manual work and more efficient data analysis through AI, while acknowledging the ongoing challenges in understanding and trusting AI-driven decision-making processes. The episode concludes with a commitment to continue exploring these topics in future discussions.

Mindmap

Keywords

💡Bioinformatics

Bioinformatics is the application of computer software and hardware to store, retrieve, analyze, and represent biological information. In the context of the video, bioinformatics is central to the discussion of technological advancements in pathogen genomics. The hosts discuss how the adoption of various software and hardware technologies has transformed the field, making it easier to analyze genetic data and contribute to public health.

💡Tech Adoption

Tech adoption refers to the process by which new technologies are integrated into a field or industry. The video narrative traces the evolution of tech adoption in bioinformatics, from software packages to cloud-based solutions. It illustrates how these technologies have become essential tools for researchers in pathogen genomics, impacting areas like interoperability and reproducibility.

💡Pathogen Genomics

Pathogen genomics is the study of the genetic material of pathogens, which are disease-causing microorganisms. The video discusses the impact of technological advancements on this field, highlighting how easier software installation, improved workflow management, and cloud computing have facilitated research and analysis in pathogen genomics.

💡Package Managers

Package managers are tools that automate the process of installing, upgrading, and managing software packages. In the script, package managers like Debian and Homebrew are mentioned as part of the tech adoption narrative, emphasizing how they simplified the installation of software and dependencies, which was previously a complex and time-consuming task.

💡Containers

Containers are a type of software that packages an application and its dependencies together. The video mentions Docker and Singularity as examples of container technologies that have improved portability and dependency management in bioinformatics, allowing researchers to run complex software stacks in isolated environments.

💡Workflow Managers

Workflow managers are tools that help in designing and executing complex workflows in a structured manner. The hosts discuss the transition from individual tool management to using workflow managers like Nextflow and Snakemake, which allow for the creation of reproducible and scalable analysis pipelines in bioinformatics.

💡Cloud Computing

Cloud computing refers to the delivery of computing services, including server time and data storage, over the Internet. In the video, cloud computing is highlighted as a significant tech adoption that has enabled rapid scaling of computational resources, making it easier for bioinformatics labs to access the power they need for pathogen genomics research.

💡Interoperability

Interoperability is the ability of different systems and components to exchange data and operate together. The video emphasizes the importance of tech adoption in promoting interoperability in bioinformatics, allowing for the seamless integration of various tools and workflows, which is crucial for collaborative research and data sharing.

💡Reproducibility

Reproducibility in scientific research means that experiments can be independently replicated to confirm results. The script discusses how tech adoption, particularly in the form of standardized workflows and containerization, has enhanced reproducibility in bioinformatics by ensuring that analytical processes can be consistently applied across different environments.

💡Public Health

Public health is the science of protecting and improving the health of communities through education, research, and policy. The video connects tech adoption in bioinformatics to its impact on public health, arguing for the importance of open-source tools and databases in supporting public health initiatives, such as disease surveillance and outbreak response.

Highlights

Narrative of tech adoption in pathogen genomics from software packages to cloud-based services.

Impact of tech adoption on interoperability and reproducibility in the field.

Early challenges in software installation and dependency management.

Evolution from manual software installation to package managers like Debian.

The importance of easy software installation for user adoption.

The rise of Homebrew and the shift to bioconda for easier software management.

Mamba's role in expediting the installation process in bioinformatics.

Adoption of workflow managers like make, snake make, and nextflow.

The transition from single tool integration to comprehensive workflow management.

The significance of Galaxy for accessible bioinformatics workflows.

The move towards cloud-based solutions and their impact on resource accessibility.

The critical role of open-source in public health and pathogen genomics.

Challenges and benefits of adopting different sequencing platforms.

The influence of cloud computing on the scalability and flexibility of bioinformatics.

The future of tech adoption with a focus on machine learning and AI in pathogenomics.

The potential of AI to transform bioinformatics workflows and decision-making.

The necessity for policy and trust in AI-driven bioinformatics.

Transcripts

play00:00

all right welcome I'm Kevin Linwood

play00:01

joined by Andrew page we're from Fijian

play00:04

and this is the bioinformatics Lab

play00:06

podcast

play00:09

today we're going to be talking about

play00:11

sort of narrative of tech adoption over

play00:13

the years in pathogen genomics uh from

play00:17

you know software packages to Containers

play00:19

workflow languages uh and now at this

play00:22

point of cloud-based guise and the

play00:25

different ways in which that's impacting

play00:26

our field

play00:27

and are you having much more time in in

play00:29

the field than I have so you have maybe

play00:31

a longer time Horizon I say that with

play00:33

absolute respect for your wisdom and

play00:35

experience in the field

play00:37

um but you have a broader perspective of

play00:39

what this has looked like because again

play00:41

whenever I talk about the sort of tech

play00:42

adoption story I kind of started package

play00:45

managers but of course there's beyond

play00:47

that you know before uh but all this

play00:50

conversation of how do we make this how

play00:51

do we mature this as a field for

play00:53

interoperability reproducibility and all

play00:56

the likes so when you think about that

play00:58

The Narrative of PEC adoption

play01:00

surely it starts before package managers

play01:03

oh yeah like I mean I remember the dark

play01:05

days where you'd have to hire someone

play01:07

you know just to sit there installing

play01:09

software because it would be it would

play01:10

take a week to install one piece of

play01:12

software you know please be editing it

play01:13

and trying to find 200 different you

play01:16

know dependencies for a compiler that

play01:17

you know was last released five years

play01:19

ago and so we've come up we've come a

play01:21

very long way

play01:23

um even then you know like languages and

play01:25

everything like I started off in I guess

play01:27

Ruby and then Pearl

play01:30

um for about Maddox and you know

play01:32

languages change all the time and no one

play01:35

uses Brill these days unfortunately well

play01:38

me uh but anyway you know and then

play01:40

python so you know things change over

play01:43

time and uh we do get better and

play01:45

actually I genuinely I think that um

play01:48

Pearl

play01:49

isn't as good for mathematics as python

play01:52

you know I've after maybe about 10 years

play01:54

I've come to that conclusion of years in

play01:56

it but then I see like my son is using

play01:58

rust and he's like oh this is amazing

play02:00

you know like and uh it's much faster

play02:02

and it probably is actually but I'm not

play02:04

gonna give a python you know

play02:06

um for some new fad language it's only

play02:08

been around a couple years uh anyway so

play02:10

I digress like it you know

play02:13

so if you just take the most basic

play02:15

installing software and package managers

play02:16

like

play02:17

the big thing a few years ago was just

play02:21

you know with Debbie and uh yeah and uh

play02:23

bunty's Debian packages so can you get

play02:25

into can you get something in a Debian

play02:27

package you know because once it's in

play02:28

there then you're sort of for life you

play02:30

know

play02:31

um and Debian Med was the big one for

play02:33

our fields which is um for biomedical

play02:37

resources go in there

play02:39

and yeah

play02:41

phenomenally difficult if you start

play02:44

packaging for Debian Med they will

play02:47

assign you like a mentor if someone who

play02:49

can guide you through the process over a

play02:51

few months wow because it's quite you

play02:54

know they want these things to to work

play02:56

universally and you know be working for

play02:59

a long time so a few months to guide you

play03:02

through the process of building your

play03:03

Debian packages you know which would

play03:05

then be sustainable in the long term and

play03:07

uh kind of easy to maintain and

play03:11

gosh so that was probably first gun

play03:14

during 2015 and because I just happened

play03:16

to work with a guy who's one of these

play03:18

maintainers and

play03:20

that the advantage of that is when you

play03:22

go to command lending and just have up

play03:23

to get install blah like Rory or

play03:26

something like that and that's that

play03:28

lowers the bar so much that it makes it

play03:31

trivial and if you make your software

play03:32

easy to install then of course people

play03:34

will use it and that's that's a key

play03:37

magic trick you know to uh to getting

play03:39

people to use your software is to make

play03:40

it trivial to install and then you know

play03:43

you go one step further and then we had

play03:45

a Homebrew do you remember that yeah I

play03:48

do that's that's about when I entered

play03:49

the the chat because yeah which is uh

play03:52

which is a little bit you know another

play03:54

system they made a bit easier to go and

play03:56

install software uh but it kind of fell

play03:59

off the rails

play04:01

um they became unwieldy condo came along

play04:03

bio conda and that's been kind of the

play04:05

default now for people to use

play04:07

um because it just works and you know

play04:09

everyone has kind of gathered around

play04:10

that it's quite easy to install stuff

play04:13

dependencies are you know are reasonably

play04:15

okay at the moment and you can't get

play04:17

into obviously a bit of a nightmare

play04:19

sometimes

play04:21

um and then Mamba has really helped the

play04:23

installation process you know rather

play04:25

than waiting an hour if tensor won't be

play04:28

software because things got so complex

play04:30

with resolving dependencies you know it

play04:32

now works it's pretty quick so you know

play04:34

we've come a long way

play04:36

um we've also come along with workflow

play04:38

managers what was the first workflow

play04:40

manager used

play04:42

make I I think I would consider make the

play04:45

first kind of workflow manager I was

play04:47

writing make files where I was like

play04:48

writing shell scripts of you know just

play04:52

executables and then I would compile

play04:54

them to not compile them I guess but I

play04:56

would kind of curate the workflow into a

play04:58

make file and then I would Define

play04:59

endpoints at the beginning and then it

play05:01

would be you know that's how the

play05:03

workflow worked it was just recipe style

play05:05

it would Define the endpoint to find the

play05:07

process is very much the same components

play05:09

I mean and then that's where I started

play05:11

learning about snake make which was

play05:12

really make in Python

play05:15

and then where I really started running

play05:17

in workflow managers was uh use of next

play05:20

flow and I think it was actually Aaron

play05:22

young again on the staff b side who kind

play05:24

of introduced us to um workflow managers

play05:27

and specifically uh maybe was

play05:31

Kelsey Floric actually I don't remember

play05:33

exactly but

play05:34

they introduced us to the concepts of

play05:36

workflow managers so we kind of went

play05:37

from the conversation of how do I get a

play05:39

single tool

play05:41

to work properly on my on my machine

play05:43

ensure that the dependencies are

play05:45

consistent throughout the different

play05:46

environments but then workflow managers

play05:48

changed it in that we were able to

play05:50

string them all together in a really

play05:51

cohesive way sorry one second

play05:56

um but

play05:58

that became kind of a really pivotal

play06:01

Point again I'm speaking on the staff b

play06:03

side but this is something that I think

play06:04

is echoed throughout the field of it's

play06:07

so much more standardized and how we're

play06:08

doing these these these analysis you

play06:10

don't have to write your own python

play06:12

logic to string data from you know your

play06:15

aligner to your variant caller to

play06:18

whatever it is Downstream for

play06:19

characterization

play06:20

rather there was a consistent language

play06:23

there somebody outside of it of of your

play06:25

laboratory can look at it and understand

play06:26

okay this is an excellent workflow I

play06:29

understand that the architecture of this

play06:32

this entire repository and moreover than

play06:34

that I could take the modules from that

play06:36

and build it into my workflow in a

play06:38

pretty seamless way so that that was a

play06:40

huge jump in terms of interoperability

play06:42

collaborative developments that we saw

play06:44

in staff B uh and we never really looked

play06:47

back from that that became just a status

play06:48

quo

play06:49

at first one I used was um gold

play06:52

vertebral resequencing uh code base it

play06:55

didn't even have a name like VR copace

play06:58

was the name and it was built for the

play07:00

Human Genome Project not human that

play07:03

hasn't Gina a Human Genome Project

play07:07

um in the Second Street so this was this

play07:10

was before people were doing things at

play07:12

scale and genomics so you know you're

play07:14

talking more than 10 years ago and said

play07:16

I love you know next word didn't exist a

play07:18

lot of these other things did not exist

play07:19

yeah and uh so they had to be built you

play07:23

know and uh sang as she was uh at that

play07:25

time you know would be producing maybe

play07:28

you know 20 30 of the world sequencing

play07:30

from one place so you know they had a

play07:33

scale different people didn't have and

play07:35

so that's what we adopted and uh we were

play07:36

done adopting it for pathogens you know

play07:39

which is quite different to human

play07:40

because you have to do a lot of things

play07:42

uh you've got a lot of small little

play07:44

things and that's very different to

play07:46

human which is you've got a few big

play07:47

things that you want to do things on

play07:50

um so it's like you know it's a flipped

play07:52

problem and of course that breaks

play07:54

everything and uh yeah so anyway that

play07:57

was the first one I ever worked on a

play07:58

highly complex

play07:59

um changing anything was very difficult

play08:01

it was all in code you know I was on

play08:03

GitHub whatever go to his own code and

play08:05

it was all in Pearl as well and you know

play08:08

everything was monitored by say writing

play08:11

little files to disk you know and

play08:12

checking them around whatever

play08:14

so it worked but you know the growing

play08:17

pains came in and so then the next

play08:18

Generation came out

play08:20

um

play08:21

which again was a super lovely beautiful

play08:24

pearl

play08:26

um but you know again it was it was more

play08:28

cloud computing focused and it did

play08:31

things very well but but then you know I

play08:34

had seen Galaxy a few times like oh this

play08:36

is actually pretty cool so when I went

play08:38

to Department that was first thing I

play08:40

brought in was Galaxy because you know

play08:41

you got a web server that anyone can use

play08:44

it can run on us on a cluster or on

play08:46

cloud and it's got all the tools you

play08:49

know kind of LinkedIn and built in

play08:50

because people you know again wrap-up

play08:53

tools simple XML files wrap the tool up

play08:56

and make it work and you know can

play08:57

broaden wakanda or whatever and then I'd

play09:00

like a lot of people to do complex

play09:01

workflows a lot of people who don't know

play09:04

the command line moving them away from

play09:06

you know the this kind of black screen

play09:08

and a blinking cursor and they don't

play09:10

know what to do you know it's uh to oh

play09:12

yeah I can just click here I can search

play09:14

for

play09:15

a liner and then suddenly you can learn

play09:18

your data and you know everything Flows

play09:19

In from the sequencers and then it can

play09:21

self-service uh to do their thing and I

play09:24

think I really do think uh gooey's you

play09:27

know their worker weight in gold you can

play09:29

see the amount of stuff that people can

play09:30

do so even in Excel you know you've got

play09:32

power users of excel it can do

play09:35

phenomenal things if you just give them

play09:37

something straightforward easy to use

play09:39

good usability they can work wonders

play09:42

when it comes to the command line

play09:44

obviously that's very powerful for doing

play09:45

things at scale and speed and for

play09:48

joining things together but you know you

play09:50

have a very high barrier for entry there

play09:53

for for an ordinary person even even

play09:55

someone who works in mathematics it can

play09:57

be a high priority entry because you

play09:58

might often have to read through lots of

play10:00

documentation to figure out how exactly

play10:02

does this fit in what exact format does

play10:04

this come out in he shouldn't have to

play10:06

sell that over and over again you know

play10:07

it's nice when you can just kind of have

play10:10

them linked together you know someone

play10:11

sells it once maybe defines the outputs

play10:15

say um for next flow or for for Galaxy

play10:18

and then it's there and it's wherever

play10:20

whatever more Works

play10:22

um

play10:24

yeah anyway uh

play10:27

workflow managers are fantastic as are

play10:29

things like self-independencies wakanda

play10:31

I just love that I mean Jesus Christ

play10:34

like the the pain the absolute pain you

play10:38

used to have to go through to install

play10:40

software and uh you know if you use

play10:44

clusters like the kind of old school

play10:47

HPC you know like basically physics

play10:50

clusters from back in the day you know

play10:51

massive you know machines in a big room

play10:54

with blinking lights

play10:56

actually installing stuff on those is a

play10:58

right pain in the ass right because one

play11:01

base operating system and you you kind

play11:04

of install stuff and load and stuff you

play11:05

know like the the operating system we

play11:07

add that particular you know and

play11:09

whatever is installed probably you know

play11:11

five years ago that that's the version

play11:13

you're stuck with that's the version of

play11:14

compilers are stuck with and you can't

play11:16

change anything you know so actually

play11:18

having the ability to run Docker and

play11:20

Singularity and whatever is is amazing

play11:22

and that I remember the arguments people

play11:24

had you know about allowing Docker to

play11:26

run on a cluster like that no that's

play11:28

security risk you know we can't be

play11:29

having that Singularity did help as a

play11:32

little bit with that but

play11:33

um yeah it's uh we've come a long way

play11:35

and now we're in the cloud like I mean

play11:37

Jesus that's amazing yeah you can just

play11:39

spin something up in a few minutes you

play11:41

don't have to wait two years to go to a

play11:43

procurement process and buy stuff you

play11:44

can just bang There You Go have

play11:45

resources run stuff and you have full

play11:48

control over it then you can tear it

play11:50

down and those are that's often the the

play11:52

kind of three

play11:54

in terms of tech adoption those are like

play11:56

the three turning points I always have

play11:57

in my mind it first is sort of maybe I'd

play12:00

put a slash in package managers and

play12:03

containers where it was solving the

play12:04

problem of dependencies and install and

play12:06

portability of their single Tools in

play12:08

itself and that's solving a lot of the

play12:10

problems that you described and then so

play12:12

now we can all if I have a tool if I

play12:13

have an assembler I can run it in a

play12:15

really reproducible way in my

play12:16

environment and you can run in the same

play12:18

way in your environment then the sort of

play12:20

next big Tech adoption was the workflow

play12:21

managers not only do I have the same

play12:23

tools I can stream them in a way that

play12:25

makes it plug and play and as you

play12:28

mentioned too these workflow managers

play12:30

not only help to standardize that but

play12:31

they also standardize the the running of

play12:33

them I can run it on an HPC I can run it

play12:35

on a local VM or I can run it in the

play12:37

cloud and all the workflow managers now

play12:39

have built-in capabilities

play12:42

for that scalability and so that you

play12:45

briefly mentioned it is that that third

play12:47

layer is the the GUI the the web

play12:49

applications that kind of wrap all these

play12:51

things together you know and my first

play12:53

experience with that was uh using Galaxy

play12:55

actually through genome tracker they

play12:57

adopted galaxies of platform and I

play12:58

realized oh my goodness I don't have to

play13:00

teach CLI any longer rather I could

play13:02

teach them to click a couple buttons I

play13:04

can show them the workflows and I can

play13:05

focus on the results and the analysis

play13:08

and interpretation rather than you know

play13:11

your Linux environment your library

play13:13

directory structure this CD means you're

play13:15

changing folders or something like that

play13:17

you know and and I think I'm stealing

play13:20

this from I think it was uh Peter

play13:22

um Van Houston in in Cape Town where

play13:25

he's he sent a line and I don't think he

play13:28

meant to be you know codify this but I

play13:30

thought it was a really powerful line

play13:31

and in public health

play13:33

uh they should be doing less

play13:34

bioinformatics and more public health

play13:36

and I thought oh that's kind of a nice

play13:37

little line it's like yeah we should

play13:39

really be working to abstract the sort

play13:41

of technical nuances of bioinformatics

play13:43

and really instead of approach provide

play13:45

them with tools that allow them to do

play13:46

Public Health

play13:48

but I think gooey's the graphic user

play13:50

interface is behind that is is really

play13:52

what what allows us to kind of Traverse

play13:54

that problem where you don't have to be

play13:57

card carrying by informatician to do

play13:58

bioinformatics rather you can use

play14:00

bioinformatics as a tool to inform your

play14:02

public health that you're implementing

play14:04

there and you know obviously we've seen

play14:06

that I've mentioned Galaxy Tara has been

play14:08

an incredible resource where you've seen

play14:10

that kind of come to play

play14:11

um and then you see again I'm still in

play14:13

these terms from different people Ali

play14:15

black and her 10 recommendations for

play14:16

pathogenomics she coined this term of an

play14:19

open bioinformatics ecosystem that's

play14:21

built off all these Technologies we've

play14:23

talked about containerized algorithms

play14:25

that are written in the standardized

play14:27

workflows that are made accessible

play14:28

through these GUI portals and it's like

play14:31

oh okay once you find that mix we've

play14:33

seen that uh that model allow us to

play14:35

distribute bioinformatics tools to

play14:37

Laboratories across the world we've seen

play14:40

that happen in so many different ways

play14:41

you know we recently put out a

play14:43

publication and where Tara from the

play14:46

broad Institute has fit that model

play14:47

though open bioinformatics ecosystem in

play14:50

that it's got a really well maintained

play14:52

graphic user interface Cloud backend

play14:54

with gcp it's got the doc store

play14:57

repository which is you know coming out

play14:59

of the ga4gh universe there

play15:02

um and then it's uh

play15:04

because it's standardized workflows you

play15:06

can also have the standardized outputs

play15:07

where that you can then transfer these

play15:09

outputs into different systems be it

play15:11

transferred to you know SRA and ncbi for

play15:14

for distribution and international

play15:15

accessibility you can also transfer

play15:17

these results into maybe more secure

play15:19

environments where you might be

play15:20

combining things with sensitive metadata

play15:22

for genomic Epi investigation so

play15:24

watching the tech adoption happen it's

play15:26

gotten to us a point where now it's just

play15:28

these resources that uh we've all been

play15:31

coiling with and trying to make sure

play15:32

that work on a machine now are in the

play15:34

hands of laboratorians so that they can

play15:37

use these things generate the results

play15:38

and make sense of the data in real time

play15:40

and form what's happening on the you

play15:43

know either infectious disease side be

play15:45

it public health clinical food safety or

play15:48

otherwise

play15:49

and you can see what happens where you

play15:50

know when things go wrong when things

play15:52

are not open like by numerics

play15:55

um

play15:55

the company has decided okay there's not

play15:57

enough money to be made to maintain this

play15:59

you know for the public health world so

play16:02

we're just gonna shut it down and that's

play16:04

it and that's a closed Source locked

play16:07

away highly critical tool for public

play16:09

health and it's it's going and had that

play16:12

been open it would have been very

play16:13

different because then obviously you

play16:15

know the community could take it up and

play16:17

you know keep maintaining it and extend

play16:19

it and and whatnot and keep it alive but

play16:21

if a commercial company holds the rights

play16:23

to that you know the source code and how

play16:25

it works and all the all the

play16:27

infrastructure behind it then it's a

play16:28

it's a problem and so we really do need

play16:31

to have open mathematics open tools open

play16:33

databases open everything

play16:36

absolutely and that's definitely been

play16:38

part of the ethos in you know how we

play16:40

work professionally with Public Health

play16:41

Labs is that we feel as if it's funded

play16:43

and supported by public health and it's

play16:45

applicable to a single Public Health lab

play16:46

it's very likely going to be something

play16:48

that other public health Laboratories

play16:49

could also find utility in you know for

play16:51

example we work with Laboratories in

play16:53

Mozambique uh helping develop you know

play16:55

assays for HIV sequencing and Analysis

play16:58

these same resources that we're working

play17:00

with aphl global Health to develop and

play17:02

innovate upon in in Mozambique we're

play17:04

watching being proliferated to

play17:05

Laboratories in the U.S who have the

play17:07

same interests and again getting back to

play17:09

that ethos of we're going to develop

play17:11

this tool publicly funded it needs to be

play17:13

open source and open accessible because

play17:15

it doesn't need to be closed off that's

play17:16

not necessarily the business model

play17:17

that's conducive to ongoing support

play17:21

long-term of uh Public Health laboratory

play17:23

pathogen genomics here

play17:26

absolutely and under Tech Adoption Fund

play17:29

actually I was at a conference day the

play17:31

other day and the what was it

play17:34

science meets policy using Next

play17:36

Generation sequencing tackle foodborne

play17:38

threats at the European food standards

play17:40

Authority I know it's really really

play17:42

eye-opening because all the different

play17:43

you know each country in Europe you know

play17:45

taking a slightly different way of doing

play17:47

things but you know obviously everyone

play17:48

is doing public health and everyone is

play17:50

doing food safety and you want to eat

play17:52

the same in goals more or less and it's

play17:54

very interesting to see how people are

play17:56

approaching different problems within

play17:58

the context of their own country some

play18:00

you know are going very much on the uh

play18:03

I suppose a short reads you know buying

play18:06

stuff I do CGM plus T shared at CGM LST

play18:10

results and others you know very much

play18:13

more well you know we'll raise the data

play18:14

or we won't release the data or whatnot

play18:16

and so yeah very interesting to see how

play18:19

people are approaching the same problem

play18:21

yeah Tech adoption it's big yeah we've

play18:24

only really talked about it on the sort

play18:25

of dry lab side of things but also on

play18:27

the wet lab side of things how are

play18:28

people adopting different sequencers

play18:30

different platforms from Illumina to ont

play18:34

I think you have some perspective on

play18:36

high on tour and adoption and things

play18:37

like this too

play18:38

so yeah I was a bit horrified to see um

play18:41

some Benchmark sets and they had like oh

play18:43

and T sorry they had iron torrent data

play18:46

but not ont then I was like come on like

play18:48

you know I understand the last time I

play18:50

saw one was actually uh is wrapped in

play18:52

plastic waiting for a disposal in an

play18:54

underground car park

play18:59

it's the first sequencing data I ever

play19:01

generated was actually on an ion torrent

play19:03

PGM if you're familiar with that yeah

play19:06

yeah

play19:07

they had the little kind of Xbox logos

play19:10

or something on them as well yeah they

play19:12

even had I think a slot for your iPhone

play19:14

if you wanted to you know put some uh

play19:16

tunes on while you're preparing those

play19:17

libraries it was it was an iPod or

play19:20

something something similar that is a

play19:22

really old connector as well yes yeah

play19:24

yeah yeah yeah yeah it was uh that was

play19:26

my first uh time uh generating

play19:28

sequencing data we had those ion torn

play19:30

chips uh I I I'm diverting there but

play19:34

that was my first tech adoption into it

play19:36

was iron tour and it was interesting

play19:37

because you know again they're not the I

play19:39

don't think we're seeing anything that's

play19:40

wildly controversial but you know known

play19:42

about the data but every time I would

play19:44

put the data out there people talk about

play19:45

the

play19:47

the air profile kind of associated with

play19:49

what was being generated there so it is

play19:51

interesting watching benchmarks uh

play19:52

getting generated where where you have a

play19:55

single technology with a known error

play19:56

profile without maybe also adding some

play19:59

context with either Illumina data or on

play20:01

T data especially for benchmarks

play20:04

yeah and was very eye-opening was that

play20:06

some countries don't allow people to use

play20:08

it loud for security reasons

play20:10

um so it very much limits them and what

play20:13

it can do in terms of mathematics and in

play20:14

terms of scaling up as well they can't

play20:16

just go we need more bang There you go

play20:19

that's a huge point in talking about

play20:21

tech adoption is cloud computing that

play20:24

because that was a big conversation is

play20:26

infrastructure development across the US

play20:28

that we were always a part of in staff B

play20:30

and people were doing pretty much

play20:32

everything you could see in terms of

play20:33

on-prem servers hpcs Cloud working with

play20:38

academic hpcs and all the like but then

play20:40

it just became so obvious that cloud was

play20:42

was the solution for all the reasons it

play20:45

is in every other industry

play20:46

um so that that's been a big thing but

play20:48

but it took a while like even me in

play20:50

Virginia it was like two and a half

play20:51

three years of discussions with RIT

play20:54

before we were given our AWS accounts

play20:56

but I think we're at a critical mass

play20:58

where we've seen so many Laboratories

play20:59

kind of break the barrier have the

play21:01

conversations with RIT that those

play21:03

conversations are shortening

play21:04

dramatically and you're seeing wide

play21:05

adoption across really the world I know

play21:08

in Academia like um when I came into the

play21:10

courtroom it was all

play21:12

um traditional HPC you know big cluster

play21:15

in uh in a server room on-prem and we he

play21:19

moved over to openstack you know which

play21:21

is a private Cloud which is a million

play21:23

times better and more flexible so you

play21:25

know that's where we start today and I'm

play21:27

sure in the future you know it'll be

play21:28

become public Cloud because it's very

play21:30

easy to go from an openstack private

play21:31

Cloud to a public Cloud

play21:33

um

play21:34

but what general pandemic is fantastic

play21:36

was that uh everyone all the um covert

play21:40

sequencing was being done and uploaded

play21:41

to uh an academic Cloud called climb MRC

play21:45

climb which is uh you know openstack

play21:48

based on three different universities in

play21:50

the UK and so instantly resources are

play21:53

available you know like no one had to

play21:54

sit around and say oh you know can we

play21:57

rent some resources from here or borrow

play21:58

some from here it's just like fine there

play22:00

you go

play22:01

um all available and then the public

play22:03

health authorities

play22:05

um like ukhsa we're using azure

play22:08

um and you know everyone was just it was

play22:10

just Cloud light loud you know in a

play22:12

private public and there's no sitting

play22:15

around you know waiting for to to buy

play22:17

vasm at the storage was just like well

play22:19

here's what we can right now and solve

play22:22

these problems I know other countries uh

play22:24

struggle because there's legislation

play22:26

that says I don't know they can only um

play22:29

if they were to use the cloud it can

play22:31

only be within their country or within

play22:33

one particular region so it kind of

play22:35

makes things a bit more difficult

play22:36

obviously Amazon Google and Azure uh

play22:39

some Microsoft are doing a good job by

play22:41

having data centers all around the world

play22:42

but it is a problem and then when you

play22:45

get done clinical stuff a lot of stuff

play22:46

can't even leave like the hospital you

play22:48

know something really sensitive data you

play22:49

know you just can't cross that barrier

play22:51

when out to Quantum we uh the building

play22:53

had the

play22:55

we shared with local hospital and

play22:57

obviously research in hospital we had

play22:59

two separate physical networks in one

play23:01

building and one you know one kind of

play23:04

server room as well you know two

play23:06

separate physical networks because they

play23:08

had to keep you know the clinical stuff

play23:10

Toki separate from the research stuff

play23:12

researchers seen as you know I've seen

play23:14

more Lucy goosey and high risk than that

play23:16

clinical stuff well I just fair enough

play23:18

but anyway it hopefully with time people

play23:22

will adopt a cloud a bit more

play23:24

and before we end the last front or I

play23:28

guess we talked about with the

play23:29

historical Tech adoption now looking

play23:31

forward I think you know you and I have

play23:32

had a couple episodes on this but what

play23:34

we're seeing is the big Tech adoption

play23:36

that's really going to be changing the

play23:37

field is you know machine learning and

play23:38

Ai and how it's going to impact things

play23:40

and we saw a little murmurs of it even

play23:42

like with Pangolin there was the pangol

play23:44

learn and it was like okay how are

play23:46

people going to deal with this sort of

play23:47

black box of decision making and it

play23:49

wasn't taken to Kylie I mean it was

play23:51

taken in its utility and practicality

play23:53

and speed but it was definitely a huge

play23:55

preference to the phylogenetic placement

play23:56

because people could make sense of the

play23:58

Usher uh approach versus machine

play24:00

learning but it's going to be

play24:02

interesting over the next couple years

play24:03

where more of this adoption of machine

play24:05

learning and AI in general there

play24:09

um or more specifically I guess it is

play24:11

going to be interesting to see how uh it

play24:13

impacts our field I don't know if you

play24:14

have a hot take in the last couple

play24:16

minutes before we close out in Tech

play24:18

adoption of AI and uh in pathogenomics

play24:22

I honestly think it's going to change

play24:24

things in ways we can't even comprehend

play24:27

right now you know it'll just be there

play24:29

it'll always be on it'll just be part of

play24:31

what we do and I'm excited for the

play24:34

future you know I'm excited to do less

play24:35

work I'm I love now when I program that

play24:38

uh it saves me so much time having to

play24:41

type stuff out because you know like it

play24:43

can you know pick up all the obvious

play24:45

stuff that I I'm probably gonna do

play24:47

yeah because we're already using it in

play24:49

active writing active programming I'm

play24:51

really interested to see how the big

play24:52

tools kind of come in here because a lot

play24:54

of the what we do is a lot of classic

play24:55

categorization that seems well for the

play24:58

taking for these kinds of Technologies

play25:00

but I think it's going to be the policy

play25:02

and Adoption of how do we

play25:04

to what level of trust we put to this in

play25:06

knowing that we can't really necessarily

play25:08

just open up and understand how we came

play25:10

to these decisions so

play25:12

that's what keeps our jobs exciting

play25:15

absolutely all right good episode we'll

play25:18

uh continue on this conversation I'm

play25:19

sure in future ones in the coming weeks

Rate This

5.0 / 5 (0 votes)

Related Tags
BioinformaticsTech AdoptionPathogen GenomicsWorkflow ManagersCloud ComputingPublic HealthOpen SourceMachine LearningAI ImpactHealthcare Tech