How Sonrai Analytics leverages ML to accelerate Precision Medicine (L300) | AWS Events
Summary
TLDRこのプレゼンテーションでは、アイルランドのスタートアップソリューションアーキテクトであるジョナ・クレイグが、AWSのサービスを活用してがん治療の効率化を目指すスタートアップであるソナイアナリティクスをサポートしていることを紹介します。ソナイアナリティクスは、大量の医療データを扱い、機械学習を用いてがんの診断や治療の時間を短縮する取り組みを進めています。彼らはAWSのマネージドサービスを活用し、小規模ながらも技術的最先端を駆使して、医療分野におけるイノベーションを目指しています。また、プレゼンテーションでは、AWSのサービスであるSageMakerを使用して機械学習モデルをトレーニング、展開し、監視するプロセスについても説明されています。
Takeaways
- 🌟 ソナイアナリティクスは、AWS技術を活用してがん治療の臨床試験時間を短縮し、医療システムの効率化に貢献しています。
- 👔 ジョナ・クレイグはアイルランドのスタートアップソリューションアーキテクトとして、さまざまな規模のスタートアップをサポートしています。
- 🛠️ AWSのマネージドサービスを活用することで、スタートアップチームは限られたリソースを効果的に利用し、技術的な課題に取り組むことができます。
- 🔧 機械学習のループは、データの準備、モデルのトレーニング、モデルの展開、そしてモニタリングとオーケストレーションの4つの基本的なステップから成り立っています。
- 💾 ソナイアナリティクスは、ペットバイト単位のデータを扱うため、効果的なコスト管理システムとスケーラブルなアーキテクチャが必要です。
- 🧬 彼らのクライアントは、がんを治療するための新しい薬剤を開発しており、AIを活用して適切な治療法を特定しています。
- 🔬 コンピュータービジョンを使用した特定のユースケースでは、顕微鏡画像からがん細胞を検出するAIを開発しています。
- 🚀 AWSのサービススタックを活用して、トレーニングから推論までのプロセスを効率化し、モデルのパフォーマンスを最適化しています。
- 🌐 グローバルな展開が可能で、AWSのデータセンターを活用して、顧客ごとにセグメント化されたインスタンスを提供しています。
- 🛡️ データ保護とプライバシーに重きを置いたAWSサービスを使用することで、GDPRなどの規制に適合しています。
- 🔑 AWSのアクティベートプログラムを活用して、初期のビジネス開発と技術的な検証を迅速に行うことができました。
Q & A
AWS認定チャレンジとは何ですか?
-AWS認定チャレンジは、AWSのサービスや機能を学ぶためのプログラムで、機械学習やソリューションアーキテクチャを含むAWSの様々な認定資格に挑戦することができる。
ソナイアナリティクスはどのような企業ですか?
-ソナイアナリティクスは、AWS技術を利用してがんのドラッグ試験時間を短縮することを目指すスタートアップ企業であり、医療分野における効率性向上に貢献している。
Jonah Craigの現在の職務は何ですか?
-Jonah Craigはアイルランドにいるスタートアップソリューションアーキテクトとして、ソナイアナリティクスなどのスタートアップ企業をサポートしている。
ソナイアナリティクスが取り組んでいるデータの種類にはどのようなものがありますか?
-ソナイアナリティクスは、医療や生命科学分野のデータを扱っており、petabytesのデータを処理する必要がある。
AWSサービスの中でソナイアナリティクスが使用しているものには何がありますか?
-ソナイアナリティクスはAWSのサービスを幅広く使用しており、SageMaker、Health Omix、Athena、Glue、Lambda、Fargate、ECSなどが挙げられる。
ソナイアナリティクスが取り組んでいるがん治療アルゴリズムの重要なポイントは何ですか?
-ソナイアナリティクスは、精確医学を通じてがんの治療法を開発しており、アルゴリズムを用いて患者さんに適切な治療法を提供することに重点を置いている。
AWSのSageMaker Studioの機能は何ですか?
-SageMaker Studioは、データの準備からモデルの展開までを一連のプロセスで管理し、機械学習エンジニアやデータサイエンティストが協力できる環境を提供している。
ソナイアナリティクスが使用しているAWSのサーバーレスコンピュートとは何ですか?
-AWSのサーバーレスコンピュートは、コードを実行するためのサーバーの管理を必要としないコンピュートサービスであり、LambdaやFargateなどが該当する。
ソナイアナリティクスが直面しているデータストレージの課題とは何ですか?
-ソナイアナリティクスは、petabytesのデータを効率的に管理し、コストを最適化する必要があり、S3のライフサイクル管理を活用してホットとコールドのストレージ間でデータを移動させている。
ソナイアナリティクスが開発しているAIアルゴリズムの目的は何ですか?
-ソナイアナリティクスは、AIアルゴリズムを開発してがんの治療法を迅速化し、病理学者の作業を支援することで、患者の診断と治療の時間を短縮することを目指している。
Outlines
🎤 プレゼンテーションの開始とAWS認定の紹介
Jonah Craigがプレゼンテーションの司会を始め、AWS認定チャレンジに参加した理由とその意味を説明。彼はアイルランドのスタートアップソリューションアーキテクトとして、ソナイアナリティクスなどの顧客と協力し、スタートアップ企業をサポートしていることを紹介。ソナイアナリティクスは、AWS技術を活用してがん治療の臨床試験時間を短縮する取り組みを行っている。Jonahは、データ準備、モデルトレーニング、モデルの展開と監視を含む機械学習のループを紹介し、ソナイアナリティクスがこれらのプロセスを効率的に遂行していることを強調した。
🔬 ソナイアナリティクスのデータサイエンスとAI製品開発
Dr. Matthew Aliseがソナイアナリティクスのデータサイエンス部門の責任者として、自社のAI製品開発の経緯と取り組みを語る。元分子生物学者の彼は、データ分析の需要が高まってからバイオインフォマティクスに転向し、ソナイに参加してからAI製品を開発している。ソナイは、がん治療法の開発を支援するため、精密医薬品向けのアルゴリズムをAWS平台上で開発している。また、コンピュータービジョンを使用して、病理学者の助けとなるAIを開発していることを紹介。
💊 がん治療法のデータ分析とAIのビジネスチャレンジ
Matthewは、がんの顕微鏡画像をデジタル化し、AIを用いて病理学者が手動で行っていた作業を自動化する取り組みについて説明。AIの適用により、病理学者の不足と高コストの問題を解決し、スライドのデジタル化をリードする。彼らの戦略は、最小限の患者データを用いて特徴量抽出器を開発し、その後自動データ準備を通じてモデルの性能を向上させ、最終的に医療現場での利用を目的としたワークフローに統合した。
🛠️ AWSテクノロジーを使ったデータサイエンスの基盤構築
Jaredがソナイのエンジニアリングチームを率いており、データエンジニアリングとインフラストラクチャを構築していることを紹介。ソナイは、信頼性の高い研究環境を提供し、GDPRなどの規制に則って患者データの取り扱いを行っている。また、AWSのサービスを活用して、複雑で多様なデータタイプに対応し、分析と発見を通じて患者の安全性を確保している。
🌐 AWSサービスを使ったデータアーキテクチャの構築
Jaredは、ソナイがAWSサービスを活用してデータアーキテクチャを構築し、分析とAI/MLの分析をサポートしていることを説明。Health OmixやAthena、Glueなどのサービスを用いて、大量の生データを効率的に処理し、分析と可視化を行っている。また、AWSのサーバーレス機能を活用して、柔軟なスケーリングとコスト効率の良いインフラを構築している。
🚀 AWS生態系を活用したスタートアップの成長
Jonahは、ソナイアナリティクスがAWSの管理サービスを活用して、製品開発に注力していることを強調。AWSのサポートとアクティベイトプログラムを通じて、ソナイは迅速に成長し、市場で競争力を維持している。また、AWSのドキュメントとトレーニング資源を活用して、技術的な課題に対処し、新しいサービスを迅速に採用している。
🔮 未来の展望とAWSサービスの進化
JaredとMatthewは、ソナイアナリティクスが今後の焦点をHealth Omixに置き、生データの処理を効率化する取り組みを行っていることを紹介。また、ファウンデーションモデルの開発や大規模な言語モデルを活用した新しいアルゴリズムの構築を予定している。AWSのサービスが進化し続けることで、ソナイはさらなる技術的イノベーションを目指している。
Mindmap
Keywords
💡AWS
💡マシンラーニング
💡ソナーアイアナリティクス
💡データサイエンティスト
💡SageMaker
💡モデルトレーニング
💡データプリペアテーション
💡モデルデプロイメント
💡MLOps
💡Precision Medicine
Highlights
AWS认证挑战介绍,鼓励人们学习AWS认证,包括机器学习和解决方案架构。
Jonah Craig作为初创解决方案架构师在爱尔兰工作,与Sonai Analytics等客户合作。
Sonai Analytics使用AWS技术减少癌症药物试验时间,提高医疗保健系统的效率。
Jonah介绍了机器学习的基础概念,包括数据准备、模型训练、部署和监控。
Amazon SageMaker服务概述,它如何封装机器学习流程。
Sonai Analytics面临的挑战,包括数据存储、成本管理、训练自己的基础模型和业务需求的扩展。
介绍了AWS的AI服务,如Amazon HealthLake,适用于医疗保健和生命科学领域。
SageMaker Studio如何支持数据准备到模型部署的全过程。
Jared和Matthew分别从数据科学和工程角度介绍Sonai的工作。
Matthew的背景介绍,从分子生物学家到数据科学领域的转变。
Sonai Analytics的成立背景和它在精准医疗领域的应用。
通过计算机视觉技术辅助病理学家分析癌症组织切片的用例。
使用AWS服务进行数据存储、处理和模型训练的技术细节。
Jared讨论了工程挑战,包括数据工程和基础设施建设。
如何使用AWS服务处理和分析大规模生物技术数据。
Sonai如何利用AWS服务加速药物开发和临床试验流程。
Jared和Matthew强调了AWS服务在医疗保健领域的应用潜力和未来展望。
观众参与请求,包括对演讲的反馈和即将举行的工作坊信息。
Transcripts
hello folks can you hear me
okay uh if you can take a moment to get
seated we'll kick off our presentation
today um the first thing I'll say is uh
wearing this jacket is not a full choice
of mine but um it's tied in with the AWS
certification challenge so if you're
looking to educate yourself on AWS
certifications whether that's machine
learning or Sol architecture it's a
really great way to learn Hands-On um
how the AWS Cloud works so I'm going to
kick things off um my name is Jonah
Craig and I work as a startup Solutions
architect here in Ireland and I have the
absolute privilege of working with
customers like sonai analytics um and we
support startups whether they're like
two people uh in in their parents
basement or garage uh all the way up to
startups who have scaled to 100 or 200
plus employees
so it's really good fun and again sonai
have been on that journey and it's been
great to support them through last year
and this year so we like to take the
most Cutting Edge and and really bring
it uh to you today from an AIML
perspective why I love working with
sonai for two really key reasons the
first is they're using um AWS technology
to reduce cancer drug trial times I'm
I'm sure many of us in this room
including myself had had cancer effect
effect their lives in some shape or form
whether through family or friends and I
know every day when I have a meeting
with Jared or Matt it's extremely
motivating to to get stuck in and and
help them grow because they're bringing
efficiencies into the Healthcare System
I also love working with them because
from a technology perspective what
they're doing is truly Cutting Edge from
a startup team you don't have unlimited
resources and you really need to
leverage things like AWS manag services
and I'm going to let them talk that talk
through that and break that down
so my job today is to set a framework if
you like um for machine learning and
make it simple and you can use this
framework uh as to like listen to their
side of the story so the machine
learning Loop when we break this stand
to a foundational concept we start with
data preparation arguably the most
important part about any machine
learning problem it does not matter if
you have the most Cutting Edge
algorithms the data really is everything
so making sure you've got access to data
and making it machine learning friendly
is
key we then move on to model training
okay so this is where we select a model
and we want to take that data that we
have and start training it so the
expensive part of machine learning and
cost optimization really is a key part
to this and then when we're ready and we
picked the optimum model we can deploy
that into production and monitor it okay
so we can talk about things like Model D
Rift um which may mean over time maybe
the data that's coming in is slightly
different to what it was trained on and
this is where the secret sauce kind of
comes in which is orchestration mlops
okay and our Amazon sagemaker service
encapsulates this Loop so if you're
building a machine learning problem you
can do all of this and chain it all
together and sonai again have these
efficiencies in place and what they can
do with a small team is is just uh
breathtaking I'm going to outline as
well the the key high level challenges
for sonai the first one is data storage
so sonai analytics are in the healthcare
life science space and they're taking
the this foundational model Trend and
really you know bringing it to to this
space in this vertical so they're
dealing with petabytes of data so
ensuring that they've got a an effective
cost uh management system for that is is
crucial and also to be able to handle
that scale is key they're also um
training and hosting Their Own
Foundation models which is again
extremely Cutting Edge they're using
services such as Sage maker to help with
this but I'm going to let the the guys
dive deep into how they're actually
doing this and then finally scaling to
meet business demand it's really really
important if you're a startup or an
Enterprise you need to be able to scale
your technical architecture to be able
to meet customer demand and again sonai
can do that if they have a a big
Enterprise client or whether it's a new
startups joining their system they can
do that because they've architect
detected it in the correct
way so you'll hear a lot today about our
AWS Services okay and today we're going
to focus on a bit on Sage makers so keep
that in mind through their presentation
and we have a whole host of AI Services
Okay so we've got Amazon Health omix
here which uh again is it applies mostly
to the hcls space however no matter
you're vertical whether you're in
fintech or anything like this there are
AI Services there that can do a lot of
heavy lifting of AI in the back end okay
so I just wanted to paint that in case
you're maybe not in the htls space but
there's a lot of things that you can
take away from our service
deack so let's talk a little bit about
sagemaker before I hand over again if
you keep that model machine learning
Loop in mind you can see it sagemaker
Studio takes care of everything from
data preparation all the way through to
the deployment of the model and what's
really helpful is that it is designed
for a whole host of uh people using the
service so ml Engineers or data uh
scientists they can all collaborate in
one place which really helps
efficiencies because you're not using
say 203d Party Services for each part of
this process again the results can be
like sonai the way that they handle
this from an AWS architecture standpoint
I get asked all the time like this
sounds great but how do I actually get
started with something Hands-On that I
can build for my own company so again I
always Point people in the right
direction the best place to start is
with a workshop get hands on and this QR
code will take you to the sage maker
immersion day and this again we've got
examples of if you want to just maybe
experiment for your first time with sage
maker or you want to build a pipeline
and orchestrate a whole end to end
solution get started we have um uh we
then encourage you to build a POC on
your side and we have AWS Partners I'm
sure you've heard a lot about them today
so again don't be afraid to engage them
they they're they've done these problems
before and they can help you deliver
them and teach your team so again really
fast way to do it and then straight into
production so I wanted to set the scene
for sonay um I'm going to introduce uh
Dr Matthew Alise who is the uh head of
data science and it's been a pleasure
working with you Matt and uh I'll leave
it to
you afternoon guys uh you can hear me
great okay um so I'm going to set the
scene a little bit more than what
Jonah's done thank you so much for for
the introduction and also the invitation
to to present uh so we're doing a double
act today uh I'm doing the data science
and my colleague Jared is doing the
engineering we're both sides of this the
same coin okay uh and I can't do my job
without him and vice versa I build the
ml I'm extracted away from the
infrastructure and Jared essentially
does the plumbing uh for us so that we
can get access to data and train our
models so I'm going to tell you a bit
about what what we do and why we do it
and how we do it but first off I'll give
you a little bit about about myself and
my path to data science um I I actually
originally started as a molecular
biologist in the lab about 10 years ago
so I was probably looked a little bit
more like a mad scientist not I wasn't a
gold jacket but it was it was a long
white one of test tubes and mixing
Regents and I worked in a Precision
medicine lab uh back then there was a
lot of new technologies about sequencing
DNA proteins and so on and what happened
about 2015 uh which coincides with the
Advent of you know Ai and Nvidia
Technologies and so on is that we were
generating so much data that the
bottleneck wasn't the generation it was
the analysis and the utilization of it
so I then transitioned into what's
called bioinformatics uh which is really
just a subset of data science within the
health field uh so I joined son about 5
years ago and we' we've been building AI
products ever since uh so now now we can
dive in to sonre um so sonre is a Queens
University spin in Belfast it was a
fairly long drive down today uh up early
uh but it's great to be down here uh and
what we've developed is a number of
different algorithms hosted on AWS
platforms for our clients uh our clients
are developing drugs to treat cancer uh
within prec Precision medicine the
important thing is who should get the
drug you don't give everyone the same
drug because drugs can't be toxic and
there's no point in giving a drug to a
patient with an O under respond so
identify new markers to apply the right
treatment and we're develop helping
develop Diagnostics or digital
Diagnostics using AI ultimately to
enable our clients to save patient lives
so that that's sunr why do we do it uh
Precision medicine includes a lot of
different disease types uh but one of
the main ones that we focus on is cancer
uh I don't think I need to tell everyone
in the room that cancer is bad okay it's
it's very unpleasant one in two people
will develop a cancer incident in their
lifetime hundreds of thousands of people
die across the UK and Ireland every year
but as I said on the previous slide
there are a lot of organizations
developing therapies to help patients
and the amount of data that they're
Genera in the lab is very amable to AI
there's a lot of digital transformation
happening across biotech and right
through the healthcare and we're here to
help so I'm going to Spotlight a use
case uh focusing on computer vision uh a
little bit like the uh the previous talk
but uh fre it in the business context
and then dive into the AWS Tex stack
that we actually used right from
training through the machine learning or
the inference sorry uh and I'm going to
pass over to my colleague Jared who's
going to Spotlight some of the the
engineering challenges behind that as
well and it's also going to really tie
in nicely with the the train in Loop
which Jonah highlighted as
well uh so this particular use case uh
I'll actually point to the images first
that you can see so the top image there
is actually a microscope a glass slide
of a patient's cancer this is colong
cancer it's a glass slide and for a long
time it was what we refer to as analog
you looked on a microscope the pathology
just would be able to say okay there's
tumor uh they would get a Sharpie pen
literally a Sharpie pen draw on the
glass slid and say that's where you need
to extract your DNA from or your RNA or
whatever it is so that you can have a
test on your
diagnosis but we're now starting to scan
these slides so they're digitized
they're really large images they're
pyramidal tiffs you can see as you from
the the image below but because they're
not digitized it's aable to Ai and we
can essentially replace that Sharpie pen
with our own AI generated proposed so
it's called a region of Interest this is
important for a number of reasons
because Pathologists are scarce there's
actually fewer and fewer Pathologists as
uh as we go through time and they're
very expensive it's about $100 per slide
there and you'll see that as the AI that
we apply is is incredibly cost effective
so that's the business challenge we ask
ourselves can AI help with this spoiler
alert it can uh but technical
requirement is can we train a model
entirely within AWS detect cancer from
these large images and assist
Pathologists now we work in a regular
environment we're one of the few
companies building AI as medical devices
uh so assist is an important word that
we're not replacing Pathologists uh and
the business output of this is to save
millions in operational costs if there's
hundreds of thousands of those images
and you're paying pathologist $100 ago
in
mindset so what was the strategy uh we
started with the minimum amount of
patients that we could really work with
to generate a feature extractor and uh
generate a model so we had to ingest a
large data set 2,000 patients actually
quite a lot I know that in other domains
that's a very small number but 2,000
patients actually quite hard to get a
hold of uh then we had to automate the
data preparation so this data came from
the lab weo automate that and then we
enter that train and evaluation Loop and
we iter it and once the model is reaches
a sufficient performance uh then we can
deploy it with an user workflow and
that's the important part a model on its
own is not useful it has to be in an
adoptable workflow for uh uh for
clinicians to use and ultimately what we
got out of this was a model which
outperforms our competitors which is
great it's scalable Cloud native
deployment so we can actually scale to
meet the needs of healthcare uh so it's
not one patient at a time we can scale
using serverless comput to uh to to go
acrost thousands of patients uh the
turnaround time Jared's actually going
to speak to we've got some really
impressive metrics there as well as the
cost savings for clients but now I'm
going to to spot L an AWS conference so
I'm going to speak to some of the uh the
technology behind our training
um so remember these are Precision
medicine Labs so the training data
actually comes from these pathology
instruments we set up an Amazon SFTP uh
it lands an S3 and then Jared's team
have an an event driven architecture in
place which uh uses lambas to extract
the metadata from these really large
images so that they can be viewed by
Pathologists so our machine learning
pipelines are a little bit different
from other domains in the sense that
Pathologists have to be able to look at
the images before they can go to the
next stage so a human in the loop is
really important uh you might need 10
pathologist a thousand pathologist
looking at different images we're also
going to talk through some of the really
cool uh lamba based architecture that uh
or use of lambdas for viewing these
images these really big images uh the
next step on the data preparation side
uh is tile extraction so you saw tiles
within those images uh you cannot pass
this single image these large images
through a GPU they're just too big so
what you need to do is tile them up or
chunk them up uh and that the pipelines
that we actually originally started with
on premise was incredibly slow uh we've
used Cloud native technology uh you can
instantiate thousands of instances so
that it's done in in in hours instead of
days or even months that's the data
preparation side the data then lands on
S3 uh we use fargate tasks orchestrated
by ECS to do that uh and then the data
is ready for us to to start training and
that the ml team can get stuck in uh in
terms of we we have our docker for
training uh parked on ECR and we use
Sage maker uh for for our training jobs
uh we do we did have an on- premise
solution which we utilized a lot but
there's I don't think I need to tell
people why using the cloud is really
important for scalability but it means
that we didn't have cues machine
learning Engineers waiting for GPU
access uh and at a startup that's really
important we can also scale down uh our
use of compute as needed uh we can scale
our experimentation we get our metrics
we output our model back to S3 one of
the things that we're looking to do is
move to the model registry as part of
sagemaker uh we do have our own uh
artifactory on S3 to handle the nuances
of what we do and because we build
medical devices it's just about timing
to move towards uh embracing uh some of
the new features in stag maker so the
model performance once it's reached that
stage it's there is a lot of rigor that
goes into training some of these models
we follow the fda's gmlp so good machine
learning practices uh we have to invol
people right from oncologists machine
learning Engineers doctors uh even
nurses and survivors where we can input
about this the sign off of the
performance of this model once that's
done we can start to get ready for a
production uh and this is where our
inference comes in into play so uh our
S3 model artifact approved for
production it's baked into an inference
container uh it sounds like common sense
but we are not allowed to have any
training logic in our inference
containers whatsoever uh most people
would not but we're actually mandated by
the regulatory bodies that that cannot
happen there can be no Superfluous logic
in there in the inference containers so
we have to have a separate one uh that's
uh hoston uh ECR it's built and that's
when I start to from my perspective our
team start to think about handing over
Jared where his team will ship this out
ship the code uh cdk is used for INF the
infrastructure we have blue green
deployment so we can bring one uh one
application down uh and wait for the
other and we use serverless comput so uh
Lambda triggers uh and the fargate ECS
fargate task orchestrated by ECS uh we
are going to be moving to inference on
Sage maker uh the those tasks do not
have GPU yet uh we want GPU so that we
can further improve the turn around time
so Sage maker is what we'll be doing uh
for inference our serverless compute as
well uh the final part is monitoring
that's very important to me as well I've
greatly simplified all of the
infrastructure in place for that that's
not just watch but it's a really good
place for us to hone into uh we need to
understand the usage patterns of our
algorithms uh and Jonah mentioned model
drift metrics uh so if we start to see a
jump from 20% cancer incidence to 40% it
might tell us there's something wrong we
need to go to the lab understand if
something is changed with the data
that's being produced all of these
insights allow for next iterations and
training a a more robust model uh so
that's a really good segue for me to
hand over to to Jared my counterpart
who's going to uh talk about the
engineering challenges around what we
do thank you Matt thank you Jonah um I'm
Jer loan I lead the engineering team
that's a team of cloud Engineers full
stack and quality insurance Engineers as
Matt said we basically handle the
plumbing which is the data engineering
and the infrastructure that underpins
the AI and machine learning analysis uh
and I'm going to Spotlight how we use
edus at a high level what we provide to
your customer customers we provide we
build our own algorithms but our main
job is actually giving Pharma and
biotech the tools for them to develop
drugs for them to develop new
treatments um so our primary use cases
that we deploy was called The Trusted
research environment that is a prepr a
prepackage Cloud environment that allows
them to have all their tools all their
data all their users in a regulated
environment that's really important this
soner or infrastructure is heavily
dictated by things like gdpr things like
how we handle patient data and it was a
big reason why we also picked AWS so
that we can use certified data centers
and where we can trust that our data
will be
protected um our customers are seeking
to discover new cancer drugs new
treatments um they they want to develop
these models and algorithms they want to
typically take their data assets through
the clinical trial so they can get to
Market and we cover the full end to end
life cycle of that and this means
handling three sort of key areas so on
the Discovery side it's about how we
actually analyze the complex data it can
be really big really un weedly and
unlike business data it's way more
varied it tends to be messy you tend to
have very Niche modalities that require
completely different tools and this forc
us to look at a data M architecture and
to look at a variety of different AWS
services that suits each individual data
type so for example health omix is very
good at handling bulky data directly
from Lab instruments um it can use
popular framework such as nextflow which
is really important for handling all of
the different types of data that can
come out whether it's tabular genomic
protein Imaging based data we use Athena
as an extremely powerful um serverless
uh query that we can build to run under
analytics and our visualizations and we
can use glue to handle things like ETL
converting CSV to parkz and typically as
as a general process we need to take
large bulky data that for example might
be 100 gigabyte of raw data which you
then turn into 40 terabyte of working
data and then you end up with actually
surprising small csvs in parkz which is
resolved and that's where you can make
analytics and findings but you need to
link it all together you need
traceability and you need to have a an
auditing basically so that we can we can
know the full history of what happened
to this sta we can know the proper
process was carried out because
ultimately patient safety is at
risk um after we make our discoveries
and handle our data we now want to
develop it so sonre and our customers
make a lot of use of sage maker and
sagemaker notebooks and we can use that
to directly hook into other AWS services
including running llms but our primary
use is spinning up applications um
training tun in models and if preparing
new drugs treatments or algorithms for
eventual clinical release on the
deployment side um we have two use cases
our customers need to deploy the things
they build so we can give them an
operational window where they can use
things like API Gateway Dockers and Apps
sync so they can spin up their compute
they connect their data sources but also
on a very practical level we need to
deploy across the world we need to have
a partner like AWS who has data centers
that we can trust and we can use AWS
orgs to create a separate account for
each and every client it's fully
segregated we do not share anything
there's no risk of databases being um
coming across people sharing compute
every single customer gets a completely
different instance and we can use
control toar to completely automate the
deployment of it we can monitor remotely
we can move it updates and we also um
use cdk which is the cloud development
kit and that lets us treat the cloud
infrastructure the way we treat our
JavaScript code is completely PR it's
completely controlled we have absolute
certainty of what version is going out
and that each customer is using the
correct
version um spotlighting some interesting
challenges and I think Emily did a great
job pointing this earlier as well there
going to be a lot of um things that
resonate with us um I'm going to walk
through a real use case um with a client
who was looking to hold a paby or a
th000 terab of Imaging data um these
images are about 5 to 12 GB each and
when you extract them they're more like
50 gab so they get very unwe very
quickly and the big challenge with this
type of data is you need both High
availability and low cost and those
things are to typically at the opposite
ends of the spectrum and the way that we
handled that was to use S3's native life
cycle management so we're able to very
easily be by apis as well as by
automatically triggering after say 3
weeks of inactivity we can cycle data
between what we call Hot which is when
it's an S3 standard that's when a
pathologist needs to view the image they
want to run model training they want to
execute algorithms so we need it highly
available and then cold is kind of like
a having a a a cold storage where maybe
you've already processed the data but
legally you need to hold it or maybe
you're not ready to analyze it you want
to batch it later and a good way we like
to think about this it's very similar to
a manufacturing term known as just in
time where when you have a physical
Warehouse you don't want to hold
inventory for the sake of it you want to
be very e efficient about when you bring
it in about when you load it out it's
how you you can make a significant
amount of cost savings and in this case
alone this led to 600,000 in savings
just from holding the data one of the
first surprising things when I when I
joined soner was that the DAT the
storage is going to be a bigger cost
problem compute which I really did not
see coming but once we get into this
scale it it it it really shows that um
and interestingly we could have made
more cost savings cuz we could have got
it down to 25 or 50 terab hot and not a
game would have had a dramatic
impact um another interesting challenge
is son you fac when we were building our
AI um soon to be CI video our medical
devices is that we were using on-
premise server as Matt mentioned and in
order to handle this data we need to
pull the individual SVS orti files we
need to
extract the metadata we need to pull out
5 to 600,000 individual tiles store them
prepare them for training it's just very
computationally intensive and we had a
pretty expensive on- premise server but
ultimately we were just bottlenecked by
CPUs and gpus and based on the current
Cadence we were looking at six-month
delay just to get the data ready just to
do the data preparation stage so to get
around this we created an MVP Cloud
native tile extractor It Was Written in
Rust and it used lambas and farget and
the the end result was that we basically
just dumped all of the images into 1 F3
bucket we set up an automation policy to
then spin up um an individual task fre
image and six months became six hours it
just iterated through the full list it
gave us everything we need and on a
practical level for sunr it was a
difference in meeting the deadline and
feeling but it also means that us and
our customers can now build these things
6 to 12 months faster and this scales
with the data so the more data you have
the more this is more important it was
also incredibly cost effective um way
cheaper than the cost of electric would
have been to run our
server uh another interesting challenge
um that Madden mentioned was that we
have to keep the human in the loop so we
can't just hold this data it can't just
be behind the scenes and it would be a
lot cheaper and efficient if it was just
spinning up boxes in the back we
actually need to make it available and
Matt had mentioned about working in a
lab we essentially provide a digital
microscope so we need to make those
images available on a web browser um and
if you can imagine from the image in
that you can you can zoom left you can
zoom in and out you can go left up down
right
and when you're doing that you are
essentially trying to view 50 gab worth
of Imaging data there's about 5 or
600,000 images they're about 80
kilobytes each we're also trying to
avoid not having to store an extra 50 CU
then that one pyte now becomes 10 pyte
when the time you fully extracted so
what we did was create a cloud native
image viewer where we spun up thousands
of Lambda indications we linked it up
with appsync for graph query to a
JavaScript front end and essentially we
are lazy loading and rendering ing as we
spin up a Lambda it reaches in to the
archive format pulls out the tile that
we need we're not technically holding N3
storage we're not even paying S3 cost in
that because it's just a a temporary
transfer in a live session and the
result is is pretty much instantaneous I
can pan around I'm lazy loading as I
move around the screen I'm only using
the data that I need it's entirely
serverless it's it's it's instantaneous
as well and uh it's also extremely cost
effective we're talking fractions of a
penny for things like this to run over a
couple of hours
and a big thing for sonry was as well
that a lot of Open Source Solutions
competitors to get around this this big
data storage problem they sacrifice
image quality they heavily compress the
image because how else do you get that
amount of data on on a server um we were
able to completely avoid that we we
don't remotely compress we use the full
original size image and for us that is
the difference potentially between an AI
correctly estimating whether you have
lung cancer or not so we have we had no
sacrific the quality but we had
instantaneous speed and we avoided
serious storage
costs um this is now looping back to
sort of Matt's key purpose about why we
do this um I've worked in healthcare a
long time I used to build medical device
software in hospitals as well Healthcare
basically has a scalability problem in a
nutshell there are constantly more and
more patients there's fewer doctors
there's fewer nurses there's fewer path
Pathologists there's less money to go
around for patient and that is not going
to get better anytime soon so lots of
companies like our ourselves are looking
at how do we digitize this how do we use
Ai and Cloud to automate and a very
telling example is our MSI algorithm
that we built um to detect colon cancer
the current gold standard is about two
weeks in a lab it goes through multiple
people there's lots of physical
processes involved we can do the
equivalent in 3 minutes and in fact in
the space of time that I've been
speaking to you we could have done the
entirety of Ireland and it would have
been surprisingly cost effective we
would have just run Millions at a time
and tested it to 100,000 at a time a has
handled it absolutely no problem uh and
also then in terms of the Practical
stuff we mentioned the Pathologists need
access to a thousand terabytes and
pedabytes of data at a time the current
reality is they're passing around
physical hard drives they're losing data
they're struggling to find things we can
we can hold that data we can make it
highly available and it means the
pathologist can go through more cases
which means more patients can get more
diagnosis and of course everything we do
has to be regulated governed and all
about patient safety and we an ISO 1345
certified company so we've we've had
external guidance on how we do machine
learning what our practices are and that
we are prioritizing patient safety and
and positive cancer
outcomes uh and I want to talk a bit
about uh aeds and why we chose them and
we're a very young startup we're 5 to
six years old so we had a fresh chance
to look at providers and one of the big
things that appealed to us was that AWS
has a massive ecosystem um there's so
many services that we can latch on to
and I think as Jonah mentioned at the
start of a speech we we're a small but
ambitious team we can't boil the ocean
we don't want to build things that we
don't need to build we want to build the
things we care about so for example we
didn't need to build our own
authentication system we can use Cognito
we didn't need to build ways to manage
certain infrastructure when we can use
Athena we can use homic this allows us
to be really precious with our time and
it's ultimately why we compete against
bigger companies because we are picking
the things that matter um another key
thing is I mentioned is that we deploy
across the world so having a large cloud
provider that has respected data centers
where we can reliably ship out the same
product it's just fundamentally required
and the flexibility of AWS is also
really useful there's always aund ways
to do things it gives us the freedom to
build the things that we want I also
can't uh speak highly enough about um
the support we've had from Jon and sinon
um we torture them regularly and other
members of the edus team we've had
access to training for sage maker we've
had beta access to health omix it allows
us to get a competitive Head Start um it
whenever we're having trouble get
product Specialists brought in who can
help us identify if our cyber security
is up the spec it can make sure that
we're doing things the right way it just
allows us to move fast and build fast
which we have to do we have no choice
but to be moving quickly in the market
and in order to H get to where we are
today we took advantage of aded activate
um in the early days it gave us um the
ability to explore and create pocs and
minimum valuable products it meant that
we could afford to experiment and we
could get something in from the
customers it was really critical to our
early early business days um and in a
similar V we can look at Marketplace as
a way to see do we need to build this or
can we buy it can I take something off
the shelf can I deal with this awkward
customer request is this what sonra
really wants to do if not let's just
find it and Slot it in and that way we
get to keep moving we get to protect our
road map which again is just really
important for a company of our size uh
and lastly probably a fairly undervalued
statement AWS does tend have really good
documentation we're constantly trying
new things we're Hing our heads against
the wall so being able to access blogs
videos training material it's the
difference between meeting in a deadline
and
not um so final slide for me and I just
want to talk about the future and
there's some things we've already
pointed to um health omix is probably
the biggest Focus for us this year we've
already got it up and running and we've
had a lot of great support from The A
Team in that as mentioned earlier it's
for handling bulky raw data that tends
to come from instruments it's a really
big problem to our customers and in
practice you can be dealing with um
gigabytes of data that terabytes when
you're using it we actually went to
three different prototypes it's still in
the early days but um we went with AWS
batch originally which is the most open
source recommended route and we had the
exact same run cost $2,000 and health
omix came in at
$22 and you know that's because of
things like there's omix compute you're
avoiding networking and data cost with
not gateways so there's lots of
advantages and there's also security and
infrastructure benefits that we don't
have to manage infrastructure we can
rely upon apis and it just makes our
life easier um thankfully I don't have
to explain Foundation models because
Emily gave a fantastic presentation on
it but this is something that son very
much is wanting to double down on um
Imaging is very complicated I mentioned
about the expensive issue with um data
storage and how tile extractions very
competition intensive that gives us now
a foundation to build a foundation model
it means that we can potentially develop
this as a way to rapidly build new
cancer diagnosis algorithms and to make
what we call multi-indicator
multi-indicator based so it could be
that we can more easily um identify
other parts of cancer in the body as
well as allow our customers to build
these things as well and then last but
not least everyone on the plan has been
talking large about large language
models sonra has used them also to try
and and boost Discovery and we are
working with our own clients to see
about how we can link in apis to bedrock
so that customers can basically get new
drugs to Market quicker that that's our
purpose is to constantly speed up to
reduce the barriers overcome the
technical hurdles uh thank you very much
I'm going to pass back to Jonah
[Applause]
Jared Matthew absolutely incredible like
it's just seeing what you've achieved
with the team um is just brilliant and
every month I check in with you folks
like it's a new story and you've
progressed something that may take you
know a year so really really appreciate
it and definitely note that what what
sonai analytics do extremely well is
they really lean into the AWS manag
services so say maker Lambda fargate
they're using that and what that means
is they can focus their time developing
their own product and business needs and
less time managing kind of
infrastructure so you can see that and
the way they're going again talked a
little bit about the AI Services check
them out uh there's loads of general
ones there's verticalized ones and
there's a whole stack you can easily
find them on the AWS website um I have
uh one final ask for the audience um it
takes about 30 seconds if you wouldn't
mind scanning the QR code it helps uh
speakers such as myself Jared and
Matthew uh talk at these events so would
really appreciate uh the feedback from
yourselves it it should only take 30
seconds so on a final note um from a
startup perspective if anyone wants to
get handson with uh some generative AI
kind of workshops and things um we are
running a generative AI Workshop uh in
Dublin on the 22nd of May where you can
bring yourself and a and a team and
we're going to build a generative AI
chat bot so if you want to get involved
with that can find me after the talk and
Jared Matthew and myself are going to be
at the startup uh Loft over there so
when you go out the door turn right
we'll take any questions if you have any
you want to learn from the two or myself
uh we'd be really happy to to Dive Right
In with those conversations so we'll be
there for about half an hour thank you
so much for being in a great audience
[Applause]
5.0 / 5 (0 votes)