10-Minute Tutorial: Patient-Level Prediction or "PLP" (Jenna Reps)
Summary
TLDRThe video script introduces the 'Odyssey' GitHub repository for the Patient Prediction package, a tool for developing and validating predictive models in healthcare. It covers the basics of prediction, focusing on disease onset, progression, and treatment response. The script offers a demo using the 'nomia' package, showcasing how to design models, select features, and perform internal validation. It also highlights customization options and a new Shiny app for model result visualization.
Takeaways
- ๐ The script introduces the 'Odyssey' GitHub repository, which is a patient prediction package for developing and validating prediction models.
- ๐ฎ It explains the purpose of the package, which is to predict the risk of future outcomes based on current data, focusing on modern prognostic models and diagnostic models.
- ๐ฅ The 'index' concept is discussed, which is a point in time such as a doctor's visit, the start of pregnancy, or the end of pregnancy, where predictions can be made about future health outcomes.
- ๐ The script highlights the ability to learn patterns from medical records to predict future risks, such as the likelihood of a stroke within a certain timeframe.
- ๐ซ It clarifies that the package is not for causal inference but for personalizing risk predictions for future or current outcomes.
- ๐ฏ The main categories for prediction include disease onset, progression, treatment choice, treatment response, safety, and treatment adherence.
- ๐ ๏ธ The process of creating a model design is outlined, which involves specifying the target population, outcome, and time span for predictions.
- ๐ง The script discusses the customization options available for feature engineering, sampling, pre-processing, and model fitting within the package.
- ๐ The importance of data splitting for model development and performance estimation is emphasized, with options for customizing the data split process.
- ๐พ The demonstration includes using the 'nomia' package, which contains data for testing and running the patient prediction package without needing external data.
- ๐ฑ The script concludes with an overview of a new shiny app interface for viewing model results, diagnostics, and performance metrics like the ROC and calibration plots.
Q & A
What is the main purpose of the 'patient prediction' package discussed in the script?
-The 'patient prediction' package is designed for developing and validating prediction models that can estimate the risk of future outcomes based on baseline features extracted from medical records.
What types of prediction models does the package support?
-The package supports both classification and survival models, focusing on personalized risk prediction for outcomes such as disease onset, progression, treatment choice, treatment response, and safety.
What is the significance of the 'index' in the context of the script?
-The 'index' refers to a specific point in time, such as a patient's visit to a doctor or the start of a condition, from which baseline features are considered for predicting future outcomes.
How does the package handle the prediction of disease onset or progression?
-For disease onset or progression, the package considers the target population, the outcome to be predicted, and the time at risk. It uses data from medical records up to the index to predict the risk of a future event within a specified time frame.
What customization options does the package offer for model development?
-The package allows for customization in various stages, including feature engineering, sampling, pre-processing, model fitting, and data splitting for internal validation. Users can write custom code to tailor these processes to their specific needs.
What is the role of the 'model design' in the package?
-The 'model design' specifies the components necessary for prediction, including the target population, the outcome, and the time at risk. It guides the package in creating a prediction model tailored to the user's specific requirements.
How does the package handle feature extraction from medical records?
-Users can specify what features or covariates they want to extract from the data, such as demographics, gender, age groups, recent drug usage, and conditions. The package also supports the use of vocabulary hierarchies for grouping these features.
What is the 'nomia' package mentioned in the script, and how is it used?
-The 'nomia' package is a dataset provided for testing and demonstration purposes. It allows users to run the 'patient prediction' package without their own data, helping them understand how the package functions.
What is the purpose of the shiny app interface mentioned in the script?
-The shiny app interface is a new feature that provides an interactive way to view and analyze the results of the prediction models. It allows users to explore model diagnostics, performance summaries, and various plots for a deeper understanding of the model outcomes.
How does the package handle the storage and retrieval of model results?
-The package uses an SQLite database to store all the models and their results. This allows for easy retrieval and comparison of different models and their performances.
What kind of diagnostics does the package provide to assess the model design and data?
-The package provides diagnostics to check for any issues in the model design and the data being used. This helps ensure that the models are developed correctly and that the data is suitable for the intended predictions.
Outlines
๐ Introduction to Patient Prediction Package
The script introduces the 'Odyssey' GitHub repository for the 'Patient Prediction' package, a tool for developing and validating predictive models in healthcare. It explains the purpose of the package, which is to answer questions related to predicting future outcomes based on current data points. The speaker outlines the types of predictive models, such as disease onset, progression, treatment choice, treatment response, and safety, and emphasizes that the package is not for causal inference but for personalizing risk predictions. The script also mentions updates to the package and provides a brief overview of how the package can be used to predict risks like the chance of having a stroke within a year, given a patient's medical records.
๐ ๏ธ Customizing Prediction Models with Model Design
This paragraph delves into the technical aspects of creating a model design for predictions, which involves specifying the target population, outcome, and time at risk. It discusses how to define the target population using a cohort ID and how to set the outcome and time span for predictions. The script explains additional settings like restricting the data to certain time points or sampling a subset of the data for efficiency. It also covers how to specify features or covariates for the model, including demographic information, drug history, and conditions. The paragraph further explains the options for feature engineering, sampling techniques, pre-processing steps, model selection, and data splitting for model development and validation.
๐ง Fitting Models and Internal Validation Process
The final paragraph describes the process of fitting models and conducting internal validation using the 'Patient Prediction' package. It details the steps to run multiple prediction models by specifying database details, model designs, and cohort definitions. The script mentions the output of the process, which includes data objects, model fit results, diagnostics for model checking, and an SQLite database for storing all models. Additionally, it provides a quick overview of a new shiny app interface that allows users to explore the results of the models, including diagnostics, performance summaries, and various plots for model evaluation. The paragraph concludes with the speaker stopping the demonstration due to time constraints.
Mindmap
Keywords
๐กOdyssey GitHub Repository
๐กPrediction Models
๐กPrognostic Models
๐กIndex
๐กCohort
๐กOutcome
๐กTime at Risk
๐กFeature Engineering
๐กModel Design
๐กInternal Validation
๐กShiny App
Highlights
Introduction to the Odyssey GitHub repository for the Patient Prediction package, a tool for developing and validating prediction models.
Explanation of the purpose of the Patient Prediction package in creating models to predict the risk of future outcomes based on baseline features.
Differentiation between modern prognostic models and diagnostic models, and their respective use cases in predicting outcomes at a current point in time.
Description of the index point in healthcare, such as the start of a condition or treatment, as a reference for risk prediction.
The process of learning patterns from medical records to predict future outcomes, such as the risk of stroke.
Clarification that the package is not for causal inference but for personalizing risk predictions.
Introduction of the five main categories for prediction, including disease onset, progression, treatment choice, treatment response, safety, and adherence.
Demonstration of using the Nomia package for testing and running the Patient Prediction package without requiring personal data.
Explanation of creating a model design with target population, outcome, and time span for prediction within the package.
Details on how to specify features and covariates for patient prediction models, including demographic information and medical history.
Inclusion of feature engineering capabilities within the package, allowing for custom code integration.
Options for data sampling and pre-processing, such as normalization and removal of redundant features, within the model design.
Support for both classification and survival models in the Patient Prediction package.
Customization options for model fitting, including hyperparameter search and default settings.
Process for splitting data into development and validation sets for internal validation of models.
Overview of the Shiny app interface for viewing and exploring model results, diagnostics, and performance.
Final demonstration of the Patient Prediction package's output, including personalized risk predictions and model performance metrics.
Transcripts
um this is the the
odyssey github repository for the
patient prediction package so this is
the package that's available for
developing prediction models and
validating them
um i'm just going to run through quickly
what prediction is and the type of
questions we can answer and then i'll
actually give you a bit of a demo
um on the package and the latest code
that's in the package it's been getting
updated quite a lot recently so some of
it may look a little bit different even
to people who've been using this code
how to come up with models that can
predict the risk of of some future
outcome um modern prognostic which is
what i tend to fix on it can also be
useful for the diagnostic models when
you're trying to predict at a current
point in time but
generally what happens is you've got
some index
and this is a point where maybe someone
comes in to visit their doctor or maybe
they start the pregnancy or they end
pregnancy
or they have some sort of condition so
there's different points in the in in
the um time time in over time when
you're interacting with the healthcare
system where
you may want to try and predict
someone's risk or some future outcome
given some baseline features so um if we
can look at their medical records at
time zero and see what's being recorded
in their records can we can we learn
patterns in in these
conditions drugs procedures observations
devices that they've had
any time up to index to predict the the
risk of some future outcome so this is
trying to
come up with a rare
i see my my condition at a certain point
in time they can look at my records and
say you've got a risk of two percent in
half of having a stroke in the next year
so this is the type of questions that
predictions are answering if you're
trying to do causal inference then
you're on in the wrong package but if
you want to try and personalize a risk
or some future outcome or some current
outcome then patient prediction is is
the thing for you
there's five main categories for
prediction that we tend to focus on the
the most common one is is what i
mentioned earlier was the disease onset
progression
so what it is is that for every single
prediction you're going to have three
different components you've got the
things in green which is
the tiger population you've got the
things in red that is the outcome that
you want to predict and things in in
purple that is the time span when you're
trying to predict the the outcome
occurring
so the target population who you want to
do the prediction for for disease onset
it may be people who have some new
disease
the outcome what it is you're trying to
predict
the
implication and the time at risk when
you want to predict it
it it may be uh while you're well maybe
within a year for example or the next
three years of having some illness
treatment choice is another prediction
we see this pretty often uh used when we
do propensity school modelling
and this is saying if you target
population of drug one or drug two which
people have drug one
um on day zero so this is the target
population is the union of the the drug
users and the outcome is just one of the
drugs and the time time zero time risk
is time zero treatment response this is
um basically people who have some
chronically used drug can we predict
some some desired outcome in some future
time period
so basically are people going to respond
well to the drug treatment safety so
women people who have a drug can we
predict some known adverse event within
some time period for example while
they're taking the drug
and then we can also look at treatment
adherence so within people who are given
huge drug
kind of it stays on the drug um and
takes it as kind of
recommended um during some time period
so these are the different prediction
questions that we can answer kind of
describe what prediction is let's now
let's get into what it looks like in in
the code so
i'm going to be demonstrating this with
the nomia package so this is a package
that has some some data in it that you
can use to
basically test and see how things work
so even if you don't have data available
to you you can still use the you know to
actually test
and run the package and see how it is if
you have data the good thing would be
you could actually put this pointless
towards your data and actually run some
some models using the real data but if i
load the anomia it's basically going to
create some four different target
four different cohorts three of which
are target kills i believe one is an
outcome so it's created
a
new drug user cohort a new drug user
cohort
people who have a bleed and another drug
so we've got free free
um i'm going to be running this
demonstration for predicting
gastrointestinal bleed for new users of
these various
drugs and then
patient prediction the way we've done it
now is we've made it so that you need to
create a model uh design for your
prediction so the model design it has
the three components i mentioned the
target population who you want to
predict um this is an id for a cohort
and generally your code will be
generated in atlas
but there's basically a table somewhere
that says here's a set of people who
have this cohort definition id and they
give you a time and then we have an
outcome
id so you can see here the outcome idea
is is corresponding to three which is
the the bleed and the target id is four
which is the nsaids
so this is the the target population
outcome and then although i mentioned
that you need time at risk when you're
trying to predict it this is actually
specified in this population settings so
in the population settings we're saying
that we're basically going to be doing
the prediction
that they had the nsaid and then to 365
days after that so within a year of that
of the nsaid
then we also have a few extra things i
kind of didn't really mention but
there's a few things one is to restrict
plb data settings so this lets you for
example you can sample patients here so
if you've got a big cohort so if if in
your data set you had 10 million people
that had the nsaids you may not want to
extract that at all because it's going
to take some time so here you could
actually pick a number here and you can
say i only want to sample a million
people and that will then just randomly
sample a million people from that 10
million people and it'll make it a bit
quicker to download you can also specify
time points of when you're trying to do
the study so you mainly want to look at
within 2020 or you may want to look at
within 2019 to 2021 so you can specify
start and end dates here to restrict
that and then we've also got things
where you can say and you want to look
at first exposure for the for the drug
um or i want people to have at least 365
days prior observation so you can
specify that in this restricted
settings here
there's also the option of doing further
restrictions so there's some redundancy
but the same things of like first
exposure only and some other inclusion
criteria are specified in this
population setting and you can look at
the documentation to look at all the
different settings you can have the next
thing you need is to specify what
features do you want to create for the
patient so what covariates what
descriptors
so you know i mentioned i had that plot
at the beginning where you had index and
you were looking at anything that
happened up to index here you specify
what you're extracting from the data so
i said i want dem i want the gender i
want age groups this is five year age
groups i want um to use any any drugs
that they've had in the last one year
and also any conditions they've had and
i'm going to be using the the vocabulary
hierarchies to look at groups for that
we've made it so you can add future
engineering now so this is like i'm
basically not going to do any feature
engineering but you can actually plug in
so if you actually wanted to do some
feature engineering you can write custom
code and you can plug that in here to do
whatever feature engineer
you want to do downloaded them we've
also got the option of doing sampling so
you can over sample under sample and
again we've made it so that you can add
custom stuff here so you can look at the
documentation to see how you can do that
um we've also got a study so cynthia
from erasmus is actually working on a
study to investigate the sample setting
so before you actually do that you may
want to wait for her results
because she's got some some interesting
results happening there
then we've also got um whether you want
to pre-process so some of the models
when you fit them you need to normalize
your data
and you may want to do some other
pre-processing like removing ref
features or removing things that are
redundant here this you've got an option
here of specifying so you can you can
say if you want to normalize you can say
if you want to remove rare things
and you can remove covariates that are
redundant so if there's two covariates
that are completely correlated you can
get rid of one of them
then here you're basically specifying
what model you want to fit so we support
classification and survival models right
now
uh the main one we
the one i got
expression so i can specify here that
i'm doing um setting the last religious
aggression
you can put in some information for like
the hyper parameter search um or all the
settings when you're fitting it or you
can just use the the default i'm just
going to put the default in here but
again you can look at the documentation
if you're interested in that and then
the last thing is how you want to split
your data into the data used to develop
the model and the data used to uh
estimate its performance for for
internal validation so again i'm just
going to do the default settings here
but you can plug in
you can actually write custom now to to
customize this however you want a lot of
these these options have been written in
a way where you can add customization so
you don't have to do the default
settings you can write your own code for
example for the coverage settings you
can actually write custom code for this
so we try to make it so it's more it's
very adaptable and very customizable but
if i actually run this now so this is
for the nsaids model this is going to
give me a model design and i've done
exactly the same model design but
instead of
using the
nsa
sorry the first code for the target
which is this drug so i use a different
drug here but same design so i can run
enter here and then i need to put in my
database details for you nomia these are
the settings but if you have your own
database um with a common data model
format you can actually put in your own
settings here
and now
to actually fit these models and do
internal validation all i have to do is
run this run multiple plp i put in the
database details i put in my list of
model designs and you could have as many
model designs as you wanted so this you
can fit a lot of models uh through this
process and then i'm putting in cohort
definitions here just so that for the um
when i'm saving i know what the names
were because i didn't actually specify
anywhere here i just have ids and it's
specific their names are so here's just
a way of me putting the names in so i
know when i've got my results uh what
the code ids corresponded to
and then i put the save directory so
right now um i had a demo here which i'm
gonna talk
when i run enter this it
i ran that's yeah so if i run this now
it should start running and it's seeing
it's basically running the whole process
to fit two models um and we're gonna see
things are popping up now in this folder
here but i pre-did this with the plp
demo one so you can see here what comes
out is you get your data
so i had two target cohorts two
different drugs so i have two different
data objects and then i had two
different models so the model one for
analysis one model two for analysis two
and then in this you have your results
for diagnostics to see did the are there
any issues really in your model design
and the data you're using and then you
also have your model that was fit here
so your plp result object is the actual
model you fit and you also have a log to
tell you how things progress and if
there's errors that will show on the in
the log
and then the last thing which is kind of
unique to what we've just added we've
got an sqlite database here that lets
you um basically it puts all the models
into one database and then
i'm just giving you a quick overview of
the shiny app um
hopefully this should be finishing soon
see i might just stop this and this
would be quicker if i just show you what
i already did
so
this was what ran before so this is
basically what's output you'll see what
you get is for every person there's a
row so uh well a row corresponds to a
person you have like an id a date and
then you have information about them
like whether they had the outcome but
what's added what you've done the model
is this value and this is the predicted
risk so this is saying this person had a
37 risk of the outcome
uh by the model and if we scroll down
we're going to find people who have a 34
risk of the outcome so you can see
people are getting the personalized risk
i just quickly want to show you the
shiny app
what is it too much time up
no sorry i've got to change this to what
you want so this is actually the new
shiny app interface where now
when
you have a
place it's loading all the results of
the two models and it's going to tell
you that there are two designs there was
one outcome there's one time at risk but
there's two target populations it's
going to tell you the model design id
the number of databases that we use for
development you can look at the
diagnostics to see whether it passes
so here we can see there's passes
you can view other characteristics of
the diagnostics and you can also view
the model
so you can go through and you can have a
look at how how well the model did like
the settings go back to the summary you
can just see a summary of the
performance like discrimination how big
the data were and such if you go to
explorer you can actually look at plots
so you can look at the auc plot
sorry the rock plot and you can look at
the calibration plot um for for each of
these results so this is the new viewer
that's going to be being put in soon
and at that i believe i'm at time so
i will stop sharing
Browse More Related Video
R for HTA 2024 Workshop - Robert Smith & Tom Ward - AssertHE
Key Machine Learning terminology like Label, Features, Examples, Models, Regression, Classification
Autoregressive Models | Auto Regression | Machine Learning for Beginners | Edureka
Construindo Plots com Matplotlib em Python
The LangChain Cookbook - Beginner Guide To 7 Essential Concepts
How GitHub Actions 10x my productivity
5.0 / 5 (0 votes)