How to detect drift and resolve issues in you Machine Learning models?

NannyML
28 Mar 202450:40

Summary

TLDRThis detailed presentation dives into the critical concept of data drift, also known as covariate shift, exploring its impact on machine learning models post-deployment. The speaker begins with a theoretical overview, followed by a discussion on the implications of data drift and covariate shift on model performance. Emphasizing practical applications, the talk introduces algorithms for detecting variations in data and concludes with a hands-on Jupiter notebook demonstration. This demonstration, accessible on GitHub, showcases real dataset analysis to identify and understand drift occurrences, providing invaluable insights into maintaining model accuracy and reliability in production environments.

Takeaways

  • 📊 Detecting data drift, also known as covariate shift, is crucial for understanding model failures in production environments.
  • 📝 The webinar covers a theoretical introduction, followed by practical demonstrations on identifying and addressing data drift.
  • 🤖 Algorithms for detecting variation in data are discussed, showcasing methods to identify when and where drift occurs.
  • 🛠️ A practical deep dive using a Jupyter notebook illustrates the process of detecting data drift in a real dataset.
  • 💳 The importance of monitoring models, especially in scenarios like predicting mortgage defaults, highlights the necessity of understanding data drift.
  • 📈 The concept of covariate shift is defined as changes in the joint distribution of model inputs, which can significantly impact model performance.
  • ⚡ Universal and multivariable detection methods are explored for identifying drift, with strengths and weaknesses discussed for each approach.
  • 🔍 The webinar emphasizes ML model monitoring, outlining steps from performance monitoring to issue resolution to protect business impact.
  • 📱 Practical tips on using univariable and multivariable detection techniques provide insights into effectively identifying and analyzing data drift.
  • 📚 Recommendations for further learning and engagement, such as accessing the webinar recording and exploring open-source libraries, are provided.

Q & A

  • What is data drift and how does it affect model performance?

    -Data drift, also known as covariate shift, occurs when the distribution of model input data changes significantly after the model has been deployed. It can negatively impact the model's performance by making its predictions less accurate.

  • How can machine learning practitioners detect data drift?

    -Practitioners can detect data drift by using specific algorithms designed to monitor and analyze changes in data distribution. These algorithms can assess whether there's a significant shift in the features' distribution or the relationship between features.

  • What is the importance of monitoring machine learning models in production?

    -Monitoring is crucial for maintaining the business impact of the model, reducing its risk, and ensuring its predictions remain reliable over time. It helps identify when the model's performance degrades due to data drift or other factors.

  • Can you give an example of a proxy target used by banks to predict mortgage defaults?

    -Banks might use a proxy target such as whether a person is delayed by six months or more in mortgage repayments after two years from the loan's start as an indicator of defaulting, rather than waiting the entire mortgage duration.

  • Why is it challenging to directly evaluate the quality of predictions for certain models?

    -For models predicting outcomes over long periods, like mortgage defaults, direct evaluation is impractical because it requires waiting years for the actual outcomes. Hence, proxy targets and performance estimation techniques become necessary.

  • What are the two main reasons machine learning models fail after deployment?

    -The two main reasons are data drift and changes in the relationship between model inputs. Both can lead to significant drops in performance if the model cannot adapt to these changes.

  • What is univariable drift detection and its limitations?

    -Univariable drift detection involves assessing changes in the distribution of individual features. Its limitations include the inability to detect changes in correlations between features, which can result in high false-positive rates for alerts.

  • How does the Johnson-Lindenstrauss method help in drift detection?

    -The Johnson-Lindenstrauss method is recommended for drift detection because it is robust against outliers and good at detecting significant shifts in data, though it can be sensitive to small drifts.

  • Why is multivariable drift detection important and what are its challenges?

    -Multivariable drift detection is crucial for capturing changes in the relationship between features or the entire data set, which univariable detection misses. However, it requires at least two features to work and can be less interpretable.

  • What practical steps were followed in the Jupiter notebook tutorial for detecting data drift?

    -The tutorial involved training a simple model, simulating its deployment, and then using specific algorithms to analyze data drift in production data, highlighting how to identify and address issues affecting model performance.

Outlines

00:00

🔍 Introduction to Data Drift Detection in Machine Learning Models

This segment introduces the concept of data drift and its impact on machine learning models post-deployment. The speaker outlines the webinar structure, starting with theoretical aspects, moving to the definition and risks associated with data drift and covariance shift, and concluding with practical detection methods. The importance of understanding data drift is emphasized with a real-world example of predicting mortgage defaults in banking. The speaker explains the challenges in evaluating model performance over time due to delayed outcomes, underlining the necessity of estimating model performance indirectly. The initial steps towards ml model monitoring and the detection of covariate shift, a common reason for model failure, are discussed.

05:00

📊 Deep Dive into Root Cause Analysis and Univariate Drift Detection

This paragraph transitions to the importance of root cause analysis following the identification of performance issues in deployed models. The focus is on understanding data drift, its causes, and its effects on model performance. The discussion then shifts to univariate drift detection methods, highlighting their ability to capture distribution changes in individual features while acknowledging their limitations, such as the inability to detect changes in feature correlations and the high rate of false positives. Various methods available for univariate drift detection are briefly mentioned, with a promise of more detailed exploration in future webinars.

10:02

🤖 Exploring Multivariate Drift Detection Techniques

The narrative progresses to multivariate drift detection, addressing its ability to capture linear relationships and distribution changes among multiple features. The limitations of multivariate methods are acknowledged, including their reliance on at least two features and reduced interpretability. The procedure for applying multivariate detection using PCA (Principal Component Analysis) is outlined, emphasizing the significance of comparing reference and analysis data sets to identify structural changes in the data. The summary highlights how multivariate detection complements univariate methods by providing a broader analysis of data drift.

15:04

📝 Practical Application and Analysis Using a Jupyter Notebook

The speaker provides a hands-on demonstration of detecting data drift using a Jupyter notebook, walking through the process of training a simple machine learning model and identifying drift in a simulated production data set. The tutorial covers the initial model training, the preparation of data for drift detection, and the utilization of Anomalib, an open-source library, for both univariate and multivariate drift analysis. This practical approach illustrates how to identify significant drifts and their impact on model performance, leading to actionable insights for model improvement.

20:05

🚀 Concluding Remarks and Q&A Session

The webinar concludes with a Q&A session, where the speaker addresses questions about the robustness of the Jensen-Shannon distance metric to outliers, the potential for using deep autoencoders for more nuanced drift detection, and the application of univariate drift tests to time series data. The speaker emphasizes the flexibility and future roadmap of Anomalib for incorporating advanced features like variational autoencoders. The audience is encouraged to contribute to the open-source project and reminded of upcoming webinars for further learning.

Mindmap

Keywords

💡Data Drift

Data Drift refers to the change in the model input data distribution over time, which can impact the performance of machine learning models once deployed in production. This concept is crucial in the context of the video, where the speaker discusses how monitoring for data drift is essential for maintaining the accuracy and reliability of models in real-world applications. For example, the speaker mentions that changes in customer information or credit scores over time can lead to data drift, affecting the model's ability to predict mortgage defaults accurately.

💡Covariate Shift

Covariate Shift is a specific type of data drift where the distribution of input variables (covariates) changes while the conditional distribution of the target variable given the input variables remains the same. In the video, the presenter highlights covariate shift as a common reason machine learning models fail after deployment. By understanding and detecting covariate shift, developers can take corrective measures to adjust the model and improve its performance over time.

💡ML Monitoring

ML Monitoring involves tracking and evaluating the performance of machine learning models in production to identify and rectify issues like data drift or model degradation. The speaker outlines a structured monitoring flow, emphasizing the importance of continuous observation for the models to ensure they continue to deliver expected business outcomes and reduce the risk associated with model predictions.

💡Root Cause Analysis

Root Cause Analysis in the context of machine learning involves identifying the underlying reasons for the drop in model performance. This can include investigating data drift, covariate shifts, or other operational issues. The video explains how conducting a thorough root cause analysis is a critical step after detecting performance issues, as it guides the data science team on how to effectively address and resolve the problems.

💡Univariable Detection

Univariable Detection refers to the process of analyzing individual model input features to identify any changes in their distributions over time. This method is part of the toolkit for detecting data drift, as discussed in the video. It's particularly useful for pinpointing specific features that may contribute to a model's declining performance, although it cannot detect changes in the relationships between features.

💡Multivariable Detection

Multivariable Detection involves analyzing changes in the relationships between multiple model input features to identify data drift. The video explains how this method can capture linear changes in feature relationships and is crucial for detecting scenarios where individual feature distributions might not change, but their interactions do, potentially impacting model performance.

💡Jensen-Shannon Distance

Jensen-Shannon Distance is a method used in univariable detection to measure the similarity between two probability distributions. It is presented in the video as a default method for detecting significant shifts in data due to its robustness against outliers and its effectiveness in capturing most significant drifts. The speaker describes how it calculates the average difference between the distributions of a single feature over different time periods.

💡PCA (Principal Component Analysis)

PCA, or Principal Component Analysis, is a technique mentioned in the context of multivariable detection for data drift. It reduces the dimensionality of data by transforming it into a set of linearly uncorrelated variables called principal components. The video demonstrates how PCA is used to capture changes in the data structure by comparing the reconstruction error between the original and compressed data.

💡Reconstruction Error

Reconstruction Error is the difference between the original dataset and its reconstruction after being compressed and decompressed, as in PCA. The video utilizes this metric in multivariable detection to quantify the extent of data drift. A significant change in reconstruction error indicates a substantial shift in the data structure, informing data scientists of potential issues affecting model performance.

💡Model Degradation

Model Degradation refers to the phenomenon where the performance of a machine learning model deteriorates over time due to changes in the underlying data or environment. The video underscores the importance of ML monitoring and regular evaluations to detect signs of model degradation early, allowing for timely interventions to maintain the model's accuracy and relevance in making predictions.

Highlights

Introduction to detecting data drift, covariate shift, and their impacts on model performance after deployment.

Explaining the importance of understanding mortgage defaults prediction for banks using proxy targets.

The challenge of evaluating model predictions quality due to the long wait for actual outcomes.

Overview of ML monitoring flow: maintaining business impact, reducing model risk, and increasing model visibility.

Defining covariate shift as a change in the joint model input distribution and its impact on model performance.

Introduction to univariable drift detection algorithms and their utility in pinpointing data drift.

Discussing the Jensen-Shannon distance method for detecting significant shifts in data.

Practical demonstration of drift detection using a Jupyter notebook and a real dataset.

The significance of handling year and month data correctly to avoid model performance issues.

The importance of preprocessing time series data for drift detection to get meaningful results.

Recommendations for using PCA in multivariable drift detection for robust and stable results.

The potential future inclusion of deep autoencoders in multivariable drift detection for capturing nonlinear relationships.

The role of domain knowledge in interpreting multivariable drift detection results.

The importance of monitoring and understanding data drift to ensure the sustained performance of ML models in production.

Invitation to contribute to the open source library for ML monitoring and drift detection.

Transcripts

play00:00

uh today we're going to talk about how

play00:02

to detect data address R also known as

play00:04

covarianship and how we can really use

play00:07

that knowledge to try to figure out what

play00:09

has gone wrong with our models after it

play00:11

has been deployed to production and I'm

play00:14

gonna have a theoretical part first and

play00:16

then I'm gonna go over what is data

play00:19

drift and risk of our chip and how it

play00:21

can impact performance and then we'll go

play00:23

over the algorithms that you can use

play00:25

within an email to detect a variation

play00:28

and then we'll finish with a practical

play00:31

Deep dive in a Jupiter node that I

play00:33

prepare that you also can access on

play00:36

GitHub uh just going through a reality

play00:40

data set when we will see that there's

play00:42

some drift and we'll try to spot how it

play00:44

happened and where it is

play00:46

uh so let's get started with uh

play00:49

desetting the stage

play00:50

just to give you a kind of a perspective

play00:52

why it's important so imagine that

play00:54

you're trying to predict uh Bank uh so

play00:56

mortgage people to work at a bank you

play00:58

try to predict mortgage defaults uh so

play01:00

in order to develop models that can

play01:02

predict whether somebody is going to

play01:04

default on a loan or not you're going to

play01:05

take their credit scores uh their

play01:07

customer information and uh hopefully

play01:10

you build them also actually predicts

play01:12

loan defaults reasonably well and to

play01:15

give you a bit of information about how

play01:17

it works in practice normally you would

play01:19

want to wait entire duration of the

play01:21

market should say 20 years or 30 years

play01:24

uh to get your targets but that is not

play01:26

really practical and for a vast majority

play01:29

of

play01:30

uh of customers you can already know

play01:33

whether they're going to be called or

play01:35

not and in reality most banks use some

play01:38

kind of proxy targets for that as an

play01:40

example we can say that if a person is

play01:42

delayed six months or more in the

play01:45

repayment of the mortgage after two

play01:47

years uh

play01:49

of the loan start long beginning then we

play01:53

can say that this person is practically

play01:54

defaulting so that is can be the Target

play01:57

and here of course it means that we

play01:59

still need to wait two years

play02:01

um after we've deployed our boss and

play02:03

made the prediction uh to really see

play02:05

whether this prediction was correct or

play02:06

not which means we cannot easily

play02:08

evaluate the quality of the predictions

play02:10

and that means that we need to somehow

play02:13

try to

play02:15

um estimate that performance and I

play02:17

already made a I already gave a webinar

play02:19

last week about that uh so if you wanna

play02:23

learn more then

play02:25

um ping me uh only in after the webinar

play02:28

and I'll let you share the link with the

play02:30

uh with the recording and now we're

play02:33

going to assume that we actually have

play02:34

access to targets and somehow something

play02:36

has gone wrong and we need to figure out

play02:39

what went wrong and we'll learn the

play02:41

steps so the first step is going over

play02:43

the ml monitoring flow so something that

play02:45

already kind of covered which is what do

play02:48

we what should we do first watch the

play02:50

second watch you prefer it how do we go

play02:52

from starting monitoring to resolving

play02:54

any issues and making sure that we can

play02:56

actually protect the business impact of

play02:58

our models then I'm gonna Define

play03:01

coverage which is one of the two main

play03:03

reasons why machine learning models can

play03:05

fail and the easiest one to spot and

play03:07

then vast majority of the webinar is

play03:10

going to be about actually delving

play03:12

deeper into the algorithms we have at

play03:14

our disposal to try to figure out where

play03:17

is data how strong it is and whether

play03:19

it's potentially linked with the drop in

play03:21

performance and for that we have

play03:23

Universal Protection when we look at one

play03:25

feature at a time and we have the

play03:27

multivariable detection and we'll try to

play03:29

look at a group of pictures or even

play03:31

entire data set at once three try to

play03:33

figure out whether there is some

play03:35

significant drift in our data

play03:38

so let's get started with the first part

play03:40

the monitoring flow and I'll just

play03:42

quickly go over that so we have the

play03:44

three goals of monitoring first we want

play03:47

to maintain the business impact of the

play03:48

model this is kind of an obvious thing

play03:50

where you develop your machine learning

play03:52

modes they serve a purpose and hopefully

play03:54

Drive business impact and if the model

play03:57

is deteriorating production and uh vast

play03:59

majority of them do deteriorate in

play04:01

production or the degrade in production

play04:03

we need to do things we need to know

play04:05

what's going on and then we need to take

play04:07

actions to maintain the business impact

play04:08

of that model second thing is reducing

play04:11

the risk of the model because the

play04:12

predictions are uncertain and there is a

play04:15

certain risk that every machine learning

play04:17

model basically imparts on the

play04:19

organization and as long as the model is

play04:22

predicting reasonably well or predicting

play04:25

inline expectations this risk is known

play04:27

however in the modern rate this risk can

play04:30

really balloon out of proportion so what

play04:33

we want to do with monitoring is to

play04:34

really know and quantify the risk of the

play04:37

model and the last thing for you as a

play04:39

data scientist or for you as a data

play04:42

science you want to increase the

play04:43

visibility of the model to either

play04:47

basically gain recognition and make sure

play04:49

that your work is well rewarded by

play04:51

either hopefully getting promotions or

play04:53

getting braces or are getting higher

play04:56

budget allocations working

play04:58

now for the process we started

play05:00

performance monitoring and that's

play05:02

something that I covered in their

play05:03

previous webinar when we want to make

play05:04

sure that we know the performance of our

play05:06

models at all times whether ground proof

play05:09

is available or not then we go into the

play05:11

root cause analysis so seemingly

play05:12

something has gone wrong and only if

play05:14

something has gone wrong you don't need

play05:15

to figure out what has gone wrong so

play05:17

then we can go into the result issue

play05:20

resolution and try to resolve the

play05:22

problem and today we're focusing mostly

play05:24

on the second part which is the ripples

play05:26

analysis and to actually do root cause

play05:29

analysis we'll have to look at data trip

play05:31

and what exactly change in our data and

play05:34

hopefully you can also find out the

play05:37

actual causes of the drop-in performance

play05:40

so that is read the flow and now we can

play05:43

start with the second part which is the

play05:45

coverage sheet

play05:49

so let's start with the defining

play05:51

a work by a chip ifs and we can quickly

play05:55

Define a coverage sheet as the change in

play05:58

the joint model input distribution so

play06:00

imagine that you have multiple features

play06:02

uh comprising of your model inputs and

play06:06

if this joint distribution changes in

play06:08

any significant way and we can talk

play06:10

about covariation to give you a simple

play06:12

example here imagine that we have just

play06:14

one feature and if that distribution

play06:17

that sample uh from population so we

play06:21

have some kind of sampling function

play06:22

that's something function from

play06:24

population to our sample changes we have

play06:26

coverage and that means that not only

play06:29

that model input distribution will

play06:31

change but potentially also the target

play06:33

distribution is going to change because

play06:35

we might move from a region when let's

play06:37

say there is more positive plus

play06:38

instances to a region when there is more

play06:40

negative class instances like in the

play06:43

example here we see that we have in

play06:45

total more negative instant negative

play06:47

class instances compared to uh the

play06:50

before the shift and that also might

play06:53

potentially impact the performance of

play06:55

the model if we move from regions when

play06:57

the model is supposed to perform well

play06:58

because maybe it's very easy to separate

play07:00

classes to region when the model is not

play07:03

going to perform so well maybe because

play07:05

it is hard to separate the classes or

play07:08

maybe because it did not have enough

play07:10

data to really learn the correct pattern

play07:12

and of course the same applies to

play07:14

regression problems but it's a bit

play07:16

easier to

play07:17

talk about binary problems and binary

play07:20

classification problems and to visualize

play07:22

binary classification problems so let's

play07:24

stick with that

play07:26

uh now uh of course as I mentioned we're

play07:28

talking about a joint probability

play07:30

distribution uh so just to give you an

play07:33

example why this joint part is important

play07:35

uh there are kind of coverage sheets

play07:38

when if you look at every single feature

play07:40

separately you will not only see a

play07:43

difference so imagine it here due to

play07:45

some kind of error on the data

play07:48

preparation or data engineering side or

play07:50

maybe somewhere in your data pipelines

play07:52

Upstream feature one and feature 2 gets

play07:54

switched sometime before between week 10

play07:57

and week six and you look at those

play07:59

distributions and if you look at the

play08:01

distribution separately it's gonna look

play08:04

basically exactly the same and if you

play08:07

look here the distribution discussion

play08:10

and the distribution of 10 and over 16

play08:12

are

play08:13

basically the same

play08:16

but almost to a rounding error and the

play08:18

same follows here but of course our

play08:21

distribution is completely different and

play08:23

the model is going to make terrible

play08:25

mistakes because it assumes that feature

play08:27

one is feature two and vice versa so

play08:29

this is something that we absolutely

play08:31

need to

play08:33

be able to detect and it's something

play08:36

that can actually

play08:37

let us identify what is the potential

play08:40

root cause of the performance store

play08:43

so now we know uh what are potential

play08:46

problems uh with machine learning models

play08:49

in production that they deteriorate and

play08:51

one of the main causes is coverage shift

play08:53

and how it looks like what it is now

play08:56

let's talk about how to detect it so

play08:58

let's assume that we have a model there

play09:01

is a problem we need to figure out what

play09:02

exactly wants work so the First Avenue

play09:05

that we can go with is the evaluative

play09:07

detection

play09:08

and what does it do so first of all it's

play09:10

going to capture the change in

play09:12

distribution of a single feature

play09:14

um and because it's going to look at the

play09:17

distribution of a single feature within

play09:19

that email it's going to automatically

play09:21

kind of loop over all features in

play09:24

paralyzed way and you will see all

play09:28

features separating and how the

play09:29

distributions of those features is

play09:31

changing but there's of course a few

play09:34

things that you cannot do and the most

play09:36

important one is that it cannot detect

play09:38

changes in correlations between features

play09:40

or any really change in relationship

play09:42

between teachers whether the

play09:43

correlations are maybe something a bit

play09:45

less leader and because imagine that you

play09:49

have a model with 70 features or 100

play09:51

features which is pretty standard in

play09:53

modular deployed to production if you

play09:55

look at every single feature separately

play09:57

it's quite likely that some of them will

play10:00

shift even though it doesn't impact

play10:02

performance or it's not the root cause

play10:04

of the problems that we might have

play10:07

identified with our performance

play10:08

monitoring and that means that it can

play10:11

suffer from high positive rates if you

play10:13

have 100 features and you want to do

play10:15

your monitoring daily you'll almost

play10:17

certainly get a lot of false positives

play10:19

of things that are meaningful changing

play10:21

but they are not really that uh real

play10:24

problem so you will just drown in alerts

play10:27

and this is really one of the main

play10:29

reasons why Universal detection cannot

play10:31

be used as a kind of quality assurance

play10:34

store tool and you need to do

play10:36

performance monitoring

play10:38

and instead you should use univariable

play10:40

protections to really drill down on

play10:43

where the problem exactly might be after

play10:45

you've identify uh issues using

play10:47

performance monitoring

play10:49

so now we have in our open source

play10:53

Library six different methods for

play10:55

Universal detection and at some point I

play10:58

will make um

play11:00

another webinar probably going over all

play11:02

of those in-depth And discussing the

play11:04

pros and cons but that is really going

play11:07

to take at least one hour so this time

play11:09

I'm gonna Focus just on one that we

play11:11

recommend as a default and we have as a

play11:13

default and in our library and this is

play11:16

the Johnson Channel lists and why we

play11:19

recommend those is first of all uh it is

play11:22

a method that's quite good at detecting

play11:24

most of the significant shifts so if you

play11:26

have a significant shift in your data

play11:29

um you know feature uh it's going to be

play11:31

able to detect those most uh like most

play11:35

of the time almost all the time it is

play11:38

also quite robust against outliers and

play11:40

like some other methods and one huge few

play11:43

huge outliers or a few anomalies in the

play11:46

data will not necessarily trigger that

play11:48

so that already reduces the false

play11:51

positive rate or false alert rate

play11:53

and of course no matter it's perfect

play11:56

that's why we have six of them just this

play11:58

one and one of the main issues with

play12:01

Jensen Channel distance is that it is

play12:03

potentially a bit too sensitive to small

play12:06

drifts so you're still getting uh that

play12:09

pulse alerts here and there but that's

play12:11

luck as you use it only as or mainly as

play12:13

a root cause analysis tool or

play12:16

um

play12:17

kind of big on what's going on too and

play12:20

you should be fine

play12:22

and now I'm gonna go over this method a

play12:25

bit more in depth talking about the

play12:26

inputs the results and the kind of

play12:28

intuition of how this method actually

play12:30

works and what it do what it does so

play12:33

let's start with practical things so

play12:35

what are the things that we need to have

play12:36

so the first thing we need to have is we

play12:38

need to get the features that we

play12:40

actually want to uh analyze that we want

play12:43

to try to see if there is any drift

play12:45

um so we just provide column names for

play12:48

which uh we want to do our analysis and

play12:51

and then in our infrared calculator

play12:54

which is kind of a high level interface

play12:57

for any of the univariable protection

play12:59

features uh we can specify and reaction

play13:02

you need to specify

play13:03

um

play13:04

what methods do we want to use for drift

play13:06

detection you can use more than one as a

play13:10

list and the same we need to do for

play13:12

categorical methods because some methods

play13:14

work on for continuous data some methods

play13:16

work only for categorical data one of

play13:18

the main strengths of Genesis channel is

play13:20

that it actually works reasonably well

play13:22

for both categorical and continuous

play13:24

features so we can use it for both

play13:28

and now uh let's move to how this thing

play13:33

actually works uh so here I have a kind

play13:37

of a presentation that should build a

play13:39

bit of intuition about what's going on

play13:41

so we have uh here probability

play13:44

distribution functions uh the reference

play13:47

one is let's say your test data for

play13:50

which you know everything was fine and

play13:52

the distribution looked like it should

play13:54

look so we have a buyer by

play13:57

model distribution here we've comprised

play14:00

of kind of two normal distributions uh

play14:03

one centered around zero and another

play14:05

Center at around five and this is our

play14:08

reference distribution now let's say

play14:11

that we wait few weeks few months and we

play14:14

detect that there is some issue

play14:16

performance so then I'm gonna want to

play14:17

compare our reference distribution to

play14:21

the let's say a week or a day for which

play14:23

we know that there is a problem and for

play14:25

that we'll look at our analysis

play14:27

distribution for a given feature and we

play14:30

see that it has significantly changed

play14:32

and there is one thing exactly at zero

play14:36

and there is another Peak and this time

play14:38

at 10 so maybe again some problem

play14:40

Upstream in our data engineering

play14:42

pipelines when something was maybe not

play14:45

scaled or scaled inversely or just

play14:47

multiplied by two for some reason and we

play14:50

see that there is significant change and

play14:52

what uh JS actually provides us is

play14:56

and average uh distance between uh the

play15:02

uh

play15:04

distributions for this distribution so

play15:07

we see here that the analysis

play15:10

distribution is completely flat and

play15:11

reference distribution is quite high so

play15:13

it's going to take this difference and

play15:15

Pulp it and then it's going to do the

play15:17

same thing here between analysis and

play15:20

reference and the referencing analysis

play15:22

so in essence it's actually quite

play15:24

similar to uh okay other Divergence uh

play15:27

but the difference here is actually

play15:29

doesn't only look from the perspective

play15:31

of difference between reference and

play15:33

Analysis but also from the difference

play15:36

between analysis and reference uh so

play15:38

it's kind of like a KL Divergence but

play15:41

from uh both sides

play15:45

and that is basically what our JS

play15:47

distance tells us

play15:49

uh so now uh we kind of have a at least

play15:53

cursive intuition of how the method

play15:54

works now let's look at the results uh

play15:57

so what we get here is our distance

play16:00

metric so just the channel distance goes

play16:02

between 0 and 1 and we see here that it

play16:05

is quite low so now that four this

play16:08

specific example uh everything is doing

play16:10

fine we have our threshold if this

play16:12

threshold is exceeded we know that there

play16:14

is a significant

play16:16

um increase in our drift and this

play16:19

threshold is generally automatically

play16:22

done by an email automatically Define

play16:24

binary ml

play16:25

um

play16:26

using the variance of this Johnson

play16:30

Channel distance or any other distance

play16:32

in our reference data sets or let's say

play16:35

our test data set and if there are some

play16:37

alerts uh the new course we will see

play16:39

also the alerts

play16:41

uh and that's really kind of an overview

play16:44

of what you can do with universal trick

play16:46

detection methods and an example of one

play16:48

and how it works now let's go to

play16:51

multivariable detection

play16:57

so here

play16:58

let's start again at what does it

play17:00

actually do so the first thing uh it

play17:03

does is it captures any linear change in

play17:05

relationship between future so this is

play17:07

kind of the answer to the problems of

play17:10

Universal Protection features that

play17:12

couldn't actually check and detect any

play17:14

kind of uh change in the correlations

play17:16

between features uh the idea of

play17:18

multivariable protection is it actually

play17:20

can capture that so it can capture the

play17:23

change in my second example that you saw

play17:25

with distribution 1 and distribution to

play17:28

switching places

play17:29

uh it also captures changes in single

play17:32

feature distributions so it kind of does

play17:34

the same thing a single simulator

play17:37

protection models um

play17:39

but on a more Global level uh and of

play17:42

course there's two

play17:44

um there's some drawbacks that and here

play17:46

I want to mention too which is one it

play17:48

requires at least two features to work

play17:50

because of the way the algorithm works

play17:52

that I'll explain in a minute we will be

play17:54

doing

play17:55

um dimensional reduction at some point

play17:57

and to read the images dimensional

play17:59

Dimensions to reduce uh which is not

play18:03

gonna be a problem for almost any data

play18:05

set because you always work with

play18:06

multiple features in machine learning

play18:08

apart from maybe some kind of Time

play18:09

series analysis

play18:11

and there's a lot of issues that because

play18:13

we're looking at multiple features at

play18:14

once uh it's not as easily interpretable

play18:17

but there's good news is that we can

play18:19

select a subset of features on our data

play18:22

to kind of try to narrow down where the

play18:25

change in data structure has actually

play18:28

happened so we still get a bit of

play18:30

interpretability there and at the end we

play18:32

can come back to Universal detection to

play18:34

see whether it be

play18:36

changing data structure is due to

play18:39

changing single feature distribution or

play18:40

correlations between features

play18:43

so again let's go over the input so the

play18:46

first thing here

play18:47

um is and the only thing fortunate thing

play18:50

is we need to provide feature columns uh

play18:52

for our data reconstruction calculators

play18:55

and we'll just provide the data for our

play18:58

features we could also put in our model

play19:01

predictions

play19:03

but that would mean that we're not

play19:06

really looking at only the covariate

play19:07

sheet but the entire distribution of

play19:09

model inputs and uh predictions at the

play19:13

same time which is kind of a different

play19:14

thing

play19:15

so we recommend that you take uh that

play19:17

you put only your features for

play19:19

multivariable detection

play19:22

and the results again quite similar to

play19:24

before when we have our reconstruction

play19:27

error which is a measure of drift and I

play19:30

will go into how we can obtain it and

play19:33

walk you through step by step of the

play19:34

algorithm that we use

play19:37

um and then again you see confidence

play19:39

bands which is basically how confident

play19:41

that our drift metric is within certain

play19:44

range these are the ranges and we have

play19:47

the threshold and if the impressions are

play19:49

again automatically turned based on the

play19:53

behavior of the metric on the reference

play19:56

data set and if your

play20:00

if there is drift in your data you will

play20:03

see that these thresholds will be

play20:05

exceeded in time and then we have alerts

play20:07

the small red diamonds that show that

play20:11

something has gone wrong

play20:12

uh now let's start building the

play20:15

intuition of what's going on in this

play20:17

algorithm

play20:18

um it looks a bit complex but actually

play20:20

the premise is quite easy

play20:23

so the main idea is that we're going to

play20:27

train or learn a compression compression

play20:30

part so let's say that we're gonna train

play20:32

something like an auto encoder

play20:34

and um this Auto encoder is going to

play20:38

learn the actual structure of the data

play20:41

in order to minimize the Reconstruction

play20:43

loss so the loss between the original

play20:45

data and the reconstructed data and the

play20:48

key thing here is that we're going to

play20:50

rely on this compressor to correctly

play20:53

learn the structure of the data and then

play20:55

we are going to measure the loss of this

play20:58

uh let's say autoencoder

play21:01

um to measure how strong is the

play21:03

structure of the data has changed so

play21:06

this is really the high level intuition

play21:07

so imagine here we have the compression

play21:11

part of compressing part of the auto

play21:13

encoder the compress it to latent data

play21:15

right in space and we decompressed and

play21:19

then what we're going to do is we're

play21:20

going to compare the original data with

play21:22

reconstruct the data like I do here and

play21:24

for every place where they don't really

play21:26

align we're gonna be able to compute the

play21:29

Reconstruction error so now that you

play21:33

have the general idea that we're gonna

play21:34

rely on this Auto encoder to learn the

play21:38

structure of data and then we will

play21:40

measure how good this Auto encoder is on

play21:42

new data to see whether there is

play21:45

significant change in the data structure

play21:47

let's go into the actual step-by-step

play21:50

algorithm

play21:52

so how do we train it so first we'll

play21:55

have to prepare the data I'm gonna walk

play21:57

you through uh with the how we train the

play21:59

algorithm in an email and we actually

play22:01

don't use Auto encoders we use the

play22:03

simplest possible thing which is the PCA

play22:05

because we don't need to capture uh the

play22:08

structured data fully we just need to

play22:10

capture it well enough that if there is

play22:12

change in the data the loss of the

play22:15

encoder decoder changes so it doesn't

play22:17

need to be low to start with so I'm

play22:19

going to prepare the data to make sure

play22:21

it can work with PCA we're going to

play22:22

impute missing values we're going to

play22:24

encode categorical features and at the

play22:26

end we're going to scale the data then

play22:28

we are going to train a fit a PCA on our

play22:33

reference data so it could be the test

play22:34

set it could be any part of your data

play22:37

any period in the data for which you

play22:39

know everything is fine and there is no

play22:41

significant data drift and performance

play22:43

is satisfactory

play22:44

uh so that what we're going to do is

play22:48

we're going to compress and decompress

play22:50

so we're going you transform an inverse

play22:52

transform of this reference data using

play22:55

our trained PCA and the strain PCA again

play22:57

was trained on reference data then we're

play23:00

going to compute the distance between

play23:01

the original and the Reconstruction

play23:03

point so the points that go through the

play23:06

compression decompression using PCA and

play23:10

we're just gonna do the simplest thing

play23:11

we're going to compute uh you could be a

play23:14

distance between every uh pair of points

play23:17

and then we're going to compute

play23:19

so-called reconstruction error which

play23:21

means that we just take all the

play23:22

distances the other you can be on

play23:24

distances between those points and we

play23:26

take the average of that and for that we

play23:28

know what is the general loss or the

play23:30

general reconstruction error of our PCA

play23:32

on reference data it's not going to be

play23:35

zero because it's losing we still need

play23:37

to reduce dimensionality we're losing

play23:38

Dimensions which means we are losing

play23:40

information most likely

play23:42

so we have some kind of reference

play23:44

reconstruction error

play23:47

then when we uh go into the drift

play23:50

detection mode uh what we're going to do

play23:52

is we're going to look at our analysis

play23:55

data so the data for which we want to

play23:57

figure out whether there is drift and

play24:00

we're going to compress and decompress

play24:02

this analysis data using the trained PCA

play24:05

and it's important to think here is that

play24:06

we do not retrain the PCA we use the PCA

play24:09

that we fitted on our reference data and

play24:12

this PCA still remembers and of course

play24:15

in quotes uh what is the structure of

play24:17

our reference data and then we're going

play24:19

to the same thing we're going to compute

play24:20

the distances between original and

play24:22

reconstruction points and we're gonna

play24:24

compute the Reconstruction error again

play24:26

this will be average between those

play24:28

distances and then I'm going to compute

play24:30

I convert this to construction error

play24:32

with the Reconstruction error we got

play24:34

from the reference data and in a moment

play24:37

I'll show how it actually looks in a

play24:39

notebook

play24:40

but the idea is that if this

play24:41

reconstruction error goes up it means

play24:44

that the model has captured the

play24:47

structural data that is no longer

play24:49

suitable for compression and

play24:51

decompression so the structure of the

play24:53

data has significantly changed in a way

play24:55

that the compression and decompression

play24:58

mechanism no longer works correctly and

play25:01

again I'm kind of on the other hand if

play25:04

the

play25:05

um higher construction are actually

play25:07

significantly goes down then it means

play25:10

that the structured data again has

play25:12

changed in a way that the compression

play25:15

the compression mechanism is better

play25:17

suited than it was on our reference data

play25:22

so that again means that we have

play25:24

significant data or coverage

play25:28

now hello to the tutorial so

play25:32

just to walk you through again if you go

play25:35

to that email

play25:37

and if you go examples here in

play25:39

repository and then if you click on

play25:42

webinars

play25:43

uh you will see uh the webinar we have

play25:47

right now today how to detective and

play25:49

resolve issues

play25:51

um in your ml models and then just click

play25:54

on group detection notebook

play25:56

and I already have it open here and you

play25:58

can easily follow you can also run it

play26:00

yourself

play26:02

so let's get started I'll give you like

play26:05

a minute or two for people who actually

play26:06

want to follow live

play26:12

okay

play26:20

let's give this two minute short let's

play26:22

get going

play26:24

uh so first I'm going to walk you

play26:26

through kind of the entire process

play26:28

starting from training the model just to

play26:30

train a simple model that actually runs

play26:33

on data and actually predicts something

play26:34

and then we're gonna simulate uh the

play26:37

deployment one will have a production

play26:38

data set that we have no access to not

play26:41

during training not during evaluation

play26:43

not during testing and it just comes

play26:45

after and we'll see what happens with it

play26:47

we'll be able to see that there is some

play26:49

issues to Performance with it and then

play26:51

it will go into the main part of the

play26:54

tutorial which is how to actually detect

play26:56

data drift so let's start with importing

play26:59

an email and just few things to load the

play27:01

data and things like pandas we're going

play27:04

to load the data

play27:06

it is a data set that's available on

play27:08

openml organization it's a very nice

play27:11

non-profit organization that has a lot

play27:13

of very cool real-life data sets so very

play27:17

highly recommended that you give it a go

play27:19

uh then we are loading the data and what

play27:22

we see here is that we have

play27:24

um

play27:25

you um

play27:27

features about when certain things

play27:29

happen and what we're actually trying to

play27:31

figure out is how many people

play27:34

sign up for bike sharing application or

play27:37

actually shared a bike uh during giving

play27:41

um

play27:43

season day I think it's day oh it's hour

play27:45

during given hour and uh we're gonna

play27:49

just quickly pre-process the data here

play27:50

uh so first thing we're gonna Define our

play27:53

Target which is the count and then we'll

play27:55

have to drop some features first of all

play27:57

we have to drop the casual and

play27:58

registered features because they

play28:01

actually have a very bad memory leak

play28:04

that they just add up to our account so

play28:06

it wouldn't really make sense to build a

play28:07

model if these two features actually

play28:09

just give us the answer

play28:11

and uh because I'm really lazy I'm just

play28:14

gonna also drop everything that is not a

play28:16

number already which is weather and

play28:18

season

play28:19

and I additionally don't drop the camera

play28:23

because that is just the target so let's

play28:26

do that then we're gonna split the data

play28:28

and we're going to split the data in our

play28:30

training testing and our production

play28:33

because we're not going to optimize on

play28:34

the hyper parameters there is no need to

play28:37

really do anything with the validation

play28:39

set so I'm just going to split the data

play28:42

and I'm gonna use a Time series split

play28:45

here so we're gonna start with things

play28:47

that happened earlier and then we're

play28:49

gonna test on things that happen later

play28:51

and then the last part of the data set

play28:53

is gonna be reserved for production

play28:56

and now we just simply train the model

play28:58

of default parameters with training LGB

play29:00

and regressor the most standard thing

play29:03

you can potentially come up with annual

play29:05

training that on train data we're gonna

play29:07

make predictions and we're gonna predict

play29:09

on training testing and production

play29:12

um

play29:13

some kind of simulating that we actually

play29:15

have a separate set that we wouldn't see

play29:17

at all and then we need to prepare the

play29:20

data for an animal this is kind of where

play29:21

the nanimal part starts uh so what we're

play29:24

gonna do first is we need to Define our

play29:27

reference data set and again as I

play29:29

already mentioned a few times perfect

play29:31

data set for that is the test data set

play29:33

so that is exactly what we're gonna do

play29:35

we're gonna take that test predictions

play29:37

test features and test the ground truth

play29:41

uh and we Define it as our reference

play29:43

status and then to analyze we can

play29:46

analyze just specific part of the data

play29:49

when something is going wrong but why

play29:50

not analyze everything to have a bit

play29:53

Fuller view since the data set is not

play29:55

that big so here for the analysis

play29:58

um

play29:59

we're gonna take our

play30:01

um production data and the features and

play30:03

the predictions for the time being

play30:05

assuming we don't have very easy access

play30:07

to ground Truth uh but right after we

play30:10

when we fit our performance calculator

play30:12

let's just assume we do have access to

play30:15

that we could also estimate our

play30:16

performance without access to ground

play30:18

truth but just to show you that there

play30:20

are really issues with uh performance

play30:23

some that we're just estimating and has

play30:25

a hallucinog hello hallucinating issues

play30:29

there's actual issues and we try to

play30:31

compute our performance using MSC or Mae

play30:34

we will actually see that there is

play30:36

issues with our model

play30:39

so we're going to

play30:41

add targets to our analysis and then

play30:43

we're going to just perform and

play30:45

calculate the performance taking the

play30:48

performance calculator and calculating

play30:50

the performance obviously on analysis

play30:53

with targets

play30:54

so let me just run it and

play30:57

we all see here that

play31:00

during testing everything was fine so we

play31:03

felt quite uh confident that we deploy

play31:06

the model

play31:07

uh even though it kind of climbed up at

play31:10

the end but probably when uh we train

play31:13

the model we just look at the total uh

play31:16

or average performance in our test data

play31:18

set we don't try to segment in time or

play31:20

do other things like that so the problem

play31:22

looked fine

play31:24

uh but once we deployed we'll see that

play31:26

there's huge spikes both MSC and Mae so

play31:30

whatever method we use we see that

play31:31

there's some issues with it

play31:34

now what might have happened uh let's

play31:37

start with the universal detection so

play31:40

um again I will uh make sure just here

play31:43

that our predictions are true that's

play31:46

categorical to make sure that we get the

play31:48

right kind of plot

play31:51

and then our column that we want to look

play31:54

at I'm just going to take all the model

play31:56

features and that they call and that the

play31:58

model has looked at

play32:00

um just to see what is going on

play32:02

step-by-step feature by Future

play32:05

and then as I already mentioned here

play32:07

we're gonna take our column names as

play32:09

column names and we're gonna compute our

play32:12

recommended metric uh or method which is

play32:16

Junction Channel distance

play32:20

um

play32:21

and we'll do that we will use our

play32:24

address calculator to fit on reference

play32:26

and the only thing that I actually think

play32:28

is that it's taking the reference data

play32:30

and for some methods it's going to bend

play32:33

them so you don't have to store the

play32:34

entire thing uh when you compare it with

play32:37

your analysis data so if you want to

play32:40

just put your calculator let's say you

play32:42

have a few terabytes or 200 terabytes of

play32:45

data you can still fit the model

play32:49

or your calculator and then you'll be

play32:52

able to store just the histograms of the

play32:55

data which makes storage possible easy

play32:57

and then you can easily compare it with

play33:00

your analysis data which normally will

play33:02

be much smaller

play33:03

so I will fit it and then we'll

play33:06

calculate uh the results for them then

play33:09

shuttle distance on our analysis data

play33:12

and we'll display the results turning

play33:15

that into Data frame so one way of

play33:17

viewing results is to manually go over

play33:20

them

play33:22

um as a data frame you can take the data

play33:24

frame and you can do whatever you want

play33:26

with it that you will be able to do just

play33:28

as a typical Timeless data frame so what

play33:30

you see here is that we have our indices

play33:33

we're going to split our data by index

play33:35

for our analysis data and our production

play33:37

data

play33:39

oh sorry for our reference data and our

play33:42

analysis data so in other words test and

play33:46

production and and then we're gonna have

play33:49

a kind of funny multi-index I think one

play33:52

for every feature we have a potential

play33:55

list of methods because we can select

play33:57

more than just one method as you can see

play33:58

here and we will see what is the

play34:01

threshold Pro which we consider that

play34:03

something has gone wrong and what is the

play34:05

value of an actual metric and what is

play34:09

potentially lower threshold as some

play34:10

methods might have a lower threshold as

play34:12

well and whether we see that as

play34:14

something problematic so whether there

play34:16

is alert is true so you could just take

play34:18

it and plug it for example in your

play34:20

retraining um

play34:22

pipeline automatically automatically you

play34:25

train if you see that specific things

play34:28

are drifting too much and you want to

play34:29

retrain and see whether automatically

play34:31

training actually fix the model so there

play34:33

is some potential for uh nice automation

play34:35

here

play34:36

uh but the other view of other way of

play34:39

viewing the results is just with plots

play34:41

and the way we do it is uh we'll take

play34:44

the results we will just say Dot Plot

play34:46

and the way the thing we actually want

play34:49

to you is the drift so we're gonna say

play34:51

that the kind here is drift and we'll

play34:54

just show it uh just like we will show

play34:56

any normal clip or plotted figure

play34:59

and again exactly what we said before

play35:02

and there's a few weird things going on

play35:05

that hopefully help us build

play35:07

understanding of why performance should

play35:10

increase so first of all the air is

play35:12

always drifting in every single track of

play35:14

our date that year is drifting which

play35:16

actually makes sense because it's always

play35:18

multiple monotonically increasing so

play35:20

it's never going to be the same as the

play35:23

distribution of our entire test set

play35:25

which already probably gives us some

play35:27

idea of uh the mistakes we might have

play35:30

made or have made during development

play35:32

sorry during training and preparation of

play35:36

this algorithm of this model

play35:39

uh then we see the same thing happened

play35:41

happens for a month is that it always

play35:44

drifts but in that case it might make

play35:46

sense because we're looking at kind of a

play35:48

univariable distribution and distance

play35:50

data lasts more than a year but for

play35:53

every period in our data every Chunk we

play35:56

probably had only one or two months so

play36:00

it's drifting it's fine it's drifting

play36:02

it's just cyclical data but it's good to

play36:03

seeing that this is actually happening

play36:07

um then we have our which is equally

play36:09

distributed there's no changes there uh

play36:12

we have what are those holidays or not

play36:13

and we see that there are

play36:16

um probably again months when we see a

play36:19

lot of holidays uh and it might actually

play36:22

mean that there is something wrong with

play36:24

hormones when there's a lot of holidays

play36:26

uh but we're not sure yet

play36:29

uh for weekday we see a very weird thing

play36:32

going on is that this data for testing

play36:36

was probably in some way uh ordered but

play36:39

for production this data was completely

play36:41

not ordered and it's completely random

play36:42

and it looks exactly as the data for

play36:47

the entire uh test set

play36:51

but again none of those actually exceeds

play36:54

our drift so we're sure that the

play36:57

distribution of our weekday has not

play36:59

changed significantly but it's something

play37:01

to look into when you develop your model

play37:03

uh and if you see something like that

play37:05

then probably should think about whether

play37:07

this is actually something that you're

play37:09

happy with

play37:11

and then for working day no changes it

play37:14

all looks very good it all stays below

play37:16

the threshold

play37:17

for the temperature we see that there

play37:19

are significant changes and especially

play37:21

there is a big jump here so again we

play37:24

have cyclical data and we already

play37:25

decided to build the intuition that we

play37:27

probably built our models uh to take

play37:29

into account Cycles in an incorrect way

play37:32

and there's something wrong going on

play37:34

when we have

play37:35

um different parts of the of that the

play37:38

year-long cycles

play37:41

the temperature the same thing humidity

play37:44

kind of similar thing

play37:46

uh wind speed similar thing so there is

play37:49

something going on with Cycles we

play37:51

already managed to figure out that we're

play37:53

probably not really think through how

play37:55

we're gonna treat temporal data when it

play37:59

will develop that model

play38:00

so temporal data then let's continue our

play38:03

investigation and instead of just doing

play38:05

normal multivariable detection let's do

play38:07

multivariable detection of all features

play38:09

to get a general view of what is that

play38:13

um drift for the entire data set but

play38:15

also let's drop two things that we

play38:17

already know are drifting and have a

play38:19

recent drift and there's probably issues

play38:21

with those so year and a month

play38:24

so I'm going to run that as well and

play38:27

then we're gonna initialize our data

play38:29

reconstruction with calculators our

play38:31

multivariate protection calculator uh

play38:34

just putting all column names for the

play38:36

timing so not those but the ones we find

play38:38

earlier

play38:40

then we're going to put it on reference

play38:42

so we're going to train our

play38:45

um PCA model on reference that's going

play38:48

to learn how the data looks like and

play38:50

then we are going to calculate so we're

play38:53

going to run it through

play38:54

um the uh trained PCA on our analysis

play38:58

data

play39:01

and what we see there is huge increase

play39:05

in the error so definitely there is

play39:09

drift and this drift probably actually

play39:11

is one of the uh causes for dropping

play39:13

performance but now let's drop uh our

play39:17

year and months so let's do the same

play39:19

thing let's fit it again without a year

play39:22

and the month

play39:24

and what happens you see that there is

play39:26

actually not really so that what does it

play39:28

tell us is that all important drift and

play39:31

all important changes in the model all

play39:32

coverage sheet in the model is really

play39:34

captured in the month and the year

play39:37

columns and we shouldn't have a Trader

play39:41

model and especially we should not have

play39:44

trained a great in boosting model on a

play39:46

year because maybe there is a trend in

play39:49

our application maybe more and more

play39:51

people start sharing bikes using our

play39:54

application which is something that you

play39:56

would hope for as a business

play39:58

um but we did not take that into account

play40:00

and our gradient boosting model just

play40:02

assumed that uh the higher the year the

play40:06

more people on average are going to

play40:08

actually

play40:11

uh sure the bikes but it just took the

play40:14

cut on the last data it shows and it

play40:16

failed to extrapolate the strength

play40:18

further because it's not something that

play40:20

um three base models can easily do and

play40:23

that drop-in performance is due to us

play40:25

taking our year and month into a

play40:28

consideration here so this is something

play40:30

that we will have to think about now as

play40:32

data scientists and try to uh kind of

play40:35

redevelop the model in a way that

play40:37

doesn't take a year and month into

play40:40

account in such a simplistic way

play40:43

and that's it we know what happened we

play40:45

know why it went wrong and then we can

play40:47

start working on

play40:49

resolution which in that case would be

play40:51

full modular Redevelopment because we're

play40:53

training won't help here we know that

play40:55

the model would fail later on when we

play40:57

see the next year

play41:00

um and that is it that is the end of the

play41:02

tutorial

play41:03

so now thanks for noticing uh we are

play41:06

slowly nearing the end of the webinar uh

play41:09

uh we are still kind of you know

play41:12

starting open source Library so I would

play41:15

very much appreciate if you give an

play41:17

email a try and give us a star every

play41:19

Star matters

play41:20

um they give a bit more visibility to

play41:23

our library which helps more people

play41:26

learn about nanimal which we see as a

play41:29

good thing because monitoring is

play41:30

important

play41:31

uh so that's that and now before we go

play41:35

to q a uh just one more thing we have

play41:38

yet another webinar on the next Thursday

play41:41

and it's time it's going to be my

play41:42

co-founder William uh talking about what

play41:45

data do you actually need to monitor

play41:47

remorse in production so it's going to

play41:49

be even more Hands-On and hopefully it's

play41:53

gonna help you uh not only understand uh

play41:56

what's going on with monitoring but

play41:57

easily get started with it and in your

play42:00

job

play42:01

and now uh let's get the Q a I'm gonna

play42:05

leave it up here and you can just scan

play42:08

uh this QR code to be forwarded to a

play42:11

form and we'll send you the recording of

play42:13

This webinar later on

play42:17

thank you so much for a day for this

play42:19

awesome presentation it was very

play42:22

informational

play42:24

uh we have a few questions here and

play42:28

um just a reminder that you can drop

play42:30

your questions in the Q a and we will

play42:32

answer all of them live

play42:34

so the first question for you vertek is

play42:37

uh what does it mean that the GS

play42:39

distance is robust to outliers

play42:43

um is it that it doesn't it does not

play42:45

detect outliers as drifts I think

play42:48

exactly that that is exactly what it

play42:50

means if you have a few outliers in your

play42:53

data and this is something that is not

play42:55

important to you from business

play42:57

perspective or from data science

play42:58

perspective uh General Shannon is going

play43:01

to be robusting uh to those and will not

play43:04

actually flag those as great on the

play43:06

other hand if it's something that you

play43:08

know is going to import be important for

play43:10

your data and even a few outliers can

play43:12

very strongly impact them in their

play43:15

business outcomes or your own metrics

play43:17

you should use something else such as

play43:19

massage time distance or it's also

play43:21

called f moving distance which is very

play43:25

sensitive to outliers and that is one of

play43:27

the reasons we have multiple methods in

play43:29

our library because there is no method

play43:31

that really fits all the use cases and

play43:35

we also have in our docs

play43:38

um quick summary of the strengths and

play43:41

weaknesses of our of our methods so then

play43:45

you can take a look there and pick the

play43:47

one that fits your skin test

play43:51

awesome thanks a lot uh I will

play43:54

um after the the Q a I will also share

play43:57

the link to the doc so that you can have

play43:59

access to this documentation that what

play44:01

tech just mentioned

play44:04

um the second question is in

play44:06

multivariate Risk detection why not

play44:08

using deep Auto encoder because

play44:10

reconstruction error instead of PCI

play44:13

reconstruct ah Jesus reconstruction to

play44:17

detect also non-linear relationship

play44:19

changes as well

play44:21

so this is something that is on our

play44:24

roadmap and it's something that we'll do

play44:25

sooner or later and the reasons we uh

play44:29

the reason we didn't go with it for the

play44:31

time being are twofold first uh PC is

play44:34

privately robust works out of the box

play44:37

and it always has very stable Behavior

play44:41

which is not true for variational

play44:43

controllers it will go with normal

play44:45

encoders there's no chance we're gonna

play44:47

get anything that is anyway meaningful

play44:49

because even very small change in data

play44:51

distribution will result in huge

play44:54

reconstruction error for variation

play44:57

outcome encoders they are significantly

play44:58

more stable so it is possible to do it

play45:01

but we need our algorithms to work on

play45:04

any kind of tabular data no matter what

play45:07

is the size of the data no matter how

play45:09

it's distributed and to do that we need

play45:12

to have quite Advanced automl hidden

play45:15

inside an EML and it's much easier to

play45:18

make a PCA work with any kind of data

play45:21

distribution and any kind of tabular

play45:23

data and then variational controllers so

play45:25

we will have them because like you said

play45:27

they are much better at detecting

play45:29

non-linear relationship changes

play45:32

um we don't have now because we need to

play45:34

start somewhere and we started at the

play45:36

something that's robust and we know it's

play45:37

going to work reasonably well for

play45:39

everything uh and in the future we will

play45:43

have variational encoders most like as

play45:46

part of our library and of course if

play45:49

you're interested in that you are most

play45:50

welcome to contribute and uh feel free

play45:53

to join our slack field trip to

play45:54

contribute on GitHub and if you want to

play45:57

go ahead and try to implement those that

play45:59

would be great

play46:02

awesome cool answer thanks a lot

play46:06

um another question was well not really

play46:09

a question but a comment saying very

play46:11

well explain how to use the PCA so

play46:14

thanks a lot Eileen I hope I'm

play46:16

pronouncing yes I'm hoping I'm

play46:18

pronouncing your name correctly but

play46:20

thanks a lot for that comment we really

play46:22

appreciate the feedback

play46:24

um we have uh

play46:27

another question that is a repetition so

play46:30

what does it mean for the GS distance to

play46:33

be robust to a players and then another

play46:36

question

play46:37

um is

play46:39

um can you use the univariates drift

play46:41

tests for a Time series

play46:44

uh yes if you did try and recycle them

play46:47

first so this is something again that is

play46:49

potentially on our roadmap which is

play46:51

explicit support for time series if your

play46:54

time series is already stationary

play46:56

there's no prints and the Cycles are

play46:58

removed then you can use that kind of

play47:02

data that's prepared in that way that

play47:04

you don't have Cycles don't have friends

play47:06

and you can put it them directly in

play47:08

Universal detection and you will get

play47:10

reasonable results as you saw here in

play47:13

the tutorial if we don't do that and we

play47:16

leave the trend in or we leave the

play47:17

cyclone in we'll get results

play47:19

so you have to first do a bit of data

play47:23

pre-processing uh to do it but it's

play47:25

absolutely possible

play47:29

nice

play47:31

everything is possible huh

play47:34

JK JK

play47:37

uh okay we have uh one final question

play47:40

for you vitec um how can you use

play47:43

multivariate drift detection to get

play47:45

interpretable results

play47:47

uh yeah so what I recommend is first

play47:50

just run everything uh then you might

play47:53

want to run it on uh sorry you might

play47:55

want to run in variety of detection see

play47:58

which features tend to behave weirdly

play48:00

and then you can do the thing that I did

play48:02

in the tutorial which is just exclude

play48:04

those and only run it on things that

play48:07

don't change from the universe

play48:10

perspective and if you're multivariate

play48:12

data detection also doesn't detect

play48:14

anything there then you can be quite

play48:16

sure that the reason for your drop-in

play48:18

performance is not there because there's

play48:20

no changes in the relationship between

play48:22

features and there is no change in the

play48:25

actual distribution of the features so

play48:27

then you are just left with the behaving

play48:30

features that you see on the infrared

play48:31

level and what you should do then is use

play48:35

domain knowledge to try to bundle the

play48:39

ones where we know that correlations

play48:40

also matter maybe the correlation

play48:43

between the age income is something that

play48:45

is actually very strongly and

play48:47

influencing the model but the

play48:48

correlation between age and location

play48:51

doesn't matter that much in that case

play48:54

um select only few features two three

play48:56

four features that you know that the

play48:59

correlation between that action better

play49:00

for the model

play49:01

and you can do that using things like

play49:04

shop explainable AI when you can look at

play49:06

combinations of feature and their

play49:08

importance and then look at um

play49:12

the subset of teachers that are either

play49:14

important or the interactions between

play49:16

them are important and run multivariate

play49:18

there and basically you should be able

play49:20

to narrow down uh whether Universe

play49:24

actually happens and just the change in

play49:27

distribution that influences performance

play49:28

or whether you find a pair of features

play49:31

that say that Don't Drift by themselves

play49:33

but if you put them in a multivariator

play49:37

detection you will see Thrift then you

play49:39

know that there's actual change in

play49:41

correlation between those features and

play49:43

that's how you get interpretable results

play49:44

that you would miss with in variety of

play49:46

protection

play49:50

awesome very complete answer thanks a

play49:54

lot

play49:55

um great then

play49:58

um I don't think we have any more

play50:00

questions to address today

play50:02

so poetic thanks a lot for your

play50:05

presentation I really appreciate it

play50:08

um thanks a lot everyone that joined us

play50:11

today uh as vitek said uh we are an open

play50:14

source Library so don't forget to start

play50:16

us on GitHub if you find this uh webinar

play50:18

useful uh it really means a lot to have

play50:21

one more star there and

play50:24

um if you just a reminder that uh if you

play50:28

do want this recording please scan this

play50:30

QR code leave us your email and we will

play50:32

send it by the end of this week and if

play50:35

you'd like to uh join another webinar

play50:38

you will have a chance