How works Artificial Intelligence for risk management

Pirani
25 Aug 202347:20

Summary

TLDRIn this webinar, Nicole Koza and data scientist Hi Faguna explore the integration of artificial intelligence in risk management. They define AI, delve into machine learning and deep learning, and discuss their applications in client segmentation and risk recommendation systems using chat GPT. The session emphasizes the importance of aligning AI techniques with business goals, the significance of data cleaning, and the practical steps for achieving accurate client segmentation. It also highlights the potential of natural language processing in understanding and generating human language for risk assessment.

Takeaways

  • πŸ˜€ The webinar focuses on the theoretical and practical aspects of artificial intelligence in risk management.
  • πŸ” The speaker, Faguna, introduces the concept of artificial intelligence (AI) as the ability to reason, solve problems, and learn from experience without human intervention.
  • πŸ€– AI is composed of two main areas: machine learning and deep learning, which are responsible for developing and training algorithms and models.
  • πŸ“š The importance of data science is highlighted, emphasizing the need for a clear business goal before applying AI techniques.
  • πŸ“ˆ Two types of machine learning are discussed: supervised learning, which requires a target variable, and unsupervised learning, which does not.
  • 🏦 An example of supervised learning is identifying fraudulent transactions in a bank's dataset, while unsupervised learning might be used for customer segmentation in a retail company.
  • 🀝 The benefits of machine learning for segmentation include accuracy, automation, scalability, adaptability, and personalization.
  • πŸ” Two clustering algorithms are mentioned: k-means, which is simple and efficient but sensitive to initialization, and Birch, which is robust to outliers and suitable for large datasets.
  • 🧹 The critical step of data cleaning is discussed, including outlier detection, categorical variable encoding, data normalization, and principal component analysis.
  • πŸ“Š Two methods for determining the optimal number of segments are presented: the elbow method, which uses inertia, and the silhouette method, which considers cohesion and separation.
  • πŸ’‘ The platform demonstration shows how to perform client segmentation using the discussed techniques and tools, emphasizing the importance of analyzing results from a business perspective.

Q & A

  • What is the main focus of the webinar presented by Nicole Koza and Data Scientist Faguna?

    -The main focus of the webinar is to explore the theoretical and practical aspects of artificial intervention in risk management, including applications of artificial intelligence, machine learning for segmentation, and natural language processing techniques.

  • What is the definition of artificial intelligence as discussed in the webinar?

    -Artificial intelligence is defined as the ability of a computer program or application to reason, solve problems, understand complex ideas, learn quickly, and learn from experience, without the need for human intervention.

  • What are the two main sub-areas of artificial intelligence mentioned in the script?

    -The two main sub-areas of artificial intelligence mentioned are machine learning and deep learning, which are responsible for developing and training the algorithms and models used in AI applications.

  • Why is data science important in the context of artificial intelligence applications?

    -Data science is important because it helps to identify clear business goals or problems that AI applications aim to solve or opportunities they aim to leverage. It ensures that AI techniques are used as tools to achieve specific business objectives rather than being an end in themselves.

  • What are the two types of machine learning mentioned in the script?

    -The two types of machine learning mentioned are supervised learning, which requires a target variable, and unsupervised learning, which does not require a target variable and is often used for pattern recognition or anomaly detection.

  • Can you explain the purpose of client segmentation using machine learning techniques?

    -The purpose of client segmentation using machine learning is to identify groups of clients with similar characteristics, which can help in understanding the customer base better, discovering unknown patterns, complying with regulations, and improving transaction monitoring.

  • What are the advantages of using machine learning for client segmentation compared to manual methods?

    -The advantages of using machine learning for client segmentation include higher accuracy with large datasets, automation, scalability, adaptability, personalization of segmentation characteristics, and improved time efficiency.

  • What are the two clustering algorithms mentioned in the script, and what are their main differences?

    -The two clustering algorithms mentioned are k-means and Birch. K-means is simpler and more efficient but can be sensitive to initialization and outliers. Birch is more robust to outliers and better suited for large datasets and high-dimensional data.

  • Why is data cleaning an essential step in the machine learning process for segmentation?

    -Data cleaning is essential because it ensures the quality of the data used for training models. Steps like outlier detection, categorical variable encoding, data normalization, and principal component analysis help to prepare the data for accurate and meaningful analysis.

  • What are the two methods mentioned for determining the optimal number of segments in a dataset?

    -The two methods mentioned for determining the optimal number of segments are the elbow method, which uses the inertia metric, and the silhouette method, which considers cohesion and separation parameters.

  • What is the role of natural language processing (NLP) in risk management applications?

    -In risk management, NLP can be used for sentiment analysis to understand customer feedback, text classification to categorize risks, and chatbots to interact with clients and provide risk recommendations or information.

  • What is the significance of GPT in the context of the presented risk management application?

    -GPT, or Generative Pre-trained Transformer, is significant because it can generate new text data based on the input provided. In the risk management application, it uses various client-related variables to suggest potential risks, aiding in the identification and mitigation of such risks.

  • How does the platform use chat GPT to assist clients in identifying potential risks?

    -The platform uses chat GPT by inputting variables such as industry, process, and risk system into the model's prompt. GPT then generates responses and recommendations for potential risks based on the pre-trained data it has from the internet.

  • What are the precautions taken to ensure client data privacy when using chat GPT in the platform?

    -To ensure client data privacy, the platform does not provide specific client information to chat GPT. Instead, it uses general variables like industry and process, relying on the model's pre-trained data to generate risk recommendations.

Outlines

00:00

πŸ‘‹ Introduction to Risk Management and AI

The script begins with Nicole Koza welcoming participants to a webinar on artificial intervention for risk management. She introduces the agenda, which includes defining artificial intelligence (AI), discussing machine learning for segmentation, and exploring natural language processing with chat GPT. Nicole also invites attendees to visit the Pirani Academy for learning materials and encourages questions throughout the session. The session's first speaker, a data scientist, outlines the day's topics, emphasizing the importance of understanding AI's theoretical and practical aspects and their applications in risk management.

05:02

πŸ€– Understanding Artificial Intelligence Composition

The speaker delves into the definition of artificial intelligence, distinguishing between human intelligence and AI's ability to learn from experience without human intervention. They explain the composition of AI, highlighting the importance of machine learning and deep learning in developing algorithms and models. The speaker also discusses the role of data science in aligning AI applications with clear business goals, emphasizing that AI is a tool to aid in solving complex problems rather than a standalone solution.

10:04

πŸ“š Machine Learning Techniques: Supervised and Unsupervised Learning

The script explains the two primary methods of machine learning: supervised and unsupervised learning. Supervised learning is illustrated with examples from banking and retail sectors, where historical data is used to predict outcomes such as fraudulent transactions or sales amounts. Unsupervised learning is characterized by the absence of a target variable, with clustering and segmentation as common applications. The speaker also discusses the importance of identifying business problems before applying machine learning techniques.

15:06

πŸ” Applications of Machine Learning in Anti-Money Laundering

The speaker introduces a specific application of machine learning in anti-money laundering, focusing on client segmentation to identify similar groups for better understanding and compliance with regulations. They discuss the advantages of using machine learning for scalability, adaptability, and time efficiency, especially when dealing with large datasets. The script also mentions different clustering algorithms, such as k-means and Birch, and their suitability for various data sizes and dimensions.

20:08

🧼 Data Cleaning and Clustering Algorithms

This paragraph emphasizes the importance of data cleaning in the machine learning process, outlining steps such as outlier detection, categorical variable encoding, data normalization, and principal component analysis. The speaker explains that these steps are crucial for accurate results, regardless of the sophistication of the algorithms used. They also discuss the k-means and Birch algorithms, providing insights into their functionality and ideal use cases.

25:10

πŸ“Š Optimal Segment Selection and Platform Demonstration

The script describes methods for determining the optimal number of segments in a dataset, including the elbow method and the silhouette metric. These methods help in finding a balance between too many and too few segments for effective client understanding. The speaker then transitions to demonstrating the platform where these steps can be performed, including data selection, cleaning, and the application of machine learning models.

30:12

πŸ“ Natural Language Processing and Its Applications

The speaker introduces natural language processing (NLP) as a subfield of AI that combines computer science and linguistics to process and understand human language. They discuss various applications of NLP, such as sentiment analysis, text classification, and chatbots, which are used to analyze sentiments in social media, classify risks, and interact with customers, respectively.

35:14

πŸ—£οΈ Chat GPT and Its Risk Management Applications

The script explores the use of chat GPT, a generative pre-trained transformer model, in risk management. The speaker explains how chat GPT can be used to recommend potential risks based on various client-related variables. They also mention other applications of chat GPT, such as risk classification and suggesting mitigation actions and possible causes of risks.

40:16

🏒 Platform Features and Risk Recommendation System

The speaker demonstrates the platform's features for risk management, including the use of chat GPT to suggest potential risks based on industry, process, and other client variables. They show how users can select variables and receive risk suggestions, which can then be used to create new risk entries in the system. The script highlights the platform's ability to provide suggestions across various industries and company processes.

45:19

πŸ“‰ Conclusion and Future Improvements

In conclusion, the speaker summarizes the importance of aligning AI applications with business goals and the necessity of thorough data cleaning in machine learning. They also emphasize the value of comparing different methodologies to find the best solution for a company's needs. The script ends with an invitation for participants to complete a survey to help improve future webinars and thanks them for their participation.

Mindmap

Keywords

πŸ’‘Artificial Intelligence (AI)

Artificial Intelligence refers to the simulation of human intelligence in computers that are programmed to think like humans and mimic their actions. In the context of the video, AI is the overarching theme, focusing on its applications in risk management. The script discusses how AI can be utilized as a tool to assist with complex problem-solving and efficiency in tasks, exemplified by the use of AI in machine learning models and natural language processing for risk assessment.

πŸ’‘Risk Management

Risk management is the process of identifying, assessing, and prioritizing potential risks to an organization's capital and earnings. The video's theme revolves around using AI for risk management, specifically in the context of anti-money laundering and operational risks. The script mentions the practical applications of AI in identifying fraudulent transactions and classifying risks, which are key components of managing an organization's exposure to potential losses.

πŸ’‘Machine Learning

Machine learning is a subset of AI that provides systems the ability to learn and improve from experience without being explicitly programmed. The script explains machine learning techniques such as supervised and unsupervised learning, which are used for tasks like client segmentation and transaction monitoring in the field of risk management.

πŸ’‘Supervised Learning

Supervised learning is an approach in machine learning where the algorithm is trained on a labeled dataset, learning to predict outcomes for new data. In the script, it is exemplified by training a model to predict fraudulent transactions in a bank's dataset, where the target variable is the classification of transactions as fraudulent or not.

πŸ’‘Unsupervised Learning

Unsupervised learning is a type of machine learning where the data used to train the algorithm is not labeled. The script discusses its use in scenarios like client segmentation, where the algorithm identifies patterns or groups without a predefined target variable, helping to uncover unknown patterns in the data.

πŸ’‘Client Segmentation

Client segmentation is the process of dividing a customer base into groups with similar characteristics. In the video, it is highlighted as a goal for understanding clients better and for compliance with regulations. The script describes using machine learning algorithms to identify similar groups of clients for more effective risk monitoring and personalized services.

πŸ’‘Natural Language Processing (NLP)

Natural Language Processing is a field of AI that enables computers to understand, interpret, and generate human language. The script discusses NLP in the context of applications like sentiment analysis and chatbots, particularly focusing on using GPT (Generative Pre-trained Transformer) models to create a risk recommendation system that can understand and generate text related to potential risks.

πŸ’‘Generative Pre-trained Transformer (GPT)

GPT is a type of deep learning model that has been pre-trained on a large corpus of text data, enabling it to generate human-like text. The script explains the use of GPT in risk management, specifically for creating a system that suggests potential risks based on various client and company characteristics, without needing to input sensitive client information directly.

πŸ’‘Data Science

Data science is a field that uses scientific methods, processes, and algorithms to extract knowledge and insights from data. In the script, data science is mentioned as an essential part of AI applications, emphasizing the need for a clear business goal before applying machine learning or deep learning techniques, ensuring that these tools are used to solve specific business problems effectively.

πŸ’‘Principal Component Analysis (PCA)

PCA is a statistical technique used to emphasize the variation and bring out strong patterns in a dataset. The script refers to PCA as a step in the data cleaning process, where it helps identify the most relevant characteristics for analysis by reducing the dimensionality of the data set and excluding variables that do not contribute to the segmentation process.

πŸ’‘K-means

K-means is a clustering algorithm used in machine learning to partition a dataset into a specified number of clusters. The script explains the process of using K-means for client segmentation, where the algorithm identifies the optimal number of segments and assigns clients to clusters based on their similarity, helping in understanding and managing client relationships more effectively.

Highlights

Introduction to the webinar on artificial intervention for risk management by Nicole Koza and data scientist Faguna.

Invitation to learn more about risk management through the Pirani Academy and a provided link.

Discussion on the theoretical and practical aspects of artificial intelligence (AI) in risk management.

Definition of AI as the ability to reason, solve problems, and learn from experience without human intervention.

Explanation of the composition of AI, including machine learning, deep learning, and data science.

Emphasis on having a clear business goal before applying AI techniques like machine learning.

Introduction to supervised and unsupervised learning in machine learning.

Example of using supervised learning for fraud detection in banking transactions.

Explanation of unsupervised learning for segmentation and anomaly detection without a target variable.

Application of machine learning for anti-money laundering through client segmentation.

Importance of data cleaning in the machine learning process for accurate results.

Description of the k-means clustering algorithm for segmenting clients into groups.

Introduction of the Birch algorithm for large datasets and its robustness to outliers.

Discussion on the advantages of machine learning for scalability, adaptability, and personalization in client segmentation.

Process of choosing the optimal number of segments using the elbow method and silhouette metric.

Demonstration of the platform's capability to perform segmentation steps and data cleaning.

Overview of the platform's features for compliance, operational risk, information security, and anti-money laundering.

Introduction to natural language processing (NLP) as a subfield of AI for text data processing and generation.

Examples of NLP applications in sentiment analysis, text classification, and chatbots.

Explanation of GPT (Generative Pre-trained Transformer) model and its use in risk management.

Application of chat GPT for recommending potential risks in a risk management system.

Description of the process for using chat GPT to suggest risks based on industry, process, and other variables.

Summary emphasizing the importance of focusing on business goals and the significance of data cleaning in AI applications.

Transcripts

play00:02

hello everyone Welcome to our rates

play00:04

management School my name is Nicole Koza

play00:07

and I'm a member of the purani team I'm

play00:09

excited to be here with you today to

play00:11

explore the theoretical and practical

play00:13

aspect of the different topics that we

play00:15

will see

play00:16

to help ourselves into this session on

play00:19

artificial intervention for risk

play00:21

management we are happy to have with us

play00:25

data scientist at kirani Hi faguna how

play00:28

are you

play00:32

hi everyone Thanks for for being part of

play00:36

this webinar how are you

play00:38

cool just exciting to to get started and

play00:43

before we begin I would love to invite

play00:46

everyone to continue learning about the

play00:48

world of risk management school by

play00:49

visiting the pirani academy where you

play00:52

will find valuable learning materials I

play00:55

will leave the link in that chat now so

play00:57

you can see the website

play00:59

so there you are

play01:05

done

play01:07

and if you have any questions during

play01:10

that webinar please feel free to make

play01:12

them through the question and answer

play01:14

section we'll be happy to answer them at

play01:17

the end of this webinar now let's begin

play01:20

with this exciting topic over to you

play01:23

okay hey thank you again everyone for

play01:27

being part of this webinar

play01:30

today we're going to be talking about

play01:32

artificial intelligence and some of its

play01:35

applications into risk management

play01:39

is to First Define this concept of

play01:43

artificial intelligence and understand

play01:45

how is it composed

play01:48

and then we are going to start

play01:51

explaining our first application of

play01:54

machine learning for segmentation

play01:57

and finally we are going to be talking

play02:00

about natural language processing

play02:02

techniques

play02:03

and an application that we also have

play02:06

using chat GPD to create a risk

play02:10

recommendation system

play02:13

so well that's the the agenda for today

play02:15

obviously if you have questions Nicole

play02:19

feel free to interrupt me and we are

play02:21

going to try to to solve them in the

play02:23

moment

play02:25

um I'm going to try to to explain like

play02:28

the theory first and then we're going to

play02:30

to go to the platform to

play02:33

to show you all the steps

play02:37

um okay so what we we're gonna start

play02:40

um I think that the best way to

play02:44

understand this new concept of

play02:46

artificial intelligence is to First

play02:49

Define both words separately and so what

play02:54

do we mean by intelligence well that's

play02:57

the ability of reasoning of solving

play03:01

problems

play03:03

understanding complex ideas and being

play03:06

able to learn quickly and learn from

play03:09

experience and well why why do we say

play03:12

that this is artificial well because we

play03:15

don't want to employ a human resource to

play03:20

develop those tasks or or those problems

play03:23

that we can get done using a computer

play03:27

program or an artificial intelligence

play03:29

application

play03:32

I mean that the idea here is that we can

play03:35

use the human intelligence for more

play03:38

complex problems and deliver like the

play03:41

more simpler task or to a computer

play03:45

program or an application that can help

play03:48

us

play03:50

um obviously we don't pretend that this

play03:53

artificial intelligence to do all the

play03:56

job but but we are going to use it as a

play03:59

tool to help us and

play04:02

and maybe achieve better results or or

play04:05

to get the task done more more quickly

play04:09

so well another thing that it's

play04:13

important is to to understand how is it

play04:16

composed uh the this artificial

play04:20

intelligence and you need to know

play04:23

that there are two main soup areas

play04:27

um these two areas are called machine

play04:30

learning and deep learning

play04:33

and these two areas are responsible for

play04:36

developing and training the algorithms

play04:40

and models that we are going to use in

play04:43

our final artificial intelligence

play04:46

application

play04:47

so

play04:48

these areas use like a huge amount of

play04:51

data to train the models and this model

play04:56

is what we are going to use in the final

play04:59

application for example

play05:01

a a really common nowadays application

play05:05

of this is chat CPT so we can consider

play05:08

the GPT as the final result as the final

play05:12

artificial intelligence application that

play05:14

we use to process and generate a text

play05:18

and also called but

play05:22

we need to understand that this

play05:23

application was trained before using a

play05:27

huge amount text Data I'm using some

play05:31

deep learning techniques so well this is

play05:34

a how is it composed this this whole box

play05:39

um and another important thing is the

play05:41

other Circle that you can see in the

play05:43

right

play05:44

it's called data science and what's

play05:47

what's the importance of this well

play05:49

because we need to understand that we

play05:52

should always need to have like a

play05:55

business goal for example a business

play05:58

problem that we want to solve or a

play06:00

business opportunity that we want to

play06:03

take advantage of

play06:05

but that's that's need to be like really

play06:08

clear we never need to use like a

play06:12

machine learning or deep learning uh a

play06:15

goal itself these are just tools and

play06:18

techniques that we can use to help us

play06:21

achieve a business goal in this case for

play06:24

example to solve a business problem but

play06:27

that's all so so the process should

play06:30

should ever be to First identify these

play06:33

business problems or opportunities

play06:36

and to try to fix them uh in a simpler

play06:40

way and then if we think that the

play06:42

problem is really complex well in that

play06:44

case yes let's use machine learning for

play06:47

example but only if if this techniques

play06:50

is going to to help us to achieve a

play06:53

better result

play06:54

so that's I consider it really important

play06:58

and well before explaining our first

play07:03

application another thing that we need

play07:06

to know is

play07:08

that in machine learning there are two

play07:11

ways of training a model these two ways

play07:16

are called supervised learning and

play07:18

unsupervised learning

play07:20

and well the supervised learning uh the

play07:24

the most important part in this

play07:26

technique is that we need to have like a

play07:29

Target variable

play07:31

and I will explain this with an example

play07:35

so suppose that we are a bank for

play07:39

example

play07:40

um we have a data set with a lot of

play07:42

transactions from our clients and in

play07:46

this data set we have a column that

play07:50

identified is if this transaction is

play07:54

problem or not problem

play07:56

so in that case our Target variable will

play08:00

be that column the column that

play08:03

classifies the transactions into

play08:05

fraudulent and not from

play08:07

so we can use this whole data set to

play08:11

train a machine learning model using

play08:14

supervised learning techniques and we

play08:18

want that this model is able to predict

play08:21

in the future if a new transaction is

play08:24

frozen or not from so that's that's the

play08:27

concept of targeted balance

play08:29

another example let's suppose that we we

play08:33

are a retail company

play08:35

um we have a data set with a lot of

play08:37

products and we have a column A that

play08:42

shows them the amount of sales for each

play08:46

product so in this case our alternative

play08:48

variable what we want to predict is that

play08:51

column the amount of size so using this

play08:55

historical data we can train a model and

play08:59

this model this model is going to

play09:01

understand and be able to predict in the

play09:05

future the amounts of sales for each

play09:08

product

play09:09

uh yes Nicole you have a question I know

play09:13

okay sorry sorry

play09:16

um okay so that's that's the case of a

play09:18

supervised learner

play09:20

what what happened in the other case on

play09:23

unsupervised learning well here we don't

play09:26

have like our

play09:29

our Target variable

play09:31

what do we have here it's like we have a

play09:35

data set for example the most common

play09:39

example here is an application for

play09:42

clustering or segmentation

play09:44

so let's suppose that we have a data set

play09:47

with characteristics from from all of

play09:50

our clients and we are trying to find

play09:54

something find a pattern find some

play09:57

groups that have a like similar

play10:00

characteristics uh but we don't have in

play10:04

this case our Target value that's why

play10:06

it's called unsupervised learning

play10:09

so uh as I told you the most common news

play10:13

is for for clustering for example uh

play10:16

where we want to find like patterns and

play10:19

another really common use case is for

play10:22

cyber security or for anomaly detection

play10:26

for example we have a data set with all

play10:29

of the transactions that our clients

play10:32

um made in in our in our bank

play10:35

but we don't have the column that

play10:37

classifies these transactions in problem

play10:40

or not problem

play10:41

so in that case we can apply an

play10:44

algorithm an unsupervised learning

play10:46

algorithm that identifies like uncommon

play10:50

transactions or weird patterns and that

play10:54

can help us to finally identify maybe a

play10:58

fraudible in transaction but that's the

play11:01

main difference between them

play11:03

so well now that we know this we can

play11:07

start with our first application in this

play11:10

case is for

play11:12

is for the section of anti-money

play11:15

laundering that we have in our platform

play11:18

and the application is the use of

play11:22

machine learning techniques for

play11:24

segmentation of our clients and so well

play11:28

remember what I told you at the

play11:31

beginning we always need to have like

play11:34

this business call in mind and that's

play11:37

our pre our main goal

play11:40

so which which our objective to find

play11:44

groups

play11:45

clients that are similar between them

play11:47

and different from others so that's the

play11:51

main goal

play11:52

and what's the purpose of doing this

play11:56

well this will help us to understand the

play12:00

customers that we are dealing with this

play12:03

also can be applied for example to

play12:05

suppliers or products and this will help

play12:09

us to discover patterns that we didn't

play12:12

know and also it's a really good first

play12:16

step towards the transactional

play12:19

monitoring

play12:21

um and also in some companies it's an

play12:25

obligation to perform a segmentation so

play12:28

we we can comply with regulations

play12:31

um so well that that's the goal the the

play12:34

objective is to identify groups of

play12:37

clients that are similar between them

play12:39

and

play12:41

when I say similar I mean that they are

play12:44

close to to each other because for

play12:47

example if we plot our data set in a

play12:50

graph uh like you can see here in the

play12:53

right

play12:55

um

play12:56

we can see that there are some

play12:57

observations that are closer to each

play12:59

other and are like more far away from

play13:02

from others so that's the way that these

play13:05

algorithms Works they calculate like

play13:08

distance between observations and in

play13:11

that way they identify like patterns or

play13:14

groups so so we need to to know that

play13:20

um and well another important thing is

play13:23

to to know like the advantages that we

play13:25

can obtain

play13:27

by applying machine learning techniques

play13:31

um well obviously if if we are working

play13:34

for example a with a data set of 100

play13:37

clients what I would told you is to okay

play13:41

just to a manual segmentation because

play13:44

never forget the goal is to segment our

play13:47

clients so if you have a data set with

play13:49

100 clients I consider that you it will

play13:53

be more easy to perform a manual

play13:55

segmentation

play13:57

but obviously if you are dealing with

play14:00

for example a huge amount of data for

play14:04

example with 2000 clients but in that

play14:07

case yes you can

play14:09

um like have more advantages by applying

play14:12

a machine learning technique

play14:15

in that case I consider that you can

play14:17

achieve a higher level of accuracy in

play14:21

those cases another advantages are for

play14:25

example automation scalability

play14:27

adaptability for example in our platform

play14:31

you will be able to perform all the

play14:33

steps like really easy and for example

play14:37

another case if you experience and

play14:42

and incrementation for example you go

play14:45

from 2 000 clients and now you have like

play14:49

10 000 well it's the same you can apply

play14:52

the same algorithms and it will be good

play14:55

you are not going to suffer like

play14:59

of time efficiency and like if you have

play15:03

to do it manually another good thing is

play15:06

that you can apply these algorithms to

play15:08

segment clients or suppliers or products

play15:12

and another good thing is the

play15:15

personalization

play15:17

um

play15:18

for example you can decide if you're you

play15:21

are going to use

play15:23

um for example different characteristics

play15:25

in their segmentation you can choose if

play15:28

you can consider for example the country

play15:31

of your clients or you can you want to

play15:34

consider the income of your client or

play15:37

not and you can perform and apply these

play15:40

algorithms and compare results and and

play15:43

obviously well the time efficiency of

play15:46

applying these techniques it's a lot

play15:49

so these are some of the advantages

play15:54

um and well now I'm going to talk about

play15:57

um the the classroom algorithm the

play16:00

clustering algorithms that we have in in

play16:03

the platform one of them it's called

play16:06

k-means

play16:08

and this algorithm

play16:10

tries to partition

play16:14

our data set into K different groups

play16:20

so later we are going to talk about how

play16:24

to identify the optimal number segments

play16:27

I mean the the K number of segments

play16:31

but the idea of of this algorithm is to

play16:35

to do that to partition our data set

play16:37

into K different groups and the way it

play16:42

performs is to First

play16:44

identify like

play16:47

different points randomly in the in our

play16:51

in our data set and then it starts like

play16:56

every performing calculations to well

play16:59

identify which observations are closer

play17:02

to that that initial point that we are

play17:06

going to call

play17:07

a the cluster Center or the centroid of

play17:10

the cluster and this process it's it's

play17:14

going to be repeating like a like a

play17:18

hundred of times or like many times

play17:20

until there are no more movements and we

play17:24

can just say okay these are the the

play17:28

final segment

play17:30

and what thing about what's good about

play17:33

this is that it's really simple

play17:35

uh and it's like really efficient uh

play17:40

also maybe it's like kind of sensitive

play17:43

to initialization and maybe it's

play17:46

sensitive also to Appliance but well we

play17:50

are we're going to explain a way to

play17:53

to remove these outliers before

play17:56

performing the algorithm but well that's

play17:58

the idea behind okay means

play18:01

and the other one it's called Birch and

play18:06

this algorithm is better for more larger

play18:10

data sets

play18:11

but this algorithm does is to First

play18:16

identify like observations that are like

play18:20

really similar to to each other

play18:23

um and confirm like a summary of the

play18:27

data set a summary using like this like

play18:31

uh aggregating like observations that

play18:34

are like really similar and then using

play18:37

like this summary of the data set and

play18:41

he decides like which are going to be

play18:44

the the final segments of or clusters uh

play18:49

well this algorithm is like it's more

play18:51

robust to outliers for example and it's

play18:54

better for

play18:56

for high dimensional data

play18:58

um for larger data sets so in those

play19:01

cases we should use this but well what I

play19:05

recommend is to let's use both and

play19:08

compare results that's really important

play19:10

and decide which result is better for

play19:14

you and for your your use case of

play19:16

business

play19:19

um okay uh well another important step

play19:23

in this process is to perform a good

play19:26

data cleaning

play19:28

and because this is like the most

play19:30

important step in the whole process you

play19:33

can use for example the most advanced

play19:35

algorithms and the most advanced and

play19:39

machine learning techniques but if you

play19:43

didn't perform a good data cleaning you

play19:47

are not going to to obtain a good result

play19:49

anyway

play19:50

so well in the platform you are going to

play19:53

see that we have like different

play19:56

uh four steps and you can perform this

play20:01

directly in the platform and the first

play20:03

one is to is an outlier detection

play20:07

and why this is important

play20:10

because for example we have our data set

play20:14

of clients

play20:15

um we identify that there is a client

play20:18

which income is like super different

play20:21

from the others so this is going to

play20:25

uh this is going to generate a

play20:28

distortion like in in the distance

play20:30

between the observations like you can

play20:33

see in this graph in the right for

play20:36

example the presence of an outlier

play20:39

it's generating a distortion that it

play20:43

seems that the the whole other group of

play20:47

observations belongs to the same cluster

play20:49

but this isn't true because it's just a

play20:53

distortion generated by by the presence

play20:56

of the supplier so

play20:59

it's important to detect them and to

play21:02

remove them from the process but never

play21:05

to eliminate them we should confirm a

play21:09

different segment of the suppliers and

play21:12

we we should analyze them like in a

play21:15

separate way

play21:17

and another important step is the

play21:21

categorical variable encoding and in

play21:25

this case what we are trying to do is

play21:27

for example if we have a a column that

play21:31

with the country of each client

play21:34

so we have for example that one client

play21:38

belongs to for example Spain and the

play21:41

other one to to France uh well but we

play21:44

can't measure like distance between

play21:46

these words so what we are trying to do

play21:49

is to express this categorical variable

play21:51

into numerical ones

play21:54

um then another step is data

play21:58

normalization

play22:00

and why this is important because for

play22:03

example we have one column with the H of

play22:07

the clients that assumes values from 20

play22:11

to 80 for example and we have another

play22:14

column that

play22:17

with the income of the of the client so

play22:20

well in that case the income for example

play22:23

can assume assume values from one

play22:27

thousand to five thousand for example so

play22:31

it would be like really difficult for

play22:33

the algorithm to calculate distance

play22:36

between numbers like 20 and numbers like

play22:39

and five thousands so what we are doing

play22:42

is to express like all numerical data in

play22:47

in a in a scale that makes them

play22:50

comparable between them

play22:52

and finally another important step is

play22:56

the principal component analysis

play23:00

um this this is also an algorithm itself

play23:04

that identifies which of all the

play23:08

characteristics from our clients are the

play23:10

most important and the more the most

play23:13

relevant

play23:15

for the analysis

play23:17

so

play23:19

for example a really simple example

play23:21

maybe we have this country column and

play23:26

maybe all of our clients belongs to

play23:28

Spain for example so well the algorithm

play23:32

is going to tell us look all your

play23:35

clients are from the same country you

play23:37

are not going to be able to segment

play23:39

clients using this variable because they

play23:42

are all from the same country so you

play23:44

should not include this in the process

play23:46

and what in that case we are going to

play23:48

exclude this column

play23:50

and that will be also good for the

play23:53

process because we are going to reduce

play23:55

the dimensionality of the whole data set

play23:59

and so what these are the like the four

play24:02

steps that we are using

play24:05

um

play24:06

well as I told you another important

play24:08

thing is to choose the optimal number of

play24:12

segments

play24:13

in this case we are going to use like

play24:16

two methods two different methods we

play24:19

always try to offer like different

play24:22

Alternatives as we were talking about

play24:25

the algorithm kamins and built to apply

play24:29

both of them and then compare results

play24:31

well in this case to choose the optimal

play24:34

number of segments we are going to use

play24:36

two methods

play24:39

one of them it's called the elbow method

play24:42

and this method use a metric that is

play24:46

called inertia

play24:47

and this metric

play24:50

it's the average of the distance from

play24:54

each point to its nearest Center it it

play24:58

nearest the center of the cluster so

play25:01

what it's trying to do is to calculate

play25:04

the distance of the distance of a client

play25:06

to the center of the client of the

play25:10

cluster that this client belongs to

play25:14

um well so what what happened here that

play25:17

if we have for example 100 clients the

play25:21

the lowest inertia that we can achieve

play25:24

is if we have like 100 clusters because

play25:27

each client is going to be a cluster

play25:31

itself

play25:32

but well we need to find like an optimal

play25:35

point so the optimal point

play25:39

will be when the inertia is starting to

play25:43

to slow down

play25:45

um and it well we need to choose like

play25:47

like a number because we need to achieve

play25:51

like the final goal of the segmentation

play25:54

that is to to segment and create like

play25:57

group of clients and we can just use

play25:59

like 100 clusters and have like every

play26:02

client is separate in a different

play26:04

cluster so we are going to find this

play26:07

this optimal point

play26:08

and um another

play26:11

um method that we have is the silhouette

play26:15

metric and this is like really common to

play26:19

use and this metric

play26:22

um use like two different parameters one

play26:25

of them is called cohesion

play26:28

and they are one it's called separation

play26:32

um decoration parameter

play26:34

tries to

play26:37

calculate like the distance or how like

play26:42

like how

play26:44

close the observations from a cluster

play26:47

are

play26:49

so for example in this in this picture

play26:52

it tries to to calculate the distance

play26:55

from all the observations from the same

play26:58

cluster from the cluster that is in red

play27:01

color and the

play27:04

the separation parameter

play27:07

is trying to calculate the distance from

play27:11

the observations of the cluster bread

play27:13

to the cluster blue so we are trying to

play27:18

achieve like the optimal situation is

play27:21

when for example like this like the

play27:24

picture number one where the

play27:27

observations are like really similar the

play27:30

observations from from one cluster are

play27:33

like really similar between them and are

play27:35

also really far away or different from

play27:39

the other clusters

play27:40

and so what this is the the way we can

play27:43

like choose the optimal number

play27:47

um well now we are going to show you

play27:50

the platform so I'm going to show you

play27:53

all these steps and how you can perform

play27:56

them there

play27:59

okay this is the platform here you are

play28:03

going to find like different sections

play28:07

for example we have a section for

play28:09

compliance another one for operational

play28:12

risk another one for informational

play28:16

information security risk and another

play28:19

one for a anti-money laundering this is

play28:23

what we we were talking about

play28:25

so here in this section

play28:30

um sorry my my screenshots

play28:33

gold goes off

play28:37

and just just I mean

play28:41

okay so here

play28:43

we can create and perform all the steps

play28:47

from the segmentation we will be able to

play28:50

create a new one I'm going to show you a

play28:54

signification uh segmentation I have

play28:57

already created and so so it will be

play29:00

more quickly

play29:02

but here is a news case for for clients

play29:05

segmentation

play29:07

um

play29:10

sorry I left in the link the link of

play29:15

this pirani tool in the chat so you can

play29:18

access to our software and you can do

play29:22

the Practical exercise with a condo with

play29:24

this link you will be able to access a

play29:27

15-day free trial so you can test the

play29:31

fifth the five system that we have in

play29:34

kirani

play29:36

okay perfect perfect

play29:39

well in this section you will be able to

play29:43

perform like all the steps uh we are

play29:46

going to start by choosing the variables

play29:52

the variables that we are going to that

play29:54

we want to include in our segmentation

play29:57

process

play29:58

what is important here is to First

play30:01

calculate this the that is the

play30:04

population of the variable and this is a

play30:07

metric that indicates how many not new

play30:11

values or not empty values we have in

play30:14

that variable

play30:16

and so well a recommendation is to

play30:19

always use like those characteristics

play30:21

that are over 80 percent for example

play30:26

if we have like this this type of

play30:30

characteristics that are empty or are

play30:32

like this like lower than 80 percent

play30:36

well you we should like exclude them and

play30:40

then the the second step is to perform

play30:44

the data cleaning that we were talking

play30:47

about here you will be able to perform

play30:50

like all the four steps

play30:53

transform like categorical data to

play30:56

numerical

play30:57

then we are going to detect the outliers

play31:00

in our data and also an important thing

play31:04

as I told you we are not going to remove

play31:07

or eliminate the suppliers we are just

play31:10

removing them from the segmentation

play31:12

process but we are going to like store

play31:16

them in a different file in a different

play31:19

CSV file so you can analyze these

play31:22

suppliers like separately and decide if

play31:26

it was just a mistake in the data or

play31:28

maybe it's a really important client

play31:30

that you should analyze and separately

play31:34

another important thing it's the well

play31:37

the data normalization that we were

play31:40

talking about and finally the principal

play31:43

component analysis to identify the the

play31:47

most important characteristics to

play31:50

include in our model

play31:52

and then when we finish this this part

play31:56

we we can choose like the model the the

play32:01

we prefer as I told you I think that the

play32:06

best thing is to try two different

play32:08

segmentations one using comments and one

play32:12

and another one using the Birch

play32:15

algorithm

play32:17

and then just analyze the results and

play32:20

compare compare them to well to decide

play32:23

which one is better for your use case

play32:28

um okay then after we we choose the the

play32:31

model we are going to calculate like the

play32:34

optimal number of segments

play32:38

here for example I have already

play32:41

calculated so using the album method uh

play32:45

well it is identifying that the optimal

play32:48

number should be six and but well

play32:52

obviously we should try with this method

play32:55

and also with the silhouette method and

play32:58

and then finally when we have done all

play33:02

the steps we are going to execute the

play33:06

the segmentation

play33:08

so when this process is conclude

play33:13

um we are going to receive like a PDF a

play33:17

file

play33:18

here I can show you with all the

play33:21

a all important characteristics and

play33:25

descriptive statistics of your segments

play33:28

and you are going to here you can see

play33:32

like the amounts of data that you have

play33:35

the number of segments uh you can see

play33:38

for example some characteristics from

play33:42

your numerical variables you are going

play33:44

to see for example the characteristics

play33:47

from the cluster one from cluster 2 and

play33:51

it's important that you start here

play33:53

analyzing all the results

play33:56

uh well yeah try to find a business

play34:01

perspective on these results and try to

play34:03

well

play34:05

achieve the goal that is to understand

play34:07

more your your clients

play34:09

and well this is how you can perform it

play34:11

in the platform

play34:13

now we are going to go back to the

play34:17

presentation

play34:19

I don't know if you have any questions

play34:21

but now we can continue with this the

play34:25

final part that is the

play34:28

the natural language processing

play34:30

techniques and the chat GPT application

play34:35

um well here well it's important

play34:39

is to to know what what it means this

play34:42

this Superior that it's also a soup area

play34:44

in artificial intelligence and the main

play34:48

purpose of this is to combine the

play34:51

computer science

play34:54

the computer science

play34:56

applications with Linguistics to obtain

play35:00

this new file that is called natural

play35:03

language processing and the main

play35:06

objective here it's not only that we are

play35:10

able to process a text Data because

play35:13

before we were talking about only

play35:15

numerical data but here we are trying to

play35:18

process like text but on not only that

play35:22

on or also we try to that this model can

play35:28

understand what it means like this text

play35:30

and also will be able to generate like

play35:34

new human language

play35:36

so that's the the main goal and some of

play35:40

the the examples of these applications

play35:44

are for sentiment analysis most

play35:48

companies use like these techniques to

play35:51

analyze sentiment in in the tweets for

play35:54

example when someone writes a tweet in

play35:59

about their company and try we are

play36:02

trying to understand if the sentiment

play36:04

behind this tweet is like positive or

play36:07

negative

play36:10

um also this is really useful for

play36:12

comments or service in in our

play36:16

in our company's platform to well to

play36:20

understand if we are offering a good

play36:23

service and if the sentiment of these

play36:26

comments are like positive or negative

play36:30

um well another common

play36:33

application is to text classification

play36:36

for for example an application into risk

play36:40

management it would be to classify risk

play36:43

we can like read all these risks that

play36:48

people write and classify them into

play36:51

different categories to well to

play36:53

understand which category is like more

play36:56

dangerous or it's more probable that

play37:00

that it happens for example

play37:02

and another application that it's really

play37:05

common is to the chatbot and this is

play37:09

what this is really common to ask

play37:12

questions to this chatbot and receive an

play37:14

answer

play37:16

um

play37:17

um well now

play37:19

in in our platform we decided to to

play37:23

apply the the model John CPT

play37:26

but it's important to understand what it

play37:29

means this maybe a lot of people don't

play37:32

know what it means like the the letters

play37:35

the G the B and the T the C refers to

play37:40

generative

play37:42

and this means that this model is able

play37:45

to create or generate like new data

play37:49

points

play37:50

remember that in this case we are

play37:52

talking about text their data so this

play37:56

model is going to be able to generate

play37:58

like new paragraph or new text

play38:03

another important thing is the p and it

play38:06

refers to pre-train uh well this is

play38:10

because it was pre-trained using like a

play38:13

huge amount of text Data available in

play38:16

the internet

play38:18

and another thing is the the tea that

play38:23

refers to a Transformers and well this

play38:26

is a like the Deep learning or neural

play38:31

network architecture that works from

play38:34

behind the model uh well you you need to

play38:38

know this in detail but an important

play38:40

thing about this is the

play38:43

attention mechanism mechanism that it

play38:46

has and this allows the model to for

play38:49

example process like a big paragraph of

play38:52

text and identify in in that paragraph

play38:55

like the keywords or the context or the

play38:59

of the whole paragraph and to retain

play39:01

that information and well this is like

play39:05

it was like really revolutionary in that

play39:07

moment

play39:10

um well there are some applications of

play39:13

chat GPT into risk management

play39:17

um

play39:17

for example one of them is what I'm

play39:21

going to show you in the platform and is

play39:23

a recommendation of potential risk

play39:26

because well we identified that a lot of

play39:31

like new clients access to the platform

play39:35

and they couldn't like identify the risk

play39:39

that that they were suffering or maybe

play39:42

they know like the risk that they were

play39:45

suffering but they couldn't like write

play39:47

it properly in the platform so they were

play39:52

like starting like calling our

play39:54

consultants in race and what we decided

play39:57

to create like this system to recommend

play39:59

potential risk

play40:02

um and this system

play40:04

is going to be using like different K

play40:07

variables about the company of the

play40:10

client for example one of the the K

play40:13

variables are the industry of the client

play40:16

uh the process in the company and the

play40:20

risk system that this refers to for

play40:23

example if they are talking about

play40:25

compliance or operational risk or

play40:28

anti-money laundering risks and and of

play40:33

course the language of the client

play40:36

um well the process is like this picture

play40:39

that we have in the lab all these K

play40:42

variables are an input

play40:45

into the prompt of the chat CPT model

play40:48

and then the well GPT use like all the

play40:52

information all these text data that it

play40:55

was used to pre-train this model from

play40:58

the internet

play40:59

um so chat GPT used all this information

play41:02

to generate a response and to make some

play41:06

recommendations of potential risks to

play41:08

decline

play41:11

um but also as I told you this is one

play41:14

application but we also have like like

play41:17

other ones for example risk

play41:19

classification we can use like this

play41:23

small to classify like the risk into

play41:26

different categories

play41:27

and and another one that we are working

play41:31

on is to make suggestions of mitigations

play41:34

actions and possible causes of the risk

play41:38

and so the idea is not only to recommend

play41:41

potential risk but also to recommend

play41:44

actions to mitigate that risk and maybe

play41:48

possible causes so that's what they're

play41:52

working on and so now I'm going to show

play41:55

you

play41:55

in the platform where you can find this

play41:59

um

play42:00

okay so here

play42:02

we can go for example to another section

play42:06

for example we can go to the operational

play42:08

risk section

play42:11

um

play42:13

okay so here we are going to have the

play42:17

module of risk

play42:19

where we can create like new ones

play42:24

a and for example before this

play42:28

application like people start like

play42:31

writing here please

play42:34

something but now we are trying to offer

play42:37

like this suggestion risk using chat gbt

play42:41

so you can get some ideas

play42:43

and and here you are going to select for

play42:47

example the industry that you are

play42:49

working on and maybe for example the

play42:51

hull industry

play42:53

and you can also select the process and

play42:56

for example we are going to choose this

play42:59

technology process

play43:01

uh let's just press suggest and subject

play43:06

is going to start working using all the

play43:08

information

play43:10

of the internet and well here are the

play43:13

suggestions you you can read them uh

play43:16

well maybe this can be helpful to

play43:19

identify to to use these suggestions or

play43:22

to identify potential risk

play43:24

um if you want to use one of these

play43:26

recommendations you just click the

play43:30

um and and that's all and you can create

play43:32

in this way a new risk

play43:35

also you will find like suggestions for

play43:37

a lot of Industries like the financial

play43:40

the insurance

play43:41

retail and also a lot of process into

play43:45

the company from the company for example

play43:47

a marketing commercial customer service

play43:52

it's a lot to

play43:55

that you can use and well that's that's

play43:58

all of this

play44:01

of this presentation that's all what I

play44:04

have to to show you I don't know if

play44:06

there are some questions

play44:09

we have one question

play44:14

keeps information we provide

play44:18

um well no in this case we are not like

play44:22

giving the information that we have for

play44:26

example from our clients to to chat gbd

play44:28

we are just using like this uh K

play44:31

variables for example the the industry

play44:34

and the process and that's all we are

play44:38

using like the information that is from

play44:41

behind of the model of charge GPT to

play44:44

help us but we are not giving life

play44:46

information actually

play44:49

um well we are doing this because we

play44:51

don't know like which which are the

play44:55

the security behind this and

play44:58

if data is keeping private or not so

play45:03

so that's what what we are doing

play45:06

great

play45:08

I think we know how

play45:11

to have questions

play45:13

so we can do a brief summary about your

play45:18

presentation

play45:20

to our participant to

play45:23

okay okay yeah so well to summary uh we

play45:27

can say that it's important to don't

play45:30

lose the the focus or the goal in

play45:33

business we need to First understand

play45:35

like the business problem or the

play45:37

business opportunity that we want to

play45:39

solve and then start thinking if if we

play45:43

need to use machine learning or

play45:45

artificial intelligence for that if if

play45:48

the problem is like really complex so

play45:51

let's do it

play45:52

um but if not just keep it simple and

play45:55

try to always focus on solving the

play45:57

problem

play45:58

and then another important thing is if

play46:01

you are performing a segmentation to

play46:05

well to do like all the steps and give

play46:09

the importance to the data cleaning part

play46:11

because that's the most important part

play46:14

as I told you you can use like all the

play46:17

algorithms that you want but if you

play46:20

didn't perform a good data cleaning you

play46:22

are not going to get a good result

play46:25

and then obviously always compare

play46:29

results like using both different

play46:32

methodologies but at the end you need to

play46:34

compare and try to choose which one is

play46:37

better for for your company

play46:40

um well that's it and if you if you need

play46:43

some recommendations of risk just use

play46:46

our section of gbd to to get them

play46:51

thank you so much for your value

play46:54

participation in swearing us finally I

play46:57

would love to invite everyone to

play46:59

participate in the short survey that

play47:01

will help you when this meeting ends

play47:03

that will help us a lot to continue

play47:05

improving our webinars thank you so much

play47:08

for your participate to for your

play47:10

participation and see you soon thank you

play47:13

bye

play47:15

[Music]

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Artificial IntelligenceRisk ManagementWebinarMachine LearningData ScienceNatural LanguageChat GPTSegmentationAMLCompliance