How works Artificial Intelligence for risk management
Summary
TLDRIn this webinar, Nicole Koza and data scientist Hi Faguna explore the integration of artificial intelligence in risk management. They define AI, delve into machine learning and deep learning, and discuss their applications in client segmentation and risk recommendation systems using chat GPT. The session emphasizes the importance of aligning AI techniques with business goals, the significance of data cleaning, and the practical steps for achieving accurate client segmentation. It also highlights the potential of natural language processing in understanding and generating human language for risk assessment.
Takeaways
- 😀 The webinar focuses on the theoretical and practical aspects of artificial intelligence in risk management.
- 🔍 The speaker, Faguna, introduces the concept of artificial intelligence (AI) as the ability to reason, solve problems, and learn from experience without human intervention.
- 🤖 AI is composed of two main areas: machine learning and deep learning, which are responsible for developing and training algorithms and models.
- 📚 The importance of data science is highlighted, emphasizing the need for a clear business goal before applying AI techniques.
- 📈 Two types of machine learning are discussed: supervised learning, which requires a target variable, and unsupervised learning, which does not.
- 🏦 An example of supervised learning is identifying fraudulent transactions in a bank's dataset, while unsupervised learning might be used for customer segmentation in a retail company.
- 🤝 The benefits of machine learning for segmentation include accuracy, automation, scalability, adaptability, and personalization.
- 🔍 Two clustering algorithms are mentioned: k-means, which is simple and efficient but sensitive to initialization, and Birch, which is robust to outliers and suitable for large datasets.
- 🧹 The critical step of data cleaning is discussed, including outlier detection, categorical variable encoding, data normalization, and principal component analysis.
- 📊 Two methods for determining the optimal number of segments are presented: the elbow method, which uses inertia, and the silhouette method, which considers cohesion and separation.
- 💡 The platform demonstration shows how to perform client segmentation using the discussed techniques and tools, emphasizing the importance of analyzing results from a business perspective.
Q & A
What is the main focus of the webinar presented by Nicole Koza and Data Scientist Faguna?
-The main focus of the webinar is to explore the theoretical and practical aspects of artificial intervention in risk management, including applications of artificial intelligence, machine learning for segmentation, and natural language processing techniques.
What is the definition of artificial intelligence as discussed in the webinar?
-Artificial intelligence is defined as the ability of a computer program or application to reason, solve problems, understand complex ideas, learn quickly, and learn from experience, without the need for human intervention.
What are the two main sub-areas of artificial intelligence mentioned in the script?
-The two main sub-areas of artificial intelligence mentioned are machine learning and deep learning, which are responsible for developing and training the algorithms and models used in AI applications.
Why is data science important in the context of artificial intelligence applications?
-Data science is important because it helps to identify clear business goals or problems that AI applications aim to solve or opportunities they aim to leverage. It ensures that AI techniques are used as tools to achieve specific business objectives rather than being an end in themselves.
What are the two types of machine learning mentioned in the script?
-The two types of machine learning mentioned are supervised learning, which requires a target variable, and unsupervised learning, which does not require a target variable and is often used for pattern recognition or anomaly detection.
Can you explain the purpose of client segmentation using machine learning techniques?
-The purpose of client segmentation using machine learning is to identify groups of clients with similar characteristics, which can help in understanding the customer base better, discovering unknown patterns, complying with regulations, and improving transaction monitoring.
What are the advantages of using machine learning for client segmentation compared to manual methods?
-The advantages of using machine learning for client segmentation include higher accuracy with large datasets, automation, scalability, adaptability, personalization of segmentation characteristics, and improved time efficiency.
What are the two clustering algorithms mentioned in the script, and what are their main differences?
-The two clustering algorithms mentioned are k-means and Birch. K-means is simpler and more efficient but can be sensitive to initialization and outliers. Birch is more robust to outliers and better suited for large datasets and high-dimensional data.
Why is data cleaning an essential step in the machine learning process for segmentation?
-Data cleaning is essential because it ensures the quality of the data used for training models. Steps like outlier detection, categorical variable encoding, data normalization, and principal component analysis help to prepare the data for accurate and meaningful analysis.
What are the two methods mentioned for determining the optimal number of segments in a dataset?
-The two methods mentioned for determining the optimal number of segments are the elbow method, which uses the inertia metric, and the silhouette method, which considers cohesion and separation parameters.
What is the role of natural language processing (NLP) in risk management applications?
-In risk management, NLP can be used for sentiment analysis to understand customer feedback, text classification to categorize risks, and chatbots to interact with clients and provide risk recommendations or information.
What is the significance of GPT in the context of the presented risk management application?
-GPT, or Generative Pre-trained Transformer, is significant because it can generate new text data based on the input provided. In the risk management application, it uses various client-related variables to suggest potential risks, aiding in the identification and mitigation of such risks.
How does the platform use chat GPT to assist clients in identifying potential risks?
-The platform uses chat GPT by inputting variables such as industry, process, and risk system into the model's prompt. GPT then generates responses and recommendations for potential risks based on the pre-trained data it has from the internet.
What are the precautions taken to ensure client data privacy when using chat GPT in the platform?
-To ensure client data privacy, the platform does not provide specific client information to chat GPT. Instead, it uses general variables like industry and process, relying on the model's pre-trained data to generate risk recommendations.
Outlines
👋 Introduction to Risk Management and AI
The script begins with Nicole Koza welcoming participants to a webinar on artificial intervention for risk management. She introduces the agenda, which includes defining artificial intelligence (AI), discussing machine learning for segmentation, and exploring natural language processing with chat GPT. Nicole also invites attendees to visit the Pirani Academy for learning materials and encourages questions throughout the session. The session's first speaker, a data scientist, outlines the day's topics, emphasizing the importance of understanding AI's theoretical and practical aspects and their applications in risk management.
🤖 Understanding Artificial Intelligence Composition
The speaker delves into the definition of artificial intelligence, distinguishing between human intelligence and AI's ability to learn from experience without human intervention. They explain the composition of AI, highlighting the importance of machine learning and deep learning in developing algorithms and models. The speaker also discusses the role of data science in aligning AI applications with clear business goals, emphasizing that AI is a tool to aid in solving complex problems rather than a standalone solution.
📚 Machine Learning Techniques: Supervised and Unsupervised Learning
The script explains the two primary methods of machine learning: supervised and unsupervised learning. Supervised learning is illustrated with examples from banking and retail sectors, where historical data is used to predict outcomes such as fraudulent transactions or sales amounts. Unsupervised learning is characterized by the absence of a target variable, with clustering and segmentation as common applications. The speaker also discusses the importance of identifying business problems before applying machine learning techniques.
🔍 Applications of Machine Learning in Anti-Money Laundering
The speaker introduces a specific application of machine learning in anti-money laundering, focusing on client segmentation to identify similar groups for better understanding and compliance with regulations. They discuss the advantages of using machine learning for scalability, adaptability, and time efficiency, especially when dealing with large datasets. The script also mentions different clustering algorithms, such as k-means and Birch, and their suitability for various data sizes and dimensions.
🧼 Data Cleaning and Clustering Algorithms
This paragraph emphasizes the importance of data cleaning in the machine learning process, outlining steps such as outlier detection, categorical variable encoding, data normalization, and principal component analysis. The speaker explains that these steps are crucial for accurate results, regardless of the sophistication of the algorithms used. They also discuss the k-means and Birch algorithms, providing insights into their functionality and ideal use cases.
📊 Optimal Segment Selection and Platform Demonstration
The script describes methods for determining the optimal number of segments in a dataset, including the elbow method and the silhouette metric. These methods help in finding a balance between too many and too few segments for effective client understanding. The speaker then transitions to demonstrating the platform where these steps can be performed, including data selection, cleaning, and the application of machine learning models.
📝 Natural Language Processing and Its Applications
The speaker introduces natural language processing (NLP) as a subfield of AI that combines computer science and linguistics to process and understand human language. They discuss various applications of NLP, such as sentiment analysis, text classification, and chatbots, which are used to analyze sentiments in social media, classify risks, and interact with customers, respectively.
🗣️ Chat GPT and Its Risk Management Applications
The script explores the use of chat GPT, a generative pre-trained transformer model, in risk management. The speaker explains how chat GPT can be used to recommend potential risks based on various client-related variables. They also mention other applications of chat GPT, such as risk classification and suggesting mitigation actions and possible causes of risks.
🏢 Platform Features and Risk Recommendation System
The speaker demonstrates the platform's features for risk management, including the use of chat GPT to suggest potential risks based on industry, process, and other client variables. They show how users can select variables and receive risk suggestions, which can then be used to create new risk entries in the system. The script highlights the platform's ability to provide suggestions across various industries and company processes.
📉 Conclusion and Future Improvements
In conclusion, the speaker summarizes the importance of aligning AI applications with business goals and the necessity of thorough data cleaning in machine learning. They also emphasize the value of comparing different methodologies to find the best solution for a company's needs. The script ends with an invitation for participants to complete a survey to help improve future webinars and thanks them for their participation.
Mindmap
Keywords
💡Artificial Intelligence (AI)
💡Risk Management
💡Machine Learning
💡Supervised Learning
💡Unsupervised Learning
💡Client Segmentation
💡Natural Language Processing (NLP)
💡Generative Pre-trained Transformer (GPT)
💡Data Science
💡Principal Component Analysis (PCA)
💡K-means
Highlights
Introduction to the webinar on artificial intervention for risk management by Nicole Koza and data scientist Faguna.
Invitation to learn more about risk management through the Pirani Academy and a provided link.
Discussion on the theoretical and practical aspects of artificial intelligence (AI) in risk management.
Definition of AI as the ability to reason, solve problems, and learn from experience without human intervention.
Explanation of the composition of AI, including machine learning, deep learning, and data science.
Emphasis on having a clear business goal before applying AI techniques like machine learning.
Introduction to supervised and unsupervised learning in machine learning.
Example of using supervised learning for fraud detection in banking transactions.
Explanation of unsupervised learning for segmentation and anomaly detection without a target variable.
Application of machine learning for anti-money laundering through client segmentation.
Importance of data cleaning in the machine learning process for accurate results.
Description of the k-means clustering algorithm for segmenting clients into groups.
Introduction of the Birch algorithm for large datasets and its robustness to outliers.
Discussion on the advantages of machine learning for scalability, adaptability, and personalization in client segmentation.
Process of choosing the optimal number of segments using the elbow method and silhouette metric.
Demonstration of the platform's capability to perform segmentation steps and data cleaning.
Overview of the platform's features for compliance, operational risk, information security, and anti-money laundering.
Introduction to natural language processing (NLP) as a subfield of AI for text data processing and generation.
Examples of NLP applications in sentiment analysis, text classification, and chatbots.
Explanation of GPT (Generative Pre-trained Transformer) model and its use in risk management.
Application of chat GPT for recommending potential risks in a risk management system.
Description of the process for using chat GPT to suggest risks based on industry, process, and other variables.
Summary emphasizing the importance of focusing on business goals and the significance of data cleaning in AI applications.
Transcripts
hello everyone Welcome to our rates
management School my name is Nicole Koza
and I'm a member of the purani team I'm
excited to be here with you today to
explore the theoretical and practical
aspect of the different topics that we
will see
to help ourselves into this session on
artificial intervention for risk
management we are happy to have with us
data scientist at kirani Hi faguna how
are you
hi everyone Thanks for for being part of
this webinar how are you
cool just exciting to to get started and
before we begin I would love to invite
everyone to continue learning about the
world of risk management school by
visiting the pirani academy where you
will find valuable learning materials I
will leave the link in that chat now so
you can see the website
so there you are
done
and if you have any questions during
that webinar please feel free to make
them through the question and answer
section we'll be happy to answer them at
the end of this webinar now let's begin
with this exciting topic over to you
okay hey thank you again everyone for
being part of this webinar
today we're going to be talking about
artificial intelligence and some of its
applications into risk management
is to First Define this concept of
artificial intelligence and understand
how is it composed
and then we are going to start
explaining our first application of
machine learning for segmentation
and finally we are going to be talking
about natural language processing
techniques
and an application that we also have
using chat GPD to create a risk
recommendation system
so well that's the the agenda for today
obviously if you have questions Nicole
feel free to interrupt me and we are
going to try to to solve them in the
moment
um I'm going to try to to explain like
the theory first and then we're going to
to go to the platform to
to show you all the steps
um okay so what we we're gonna start
um I think that the best way to
understand this new concept of
artificial intelligence is to First
Define both words separately and so what
do we mean by intelligence well that's
the ability of reasoning of solving
problems
understanding complex ideas and being
able to learn quickly and learn from
experience and well why why do we say
that this is artificial well because we
don't want to employ a human resource to
develop those tasks or or those problems
that we can get done using a computer
program or an artificial intelligence
application
I mean that the idea here is that we can
use the human intelligence for more
complex problems and deliver like the
more simpler task or to a computer
program or an application that can help
us
um obviously we don't pretend that this
artificial intelligence to do all the
job but but we are going to use it as a
tool to help us and
and maybe achieve better results or or
to get the task done more more quickly
so well another thing that it's
important is to to understand how is it
composed uh the this artificial
intelligence and you need to know
that there are two main soup areas
um these two areas are called machine
learning and deep learning
and these two areas are responsible for
developing and training the algorithms
and models that we are going to use in
our final artificial intelligence
application
so
these areas use like a huge amount of
data to train the models and this model
is what we are going to use in the final
application for example
a a really common nowadays application
of this is chat CPT so we can consider
the GPT as the final result as the final
artificial intelligence application that
we use to process and generate a text
and also called but
we need to understand that this
application was trained before using a
huge amount text Data I'm using some
deep learning techniques so well this is
a how is it composed this this whole box
um and another important thing is the
other Circle that you can see in the
right
it's called data science and what's
what's the importance of this well
because we need to understand that we
should always need to have like a
business goal for example a business
problem that we want to solve or a
business opportunity that we want to
take advantage of
but that's that's need to be like really
clear we never need to use like a
machine learning or deep learning uh a
goal itself these are just tools and
techniques that we can use to help us
achieve a business goal in this case for
example to solve a business problem but
that's all so so the process should
should ever be to First identify these
business problems or opportunities
and to try to fix them uh in a simpler
way and then if we think that the
problem is really complex well in that
case yes let's use machine learning for
example but only if if this techniques
is going to to help us to achieve a
better result
so that's I consider it really important
and well before explaining our first
application another thing that we need
to know is
that in machine learning there are two
ways of training a model these two ways
are called supervised learning and
unsupervised learning
and well the supervised learning uh the
the most important part in this
technique is that we need to have like a
Target variable
and I will explain this with an example
so suppose that we are a bank for
example
um we have a data set with a lot of
transactions from our clients and in
this data set we have a column that
identified is if this transaction is
problem or not problem
so in that case our Target variable will
be that column the column that
classifies the transactions into
fraudulent and not from
so we can use this whole data set to
train a machine learning model using
supervised learning techniques and we
want that this model is able to predict
in the future if a new transaction is
frozen or not from so that's that's the
concept of targeted balance
another example let's suppose that we we
are a retail company
um we have a data set with a lot of
products and we have a column A that
shows them the amount of sales for each
product so in this case our alternative
variable what we want to predict is that
column the amount of size so using this
historical data we can train a model and
this model this model is going to
understand and be able to predict in the
future the amounts of sales for each
product
uh yes Nicole you have a question I know
okay sorry sorry
um okay so that's that's the case of a
supervised learner
what what happened in the other case on
unsupervised learning well here we don't
have like our
our Target variable
what do we have here it's like we have a
data set for example the most common
example here is an application for
clustering or segmentation
so let's suppose that we have a data set
with characteristics from from all of
our clients and we are trying to find
something find a pattern find some
groups that have a like similar
characteristics uh but we don't have in
this case our Target value that's why
it's called unsupervised learning
so uh as I told you the most common news
is for for clustering for example uh
where we want to find like patterns and
another really common use case is for
cyber security or for anomaly detection
for example we have a data set with all
of the transactions that our clients
um made in in our in our bank
but we don't have the column that
classifies these transactions in problem
or not problem
so in that case we can apply an
algorithm an unsupervised learning
algorithm that identifies like uncommon
transactions or weird patterns and that
can help us to finally identify maybe a
fraudible in transaction but that's the
main difference between them
so well now that we know this we can
start with our first application in this
case is for
is for the section of anti-money
laundering that we have in our platform
and the application is the use of
machine learning techniques for
segmentation of our clients and so well
remember what I told you at the
beginning we always need to have like
this business call in mind and that's
our pre our main goal
so which which our objective to find
groups
clients that are similar between them
and different from others so that's the
main goal
and what's the purpose of doing this
well this will help us to understand the
customers that we are dealing with this
also can be applied for example to
suppliers or products and this will help
us to discover patterns that we didn't
know and also it's a really good first
step towards the transactional
monitoring
um and also in some companies it's an
obligation to perform a segmentation so
we we can comply with regulations
um so well that that's the goal the the
objective is to identify groups of
clients that are similar between them
and
when I say similar I mean that they are
close to to each other because for
example if we plot our data set in a
graph uh like you can see here in the
right
um
we can see that there are some
observations that are closer to each
other and are like more far away from
from others so that's the way that these
algorithms Works they calculate like
distance between observations and in
that way they identify like patterns or
groups so so we need to to know that
um and well another important thing is
to to know like the advantages that we
can obtain
by applying machine learning techniques
um well obviously if if we are working
for example a with a data set of 100
clients what I would told you is to okay
just to a manual segmentation because
never forget the goal is to segment our
clients so if you have a data set with
100 clients I consider that you it will
be more easy to perform a manual
segmentation
but obviously if you are dealing with
for example a huge amount of data for
example with 2000 clients but in that
case yes you can
um like have more advantages by applying
a machine learning technique
in that case I consider that you can
achieve a higher level of accuracy in
those cases another advantages are for
example automation scalability
adaptability for example in our platform
you will be able to perform all the
steps like really easy and for example
another case if you experience and
and incrementation for example you go
from 2 000 clients and now you have like
10 000 well it's the same you can apply
the same algorithms and it will be good
you are not going to suffer like
of time efficiency and like if you have
to do it manually another good thing is
that you can apply these algorithms to
segment clients or suppliers or products
and another good thing is the
personalization
um
for example you can decide if you're you
are going to use
um for example different characteristics
in their segmentation you can choose if
you can consider for example the country
of your clients or you can you want to
consider the income of your client or
not and you can perform and apply these
algorithms and compare results and and
obviously well the time efficiency of
applying these techniques it's a lot
so these are some of the advantages
um and well now I'm going to talk about
um the the classroom algorithm the
clustering algorithms that we have in in
the platform one of them it's called
k-means
and this algorithm
tries to partition
our data set into K different groups
so later we are going to talk about how
to identify the optimal number segments
I mean the the K number of segments
but the idea of of this algorithm is to
to do that to partition our data set
into K different groups and the way it
performs is to First
identify like
different points randomly in the in our
in our data set and then it starts like
every performing calculations to well
identify which observations are closer
to that that initial point that we are
going to call
a the cluster Center or the centroid of
the cluster and this process it's it's
going to be repeating like a like a
hundred of times or like many times
until there are no more movements and we
can just say okay these are the the
final segment
and what thing about what's good about
this is that it's really simple
uh and it's like really efficient uh
also maybe it's like kind of sensitive
to initialization and maybe it's
sensitive also to Appliance but well we
are we're going to explain a way to
to remove these outliers before
performing the algorithm but well that's
the idea behind okay means
and the other one it's called Birch and
this algorithm is better for more larger
data sets
but this algorithm does is to First
identify like observations that are like
really similar to to each other
um and confirm like a summary of the
data set a summary using like this like
uh aggregating like observations that
are like really similar and then using
like this summary of the data set and
he decides like which are going to be
the the final segments of or clusters uh
well this algorithm is like it's more
robust to outliers for example and it's
better for
for high dimensional data
um for larger data sets so in those
cases we should use this but well what I
recommend is to let's use both and
compare results that's really important
and decide which result is better for
you and for your your use case of
business
um okay uh well another important step
in this process is to perform a good
data cleaning
and because this is like the most
important step in the whole process you
can use for example the most advanced
algorithms and the most advanced and
machine learning techniques but if you
didn't perform a good data cleaning you
are not going to to obtain a good result
anyway
so well in the platform you are going to
see that we have like different
uh four steps and you can perform this
directly in the platform and the first
one is to is an outlier detection
and why this is important
because for example we have our data set
of clients
um we identify that there is a client
which income is like super different
from the others so this is going to
uh this is going to generate a
distortion like in in the distance
between the observations like you can
see in this graph in the right for
example the presence of an outlier
it's generating a distortion that it
seems that the the whole other group of
observations belongs to the same cluster
but this isn't true because it's just a
distortion generated by by the presence
of the supplier so
it's important to detect them and to
remove them from the process but never
to eliminate them we should confirm a
different segment of the suppliers and
we we should analyze them like in a
separate way
and another important step is the
categorical variable encoding and in
this case what we are trying to do is
for example if we have a a column that
with the country of each client
so we have for example that one client
belongs to for example Spain and the
other one to to France uh well but we
can't measure like distance between
these words so what we are trying to do
is to express this categorical variable
into numerical ones
um then another step is data
normalization
and why this is important because for
example we have one column with the H of
the clients that assumes values from 20
to 80 for example and we have another
column that
with the income of the of the client so
well in that case the income for example
can assume assume values from one
thousand to five thousand for example so
it would be like really difficult for
the algorithm to calculate distance
between numbers like 20 and numbers like
and five thousands so what we are doing
is to express like all numerical data in
in a in a scale that makes them
comparable between them
and finally another important step is
the principal component analysis
um this this is also an algorithm itself
that identifies which of all the
characteristics from our clients are the
most important and the more the most
relevant
for the analysis
so
for example a really simple example
maybe we have this country column and
maybe all of our clients belongs to
Spain for example so well the algorithm
is going to tell us look all your
clients are from the same country you
are not going to be able to segment
clients using this variable because they
are all from the same country so you
should not include this in the process
and what in that case we are going to
exclude this column
and that will be also good for the
process because we are going to reduce
the dimensionality of the whole data set
and so what these are the like the four
steps that we are using
um
well as I told you another important
thing is to choose the optimal number of
segments
in this case we are going to use like
two methods two different methods we
always try to offer like different
Alternatives as we were talking about
the algorithm kamins and built to apply
both of them and then compare results
well in this case to choose the optimal
number of segments we are going to use
two methods
one of them it's called the elbow method
and this method use a metric that is
called inertia
and this metric
it's the average of the distance from
each point to its nearest Center it it
nearest the center of the cluster so
what it's trying to do is to calculate
the distance of the distance of a client
to the center of the client of the
cluster that this client belongs to
um well so what what happened here that
if we have for example 100 clients the
the lowest inertia that we can achieve
is if we have like 100 clusters because
each client is going to be a cluster
itself
but well we need to find like an optimal
point so the optimal point
will be when the inertia is starting to
to slow down
um and it well we need to choose like
like a number because we need to achieve
like the final goal of the segmentation
that is to to segment and create like
group of clients and we can just use
like 100 clusters and have like every
client is separate in a different
cluster so we are going to find this
this optimal point
and um another
um method that we have is the silhouette
metric and this is like really common to
use and this metric
um use like two different parameters one
of them is called cohesion
and they are one it's called separation
um decoration parameter
tries to
calculate like the distance or how like
like how
close the observations from a cluster
are
so for example in this in this picture
it tries to to calculate the distance
from all the observations from the same
cluster from the cluster that is in red
color and the
the separation parameter
is trying to calculate the distance from
the observations of the cluster bread
to the cluster blue so we are trying to
achieve like the optimal situation is
when for example like this like the
picture number one where the
observations are like really similar the
observations from from one cluster are
like really similar between them and are
also really far away or different from
the other clusters
and so what this is the the way we can
like choose the optimal number
um well now we are going to show you
the platform so I'm going to show you
all these steps and how you can perform
them there
okay this is the platform here you are
going to find like different sections
for example we have a section for
compliance another one for operational
risk another one for informational
information security risk and another
one for a anti-money laundering this is
what we we were talking about
so here in this section
um sorry my my screenshots
gold goes off
and just just I mean
okay so here
we can create and perform all the steps
from the segmentation we will be able to
create a new one I'm going to show you a
signification uh segmentation I have
already created and so so it will be
more quickly
but here is a news case for for clients
segmentation
um
sorry I left in the link the link of
this pirani tool in the chat so you can
access to our software and you can do
the Practical exercise with a condo with
this link you will be able to access a
15-day free trial so you can test the
fifth the five system that we have in
kirani
okay perfect perfect
well in this section you will be able to
perform like all the steps uh we are
going to start by choosing the variables
the variables that we are going to that
we want to include in our segmentation
process
what is important here is to First
calculate this the that is the
population of the variable and this is a
metric that indicates how many not new
values or not empty values we have in
that variable
and so well a recommendation is to
always use like those characteristics
that are over 80 percent for example
if we have like this this type of
characteristics that are empty or are
like this like lower than 80 percent
well you we should like exclude them and
then the the second step is to perform
the data cleaning that we were talking
about here you will be able to perform
like all the four steps
transform like categorical data to
numerical
then we are going to detect the outliers
in our data and also an important thing
as I told you we are not going to remove
or eliminate the suppliers we are just
removing them from the segmentation
process but we are going to like store
them in a different file in a different
CSV file so you can analyze these
suppliers like separately and decide if
it was just a mistake in the data or
maybe it's a really important client
that you should analyze and separately
another important thing it's the well
the data normalization that we were
talking about and finally the principal
component analysis to identify the the
most important characteristics to
include in our model
and then when we finish this this part
we we can choose like the model the the
we prefer as I told you I think that the
best thing is to try two different
segmentations one using comments and one
and another one using the Birch
algorithm
and then just analyze the results and
compare compare them to well to decide
which one is better for your use case
um okay then after we we choose the the
model we are going to calculate like the
optimal number of segments
here for example I have already
calculated so using the album method uh
well it is identifying that the optimal
number should be six and but well
obviously we should try with this method
and also with the silhouette method and
and then finally when we have done all
the steps we are going to execute the
the segmentation
so when this process is conclude
um we are going to receive like a PDF a
file
here I can show you with all the
a all important characteristics and
descriptive statistics of your segments
and you are going to here you can see
like the amounts of data that you have
the number of segments uh you can see
for example some characteristics from
your numerical variables you are going
to see for example the characteristics
from the cluster one from cluster 2 and
it's important that you start here
analyzing all the results
uh well yeah try to find a business
perspective on these results and try to
well
achieve the goal that is to understand
more your your clients
and well this is how you can perform it
in the platform
now we are going to go back to the
presentation
I don't know if you have any questions
but now we can continue with this the
final part that is the
the natural language processing
techniques and the chat GPT application
um well here well it's important
is to to know what what it means this
this Superior that it's also a soup area
in artificial intelligence and the main
purpose of this is to combine the
computer science
the computer science
applications with Linguistics to obtain
this new file that is called natural
language processing and the main
objective here it's not only that we are
able to process a text Data because
before we were talking about only
numerical data but here we are trying to
process like text but on not only that
on or also we try to that this model can
understand what it means like this text
and also will be able to generate like
new human language
so that's the the main goal and some of
the the examples of these applications
are for sentiment analysis most
companies use like these techniques to
analyze sentiment in in the tweets for
example when someone writes a tweet in
about their company and try we are
trying to understand if the sentiment
behind this tweet is like positive or
negative
um also this is really useful for
comments or service in in our
in our company's platform to well to
understand if we are offering a good
service and if the sentiment of these
comments are like positive or negative
um well another common
application is to text classification
for for example an application into risk
management it would be to classify risk
we can like read all these risks that
people write and classify them into
different categories to well to
understand which category is like more
dangerous or it's more probable that
that it happens for example
and another application that it's really
common is to the chatbot and this is
what this is really common to ask
questions to this chatbot and receive an
answer
um
um well now
in in our platform we decided to to
apply the the model John CPT
but it's important to understand what it
means this maybe a lot of people don't
know what it means like the the letters
the G the B and the T the C refers to
generative
and this means that this model is able
to create or generate like new data
points
remember that in this case we are
talking about text their data so this
model is going to be able to generate
like new paragraph or new text
another important thing is the p and it
refers to pre-train uh well this is
because it was pre-trained using like a
huge amount of text Data available in
the internet
and another thing is the the tea that
refers to a Transformers and well this
is a like the Deep learning or neural
network architecture that works from
behind the model uh well you you need to
know this in detail but an important
thing about this is the
attention mechanism mechanism that it
has and this allows the model to for
example process like a big paragraph of
text and identify in in that paragraph
like the keywords or the context or the
of the whole paragraph and to retain
that information and well this is like
it was like really revolutionary in that
moment
um well there are some applications of
chat GPT into risk management
um
for example one of them is what I'm
going to show you in the platform and is
a recommendation of potential risk
because well we identified that a lot of
like new clients access to the platform
and they couldn't like identify the risk
that that they were suffering or maybe
they know like the risk that they were
suffering but they couldn't like write
it properly in the platform so they were
like starting like calling our
consultants in race and what we decided
to create like this system to recommend
potential risk
um and this system
is going to be using like different K
variables about the company of the
client for example one of the the K
variables are the industry of the client
uh the process in the company and the
risk system that this refers to for
example if they are talking about
compliance or operational risk or
anti-money laundering risks and and of
course the language of the client
um well the process is like this picture
that we have in the lab all these K
variables are an input
into the prompt of the chat CPT model
and then the well GPT use like all the
information all these text data that it
was used to pre-train this model from
the internet
um so chat GPT used all this information
to generate a response and to make some
recommendations of potential risks to
decline
um but also as I told you this is one
application but we also have like like
other ones for example risk
classification we can use like this
small to classify like the risk into
different categories
and and another one that we are working
on is to make suggestions of mitigations
actions and possible causes of the risk
and so the idea is not only to recommend
potential risk but also to recommend
actions to mitigate that risk and maybe
possible causes so that's what they're
working on and so now I'm going to show
you
in the platform where you can find this
um
okay so here
we can go for example to another section
for example we can go to the operational
risk section
um
okay so here we are going to have the
module of risk
where we can create like new ones
a and for example before this
application like people start like
writing here please
something but now we are trying to offer
like this suggestion risk using chat gbt
so you can get some ideas
and and here you are going to select for
example the industry that you are
working on and maybe for example the
hull industry
and you can also select the process and
for example we are going to choose this
technology process
uh let's just press suggest and subject
is going to start working using all the
information
of the internet and well here are the
suggestions you you can read them uh
well maybe this can be helpful to
identify to to use these suggestions or
to identify potential risk
um if you want to use one of these
recommendations you just click the
um and and that's all and you can create
in this way a new risk
also you will find like suggestions for
a lot of Industries like the financial
the insurance
retail and also a lot of process into
the company from the company for example
a marketing commercial customer service
it's a lot to
that you can use and well that's that's
all of this
of this presentation that's all what I
have to to show you I don't know if
there are some questions
we have one question
keeps information we provide
um well no in this case we are not like
giving the information that we have for
example from our clients to to chat gbd
we are just using like this uh K
variables for example the the industry
and the process and that's all we are
using like the information that is from
behind of the model of charge GPT to
help us but we are not giving life
information actually
um well we are doing this because we
don't know like which which are the
the security behind this and
if data is keeping private or not so
so that's what what we are doing
great
I think we know how
to have questions
so we can do a brief summary about your
presentation
to our participant to
okay okay yeah so well to summary uh we
can say that it's important to don't
lose the the focus or the goal in
business we need to First understand
like the business problem or the
business opportunity that we want to
solve and then start thinking if if we
need to use machine learning or
artificial intelligence for that if if
the problem is like really complex so
let's do it
um but if not just keep it simple and
try to always focus on solving the
problem
and then another important thing is if
you are performing a segmentation to
well to do like all the steps and give
the importance to the data cleaning part
because that's the most important part
as I told you you can use like all the
algorithms that you want but if you
didn't perform a good data cleaning you
are not going to get a good result
and then obviously always compare
results like using both different
methodologies but at the end you need to
compare and try to choose which one is
better for for your company
um well that's it and if you if you need
some recommendations of risk just use
our section of gbd to to get them
thank you so much for your value
participation in swearing us finally I
would love to invite everyone to
participate in the short survey that
will help you when this meeting ends
that will help us a lot to continue
improving our webinars thank you so much
for your participate to for your
participation and see you soon thank you
bye
[Music]
Посмотреть больше похожих видео
Yapay Zekanın Bug Bounty ve Penetrasyon Testine Etkisi ve Birlikte Kullanımı
Artificial Intelligence Class 10 Ch 1 |AI vs Machine Learning vs Deep Learning (Differences) 2022-23
What does AI mean for the future of IT? | Cloud Chat Ep. 35
EP7 reduzido
ChatGPT spiegato da un avvocato "privacysta" (guida completa con prompt utili)
Google’s AI Course for Beginners (in 10 minutes)!
5.0 / 5 (0 votes)