Make Your Applications Smarter with Oracle Machine Learning and AutoML | Oracle DatabaseWorld AI Edi

Oracle
30 May 202431:26

Summary

TLDRこんにちは、マルコス・アランシビアです。今日は、オラクルの機械学習とAutoMLを使ってアプリケーションを賢くする方法をご紹介します。機械学習モデルをデータベース内で実行することで、顧客やプロセスの分析が向上し、外部パッケージのコスト削減が可能です。また、OracleのAutoML UIを使って簡単にモデルを構築・展開し、さまざまなAPIを通じて利用できます。特に、顧客離脱の予測モデルのデモを通じて、その有用性を示します。オラクルの機械学習は、データ移動の手間を省き、迅速なプロジェクト展開を可能にします。

Takeaways

  • 😀 自律型データベースの製品管理チームの一員として、Oracle機械学習とAutoMLを使用してアプリケーションをスマートにする方法を紹介します。
  • 📈 機械学習をアプリケーションに導入することで、顧客やプロセスの分析が向上します。
  • 🛠 AutoML UIを使用して機械学習モデルを構築し、APIを通じてアプリケーションの性能を向上させます。
  • 🔒 データベース内で実行されるモデルにより、機械学習モデルの信頼性とセキュリティが向上します。
  • 💡 機械学習の使用事例はあらゆる業界に存在し、顧客セグメンテーション、ロイヤルティ、顧客離脱などに役立ちます。
  • 🔧 AutoMLは反復的なタスクを排除し、アルゴリズム選択、サンプリング、特徴選択、モデル調整を自動化します。
  • 📊 Oracleデータベース内で直接データを処理することで、データ移動のコストと時間を削減します。
  • 🚀 Oracle MovieStreamの事例を使用して、顧客の離脱を予測し、プロモーションを提供する方法を示します。
  • 📝 AutoML UIを使用してモデルを構築し、PythonやSQLでスコアリングおよびデプロイメントを実行する方法を説明します。
  • 🔍 データとモデルの監視機能を提供し、データの変動やモデルのドリフトを防止します。

Q & A

  • オートマチック機械学習 (AutoML) とは何ですか?

    -AutoML は、機械学習モデルの構築、評価、およびチューニングのプロセスを自動化する技術です。データサイエンティストがアルゴリズムを選択し、モデルを最適化するための手動作業を減らします。

  • Oracle 機械学習の主な利点は何ですか?

    -Oracle 機械学習の利点には、データベース内で直接機械学習を実行することでデータ移動を減らし、ソリューションのアーキテクチャを簡素化することが含まれます。また、信頼性とセキュリティが向上し、モデルの構築と展開が容易になります。

  • どのようなユースケースで Oracle 機械学習が使用されますか?

    -Oracle 機械学習は、顧客セグメンテーション、ロイヤルティ、顧客離反(チャーン)、製品の次のベストオファー、クロスセル、予知保全など、さまざまな業界のユースケースで使用されます。

  • Oracle Autonomous Database の利点は何ですか?

    -Oracle Autonomous Database は、クラウド環境で Oracle データベースを運用し、パッチ適用やバックアップなどのインフラストラクチャ管理を自動化することで、ユーザーが分析や開発に集中できるようにします。

  • 顧客チャーンを予測するためにどのアルゴリズムが使用されましたか?

    -顧客チャーン予測には、決定木、ランダムフォレスト、サポートベクターマシン (SVM) の線形カーネルが使用されました。

  • AutoML がどのように機能するか説明してください。

    -AutoML は、アルゴリズムの選択、適応サンプリング、特徴選択、およびモデルチューニングのプロセスを自動化します。これにより、最適なアルゴリズムとモデルパラメータを選定し、データサイエンティストの作業を効率化します。

  • Oracle Machine Learning Notebooks とは何ですか?

    -Oracle Machine Learning Notebooks は、データサイエンティストが機械学習モデルを構築、評価、デプロイするためのインタラクティブなノートブック環境です。Python や SQL などの言語でコードを実行できます。

  • AutoML によるモデルのチューニングとは何ですか?

    -AutoML によるモデルのチューニングは、モデルの精度を向上させるために、各モデルのパラメータを自動的に調整するプロセスです。これにより、より高性能な機械学習モデルが得られます。

  • Oracle Machine Learning Services とは何ですか?

    -Oracle Machine Learning Services は、機械学習モデルを REST エンドポイントとしてデプロイし、API を通じてモデルの予測を実行するサービスです。これにより、アプリケーション開発者は簡単に機械学習モデルを統合できます。

  • モデルモニタリングの重要性について説明してください。

    -モデルモニタリングは、時間の経過によるデータの変化やモデルの劣化を検出するために重要です。これにより、モデルのパフォーマンスを維持し、適切な更新や再トレーニングを行うことができます。

Outlines

00:00

😀 アプリケーションをスマートにする方法

マルコス・アランシビアがオートノーマスデータベースの製品管理チームの一員として、Oracle機械学習とAutoMLを使用してアプリケーションをスマートにする方法を紹介。主なポイントは、機械学習をアプリケーションに統合することで顧客やプロセスの分析が向上し、AutoML UIを使用してさまざまなAPIを通じて機械学習モデルを構築し、信頼性とセキュリティを向上させること。また、機械学習のユースケースがすべての業界に存在し、異なるアルゴリズムの使用法や、Oracleデータベース内のホワイトリストにある多数のアルゴリズムについて説明。

05:02

🚀 オラクルムービーストリームのユースケース

クラウド向けのオラクルデータベースを使用し、パッチ適用やバックアップなどのインフラ管理を自動化。例としてNetflix競合のOracle MovieStreamを取り上げ、顧客維持のために機械学習を使用して顧客の離脱理由(チャーン)を特定する方法を説明。データレイクを利用してデータをクリーンアップし、機械学習モデルを構築・展開するプロセスを示し、AutoMLが如何にデータサイエンティストの作業を効率化するかを紹介。

10:03

🛠️ AutoML実験の実演

オラクルクラウドインフラストラクチャ内でAutoMLを使用して実験を行うデモ。データソースの選択、特徴量の確認、予測対象の定義、モデルチューニングなどのステップを実行。特定のアルゴリズム(例:ランダムフォレスト)が選択され、その特徴量の重要度や精度の確認方法を解説。また、SQLやPythonを用いたスコアリングの実行方法も紹介。

15:04

💻 モデルのノートブック生成とデプロイ

AutoMLで生成されたモデルをPythonノートブックに変換し、再利用可能な形で保存する方法を説明。ノートブックはモデルの実験内容やデータの準備方法、最適化設定を含む。さらに、モデルをRESTエンドポイントとしてデプロイし、他のユーザーがすぐに使用できるようにするプロセスも紹介。

20:04

📊 スコアリングとモデル利用

生成されたノートブックやRESTエンドポイントを使用して、顧客の離脱確率を予測する方法を実演。PythonやSQLでスコアリングを行い、顧客IDや予測確率を含む結果を取得。高速なスコアリングのためのマイクロバッチ処理の利用についても説明。

25:05

🧪 モデルとデータの監視

データドリフトやモデルの劣化を防ぐためのOracle機械学習サービスの監視機能を紹介。UIを使用してデータやモデルの変動を監視し、特定のメトリクスを使用して評価する方法を説明。データや特徴量の変動、複数のアルゴリズムの監視などの機能も紹介。

30:08

🌟 Oracle機械学習の今後の展望

Oracle機械学習コンポーネントの今後のロードマップについて説明。GPUの利用可能性やLLM推論、埋め込みなど、今年の残りの期間に向けて発表される新機能に言及。セッションの締めくくりとして、機械学習をアプリケーションに統合する利点と信頼性の向上について再度強調。

Mindmap

Keywords

💡Oracle Machine Learning

Oracle Machine Learningは、Oracle社が提供する機械学習プラットフォームです。このプラットフォームを使用することで、データ分析と予測モデリングが可能になります。ビデオでは、Oracle Machine Learningを活用してアプリケーションをよりスマートにするプロセスが説明されています。

💡AutoML

AutoMLとは、自動機械学習の略で、データサイエンティストが手動でアルゴリズムを調整する必要がなく、自動で最適な機械学習モデルを選択して構築するプロセスを指します。ビデオでは、AutoMLを使用して顧客離反(churn)モデルを構築し、アプリケーションの分析能力を高める方法が紹介されています。

💡顧客離反(Customer Churn)

顧客離反とは、顧客が製品やサービスを利用をやめることを指します。ビデオでは、顧客離反を予測するための分類アルゴリズムを使用し、重要な顧客を特定し、彼らがサービスを続ける可能性を高める方法について説明しています。

💡予測メンテナンス(Predictive Maintenance)

予測メンテナンスは、機械学習を使用して設備の故障を予測し、メンテナンスを計画的に行うことでダウンタイムを減らす戦略です。ビデオでは、機械学習の応用例として提唱されており、設備の健全性を維持する上で重要な役割を果たしています。

💡分類アルゴリズム(Classification Algorithm)

分類アルゴリズムは、機械学習における手法の1つで、データを事前に定義されたカテゴリーに分類することを目的としています。ビデオでは、顧客離反の予測に分類アルゴリズムを使用することで、顧客がサービスを続けるかどうかを予測するモデルを構築しています。

💡アルゴリズムのホワイトリスト(Algorithm Whitelist)

アルゴリズムのホワイトリストとは、特定のアルゴリズムのみを許可するリストのことです。ビデオでは、Oracleデータベース内で利用可能なアルゴリズムのホワイトリストが存在し、顧客が最も頻繁に使用するアルゴリズムに基づいて作成されています。

💡Oracle Converged Database

Oracle Converged Databaseは、Oracle社が提供する統合データベースソリューションです。ビデオでは、Oracle Machine LearningがこのConverged Databaseの一部として機能し、データベース内で直接機械学習モデルを構築できる利便性が強調されています。

💡AutoML UI

AutoML UIとは、AutoMLの機能をグラフィカルユーザーインターフェースを通じて提供するものです。ビデオでは、AutoML UIを使用して機械学習モデルを構築し、パラメータの調整や最適化を自動化するプロセスが説明されています。

💡データマイニング(Data Mining)

データマイニングは、大量のデータから意味のある情報やパターンを抽出するプロセスです。ビデオでは、Oracle Data Minerがデータマイニングに使用され、機械学習モデルの構築と予測分析を行う際に活用されています。

💡モデルの信頼性とセキュリティ(Model Reliability and Security)

モデルの信頼性とセキュリティとは、機械学習モデルが正確で安全であることを保証するものです。ビデオでは、データベース内でモデルを実行することで、信頼性とセキュリティが高まり、外部のパッケージやエンジンに依存する必要が少なくなることが強調されています。

Highlights

Introduction to making applications smarter with Oracle machine learning and AutoML.

Key benefits: improved analysis of customers and processes by infusing machine learning into applications.

Building machine learning models using AutoML UI and integrating them through different APIs.

Oracle machine learning models increase reliability and security by running inside the database.

Machine learning use cases span across industries involving customers, products, equipment, and employees.

Examples of use cases include customer segmentation, loyalty, churn, predictive maintenance, and employee attributes.

Different classes of algorithms used for customer lifetime value, loan analysis, probability of default, etc.

Overview of Oracle machine learning algorithms available inside every Oracle database.

Demonstration of Oracle MovieStream use case to predict customer churn using machine learning.

Explanation of AutoML process: algorithm selection, adaptive sampling, feature selection, and model tuning.

AutoML eliminates repetitive tasks, enabling even inexperienced data scientists to save time.

Detailed steps of using AutoML in Oracle machine learning UI, including data source selection and model configuration.

Comparison of different models and evaluation metrics within Oracle machine learning UI.

Deployment of machine learning models through REST endpoints and scoring with dynamic SQL.

Use of Python and R for building and deploying machine learning models, emphasizing flexibility for data scientists.

Introduction of Oracle machine learning services for data and model monitoring to avoid data drift and ensure model reliability.

Roadmap for Oracle machine learning components, including upcoming features and enhancements.

Summary of session takeaways: critical intelligence for applications, ease of building models with AutoML UI, and database security.

Transcripts

play00:05

Hi, everyone. Welcome.

play00:06

I am Marcos Arancibia,

play00:08

part of the product management team for autonomous database.

play00:10

And today, you're gonna see

play00:11

how to make your applications smarter

play00:13

with the use of Oracle machine learning and AutoML.

play00:17

So, the key takeaways for today is gonna be,

play00:20

you're gonna see how making machine learning

play00:23

and infusing that into your applications

play00:26

makes that much better

play00:28

for any of your analysis of customers and processes.

play00:32

Building also machine learning models

play00:33

using AutoML UI is gonna allow you

play00:36

to use it through different APIs

play00:38

and make that application work better.

play00:40

And also increasing the reliability

play00:42

and security of machine learning models.

play00:45

And that comes because of the models

play00:48

that are running inside the database.

play00:51

It basically lowered the cost

play00:52

and the usage of third party packages or engines

play00:56

that you need to move data to.

play00:57

Machine learning use cases are part of every industry.

play01:00

You see all the industries have customers

play01:04

and/or products and/or equipment and/or employees.

play01:08

And there are many, many different types

play01:09

of use cases you can see.

play01:11

So, if you have customers,

play01:12

you are worried about customer segmentation, loyalty,

play01:15

customer attrition or churn.

play01:18

And then, if you have products,

play01:20

you're looking for the next best offer,

play01:22

cross-selling, things like that.

play01:23

Looking at equipment, you're trying

play01:25

to identify predictive maintenance

play01:26

and things along those lines.

play01:27

And employees, you're also looking

play01:29

at all their best attributes

play01:30

trying to find similar good employees.

play01:34

But what these use cases have

play01:36

is all of them have a different classes of techniques.

play01:41

So, when you're doing customer lifetime value

play01:44

or loan analysis or probability of default,

play01:47

you are using one

play01:49

or even more of these classes of algorithms.

play01:53

So, you're using classification or using regression

play01:55

or anomaly detection, and things like that.

play01:58

And we're gonna look at today

play02:00

at a usage of a customer churn.

play02:03

So, that's part of the classification family of algorithm.

play02:06

But we're gonna see that technique there.

play02:09

And every technique

play02:11

has actually their algorithms themselves.

play02:15

So, this is a very large list of algorithms.

play02:18

This whitelist is actually available

play02:21

inside the Oracle database and every Oracle database,

play02:24

and it's part of the Oracle machine learning.

play02:27

So, we have algorithms for classification and regression,

play02:31

and all of these were created basically looking

play02:33

at what our customers were using most frequently.

play02:37

As part of the AI and ML ecosystem across Oracle,

play02:42

the machine learning and database

play02:44

is part of the machine learning

play02:45

for data platforms right there in the middle.

play02:48

And that's what we're gonna be focusing today.

play02:50

We're also part of the Oracle converged database.

play02:52

So, when you think about the database,

play02:54

we are actually in there.

play02:56

So, we're part of that process.

play02:58

If you have an Oracle database, we are in there.

play03:01

We're part of that process no matter where.

play03:06

So, you can see that on the left-hand side here,

play03:09

I have the components listed.

play03:11

This is basically the different APIs of a SQL,

play03:14

APIs available everywhere.

play03:16

And we have Python and R

play03:18

that are available in the autonomous database side

play03:20

and also on database on-premises, base database service,

play03:23

and exadata and things like that.

play03:25

There are three components that are exclusive to autonomous,

play03:28

which are the Oracle machine learning notebooks

play03:30

and OML AutoML UI.

play03:33

And also the Oracle machine learning services,

play03:35

our REST endpoint server there.

play03:38

But then, Oracle Data Miner

play03:40

is available again against all of these platforms as well.

play03:43

But we're very flexible here.

play03:46

The idea for us,

play03:48

and the reason why we created the Oracle machine learning

play03:51

in the first place was that there's a gravity on your data,

play03:57

on the volumes of data, large volumes of data.

play04:00

Why would you take terabytes of data

play04:04

out of the database to crunch numbers on a little platform

play04:08

where you have open source algorithm?

play04:10

When you can actually do that

play04:12

directly where the data resides.

play04:14

So, saving and eliminating all the data movement

play04:17

and simplifying the solution architecture

play04:19

is what we were going for.

play04:21

So, that's where we run

play04:22

those basic 30 different algorithms there.

play04:25

At the end of the day, that actually saves a lot of time.

play04:29

And basically, it's the data access time

play04:32

and the export, the results time,

play04:34

all of those things are eliminated,

play04:36

but also crunching the time

play04:38

for the machine learning modeling,

play04:40

the data preparation and exploration and things like that.

play04:43

Because those things inside the database

play04:45

and where the database is,

play04:47

when your data is, is running faster.

play04:50

So, it gives you better time to production on your project.

play04:55

So, we're gonna be using the autonomous database today

play04:57

for demonstrating to you guys this process.

play05:01

Basically, the autonomous database helps you

play05:03

with running the Oracle database, but for the cloud

play05:08

and without requiring you to do any patching,

play05:11

any backup, any worrying about any of those activities.

play05:16

We are actually managing the infrastructure.

play05:18

We're managing the automating,

play05:21

all the database, process and management.

play05:24

We're managing all the data center operations,

play05:26

so we have all of that automated for you,

play05:29

so you have more time to develop,

play05:32

more time to work with analytics.

play05:35

That's the idea for us is having you to do that.

play05:39

So, I'm going to be using now a use case,

play05:42

and this use case is gonna be the Oracle MovieStream.

play05:45

So, this is like your Netflix competitor here

play05:49

where I have a lot of customers.

play05:52

So, I have a lot of customers that I need to take care of

play05:55

and I'm worried about,

play05:56

and I need to actually acquire more customers,

play05:59

but keeping the customers that I have

play06:01

is probably the cheapest way of continuous revenue.

play06:07

So, I'll be evaluating these customers

play06:11

and I'll try to identify which customers

play06:15

are the most important ones

play06:16

in terms of customers that I can keep.

play06:19

So, to understand whether I can keep a customer,

play06:23

I need to understand why they leave.

play06:26

And the reasons

play06:27

why they're leaving is what we call the churn.

play06:30

So, I'm gonna try to identify the probability

play06:32

of a customer leaving,

play06:33

so probability to churn using machine learning.

play06:37

And then, in the way that I can potentially offer something

play06:40

like a pizza, promotion to that customer,

play06:44

to try to make him come back

play06:46

or make him stay with our server.

play06:48

But for that, I'm gonna assume that I have a data lake

play06:52

in our Oracle cloud infrastructure.

play06:54

I got lots of different sources.

play06:57

I can enterprise applications,

play06:58

MovieStream events, third party data.

play07:01

I also have an object storage

play07:03

where I have landing zone, gold zone, and sandbox.

play07:06

And the idea here

play07:07

is that I'm going to be using the autonomous database

play07:10

for working with that data, once that data

play07:12

is loaded in and clean up and prepared for me.

play07:16

I'm gonna be showing you the model portion

play07:19

and the deployment portion

play07:20

of the machine learning components there.

play07:23

There are many other tools

play07:25

that are available in autonomous database for self-service,

play07:28

like loading and transformations

play07:29

and graph and things like that.

play07:30

I'm gonna be focusing specifically

play07:32

on the machine learning side.

play07:35

So, the problem with the traditional machine learning

play07:38

is that usually what happens

play07:40

for data scientists like myself,

play07:42

is you start the process

play07:43

by selecting one of those algorithm

play07:47

that we talked about before.

play07:49

Now, I pick up one of those algorithms at a time.

play07:52

I then go to evaluate that algorithm against data,

play07:56

come up with one algorithm or one model.

play08:00

But then, I have to create and optimize that model.

play08:02

So, I start playing with all the different components

play08:06

and the different parameters

play08:07

that each model has until I get one model

play08:10

or I get a bunch of versions

play08:12

that are the best versions for that model.

play08:14

Those are the things that I'm doing.

play08:16

But then, I have to rinse and repeat that process.

play08:21

So, now, I have to restart the entire process again

play08:25

using the next algorithm.

play08:29

So, what AutoML or automated machine learning does

play08:33

is actually eliminates that repetitive task.

play08:36

All of those process of building

play08:37

and evaluating every different algorithm.

play08:40

It's gonna use something

play08:42

called the auto algorithm selection.

play08:44

So, it's gonna look into all of these algorithms,

play08:47

and then it's gonna go

play08:48

and say, okay, out of all of these algorithms,

play08:51

which ones are the ones

play08:53

that actually best work for your problem, for your data?

play08:57

Next step is gonna go and do an adaptive sampling.

play09:00

It's gonna identify what is the right size of a sample,

play09:04

make it proportional,

play09:06

and do the analysis

play09:07

that it needs to best perform for each model.

play09:12

And then, it's gonna do an auto feature selection.

play09:14

So, things like color of the eyes,

play09:16

the name of the customer or the shoe size shouldn't matter

play09:21

to evaluate the process

play09:23

of whether the customer's gonna stay or not.

play09:27

So, it will eliminate all of those things automatically.

play09:31

And then, you can go into auto model tuning.

play09:34

It's gonna do an increase the accuracy of each of the models

play09:38

by tweaking a little bit all of the parameters that it has.

play09:42

So, that process basically is gonna enable even someone

play09:47

that is an experience data scientist

play09:49

to actually save time as well.

play09:52

So, I'm gonna be showing you

play09:53

and simulating that I am that data scientist

play09:55

right there in the middle here.

play09:57

And I'm gonna be working

play09:59

on deploying this model in several different ways.

play10:01

You're gonna see the different capabilities

play10:03

and different UIs that we have available for that.

play10:06

And then, finally, I'll talk about how that developer,

play10:10

that person, that app developer person can actually use that

play10:14

to really at that point, create the application there.

play10:22

So, I'm gonna start then on my demo.

play10:27

Basically, here is the Oracle cloud infrastructure.

play10:31

So, I'm inside the autonomous database console here.

play10:36

So, I have my autonomous database console.

play10:38

And in the console, I have something called database actions

play10:42

where you can create users and things like that.

play10:44

But I'm gonna click on the View all database actions.

play10:48

So, this is the database actions then.

play10:50

And you can see that I have a lot of different tools here.

play10:53

As I mentioned, tools for data loading

play10:55

and transformations and things like that

play10:57

and managing things.

play10:59

But on top here, I have the development tool.

play11:02

And specifically on this one,

play11:03

I have the Oracle machine learning.

play11:06

So, when you click on that, you get here.

play11:09

This is the Oracle machine learning UI,

play11:12

and I'm the user, MovieStream.

play11:15

And what I'm gonna do here is I'm gonna click on AutoML.

play11:17

And I'm gonna look at the AutoML experiment.

play11:21

There is an experiment there already,

play11:23

but I'm gonna create a brand new one

play11:25

to show you guys step by step.

play11:27

So, we want to detect

play11:34

customers about to leave.

play11:36

I can put any

play11:40

comments.

play11:41

And the first thing I need to do

play11:43

is I need to get a data source.

play11:45

If you look at the table, and then on the right,

play11:47

I have here at the table, I have the schema.

play11:49

So, all the schemas

play11:50

that my Oracle database user has access to

play11:53

and all the tables in there.

play11:55

So, I have access to this table. I'm just gonna click on it.

play11:59

And now, what the system is doing

play12:01

is actually showing me down here,

play12:02

you can see all of the different features that I have there.

play12:07

So, I have things like age, average number of transactions.

play12:12

I have transactions three months ago,

play12:14

four months ago, five months ago.

play12:16

Sales, same thing.

play12:18

Discount, average number of transactions

play12:21

in the last quarter, CD, credit balance, customer ID.

play12:25

All of these different discounts. Education, email.

play12:29

And then, things like gender.

play12:30

And then, I start genre, movie genres.

play12:34

How many movies of genre action the customer watches,

play12:37

how many movies of the adventure,

play12:39

animation, biography, comedy, things like that.

play12:43

And then, at the end here,

play12:46

I have even more things like household size, income level.

play12:51

And the most important here in this case is churner.

play12:54

That specific feature is telling me

play12:56

whether a customer left last month or he stayed,

play13:01

which means he had zero watches on the service last month.

play13:06

Churner definition is something that all companies,

play13:08

all businesses do differently.

play13:10

You could have thought about something like,

play13:13

wow, the customer reduced their watch levels

play13:16

by 90% or something.

play13:18

Which it is an indication

play13:20

that they're probably watching movies somewhere else.

play13:22

So, this is just an ample,

play13:24

a definition that all companies do differently.

play13:28

In this case, I have a zero, one,

play13:31

a binary target here that says the customer left.

play13:34

Last month, he didn't watch any movies with me.

play13:37

And then, finally, a few other columns like marital status,

play13:41

the location, latitude and longitude,

play13:42

and years as customer, and things like that.

play13:45

Very well. So, I define then the prediction.

play13:49

What I wanna try to predict

play13:50

is whether the customer is going to stay or going to leave.

play13:53

So, it's the IS_CHURNER.

play13:55

And then, the case ID here, I'm gonna go to customer ID.

play14:00

Now, because I selected this churner

play14:01

and the churner is a binary zero, one,

play14:04

it's going to automatically detect

play14:06

the prediction type of classification.

play14:08

I can see that it could have done regression

play14:10

if I'm trying to predict something else

play14:12

like customer credit limit or other things like that.

play14:17

All right. So, back to the features here then.

play14:21

I can manually remove things

play14:22

like I don't even want the system to try to use.

play14:27

So, maybe first name and last name,

play14:28

I know they have nothing to do with

play14:31

whether a customer is gonna stay.

play14:32

And if they did, it probably would've been wrong.

play14:35

So, you don't want to use that.

play14:38

Then, in additional settings here,

play14:40

I will reduce the number of maximum top models to three.

play14:44

So, let me check the top three models here.

play14:47

I have the maximum duration, it's okay, the default.

play14:50

Database service level.

play14:52

So, autonomous database have low, medium,

play14:53

and high service levels.

play14:55

Medium allows me to do parallelism.

play14:58

So, I'm just gonna turn it to medium.

play15:01

And then, the model metrics.

play15:02

So, the data scientist

play15:03

is gonna choose maybe a different model metric

play15:06

like accuracy, like area under the ROC curve,

play15:09

F1, or Precision or Recall,

play15:11

depending on what they're optimizing for.

play15:15

And the list of algorithms.

play15:16

So, this is just something you can uncheck if you want.

play15:20

Something like companies in the financial services

play15:24

that actually work with credit scoring,

play15:26

they are not allowed in the USA

play15:28

to use a neural network algorithms

play15:32

to define their credit score.

play15:34

So, they need to use something else, GLM or decision trees.

play15:39

So, you could potentially uncheck neural network here

play15:42

and just not make use of that,

play15:43

but let it run through the other ones

play15:46

and see which ones are the best.

play15:49

All right, so having said that, I'm gonna click on here,

play15:52

I'm gonna click Faster Result.

play15:54

That's basically, it's going to run faster.

play15:56

If I selected better accuracy,

play15:58

it's just gonna do a little more tuning,

play16:01

a little more fine tuning.

play16:02

So, it's taking a little more time.

play16:04

So, I'm gonna just let it run.

play16:06

So, it's starting my experiment right now.

play16:09

So, after a basic initialization phase here,

play16:13

it's going to then start working in the second step here.

play16:18

That's gonna be the leaderboard

play16:20

where you're gonna see the selection of the best algorithm.

play16:25

So, the first step's gonna be now algorithm selection.

play16:29

That's gonna be a happening here.

play16:31

So, every step that it takes,

play16:32

so it's gonna do algorithm selection.

play16:34

After the algorithms are selected,

play16:37

remember that I requested three, I'm gonna go

play16:40

and it's gonna go automatically and do adaptive sampling.

play16:43

It's gonna go through feature selection,

play16:45

and then it's gonna do a final model tuning

play16:48

and a prediction impact analysis

play16:50

of every feature for all the models.

play16:54

So, I can see the first initial balanced accuracy

play16:58

of that table, of that database.

play17:02

And now, once it runs through the algorithm selection

play17:07

like it just did, here in the leaderboard,

play17:10

you can see it's selected the decision tree,

play17:12

a random forest,

play17:13

and a support vector machine that is a linear kernel.

play17:17

These three algorithms the first ones

play17:19

are the best ones that it identified initially.

play17:22

Now, it's gonna go through an optimization here for them.

play17:25

So, I'm gonna just leave this experiment running

play17:27

and I'm gonna open up another one that just ran.

play17:31

So, you can see it's running here on top.

play17:34

So, this other one that I just ran,

play17:36

actually completed in around three minutes.

play17:39

And you can see all of the steps that it took.

play17:41

So, it ended up that the random forest

play17:45

is the best one for balanced accuracy.

play17:48

If you click on that model in the link, this is what...

play17:54

This is showing basically

play17:55

all of the different features and their weight.

play17:59

They're important for that model in particular.

play18:02

So, basically saying, age and gender,

play18:05

gender, education,

play18:07

how many thriller movies the customer watches,

play18:10

average number of transactions in the last quarter,

play18:13

years of residence, credit balance,

play18:15

genre family and genre war.

play18:17

These are the features

play18:18

that actually make this model important.

play18:22

And the confusion matrix here tells me the errors.

play18:25

So, I predicted that the customers were not gonna churn

play18:27

and they did not.

play18:29

The model was correct 64% of the time here

play18:32

or 64% of the population falls into this guys.

play18:36

And then, 28% of the population falls here

play18:39

where I predicted they were going to churn, they did.

play18:42

Where the model missed was 6% of the people

play18:45

I thought they were gonna churn and they didn't.

play18:48

It's not terrible, because if I'm offering a pizza,

play18:50

it's fine, I'm gonna be sending pizza

play18:52

to people that did not necessarily were gonna churn.

play18:57

But again, it's okay.

play18:59

Now, this 1% of people here, I might be more important,

play19:05

more critical to look at later,

play19:07

because those guys, I could not predict.

play19:09

They actually churned,

play19:11

but I did not predict they were going to.

play19:13

So, again, it's not a large one compared to this.

play19:17

But it's still okay.

play19:19

So, these are things that I can do later.

play19:21

All right. Now that model is built, I can do a rename.

play19:24

So, I give the model an interesting name.

play19:26

I already did that for CHURN_PRED,

play19:29

so I can reuse that model.

play19:30

It's already there in the database with these names here.

play19:33

But I can reuse that model

play19:35

for other things like scoring immediately.

play19:38

I can then create a notebook or let me first select metrics,

play19:43

so you can select different metrics here and compare models.

play19:48

So, that just shows me different metrics

play19:50

that I was not optimizing for,

play19:52

but I can see how they look like after the models are done.

play19:58

And then, I can create a notebook

play20:00

that basically takes this model here

play20:04

and it builds

play20:07

an entire notebook

play20:08

that actually is written in Python

play20:10

using all of the steps that AutoML actually did.

play20:14

All of this optimizations,

play20:16

every single thing that the AutoML was doing,

play20:19

it's gonna be translated into Python.

play20:21

And then, you're gonna be able to see that notebook

play20:24

and we're gonna go there soon.

play20:26

And then, the other thing

play20:28

that I can do here is I can deploy the model.

play20:30

So, I can deploy this model to a new URI.

play20:34

So, if I give it a URI name, a version, and a namespace,

play20:37

this goes to what we call

play20:38

the Oracle machine learning services.

play20:40

That's our server

play20:41

for autonomous database users or customers.

play20:45

And with that, you can actually score immediately

play20:49

using REST against that model there.

play20:53

So, we're gonna see all of that.

play20:54

So, now, first things first,

play20:56

if I come here to the Notebooks then,

play20:59

I can see that notebook that was generated for me.

play21:03

So, this notebook that is generated

play21:05

is going to have some comments

play21:08

about the model originally, the experiment.

play21:12

And then, it's gonna talk about the data.

play21:13

So, it's gonna actually step by step here

play21:16

is going to import

play21:17

the Oracle machine learning package for Python.

play21:20

Inside a Python session,

play21:22

it's going to create a new, what we call proxy object,

play21:26

that points to this table with these columns.

play21:29

So, it's basically creating a view behind the scenes

play21:32

that is a query based on these columns

play21:35

that were actually the ones identified by AutoML

play21:39

to be the best ones,

play21:40

the ones that were needed by this random forest.

play21:44

And then, it creates what we call a proxy object.

play21:46

This is not pulling the data back to Python's memory.

play21:49

This is just a pointer to the data

play21:52

that is in the database, that new view that was created.

play21:55

I prepared the data for building the model.

play21:58

And then, finally, here, I have the exact settings

play22:03

that the AutoML model identified

play22:07

as being the most important settings for that random forest.

play22:11

So, that I can reproduce exactly

play22:13

that random forest model that I have.

play22:16

So, exactly how many the impurity metrics that I'm using,

play22:19

how many minimum percent split per node,

play22:23

how many the tree depth, max depth.

play22:28

So, all of these things were the settings

play22:30

that random forest requires

play22:32

that actually were identified by AutoML process.

play22:36

And then, I'm building the model,

play22:37

so I can build that model and you can take that model

play22:40

and work with it or rename it, give it a name,

play22:43

run it on a different server, and things like that.

play22:47

Now, how can I use that?

play22:49

So, first things first,

play22:51

I'm gonna show you here I have a scoring.

play22:54

So, I have a scoring process and I'm going to show you

play22:57

where I can actually use that model for scoring.

play23:03

So, first things first,

play23:05

another user doesn't have to be my own user.

play23:08

Another user imports the OML library,

play23:11

does the send requests to the proxy object

play23:15

to be the entire table.

play23:17

It's okay.

play23:18

But then, I bring in

play23:20

my random forest user that I call CHURN_PRED.

play23:23

And I can just run a prediction.

play23:26

With the output of that prediction,

play23:27

I have probability to churn now.

play23:30

So, I have all of the customer ID,

play23:31

the information about the customer,

play23:33

I have now probability that this customer is going to leave

play23:37

and a prediction zero, one.

play23:39

So, same thing with SQL.

play23:41

I can actually repeat that process with SQL.

play23:44

And you can see that right here.

play23:47

So, I have prediction and prediction probability.

play23:49

These are things that are exclusive

play23:51

to the Oracle, the SQL.

play23:53

So, I have prediction

play23:55

and prediction probability on CHURN_PRED.

play23:57

This is dynamic SQL, so it's SELECT *

play23:59

and I'm dynamically scoring

play24:02

every customer available here on that table.

play24:05

And I'm getting their probability to churn right here.

play24:08

Now, the deployment that I just did

play24:10

is actually to the rest and point,

play24:12

to the Oracle machine learning services.

play24:14

So, I have now that model.

play24:16

So, I'm gonna just get a token,

play24:18

a token passing my Oracle database username and password.

play24:23

And now, I can predict,

play24:24

remember that I called it CHURN_PRED.

play24:26

So, this CHURN_PRED REST endpoint now is alive.

play24:31

I am passing and now as the application developer.

play24:35

I'm just passing this input records.

play24:37

I'm passing some data about that customer.

play24:40

So, I click send, that data goes and comes back.

play24:45

I can see that the result came back in 47 milliseconds.

play24:49

So, that's very fast between my home here in Miami

play24:53

and the server in the northeast.

play24:57

And I see the probabilities here.

play24:59

So, this is basically the probability that this customer,

play25:03

in particular this one

play25:04

that we sent, is going to churn 84%.

play25:08

But more than that, more than that,

play25:10

because the top end details here is five,

play25:13

what I'm saying is I want

play25:14

to get the top five prediction details

play25:18

or the top five reasons

play25:19

why these model thinks that this guy is going to churn.

play25:25

And I can see the reasons here and their weight.

play25:27

So, age, the number of movies

play25:29

of war movies the guys watches,

play25:31

the number of thriller movies he watches,

play25:33

how many years of residence and the genre family.

play25:36

And this can also use micro batches.

play25:39

So, I can have, in this case,

play25:40

I have 100 customers as an example,

play25:43

a micro batch here that I'm running.

play25:46

Same thing, I can just run that scoring

play25:47

and I'll get, all of these results, still subsecond.

play25:53

And I get 100 customers

play25:55

with all their probabilities to churn here.

play25:58

So, you can see that my application can,

play26:01

we can handle all of that 101,

play26:03

'cause I started counting in zero.

play26:05

You can count and you can run this process very fast.

play26:11

Back to my slides then.

play26:13

What I do then after that process.

play26:17

So, we saw that we could use Python and R.

play26:20

We saw Python being used as the engine,

play26:23

but we can use R as well.

play26:25

And we basically empower all data scientists

play26:27

using their own native language that they wanna program in,

play26:30

but they can actually use all the in-database capabilities.

play26:34

In addition to that, for the autonomous database,

play26:36

we have the REST endpoint as well that can be used.

play26:41

And then, remember that I mentioned

play26:44

that I was gonna show you guys AutoML,

play26:46

the UI side was part of the autonomous database.

play26:49

Well, you have AutoML as well available in Python

play26:53

and this can run anywhere.

play26:55

It doesn't need to be the autonomous database.

play26:57

It can run on-premises, it can run on an exadata.

play27:01

You have this Python AutoML UI, AutoML API that you can run.

play27:07

And then, just give the model name.

play27:11

You can score that with dynamic SQL

play27:12

like you can see here on the right.

play27:15

You can also use PL/SQL to build a model.

play27:19

And that's something we have supported

play27:21

all the way back to Oracle 11g Release.

play27:25

But basically, this is the in-database model building

play27:27

using PL/SQL on the left.

play27:30

And on the right, I can actually reuse that model in Python

play27:33

and I can use a decision tree model in Python

play27:36

and run inference with that.

play27:38

I also can build a decision tree with Python

play27:41

and score in SQL or I can even use R

play27:44

and build an in-database decision tree model

play27:46

like I'm doing here on the left.

play27:48

And then, again, use SQL.

play27:50

So, there's a lot of flexibility for data scientists.

play27:53

Finally, one of the most flexible portions

play27:56

we have is the embedded execution.

play27:59

You can actually write your own code, your own R,

play28:02

or your own Python code,

play28:05

and run your own third party package,

play28:08

whatever you are looking for.

play28:10

And we can actually package it up.

play28:12

We actually fund the R or Python session needed.

play28:18

The database is controlling that.

play28:19

The database injects the data needed on that session.

play28:23

Runs that code that you wrote,

play28:25

and then brings back the results.

play28:27

We can bring back the result into the database

play28:29

or to an object storage or to a file service

play28:32

or anything that you need there.

play28:36

So, it is again, very, very flexible and very powerful.

play28:41

All right, so recently,

play28:43

we released an Oracle machine learning services data

play28:45

and model monitoring.

play28:47

So, basically, what we're trying to avoid here is,

play28:49

data over time changes.

play28:51

Your customer behavior changes, the economy changes,

play28:55

and what people are doing changes.

play28:56

There's something trending up and down.

play28:58

So, all of these things affect data, the data coming in.

play29:02

So, we wanna avoid that data drift.

play29:04

But also the model monitoring

play29:06

is also we're looking at the model.

play29:07

We also added in addition

play29:11

to these things in OML services then,

play29:13

we added UIs for those.

play29:16

So, we have this UI available there for you

play29:18

where you have data monitoring,

play29:20

you're looking at any drift over time for that process.

play29:25

And then, things that, like looking at the features,

play29:28

whether they're changing over time.

play29:30

And then, we're evaluating them with specific metrics,

play29:34

with some statistics, population stability index,

play29:37

things like that that you can choose

play29:39

and evaluate the specific processes that are happening.

play29:44

And also on the model monitoring side,

play29:47

you can monitor several algorithms at once.

play29:51

So, we're looking at all these different algorithms

play29:53

and some of them are drifting more than the others,

play29:56

depending on what model it is.

play30:00

So, you can actually evaluate them, look at feature impact,

play30:03

look at the predictive impact over time.

play30:08

So, last but not least here,

play30:12

the roadmap for Oracle machine learning components

play30:15

So, basically, these are the things

play30:16

that are being launched and announced

play30:19

and we're gonna be working on for the rest of the year,

play30:24

working on different things

play30:26

and including GPU availability

play30:27

for things like LLM inferencing

play30:29

and embeddings and things like that.

play30:31

And then, I hope you guys

play30:33

had a good time watching this session.

play30:38

These are the key takeaways.

play30:39

Again, machine learning can infuse your applications

play30:42

with critical intelligence

play30:43

about your customers and processes.

play30:45

Building this machine learning is using AutoML UI

play30:48

is very easy and you can deploy them to different APIs.

play30:52

And then, also, we talked about the reliability

play30:54

and the security of building these models in database.

play30:58

They are objects inside the database.

play31:01

These are some of the links

play31:02

that you guys can go for more information.

play31:04

Live labs, you can go into a workspace in Slack

play31:08

to find us there, LinkedIn, and things like that.

play31:11

So, with that, thank you very much for watching the session

play31:16

and appreciate it and have a good day.

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
機械学習AutoML顧客分析プロセス改善オラクルデータベースAIデータサイエンスモデル構築クラウド
Benötigen Sie eine Zusammenfassung auf Englisch?