How to accelerate machine learning development with BigQuery ML

Google Cloud
11 Oct 202218:20

Summary

TLDRAbhinav KhushrajとCandice Chenは、Google CloudのBigQuery MLについて語りました。2018年にリリースされてから、さまざまな企業やスタートアップ、個人データ専門家から好評を得ています。BigQuery MLは、データサイエンスを誰にでも簡単にできるようにすることで、強力なML機能を解き放すというミッションを持ち続けています。BigQuery MLは、データ管理の簡易化、インフラstractureの扱いを避け、デプロイの容易さを提供します。また、MLOpsの統合、非構造化データの活用、TensorFlow LiteやXGBoostなどのモデルのインポート、リモートモデルでの推論、予測の向上、特徴工学的強化など、新機能が紹介されました。特にTwiddyの事例では、非構造化データと構造化データの組み合わせが、検索結果の向上に寄与し、コンテンツチームの意思決定をデータ駆動型に変えました。

Takeaways

  • 🚀 BigQuery MLは、2018年にリリースされて以来、多くの顧客から好評を得ており、大小様々な企業やスタートアップ、個人のデータ専門家が革新的な方法で活用している。
  • 📈 BigQuery MLのミッションは、誰もがデータサイエンティストになることができるように、強力な機械学習(ML)機能を簡単に解き放すことです。
  • 🔧 従来のMLワークフローでは、データの移動やインフラストラクチャの整備、スキルセットの確保など多くの課題がありましたが、BigQuery MLはこれらを解決する。
  • 💡 BigQuery MLでは、単純なSQLクエリでモデルをトレーニングし、BigQuery内の既存データセットを指し示してモデルを構築できます。
  • 📊 BigQuery MLは、ビッグデータのスケールとコンピュート能力を活用して、1,000行から数百만行までのデータを扱うことができます。
  • 🔗 MLOpsは、企業規模での機械学習を実行する上で不可欠であり、BigQuery MLはVertex AI Model Registryと緊密に統合されています。
  • 🧠 BigQuery MLは、ビジョンモデルを導入し、非構造化データ(画像、テキスト、ドキュメントなど)の価値を解き放すことができます。
  • 🔌 BigQuery MLは、TensorFlow、TensorFlow Lite、XGBoostなどの他のプラットフォームでトレーニングされたモデルをインポートできるようになりました。
  • 📉 ARIMA Plusと呼ばれる単変量時系列予測モデルに加えて、BigQuery MLは今、外部回帰子を用いた多変量時系列予測をサポートします。
  • ⚙️ 特徴エンジニアリングは、優れた機械学習の核心であり、BigQuery MLは新しい数値およびカテゴリカル関数を導入することで、これに貢献します。
  • 🌟 Twiddyの事例では、BigQuery MLを使用して、非構造化データと構造化データを組み合わせて、クリックスルー率を予測し、マーケティング分析を劇的に変革しました。
  • 📝 Twiddyは、BigQuery MLを活用して、データドリブンなアプローチでコンテンツチームが写真の選択と編集を行うことができます。

Q & A

  • BigQuery MLはいつから利用可能になったのですか?

    -BigQuery MLは2018年にリリースされました。

  • BigQuery MLの導入により、どのような新しい機能が提供される予定ですか?

    -BigQuery MLは、MLOpsの統合、非構造化データの分析、TensorFlow LiteやXGBoostなどのモデルのインポート、リモートモデルでの推論、予測機能の強化、およびフィーチャーエンジニアリングの向上など、多くの新機能を提供する予定です。

  • MLOpsとは何ですか?なぜ重要なのでしょうか?

    -MLOpsは、機械学習のエンタープライズ規模での運用を支援するプロセスです。モデルの管理や比較、監視などを行うことで、機械学習の質と効率を保証します。

  • BigQuery MLが提供する非構造化データの分析とは何ですか?

    -非構造化データの分析では、画像やテキスト、ドキュメントなどのデータを扱えます。BigQuery MLは、これらのデータをモデル予測に活用できるように、画像からオブジェクトを特定するなど、ビジョンモデルを導入しています。

  • TensorFlow LiteやXGBoostなどの外部でトレーニングされたモデルをBigQuery MLで使用するにはどうすればよいですか?

    -外部でトレーニングされたモデルをBigQuery MLで使用するためには、CREATE OR REPLACE MODELを使用してモデルをインポートし、ML.PREDICTで推論を行うことができます。

  • リモートモデルでの推論とは何ですか?

    -リモートモデルでの推論とは、すでに他のプラットフォーム(例:Vertex AIやCloud Functions)にデプロイされているモデルに対して、BigQuery MLから推論を実行できる機能です。これにより、BigQuery MLでサポートされていないモデルタイプも使用できます。

  • BigQuery MLの多変量時系列予測とは何ですか?

    -多変量時系列予測は、過去の売上だけでなく、売上に影響を与える他の要因も考慮して未来の売上を予測する機能です。ARIMA Plus XREGを使用することで、ターゲットメトリックだけでなく、追加の特徴量も入力できます。

  • BigQuery MLで提供されるフィーチャーエンジニアリングの新しい機能は何ですか?

    -BigQuery MLでは、新しい数値型とカテゴリー型の関数の導入、そしてモデルのエクスポート時に変換句が一緒にエクスポートされることで、オンライン展開でのモデルの使いやすさが向上します。

  • Twiddyの事例ではどのようにBigQuery MLを使用してクリックスルー率を予測しましたか?

    -Twiddyは、Google Cloud Storageに保存された家の画像を指すオブジェクトテーブルを作成し、ビジョンモデルをインポートして画像を数値データに変換しました。その後、ビジネスデータと組み合わせて、機械学習モデルを構築し、クリックスルー率を予測しました。

  • Twiddyはどのようにして画像の選択と編集を改善しましたか?

    -Twiddyは、機械学習モデルの結果をもとに、ユーザーがクリックする可能性が高い画像の特徴を理解し、コンテンツチームが写真の選択と編集にデータ駆動型のアプローチを取るようになりました。

  • BigQuery MLの新機能に興味を持っていたらどうすればよいですか?

    -興味を持っていたら、ページ下部に表示されるリンクから興味を表明するフォームにアクセスし、登録することができます。登録後、Google Cloudチームから連絡がくる予定です。

Outlines

00:00

🚀 BigQuery MLの紹介と新機能

Abhinav KhushrajとCandice ChenがBigQuery MLについて語ります。2018年にリリースされ、顧客の多様なデータニーズに対応してきました。BigQuery MLはSQLを使用して機械学習モデルを簡単に構築・予測できるサービスです。また、新しい機能として、MLOpsの統合、解釈可能なAI、パイプラインの活用、非構造化データの活用、TensorFlow LiteやXGBoostなどの外部モデルのインポート、リモートモデルの推論、予測モデルの多変量時間系列予測、そして特徴工学的新しい関数の紹介が行われます。

05:01

📈 BigQuery MLの新機能詳解

BigQuery MLの新機能が詳しく説明されています。MLOpsの重要性とVertex AI Model Registryとの連携、非構造化データの活用方法、外部モデルのインポート、リモートモデル推論、多変量時間系列予測、そして特徴工学の強化が紹介されています。これらの機能は、データサイエンスを簡素化し、ビジネスの問題解決に貢献すると述べられています。

10:02

🏠 Twiddyの事例:非構造化データと構造化データの活用

Candice ChenがTwiddy社の事例を紹介します。Twiddyはバケーションレンタル会社のウェブサイトで、検索結果の向上を妨げている3つの課題を抱えていました。BigQuery MLを使用することで、彼らは画像と構造化データを組み合わせて、レンタルプロパティのクリックスルー率を予測するモデルを構築しました。このプロセスは、GCSに保存された画像を指すオブジェクトテーブルの作成、Visionモデルのインポート、そして画像埋め込みとビジネスデータの組み合わせを通じて行われます。これにより、Twiddyは機械学習モデルを使用して画像を分析し、コンテンツチームがデータ駆動のアプローチを取るようになりました。

15:04

🎯 Twiddyの成果とBigQuery MLの今後

TwiddyはBigQuery MLを使用して、クリックスルー率の向上に成功しました。彼らはA/Bテストを行い、コンテンツチームがユーザーとのエンゲージメントを向上させるために写真の選択と編集を行うことができます。また、AdswerveからTwiddyのフィードバックが引用され、BigQuery MLの進化と活用が期待されています。興味がある人はリンクから登録して、新しい機能を試すことができます。

Mindmap

Keywords

💡BigQuery ML

BigQuery MLは、Google Cloudプラットフォーム上で動作する機械学習サービスです。ビッグデータの分析と予測モデルの構築がSQLクエリを通じて行えます。ビデオでは、BigQuery MLが導入されてから多くの顧客から支持され、革新的な方法で活用されており、さらに多くの機能が追加される予定であると説明されています。

💡データ統合 (Data Integration)

データ統合とは、異なるソースやフォーマットを持つデータを一か所にまとめ、整理することで、データの分析や機械学習モデルの構築を容易にすることを指します。ビデオでは、BigQuery MLがデータを移動することなく、構造化データと非構造化データの統合を可能にし、データ分析を強化することができると強調されています。

💡MLOps

MLOpsは、機械学習のエンタープライズ規模での運用を支援するプロセスとツールのセットです。ビデオでは、BigQuery MLがVertex AI Model Registryと緊密に統合され、モデルの管理、比較、監視が可能になるという新たな機能が紹介されています。

💡非構造化データ (Unstructured Data)

非構造化データとは、イメージ、テキスト、ドキュメントなどの形式で存在するデータを指します。ビデオでは、BigQueryが非構造化データを扱う能力を獲得し、BigQuery MLがそのデータを用いて機械学習モデルを構築することができると説明されています。

💡モデルインポート (Model Import)

モデルインポートとは、既存の機械学習モデルを新しいプラットフォームに取り込むプロセスです。ビデオでは、BigQuery MLがTensorFlow LiteやXGBoostなどの他のモデルをインポートし、BigQuery内部で推論を行うことができるようになるという機能が紹介されています。

💡予測 (Prediction)

予測とは、機械学習モデルを使用して未知のデータを用いて将来の結果を推測することを指します。ビデオでは、BigQuery MLが顧客の需要予測や収益予測など、ビジネス問題に対する予測を可能にし、さらに多変量時系列予測のサポートも追加されると説明されています。

💡特徴エンジニアリング (Feature Engineering)

特徴エンジニアリングとは、機械学習モデルのパフォーマンスを向上させるために、データをモデルが理解しやすい形に変換するプロセスです。ビデオでは、BigQuery MLが新しい数値的およびカテゴリカルな関数を導入し、特徴エンジニアリングを強化することで、モデルのトレーニングと予測の質を向上させると説明されています。

💡リモートモデル (Remote Model)

リモートモデルとは、すでに他のエンドポイントにデプロイされている機械学習モデルを指します。ビデオでは、BigQuery MLがリモートモデルに対して推論を行うことができるようになり、BigQuery外部にデプロイされたモデルを活用できると説明されています。

💡ARIMA Plus

ARIMA Plusは、時間序列データを用いて将来のトレンドを予測する統計的手法です。ビデオでは、BigQuery MLがARIMA Plusをサポートし、さらに外部回帰子を用いた多変量時系列予測を可能にすることで、予測の精度と柔軟性を向上させると紹介されています。

💡可視化 (Visualization)

可視化とは、データや情報を視覚的に表現することで、理解しやすく情報を伝える手法です。ビデオでは、TwiddyがBigQuery MLを用いて、プロパティのクリックスルー率を予測し、画像の特徴を可視化することで、コンテンツチームがよりデータ駆動型の意思決定を行うことができると説明されています。

💡SQL

SQLは、データベースと対話するための言語であり、クエリを用いてデータの検索、更新、管理を行うことができます。ビデオでは、BigQuery MLがSQLを用いて機械学習タスクを実行し、データサイエンスのスキルを必要とせずに、ビジネスアナリストがデータから洞察を得ることができると強調されています。

Highlights

BigQuery ML was launched in 2018 and has seen great adoption across various customer sizes and data needs.

BigQuery ML aims to simplify the process of unlocking ML capabilities, enabling anyone to become a data scientist.

New powerful capabilities are being introduced to BigQuery ML to enhance its functionality.

BigQuery ML allows for training models directly within BigQuery using a simple SQL query.

The platform provides scalability and compute capabilities for models ranging from 1,000 to millions of rows.

BigQuery ML supports batch predictions using SELECT star from ML to predict on new incoming data.

Users can perform ML workflows without moving data from BigQuery, addressing a significant challenge in AI.

BigQuery ML leverages Google's massive compute and storage infrastructure, reducing the need for users to manage infrastructure.

MLOps is crucial for enterprise-scale machine learning, and BigQuery ML is now integrated with Vertex AI Model Registry for model management.

BigQuery ML introduces over 20 ML operators for use within Vertex AI managed pipelines.

BigQuery ML now supports handling unstructured data such as images, texts, and documents.

Users can import custom TensorFlow-based vision models for object detection and other vision capabilities.

BigQuery ML will support importing models beyond TensorFlow, including TensorFlow Lite and XGBoost.

The platform allows inference on remote models deployed on Vertex AI or Cloud Functions.

BigQuery ML introduces multivariate time series forecasting with ARIMA Plus XREG for more robust predictions.

Feature engineering is enhanced with new numerical and categorical functions, and transform clauses are exported with models.

Twiddy, a vacation rental company, used BigQuery ML to combine structured and image data to predict click-through rates for their listings.

Twiddy's content team can now make data-driven decisions for photo editing and selection based on ML model insights.

BigQuery ML's SQL alignment allows business analysts to utilize their existing skills for advanced image analysis and ML tasks.

Transcripts

play00:00

[MUSIC PLAYING]

play00:06

ABHINAV KHUSHRAJ: Hi.

play00:07

I'm Abhinav Khushraj and I'm a product lead for BigQuery ML.

play00:11

CANDICE CHEN: Hello, everyone.

play00:12

My name is Candice Chen.

play00:14

I am the product manager for BigQuery at Google Cloud.

play00:18

ABHINAV KHUSHRAJ: Today, we're going

play00:19

to talk about BigQuery ML.

play00:21

Since we launched BigQuery ML in 2018, we have come a long way.

play00:25

Lots of great adoption from customers of all sizes,

play00:28

with large enterprise customers with complex data

play00:30

needs or small startups or young companies or even

play00:34

individual data professionals.

play00:37

And we've been able to see the new and innovative ways they've

play00:40

been using this.

play00:41

And it has just blown our mind.

play00:44

Today's session, we're going to talk about all the new stuff

play00:47

that we're going to bring to BigQuery ML--

play00:48

all the new, powerful capabilities

play00:50

that we're introducing to make it even more powerful.

play00:53

It has always been our mission to make BigQuery ML the easiest

play00:57

way to unlock powerful ML capabilities

play00:59

so that anyone can be a data scientist.

play01:02

And so first, I will start with a brief overview

play01:04

of what is BigQuery ML.

play01:06

And then show you all the cool goodies

play01:08

that we're launching this year.

play01:09

And then in the end, we'll see a wonderful customer demo

play01:12

that Candice is going to show us where

play01:14

unstructured data and structured data

play01:15

comes together to solve real customer pain points.

play01:19

So let's go ahead.

play01:21

A typical ML workflow is where an organization

play01:25

is spending a lot of time to bring

play01:27

their data from transaction systems or traditional legacy

play01:30

systems by doing ETL and investing

play01:32

a lot of time and energy and doing that work so that they

play01:36

have a single place to do analytics--

play01:38

in this case, BigQuery.

play01:39

And once you have it all there, it's great.

play01:41

You can run all kinds of analytics if it's VI.

play01:44

But the moment you want to do AI in ML--

play01:47

oh boy, there are lots of challenges.

play01:49

The first thing you have to do is take the data out.

play01:51

After putting in all that effort,

play01:52

now you have to take this data out,

play01:54

which means you're losing data governance.

play01:57

You'll have to build new data pipelines to build consistency

play02:00

from the old or the new source.

play02:02

It just becomes much harder.

play02:04

Another problem is you have to train these models.

play02:06

And you have to figure out what infrastructure

play02:07

you need-- whether you need a large set of Kubernetes

play02:10

environment and whatnot.

play02:11

Or you have to hire skilled professionals--

play02:14

people who know data science and Python and R and all of that.

play02:18

So that's another problem-- infrastructure and skill set.

play02:20

And finally, another infrastructure problem

play02:22

is where do you deploy this model?

play02:24

You have to deploy this model and have

play02:26

the right scale and infrastructure

play02:27

so it can deliver the final value at deployment time.

play02:31

And these are the problems that BigQuery ML solves.

play02:33

It makes it easy to manage data.

play02:36

It makes it easy to not have to deal with infrastructure.

play02:38

And it makes it easy to not have to deploy anywhere.

play02:41

It can be done right inside BigQuery for batch prediction.

play02:45

And the way it does it is this.

play02:47

First step, we just come and train

play02:49

a model by writing the simple SQL query CREATE OR REPLACE

play02:52

MODEL.

play02:53

And you point to the data set that's already inside BigQuery.

play02:56

You've already done all the work to bring the data.

play02:58

Why take it out again?

play03:00

Train the model right inside BigQuery.

play03:01

BigQuery has tremendous scale capability

play03:04

and compute capability for storage

play03:06

and compute that you put to work when you train this model.

play03:08

So whether it's a data with 1,000 rows or millions of rows,

play03:12

you don't have to worry about it.

play03:14

BigQuery takes care of it.

play03:15

And once the model training is done,

play03:17

you can start doing prediction.

play03:18

You can call SELECT star from ML to predict and do prediction

play03:22

on new data that's coming in to predict maybe forecasting

play03:26

or maybe predict new segments for your customer, and so on.

play03:29

And so, in summary, what BigQuery ML really

play03:32

brings as value to the table is you do not have to go and look

play03:36

for real advanced ML skills, like Python and R.

play03:39

If SQL is a much wider unknown language

play03:42

and users can use that, do ML.

play03:44

Second, you can execute all your ML workflows

play03:47

without moving data from BigQuery.

play03:49

Moving data is the biggest problem.

play03:51

Every AI problem becomes a data problem.

play03:53

And so by having to do it inside BigQuery, inside the database,

play03:56

it really takes away a big barrier to ML development

play04:01

and ML adoption.

play04:02

And finally, infrastructure.

play04:03

BigQuery takes advantage of the massive scale of Google's

play04:07

compute infrastructure and storage infrastructure

play04:09

to be able to not have to think about what infrastructure you

play04:13

need to do ML.

play04:14

And that's powerful.

play04:15

Now, this is stuff that we've been doing since 2018.

play04:17

And so now we're going to launch a whole bunch

play04:19

of new capabilities that will just blow your mind.

play04:22

So let's start with MLOps.

play04:24

MLOps is critical for doing machine

play04:26

learning at enterprise scale.

play04:28

If you do not have MLOps, you're training lots of models.

play04:31

You need to manage models.

play04:32

You need to compare models.

play04:33

How do you do all of that?

play04:35

And BigQuery ML is now tightly integrated with Vertex AI Model

play04:39

Registry, so any model that you create

play04:41

in BigQuery ML can be registered and monitored

play04:44

using Vertex AI Model Registry.

play04:48

Once you have the models available in Vertex AI,

play04:50

you can either deploy one quick deployment right inside Vertex

play04:53

AI to its online deployment endpoints

play04:55

or you can do comparison and evaluation.

play04:58

Or you can even do down the road monitoring

play05:00

and a bunch of other capabilities

play05:02

that Vertex AI provides out of the box.

play05:04

And not only that, you will also have the ability

play05:06

to do Explainable AI for models that

play05:09

were built with XAI capability.

play05:12

Another important piece related to operationalizing

play05:16

is the ability to use pipelines BigQuery ML has introduced

play05:22

more than 20 ML operators that will enable

play05:25

you to use those operators inside Vertex AI managed

play05:28

pipelines.

play05:29

Vertex AI managed pipeline is a KFP-based, queue-flow-based

play05:32

pipelines that can be used by organizations and professionals

play05:37

to make ML easier to manage.

play05:39

And with BigQuery ML, you can now

play05:41

do that right inside Vertex.

play05:43

And another cool thing about it is

play05:45

if you are a Python aficionado and not a SQL person,

play05:48

you can now use BigQuery ML operators

play05:50

to create your BigQuery ML train and predict using BigQuery ML.

play05:56

So this is, again, going to open up more ways users can use

play05:59

and a tighter story between Vertex AI and BigQuery ML.

play06:05

Insights from unstructured data.

play06:06

This is the big news for us today.

play06:09

So far, BigQuery has always been a leader

play06:12

in managing structured and semi-structured data.

play06:15

Earlier today, Thomas Green announced

play06:17

that BigQuery can now handle unstructured data,

play06:20

such as images, texts, docs--

play06:22

you name it.

play06:23

And what's the point of having all of that data if you cannot

play06:26

do something with it?

play06:27

So BigQuery ML will now help you unlock

play06:30

that value for the unstructured data

play06:31

that BigQuery will help you manage.

play06:34

You can do model predictions with vision

play06:37

to figure out objects in an image

play06:39

by bringing in your own custom, TensorFlow-based vision models

play06:43

to do object detection and a slew of other vision

play06:46

kind of capability.

play06:47

Down the road, we intend to unlock other kinds

play06:50

of capability beyond vision.

play06:51

And we look forward to making that available.

play06:53

But today, Candice is also going to show

play06:55

you a demo of this particular feature.

play07:00

Now, BigQuery ML already has support

play07:04

for many, many types of models.

play07:06

But a popular request from our customers

play07:09

is, hey, how can I do inference on models that I've already

play07:11

trained?

play07:12

I have a legacy system.

play07:13

I want to train using that.

play07:14

Or I built a model in another platform

play07:16

and I'd like to bring that in here.

play07:18

How can I do that?

play07:19

And so BigQuery ML will now be able to import models

play07:23

beyond TensorFlow.

play07:24

We already support TensorFlow import.

play07:25

And now we'll take it to TensorFlow Lite and XGBoost.

play07:28

By typing a simple CREATE OR REPLACE

play07:30

MODEL and point to the model.

play07:32

And you can bring the model in.

play07:33

Then you can do the ML predict, just like you always do.

play07:36

And all the compute is being done inside the query.

play07:40

Alternatively, if you've got a model that

play07:42

is too big or too large or it is already

play07:44

deployed at an end point and you do not have the ability

play07:47

to bring that into BigQuery, BigQuery ML

play07:50

will now have the ability to do inference on remote models.

play07:53

So if you have a model that is--

play07:55

you can deploy a model on Vertex AI or Cloud Functions.

play07:58

And then come to BigQuery and just point to it

play08:00

by saying CREATE OR REPLACE MODEL and point to that model

play08:03

endpoint.

play08:04

And then when you do ML predict, the prediction

play08:07

is going to take the data from BigQuery,

play08:08

take it all the way to the end point, do the prediction,

play08:11

bring the results back.

play08:12

You can also do evaluation of your model

play08:14

with that same remote model capability.

play08:17

So all in all, inference on importing inference

play08:20

with imported models, the remote model

play08:22

will open up a whole range of other model types

play08:25

beyond the model types that BigQuery ML supports.

play08:28

Forecasting is a very important business problem.

play08:32

Customers want to predict--

play08:33

what is my future demand going to look

play08:35

like for a certain product?

play08:36

Or what are future revenues going

play08:37

to look like for a certain SKU?

play08:39

And BigQuery has natively supported

play08:42

ARIMA Plus, which is one of the most popular BigQuery ML

play08:45

models.

play08:46

And today, we are going to take that to the next level.

play08:49

Until now, we could only do forecasting with a single input

play08:52

variable.

play08:53

So if you have a revenue that you want to predict,

play08:55

you can only input the past revenue along

play08:58

with the timestamp.

play08:59

And then it will predict the future revenue.

play09:01

Now, a common request from customers

play09:04

is that-- hey, look, my revenue is impacted not just

play09:06

by past revenue but I have other factors that

play09:08

impact that revenue.

play09:09

How can I bring that in?

play09:10

And that is basically multivariate time series

play09:12

forecasting.

play09:14

You will have the ability to use ARIMA Plus

play09:17

XREG, which is a ARIMA Plus with external regressors.

play09:20

So you can now provide additional features

play09:23

beyond just the target metric.

play09:25

Like in this case, Feature 1, 2, and 3

play09:26

can be inputted along with the target metric

play09:28

to give you a forecasted target metric.

play09:30

So a lot more robustness to your forecasting because

play09:34

of the external regressors.

play09:36

This is a very customer-centric feature.

play09:38

And I am so excited that the customers can now unlock more

play09:41

capability with forecasting.

play09:44

Feature engineering is at the heart

play09:46

of great machine learning.

play09:47

A lot of the time goes into feature engineering--

play09:49

taking the data and making it represented in a manner

play09:52

that the model training results in great models.

play09:55

And we already had fantastic support

play09:58

for various ML functions.

play09:59

And we are taking it to the next level today

play10:02

by introducing a host of new numerical and categorical

play10:05

functions here.

play10:06

Further, when you export BigQuery ML models,

play10:09

the transform clauses will also be exported along with them.

play10:13

So this is going to really simplify

play10:15

how you do online deployment with models that you've

play10:17

trained inside BigQuery.

play10:19

So as you can see, a lot of powerful capabilities

play10:22

are coming your way.

play10:23

The first two features are already here

play10:25

and you're welcome to use right away.

play10:28

The rest of the features are either

play10:30

coming in queue for this year or early next year.

play10:33

And we can't wait to get it in your hands

play10:35

and hear from you what you think about those.

play10:37

If you're interested in any of these,

play10:39

use the link presented at the bottom of this page

play10:42

to express your interest and we'll be in touch with you.

play10:45

Over to you, Candice, now, to show us

play10:46

a super cool demo of how unstructured data

play10:49

and structured data can come together

play10:51

to really dramatically change how marketing analytics

play10:56

is done.

play10:57

So over to you, Candice.

play10:59

CANDICE CHEN: Thank you, Abhinav.

play11:00

Today, I'd like to show you how Adswerve and Twiddy are

play11:04

able to run SQL on their images and create machine learning

play11:08

models by using BigQuery ML.

play11:11

So Adswerve is a tech service provider that bring AI to data.

play11:17

And Twiddy is one of the clients that they work with.

play11:19

It's a vacation rental company in the Outer Banks

play11:22

of North Carolina.

play11:24

So if you go to Twiddy's website today,

play11:26

you can find your dream vacation home.

play11:29

You can do a search like how many

play11:30

bedrooms that you want to have, how close it is to the beach,

play11:34

whether it has an ocean view or not.

play11:36

And then they will generate a list of recommended vacation

play11:40

homes based on your criteria that

play11:41

come with the beautiful images that you saw in this photo.

play11:46

So as of today, Twiddy actually faced three major challenges

play11:50

that prevented them from further improving the search results.

play11:54

The first one-- they are relying mainly

play11:56

on structured data, which is the search data that they get

play11:59

from their users on the website to help them

play12:02

generate which properties they recommend to their users.

play12:06

And secondly, their content team is merely

play12:09

using a manual process to select the photos and edit the photos.

play12:13

And then lastly, even if they want

play12:15

to utilize those images, to be honest,

play12:18

it's pretty challenging to analyze those images using

play12:21

machine learning model, given the complexity

play12:23

of the work and the requirements on the data science skills.

play12:27

So how Twiddy was able to tackle this challenge

play12:30

and go from using just structured data to be

play12:32

able to use both structured data and image

play12:35

data to predict the click-through rate

play12:37

of their rental properties.

play12:39

So now let's find out.

play12:40

Here, the steps that they took, starting with they

play12:43

had those images they already stored on GCS.

play12:46

So first step they took is to create an object table, which

play12:51

essentially points you to all of those house images

play12:53

that you stored on GCS.

play12:55

So this is actually a brand-new capability

play12:58

that we are announcing today, which we're super

play13:00

excited about, which means now you

play13:02

can access those unstructured data right inside BigQuery.

play13:06

And second step, they imported a vision model

play13:09

to help to generate image embeddings.

play13:11

What it does is to simply translate those images

play13:15

into a series of numbers.

play13:17

Then you can use them as inputs for your machine learning

play13:19

models.

play13:21

So now you have your image embeddings.

play13:22

Combine it with all of your business data from a website.

play13:26

You can start to build a machine learning model

play13:28

and then make some predictions.

play13:30

So all of this you can do inside using SQL in BigQuery.

play13:35

So let me walk you through step by step.

play13:37

So the first one, they create an external model, which

play13:41

essentially helps you to point to all of those images

play13:44

you have on GCS.

play13:45

As you can see in the output table on the right side,

play13:48

you can see each row represents an object stored on GCS.

play13:52

Now the table is created.

play13:54

You can just create a table just like any other tables

play13:57

right inside BigQuery.

play13:59

And the next step, they imported a vision model.

play14:02

In this case, they use a TensorFlow model

play14:05

called ImageNet to help to generate the image embeddings.

play14:10

So once you have model injected, you just

play14:13

ML.PREDICT to generate the image embeddings.

play14:16

In this case, they even use the principle analysis

play14:20

to further reduce those embeddings down to 39 features.

play14:24

Then they can use for further analysis.

play14:27

And the next step.

play14:28

Now you have your image embeddings in one table,

play14:31

you can combine it together with your search data.

play14:34

Then you combine them into one single table.

play14:37

So I think that's where we think the magic happens.

play14:40

So you can have everything-- structured, unstructured data--

play14:43

all in one single place.

play14:45

You can view, analyze, and [INAUDIBLE] machine

play14:48

learning on top of that.

play14:49

In this case, Twiddy used a model called Wide and Deep

play14:53

for training and for prediction.

play14:56

Now we can finally use ML.PREDICT

play14:58

to predict the click-through rate of the rental property

play15:01

listings.

play15:02

So on the right side, as you can see,

play15:04

the number highlighted represents the probability

play15:07

for a user to click those rental properties for the Twiddy

play15:10

users.

play15:11

So now you wonder-- so how can we understand these numbers?

play15:15

And what Twiddy can do about it.

play15:17

So what they did was to use a function

play15:19

called global explain to help us understand

play15:22

all of those features that they use for the model.

play15:25

Which of them can help us explain the final prediction

play15:28

numbers?

play15:29

So you can see a whole list on the left side here.

play15:32

What we found that's really interesting is any feature

play15:35

that's related to the image--

play15:37

like principal component, which is essentially

play15:39

the image embedding together with the image label--

play15:42

help to represent 57% of the explanation reasons, which

play15:49

is really high.

play15:50

Much higher than Twiddy initially anticipated.

play15:53

And to also illustrate this further on the right side,

play15:57

you can see the average probability of a user

play16:00

to click those images whenever those image labels were

play16:04

featured in a specific photo.

play16:06

So you can see the full quadrants here.

play16:08

And the top two quadrants, you can

play16:10

see if you include some water--

play16:12

swimming pool, beach-- daytime, it's

play16:15

more likely to get more clicks from the users.

play16:17

So in the bottom quadrant, if you

play16:20

include elements like porch, stairway, it's

play16:23

probably not as impactful.

play16:24

So this is something Twiddy should probably do less of.

play16:28

So now with this insight, they are

play16:30

ready to take it to the next level.

play16:32

So for the content team, now they

play16:34

can translate those findings and make better decisions

play16:37

for their photo editing and selection.

play16:39

So in this case, it's exactly the same house

play16:42

in both two images.

play16:44

Initially, they used a nighttime photo.

play16:47

But as you remember, the result is telling us

play16:50

from the machine learning model, our users

play16:53

are more likely to click the daytime photos.

play16:55

So now they have replaced it with a photo on the right side

play16:58

here, as you see.

play17:00

So to summarize what we learned so far from Twiddy's use case.

play17:04

So with all of those great prediction results, now

play17:07

Twiddy is able to make a more informed decision

play17:10

to do the A/B testing to help them improve

play17:13

their click-through rate.

play17:14

The content team can take a more data-driven approach

play17:18

when it comes to which photo to select,

play17:21

how to edit those photos to better engage with their users.

play17:25

And all of this, you can just use

play17:27

SQL, which aligns really well with their business analyst

play17:31

skills.

play17:31

And then lastly, I want to share with you a quote from Adswerve.

play17:35

"Twiddy now has the capability to use advanced image analysis

play17:39

to stay competitive in an ever-changing landscape

play17:42

of vacation rental providers-- and can

play17:45

do this using their in-house SQL skills."

play17:47

I want to thank Adswerve and Twiddy for sharing

play17:51

your feedback with us.

play17:52

And looking forward to hearing more results from you.

play17:55

So for any of you who are interested to try out

play17:58

this feature or any BigQuery ML features

play18:00

that we shared with you today, please use this interest

play18:03

form to sign up.

play18:04

We look forward to working closely with you.

play18:06

And thanks, everyone, for joining.

play18:08

[MUSIC PLAYING]

Rate This

5.0 / 5 (0 votes)

相关标签
BigQuery MLデータサイエンス機械学習Google Cloud非構造化データMLOps予測モデルSQLAI画像分析マーケティング
您是否需要英文摘要?