How to accelerate machine learning development with BigQuery ML
Summary
TLDRAbhinav KhushrajとCandice Chenは、Google CloudのBigQuery MLについて語りました。2018年にリリースされてから、さまざまな企業やスタートアップ、個人データ専門家から好評を得ています。BigQuery MLは、データサイエンスを誰にでも簡単にできるようにすることで、強力なML機能を解き放すというミッションを持ち続けています。BigQuery MLは、データ管理の簡易化、インフラstractureの扱いを避け、デプロイの容易さを提供します。また、MLOpsの統合、非構造化データの活用、TensorFlow LiteやXGBoostなどのモデルのインポート、リモートモデルでの推論、予測の向上、特徴工学的強化など、新機能が紹介されました。特にTwiddyの事例では、非構造化データと構造化データの組み合わせが、検索結果の向上に寄与し、コンテンツチームの意思決定をデータ駆動型に変えました。
Takeaways
- 🚀 BigQuery MLは、2018年にリリースされて以来、多くの顧客から好評を得ており、大小様々な企業やスタートアップ、個人のデータ専門家が革新的な方法で活用している。
- 📈 BigQuery MLのミッションは、誰もがデータサイエンティストになることができるように、強力な機械学習(ML)機能を簡単に解き放すことです。
- 🔧 従来のMLワークフローでは、データの移動やインフラストラクチャの整備、スキルセットの確保など多くの課題がありましたが、BigQuery MLはこれらを解決する。
- 💡 BigQuery MLでは、単純なSQLクエリでモデルをトレーニングし、BigQuery内の既存データセットを指し示してモデルを構築できます。
- 📊 BigQuery MLは、ビッグデータのスケールとコンピュート能力を活用して、1,000行から数百만行までのデータを扱うことができます。
- 🔗 MLOpsは、企業規模での機械学習を実行する上で不可欠であり、BigQuery MLはVertex AI Model Registryと緊密に統合されています。
- 🧠 BigQuery MLは、ビジョンモデルを導入し、非構造化データ(画像、テキスト、ドキュメントなど)の価値を解き放すことができます。
- 🔌 BigQuery MLは、TensorFlow、TensorFlow Lite、XGBoostなどの他のプラットフォームでトレーニングされたモデルをインポートできるようになりました。
- 📉 ARIMA Plusと呼ばれる単変量時系列予測モデルに加えて、BigQuery MLは今、外部回帰子を用いた多変量時系列予測をサポートします。
- ⚙️ 特徴エンジニアリングは、優れた機械学習の核心であり、BigQuery MLは新しい数値およびカテゴリカル関数を導入することで、これに貢献します。
- 🌟 Twiddyの事例では、BigQuery MLを使用して、非構造化データと構造化データを組み合わせて、クリックスルー率を予測し、マーケティング分析を劇的に変革しました。
- 📝 Twiddyは、BigQuery MLを活用して、データドリブンなアプローチでコンテンツチームが写真の選択と編集を行うことができます。
Q & A
BigQuery MLはいつから利用可能になったのですか?
-BigQuery MLは2018年にリリースされました。
BigQuery MLの導入により、どのような新しい機能が提供される予定ですか?
-BigQuery MLは、MLOpsの統合、非構造化データの分析、TensorFlow LiteやXGBoostなどのモデルのインポート、リモートモデルでの推論、予測機能の強化、およびフィーチャーエンジニアリングの向上など、多くの新機能を提供する予定です。
MLOpsとは何ですか?なぜ重要なのでしょうか?
-MLOpsは、機械学習のエンタープライズ規模での運用を支援するプロセスです。モデルの管理や比較、監視などを行うことで、機械学習の質と効率を保証します。
BigQuery MLが提供する非構造化データの分析とは何ですか?
-非構造化データの分析では、画像やテキスト、ドキュメントなどのデータを扱えます。BigQuery MLは、これらのデータをモデル予測に活用できるように、画像からオブジェクトを特定するなど、ビジョンモデルを導入しています。
TensorFlow LiteやXGBoostなどの外部でトレーニングされたモデルをBigQuery MLで使用するにはどうすればよいですか?
-外部でトレーニングされたモデルをBigQuery MLで使用するためには、CREATE OR REPLACE MODELを使用してモデルをインポートし、ML.PREDICTで推論を行うことができます。
リモートモデルでの推論とは何ですか?
-リモートモデルでの推論とは、すでに他のプラットフォーム(例:Vertex AIやCloud Functions)にデプロイされているモデルに対して、BigQuery MLから推論を実行できる機能です。これにより、BigQuery MLでサポートされていないモデルタイプも使用できます。
BigQuery MLの多変量時系列予測とは何ですか?
-多変量時系列予測は、過去の売上だけでなく、売上に影響を与える他の要因も考慮して未来の売上を予測する機能です。ARIMA Plus XREGを使用することで、ターゲットメトリックだけでなく、追加の特徴量も入力できます。
BigQuery MLで提供されるフィーチャーエンジニアリングの新しい機能は何ですか?
-BigQuery MLでは、新しい数値型とカテゴリー型の関数の導入、そしてモデルのエクスポート時に変換句が一緒にエクスポートされることで、オンライン展開でのモデルの使いやすさが向上します。
Twiddyの事例ではどのようにBigQuery MLを使用してクリックスルー率を予測しましたか?
-Twiddyは、Google Cloud Storageに保存された家の画像を指すオブジェクトテーブルを作成し、ビジョンモデルをインポートして画像を数値データに変換しました。その後、ビジネスデータと組み合わせて、機械学習モデルを構築し、クリックスルー率を予測しました。
Twiddyはどのようにして画像の選択と編集を改善しましたか?
-Twiddyは、機械学習モデルの結果をもとに、ユーザーがクリックする可能性が高い画像の特徴を理解し、コンテンツチームが写真の選択と編集にデータ駆動型のアプローチを取るようになりました。
BigQuery MLの新機能に興味を持っていたらどうすればよいですか?
-興味を持っていたら、ページ下部に表示されるリンクから興味を表明するフォームにアクセスし、登録することができます。登録後、Google Cloudチームから連絡がくる予定です。
Outlines
🚀 BigQuery MLの紹介と新機能
Abhinav KhushrajとCandice ChenがBigQuery MLについて語ります。2018年にリリースされ、顧客の多様なデータニーズに対応してきました。BigQuery MLはSQLを使用して機械学習モデルを簡単に構築・予測できるサービスです。また、新しい機能として、MLOpsの統合、解釈可能なAI、パイプラインの活用、非構造化データの活用、TensorFlow LiteやXGBoostなどの外部モデルのインポート、リモートモデルの推論、予測モデルの多変量時間系列予測、そして特徴工学的新しい関数の紹介が行われます。
📈 BigQuery MLの新機能詳解
BigQuery MLの新機能が詳しく説明されています。MLOpsの重要性とVertex AI Model Registryとの連携、非構造化データの活用方法、外部モデルのインポート、リモートモデル推論、多変量時間系列予測、そして特徴工学の強化が紹介されています。これらの機能は、データサイエンスを簡素化し、ビジネスの問題解決に貢献すると述べられています。
🏠 Twiddyの事例:非構造化データと構造化データの活用
Candice ChenがTwiddy社の事例を紹介します。Twiddyはバケーションレンタル会社のウェブサイトで、検索結果の向上を妨げている3つの課題を抱えていました。BigQuery MLを使用することで、彼らは画像と構造化データを組み合わせて、レンタルプロパティのクリックスルー率を予測するモデルを構築しました。このプロセスは、GCSに保存された画像を指すオブジェクトテーブルの作成、Visionモデルのインポート、そして画像埋め込みとビジネスデータの組み合わせを通じて行われます。これにより、Twiddyは機械学習モデルを使用して画像を分析し、コンテンツチームがデータ駆動のアプローチを取るようになりました。
🎯 Twiddyの成果とBigQuery MLの今後
TwiddyはBigQuery MLを使用して、クリックスルー率の向上に成功しました。彼らはA/Bテストを行い、コンテンツチームがユーザーとのエンゲージメントを向上させるために写真の選択と編集を行うことができます。また、AdswerveからTwiddyのフィードバックが引用され、BigQuery MLの進化と活用が期待されています。興味がある人はリンクから登録して、新しい機能を試すことができます。
Mindmap
Keywords
💡BigQuery ML
💡データ統合 (Data Integration)
💡MLOps
💡非構造化データ (Unstructured Data)
💡モデルインポート (Model Import)
💡予測 (Prediction)
💡特徴エンジニアリング (Feature Engineering)
💡リモートモデル (Remote Model)
💡ARIMA Plus
💡可視化 (Visualization)
💡SQL
Highlights
BigQuery ML was launched in 2018 and has seen great adoption across various customer sizes and data needs.
BigQuery ML aims to simplify the process of unlocking ML capabilities, enabling anyone to become a data scientist.
New powerful capabilities are being introduced to BigQuery ML to enhance its functionality.
BigQuery ML allows for training models directly within BigQuery using a simple SQL query.
The platform provides scalability and compute capabilities for models ranging from 1,000 to millions of rows.
BigQuery ML supports batch predictions using SELECT star from ML to predict on new incoming data.
Users can perform ML workflows without moving data from BigQuery, addressing a significant challenge in AI.
BigQuery ML leverages Google's massive compute and storage infrastructure, reducing the need for users to manage infrastructure.
MLOps is crucial for enterprise-scale machine learning, and BigQuery ML is now integrated with Vertex AI Model Registry for model management.
BigQuery ML introduces over 20 ML operators for use within Vertex AI managed pipelines.
BigQuery ML now supports handling unstructured data such as images, texts, and documents.
Users can import custom TensorFlow-based vision models for object detection and other vision capabilities.
BigQuery ML will support importing models beyond TensorFlow, including TensorFlow Lite and XGBoost.
The platform allows inference on remote models deployed on Vertex AI or Cloud Functions.
BigQuery ML introduces multivariate time series forecasting with ARIMA Plus XREG for more robust predictions.
Feature engineering is enhanced with new numerical and categorical functions, and transform clauses are exported with models.
Twiddy, a vacation rental company, used BigQuery ML to combine structured and image data to predict click-through rates for their listings.
Twiddy's content team can now make data-driven decisions for photo editing and selection based on ML model insights.
BigQuery ML's SQL alignment allows business analysts to utilize their existing skills for advanced image analysis and ML tasks.
Transcripts
[MUSIC PLAYING]
ABHINAV KHUSHRAJ: Hi.
I'm Abhinav Khushraj and I'm a product lead for BigQuery ML.
CANDICE CHEN: Hello, everyone.
My name is Candice Chen.
I am the product manager for BigQuery at Google Cloud.
ABHINAV KHUSHRAJ: Today, we're going
to talk about BigQuery ML.
Since we launched BigQuery ML in 2018, we have come a long way.
Lots of great adoption from customers of all sizes,
with large enterprise customers with complex data
needs or small startups or young companies or even
individual data professionals.
And we've been able to see the new and innovative ways they've
been using this.
And it has just blown our mind.
Today's session, we're going to talk about all the new stuff
that we're going to bring to BigQuery ML--
all the new, powerful capabilities
that we're introducing to make it even more powerful.
It has always been our mission to make BigQuery ML the easiest
way to unlock powerful ML capabilities
so that anyone can be a data scientist.
And so first, I will start with a brief overview
of what is BigQuery ML.
And then show you all the cool goodies
that we're launching this year.
And then in the end, we'll see a wonderful customer demo
that Candice is going to show us where
unstructured data and structured data
comes together to solve real customer pain points.
So let's go ahead.
A typical ML workflow is where an organization
is spending a lot of time to bring
their data from transaction systems or traditional legacy
systems by doing ETL and investing
a lot of time and energy and doing that work so that they
have a single place to do analytics--
in this case, BigQuery.
And once you have it all there, it's great.
You can run all kinds of analytics if it's VI.
But the moment you want to do AI in ML--
oh boy, there are lots of challenges.
The first thing you have to do is take the data out.
After putting in all that effort,
now you have to take this data out,
which means you're losing data governance.
You'll have to build new data pipelines to build consistency
from the old or the new source.
It just becomes much harder.
Another problem is you have to train these models.
And you have to figure out what infrastructure
you need-- whether you need a large set of Kubernetes
environment and whatnot.
Or you have to hire skilled professionals--
people who know data science and Python and R and all of that.
So that's another problem-- infrastructure and skill set.
And finally, another infrastructure problem
is where do you deploy this model?
You have to deploy this model and have
the right scale and infrastructure
so it can deliver the final value at deployment time.
And these are the problems that BigQuery ML solves.
It makes it easy to manage data.
It makes it easy to not have to deal with infrastructure.
And it makes it easy to not have to deploy anywhere.
It can be done right inside BigQuery for batch prediction.
And the way it does it is this.
First step, we just come and train
a model by writing the simple SQL query CREATE OR REPLACE
MODEL.
And you point to the data set that's already inside BigQuery.
You've already done all the work to bring the data.
Why take it out again?
Train the model right inside BigQuery.
BigQuery has tremendous scale capability
and compute capability for storage
and compute that you put to work when you train this model.
So whether it's a data with 1,000 rows or millions of rows,
you don't have to worry about it.
BigQuery takes care of it.
And once the model training is done,
you can start doing prediction.
You can call SELECT star from ML to predict and do prediction
on new data that's coming in to predict maybe forecasting
or maybe predict new segments for your customer, and so on.
And so, in summary, what BigQuery ML really
brings as value to the table is you do not have to go and look
for real advanced ML skills, like Python and R.
If SQL is a much wider unknown language
and users can use that, do ML.
Second, you can execute all your ML workflows
without moving data from BigQuery.
Moving data is the biggest problem.
Every AI problem becomes a data problem.
And so by having to do it inside BigQuery, inside the database,
it really takes away a big barrier to ML development
and ML adoption.
And finally, infrastructure.
BigQuery takes advantage of the massive scale of Google's
compute infrastructure and storage infrastructure
to be able to not have to think about what infrastructure you
need to do ML.
And that's powerful.
Now, this is stuff that we've been doing since 2018.
And so now we're going to launch a whole bunch
of new capabilities that will just blow your mind.
So let's start with MLOps.
MLOps is critical for doing machine
learning at enterprise scale.
If you do not have MLOps, you're training lots of models.
You need to manage models.
You need to compare models.
How do you do all of that?
And BigQuery ML is now tightly integrated with Vertex AI Model
Registry, so any model that you create
in BigQuery ML can be registered and monitored
using Vertex AI Model Registry.
Once you have the models available in Vertex AI,
you can either deploy one quick deployment right inside Vertex
AI to its online deployment endpoints
or you can do comparison and evaluation.
Or you can even do down the road monitoring
and a bunch of other capabilities
that Vertex AI provides out of the box.
And not only that, you will also have the ability
to do Explainable AI for models that
were built with XAI capability.
Another important piece related to operationalizing
is the ability to use pipelines BigQuery ML has introduced
more than 20 ML operators that will enable
you to use those operators inside Vertex AI managed
pipelines.
Vertex AI managed pipeline is a KFP-based, queue-flow-based
pipelines that can be used by organizations and professionals
to make ML easier to manage.
And with BigQuery ML, you can now
do that right inside Vertex.
And another cool thing about it is
if you are a Python aficionado and not a SQL person,
you can now use BigQuery ML operators
to create your BigQuery ML train and predict using BigQuery ML.
So this is, again, going to open up more ways users can use
and a tighter story between Vertex AI and BigQuery ML.
Insights from unstructured data.
This is the big news for us today.
So far, BigQuery has always been a leader
in managing structured and semi-structured data.
Earlier today, Thomas Green announced
that BigQuery can now handle unstructured data,
such as images, texts, docs--
you name it.
And what's the point of having all of that data if you cannot
do something with it?
So BigQuery ML will now help you unlock
that value for the unstructured data
that BigQuery will help you manage.
You can do model predictions with vision
to figure out objects in an image
by bringing in your own custom, TensorFlow-based vision models
to do object detection and a slew of other vision
kind of capability.
Down the road, we intend to unlock other kinds
of capability beyond vision.
And we look forward to making that available.
But today, Candice is also going to show
you a demo of this particular feature.
Now, BigQuery ML already has support
for many, many types of models.
But a popular request from our customers
is, hey, how can I do inference on models that I've already
trained?
I have a legacy system.
I want to train using that.
Or I built a model in another platform
and I'd like to bring that in here.
How can I do that?
And so BigQuery ML will now be able to import models
beyond TensorFlow.
We already support TensorFlow import.
And now we'll take it to TensorFlow Lite and XGBoost.
By typing a simple CREATE OR REPLACE
MODEL and point to the model.
And you can bring the model in.
Then you can do the ML predict, just like you always do.
And all the compute is being done inside the query.
Alternatively, if you've got a model that
is too big or too large or it is already
deployed at an end point and you do not have the ability
to bring that into BigQuery, BigQuery ML
will now have the ability to do inference on remote models.
So if you have a model that is--
you can deploy a model on Vertex AI or Cloud Functions.
And then come to BigQuery and just point to it
by saying CREATE OR REPLACE MODEL and point to that model
endpoint.
And then when you do ML predict, the prediction
is going to take the data from BigQuery,
take it all the way to the end point, do the prediction,
bring the results back.
You can also do evaluation of your model
with that same remote model capability.
So all in all, inference on importing inference
with imported models, the remote model
will open up a whole range of other model types
beyond the model types that BigQuery ML supports.
Forecasting is a very important business problem.
Customers want to predict--
what is my future demand going to look
like for a certain product?
Or what are future revenues going
to look like for a certain SKU?
And BigQuery has natively supported
ARIMA Plus, which is one of the most popular BigQuery ML
models.
And today, we are going to take that to the next level.
Until now, we could only do forecasting with a single input
variable.
So if you have a revenue that you want to predict,
you can only input the past revenue along
with the timestamp.
And then it will predict the future revenue.
Now, a common request from customers
is that-- hey, look, my revenue is impacted not just
by past revenue but I have other factors that
impact that revenue.
How can I bring that in?
And that is basically multivariate time series
forecasting.
You will have the ability to use ARIMA Plus
XREG, which is a ARIMA Plus with external regressors.
So you can now provide additional features
beyond just the target metric.
Like in this case, Feature 1, 2, and 3
can be inputted along with the target metric
to give you a forecasted target metric.
So a lot more robustness to your forecasting because
of the external regressors.
This is a very customer-centric feature.
And I am so excited that the customers can now unlock more
capability with forecasting.
Feature engineering is at the heart
of great machine learning.
A lot of the time goes into feature engineering--
taking the data and making it represented in a manner
that the model training results in great models.
And we already had fantastic support
for various ML functions.
And we are taking it to the next level today
by introducing a host of new numerical and categorical
functions here.
Further, when you export BigQuery ML models,
the transform clauses will also be exported along with them.
So this is going to really simplify
how you do online deployment with models that you've
trained inside BigQuery.
So as you can see, a lot of powerful capabilities
are coming your way.
The first two features are already here
and you're welcome to use right away.
The rest of the features are either
coming in queue for this year or early next year.
And we can't wait to get it in your hands
and hear from you what you think about those.
If you're interested in any of these,
use the link presented at the bottom of this page
to express your interest and we'll be in touch with you.
Over to you, Candice, now, to show us
a super cool demo of how unstructured data
and structured data can come together
to really dramatically change how marketing analytics
is done.
So over to you, Candice.
CANDICE CHEN: Thank you, Abhinav.
Today, I'd like to show you how Adswerve and Twiddy are
able to run SQL on their images and create machine learning
models by using BigQuery ML.
So Adswerve is a tech service provider that bring AI to data.
And Twiddy is one of the clients that they work with.
It's a vacation rental company in the Outer Banks
of North Carolina.
So if you go to Twiddy's website today,
you can find your dream vacation home.
You can do a search like how many
bedrooms that you want to have, how close it is to the beach,
whether it has an ocean view or not.
And then they will generate a list of recommended vacation
homes based on your criteria that
come with the beautiful images that you saw in this photo.
So as of today, Twiddy actually faced three major challenges
that prevented them from further improving the search results.
The first one-- they are relying mainly
on structured data, which is the search data that they get
from their users on the website to help them
generate which properties they recommend to their users.
And secondly, their content team is merely
using a manual process to select the photos and edit the photos.
And then lastly, even if they want
to utilize those images, to be honest,
it's pretty challenging to analyze those images using
machine learning model, given the complexity
of the work and the requirements on the data science skills.
So how Twiddy was able to tackle this challenge
and go from using just structured data to be
able to use both structured data and image
data to predict the click-through rate
of their rental properties.
So now let's find out.
Here, the steps that they took, starting with they
had those images they already stored on GCS.
So first step they took is to create an object table, which
essentially points you to all of those house images
that you stored on GCS.
So this is actually a brand-new capability
that we are announcing today, which we're super
excited about, which means now you
can access those unstructured data right inside BigQuery.
And second step, they imported a vision model
to help to generate image embeddings.
What it does is to simply translate those images
into a series of numbers.
Then you can use them as inputs for your machine learning
models.
So now you have your image embeddings.
Combine it with all of your business data from a website.
You can start to build a machine learning model
and then make some predictions.
So all of this you can do inside using SQL in BigQuery.
So let me walk you through step by step.
So the first one, they create an external model, which
essentially helps you to point to all of those images
you have on GCS.
As you can see in the output table on the right side,
you can see each row represents an object stored on GCS.
Now the table is created.
You can just create a table just like any other tables
right inside BigQuery.
And the next step, they imported a vision model.
In this case, they use a TensorFlow model
called ImageNet to help to generate the image embeddings.
So once you have model injected, you just
ML.PREDICT to generate the image embeddings.
In this case, they even use the principle analysis
to further reduce those embeddings down to 39 features.
Then they can use for further analysis.
And the next step.
Now you have your image embeddings in one table,
you can combine it together with your search data.
Then you combine them into one single table.
So I think that's where we think the magic happens.
So you can have everything-- structured, unstructured data--
all in one single place.
You can view, analyze, and [INAUDIBLE] machine
learning on top of that.
In this case, Twiddy used a model called Wide and Deep
for training and for prediction.
Now we can finally use ML.PREDICT
to predict the click-through rate of the rental property
listings.
So on the right side, as you can see,
the number highlighted represents the probability
for a user to click those rental properties for the Twiddy
users.
So now you wonder-- so how can we understand these numbers?
And what Twiddy can do about it.
So what they did was to use a function
called global explain to help us understand
all of those features that they use for the model.
Which of them can help us explain the final prediction
numbers?
So you can see a whole list on the left side here.
What we found that's really interesting is any feature
that's related to the image--
like principal component, which is essentially
the image embedding together with the image label--
help to represent 57% of the explanation reasons, which
is really high.
Much higher than Twiddy initially anticipated.
And to also illustrate this further on the right side,
you can see the average probability of a user
to click those images whenever those image labels were
featured in a specific photo.
So you can see the full quadrants here.
And the top two quadrants, you can
see if you include some water--
swimming pool, beach-- daytime, it's
more likely to get more clicks from the users.
So in the bottom quadrant, if you
include elements like porch, stairway, it's
probably not as impactful.
So this is something Twiddy should probably do less of.
So now with this insight, they are
ready to take it to the next level.
So for the content team, now they
can translate those findings and make better decisions
for their photo editing and selection.
So in this case, it's exactly the same house
in both two images.
Initially, they used a nighttime photo.
But as you remember, the result is telling us
from the machine learning model, our users
are more likely to click the daytime photos.
So now they have replaced it with a photo on the right side
here, as you see.
So to summarize what we learned so far from Twiddy's use case.
So with all of those great prediction results, now
Twiddy is able to make a more informed decision
to do the A/B testing to help them improve
their click-through rate.
The content team can take a more data-driven approach
when it comes to which photo to select,
how to edit those photos to better engage with their users.
And all of this, you can just use
SQL, which aligns really well with their business analyst
skills.
And then lastly, I want to share with you a quote from Adswerve.
"Twiddy now has the capability to use advanced image analysis
to stay competitive in an ever-changing landscape
of vacation rental providers-- and can
do this using their in-house SQL skills."
I want to thank Adswerve and Twiddy for sharing
your feedback with us.
And looking forward to hearing more results from you.
So for any of you who are interested to try out
this feature or any BigQuery ML features
that we shared with you today, please use this interest
form to sign up.
We look forward to working closely with you.
And thanks, everyone, for joining.
[MUSIC PLAYING]
Weitere ähnliche Videos ansehen
This tool is a MUST for programmers 👩💻 #coder #technology #developer #software #tech #linux
進化したChatGPTの事例50選をががっと見る
MongoDB Schema Design | Embedding or Referencing? MongoDB Tutorials
Introducing AutoCAD 2025. What does innovation mean to you?
GPT-4oの強化されたテーブル(表)機能が感動するレベルですごい!!
DuckDB An Embeddable Analytical Database
5.0 / 5 (0 votes)